21
proteins STRUCTURE FUNCTION BIOINFORMATICS Comprehensive analysis of the helix-X-helix motif in soluble proteins Julie Deville, Julien Rey, and Marie Chabbert * CNRS UMR 6214-INSERM U771, Universite ´ d’Angers, Faculte ´ de Me ´decine, 3 rue Haute de Recule ´e, 49045 Angers, France INTRODUCTION Understanding the relationship between amino acid sequence and protein structure is of primary importance to develop meth- ods aimed at improving structure predictions and designing de novo proteins. Secondary structure (SS) is a crucial level in the hierarchical classification of protein structure. a-Helices and b- sheets, the major SS elements, allow a simple description of pro- tein structures and are used for protein classification (e.g. the SCOP or the CATH databases 1,2 ). These regular backbone struc- tures are linked by loops or turns. Prediction of the secondary structure of a protein from its sequence is a key step for structure prediction. The different pro- pensities of each amino acid for helical, strand, or random coil conformations have been acknowledged long ago 3,4 and form the bases of numerous SS prediction programs. 5–8 All the SS prediction programs rely on pattern recognition techniques com- bined to statistical analysis of known protein structures. The ac- curacy rate for a three state prediction (helix, strand, and coil) is about 60% for the initial algorithms based on a single sequence analysis 3,6,9 and rises to about 75% for current algorithms based on multiple sequence alignment. 8,10–14 One limitation of these programs is that they rely on the analysis of the sequence properties of a window surrounding each individual amino acid, implying a poor precision. For example, Jnet predictions are based on a 17 residue long window. 11 Although a-helices are generally better predicted than b-strands or coils, it is difficult to identify their extremities correctly, in spite of strong capping sequence signals. 15 Another caveat due to the window size is that short irregular elements joining similar SS elements can be missed. Two helices joined by a few residue linker may be pre- dicted to form a single long helix. The capability to develop bioinformatics tools able to predict the position and the structure of such kinks between two a-heli- ces should be very valuable for both structure predictions and protein design. Development of these tools requires a compre- hensive analysis of these structures. The properties of two or three residue long linkers joining a-helices have been widely *Correspondence to: Dr. Marie Chabbert, CNRS UMR 6214-INSERM U771, Faculte ´ de Me ´decine d’Angers, 3 rue Haute de Recule ´e, 49045 Angers, France. E-mail: [email protected] Received 25 April 2007; Revised 12 September 2007; Accepted 11 October 2007 Published online 23 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.21879 ABSTRACT a-Helices are the most common secondary structures found in globular proteins. In this report, we analyze the stereochemical and sequence properties of helix-X- helix (HXH) motifs in which two a-helices are linked by a single residue, in search of characteristic struc- tures and sequence signals. The analysis is carried out on a database of 837 nonredundant HXH motifs. The kinks are characterized by the bend angle between the axes of the N-terminal and C-terminal helices and the wobble angle corresponding to the rotation of C-ter- minal helix axis on the plane perpendicular to the N- terminal one. The phi-psi dihedral angles of the linker residue are clustered in six distinct areas of the Rama- chandran plot: two areas are located in the additional allowed a region (a 1 and a 2 ), two areas are in the additional allowed b region (b 1 and b 2 ) and two areas have positive phi values (a L and b M ). Each phi/ psi region corresponds to characteristic bend and wob- ble angles and amino acid distributions. Bend angles can vary from 08 to 1608. Most wobble angles corre- spond to a counter-clockwise rotation of the C-termi- nal helix. Proline residues are rigorously excluded from the linker position X but have a high propensity at position X11 of the b 1 and b 2 motifs (12 and 7, respectively) and at position X13 of the a 1 motifs (9). Glycine linkers are located either in the a L region (20%) or in the b M region (80%). This latter confor- mation is characterized by a marked bend angle (1248 6 188) and a clockwise wobble. Among other amino acids, Asn is remarkable for its high propensity (>3) at the linker position of the a 2 , b 1 , and b 2 motifs. Sta- bilization of HXH motifs by H-bonds between polar side chains of the linker and polar groups of the back- bone is determined. A method based on position-spe- cific scoring matrices is developed for conformational prediction. The accuracy of the predictions reaches 80% when the method is applied to proline-induced kinks or to kinks with bend angles in the 508–1008 range. Proteins 2008; 72:115–135. V V C 2008 Wiley-Liss, Inc. Key words: 3D data mining; protein structure; helix kink; proline; glycine. V V C 2008 WILEY-LISS, INC. PROTEINS 115

Comprehensive analysis of the helix-X-helix motif in soluble proteins

Embed Size (px)

Citation preview

  • proteinsSTRUCTURE O FUNCTION O BIOINFORMATICS

    Comprehensive analysis of the helix-X-helixmotif in soluble proteinsJulie Deville, Julien Rey, and Marie Chabbert*

    CNRS UMR 6214-INSERM U771, Universite dAngers, Faculte de Medecine, 3 rue Haute de Reculee, 49045 Angers, France

    INTRODUCTION

    Understanding the relationship between amino acid sequence

    and protein structure is of primary importance to develop meth-

    ods aimed at improving structure predictions and designing de

    novo proteins. Secondary structure (SS) is a crucial level in the

    hierarchical classification of protein structure. a-Helices and b-sheets, the major SS elements, allow a simple description of pro-

    tein structures and are used for protein classification (e.g. the

    SCOP or the CATH databases1,2). These regular backbone struc-

    tures are linked by loops or turns.

    Prediction of the secondary structure of a protein from its

    sequence is a key step for structure prediction. The different pro-

    pensities of each amino acid for helical, strand, or random coil

    conformations have been acknowledged long ago3,4 and form

    the bases of numerous SS prediction programs.58 All the SS

    prediction programs rely on pattern recognition techniques com-

    bined to statistical analysis of known protein structures. The ac-

    curacy rate for a three state prediction (helix, strand, and coil) is

    about 60% for the initial algorithms based on a single sequence

    analysis3,6,9 and rises to about 75% for current algorithms

    based on multiple sequence alignment.8,1014 One limitation of

    these programs is that they rely on the analysis of the sequence

    properties of a window surrounding each individual amino acid,

    implying a poor precision. For example, Jnet predictions are

    based on a 17 residue long window.11 Although a-helices aregenerally better predicted than b-strands or coils, it is difficultto identify their extremities correctly, in spite of strong capping

    sequence signals.15 Another caveat due to the window size is

    that short irregular elements joining similar SS elements can be

    missed. Two helices joined by a few residue linker may be pre-

    dicted to form a single long helix.

    The capability to develop bioinformatics tools able to predict

    the position and the structure of such kinks between two a-heli-ces should be very valuable for both structure predictions and

    protein design. Development of these tools requires a compre-

    hensive analysis of these structures. The properties of two or

    three residue long linkers joining a-helices have been widely

    *Correspondence to: Dr. Marie Chabbert, CNRS UMR 6214-INSERM U771, Faculte de

    Medecine dAngers, 3 rue Haute de Reculee, 49045 Angers, France.

    E-mail: [email protected]

    Received 25 April 2007; Revised 12 September 2007; Accepted 11 October 2007

    Published online 23 January 2008 in Wiley InterScience (www.interscience.wiley.com).

    DOI: 10.1002/prot.21879

    ABSTRACT

    a-Helices are the most common secondary structures

    found in globular proteins. In this report, we analyze

    the stereochemical and sequence properties of helix-X-

    helix (HXH) motifs in which two a-helices are linked

    by a single residue, in search of characteristic struc-

    tures and sequence signals. The analysis is carried out

    on a database of 837 nonredundant HXH motifs. The

    kinks are characterized by the bend angle between the

    axes of the N-terminal and C-terminal helices and the

    wobble angle corresponding to the rotation of C-ter-

    minal helix axis on the plane perpendicular to the N-

    terminal one. The phi-psi dihedral angles of the linker

    residue are clustered in six distinct areas of the Rama-

    chandran plot: two areas are located in the additional

    allowed a region (a1 and a2), two areas are in the

    additional allowed b region (b1 and b2) and two

    areas have positive phi values (aL and bM). Each phi/

    psi region corresponds to characteristic bend and wob-

    ble angles and amino acid distributions. Bend angles

    can vary from 08 to 1608. Most wobble angles corre-spond to a counter-clockwise rotation of the C-termi-

    nal helix. Proline residues are rigorously excluded

    from the linker position X but have a high propensity

    at position X11 of the b1 and b2 motifs (12 and 7,respectively) and at position X13 of the a1 motifs (9).Glycine linkers are located either in the aL region

    (20%) or in the bM region (80%). This latter confor-

    mation is characterized by a marked bend angle (12486 188) and a clockwise wobble. Among other aminoacids, Asn is remarkable for its high propensity (>3)at the linker position of the a2, b1, and b2 motifs. Sta-

    bilization of HXH motifs by H-bonds between polar

    side chains of the linker and polar groups of the back-

    bone is determined. A method based on position-spe-

    cific scoring matrices is developed for conformational

    prediction. The accuracy of the predictions reaches

    80% when the method is applied to proline-induced

    kinks or to kinks with bend angles in the 5081008range.

    Proteins 2008; 72:115135.VVC 2008 Wiley-Liss, Inc.

    Key words: 3D data mining; protein structure; helix

    kink; proline; glycine.

    VVC 2008 WILEY-LISS, INC. PROTEINS 115

  • investigated,1618 but one residue linkers did not receive

    much attention yet. However, an analysis of the Protein

    Data Bank17 indicated that a structural motif consisting

    of two consecutive helices separated by one residue (the

    helix-X-helix or HXH motif) is almost as frequent as

    motifs with two helices joined by a two or three residues.

    In this study, we thus focus on one residue linkers

    joining a-helices in soluble proteins. We analyze the ster-eochemical and sequence properties of HXH motifs in a

    nonredundant database that we have developed. We show

    that the relative orientation between the two helices is

    determined by a few possible dihedral conformations of

    the protein backbone at the linker position and that each

    dihedral conformation exhibits a distinct amino acid dis-

    tribution at and around the linker. We analyze H-bonds

    within the HXH motif to determine the interactions sta-

    bilizing each conformation. Finally, we develop a method

    based on position-specific scoring matrices for conforma-

    tional prediction.

    MATERIALS ANDMETHODS

    Terminology

    The helices are named as helix 1 (H1) or helix 2 (H2)

    depending upon their relative position in the protein pri-

    mary structure. Following the common nomenclature, the

    first i positions of an a-helix are called N1, N2, N3, . . . ,Ni, the first preceding position Ncap and the second pre-

    ceding position N0, whereas the last j positions are Cj, . . . ,C3, C2, and C1, followed by Ccap and C0. The linker posi-tion is called X and corresponds both to the Ccap position

    of helix 1 (Ccap(H1)) and to the Ncap position of helix 2

    (Ncap(H2)). The ith position upstream is called X2i andthe jth position downstream is X1j. Position X21 corre-sponds to C1(H1) and to N0(H2), whereas position X11corresponds to N1(H2) and to C0(H1).

    Data sets

    PDB_25 and PDB_90 refer to nonhomologous sets of

    protein structures selected from the 25% and the 90%

    threshold lists, compiled by Hobohm and Sander19 and

    accessible at http://bioinfo.tg.fh-giessen.de/pdbselect (March

    2006 release). PDB_25S and PDB_90S refer to subsets of

    PDB_25 and PDB_90, respectively, containing only soluble

    proteins whose structure was determined by crystallography

    with a resolution 2.5 A and an R-factor 0.25. Helix def-inition was determined by DSSP.20

    We developed a database of HXH motifs from PDB_90,

    according to the procedure summarized in Figure 1. First,

    soluble proteins whose structure was determined by X-ray

    crystallography with a resolution of 2.5 A and a R-factor

  • C5 to C2 of H1 and from N2 to N5 of H2) did not alter

    the results.

    The motif was characterized by the bend angle ub andthe wobble angle uw. Definition of these angles allows adescription of the relative orientation of the two helices

    in spherical coordinates. The bend angle ub defines theangle between the axes of the two helices [Fig. 2(a)]. It is

    measured as:

    cosub H1 H2

    A bend angle of 08 indicates that there is no change inthe direction of the helix axes, whereas a bend angle of

    1808 indicates a total reversal in the direction of helix 2(antiparallel helices).

    The wobble angle uw [Fig. 2(b)] corresponds to therotation of the projection of H2 in a plane perpendicular

    to H1. The reference vector in this plane joins the helix

    center to the Ca atom of the C4 residue of H1. This vec-tor corresponds to the bisector of the H1 C5-C4-C3 angle.

    It was chosen as it allows an intuitive description of the

    kink geometry. Left-handed kinks correspond to a positive

    uw whereas right-handed kinks correspond to a negativeuw. For small bend angles, determination of the wobbleangle could be difficult. In these cases, the wobble angle

    was verified by shifting the reference atoms by one or two

    residues and visual inspection. In particular, each kink

    with a negative wobble angle was carefully controlled by

    visual inspection and detailed structural analysis.

    Search of reverse false positives

    The search for residues that are identified as helical by

    DSSP but have phi/psi values similar to those of the

    HXH linker (reverse false positives) was carried out on

    PDB_90S. We considered helices that were at least eleven

    residue long. We measured the backbone dihedral angles

    of each residue included from position N6 to position

    C6 along with the dihedral angles of the neighbor resi-

    dues. Residues surrounded by at least one residue (from

    position 25 to 21 and from position 11 to 15) out ofthe a-helical range were removed from the data set, aspreviously done with the HXH motifs, to consider only

    helix distortions due to a single residue.

    The search of a2-like structures was carried out onPDB_25S. We first searched for residues located within

    a-helices whose dihedral angles summed to 2908 108.Among the 2200 residues found, we selected those whose

    psi was larger than 2208. This led to a set of 59 structuresused for visual inspection and further characterization of the

    properties of the residues surrounding the a2-like residue.

    Proline-containing helices

    The analysis was carried out on PDB_25S. We consid-

    ered helices with a minimal length of 11 residues that

    contained a proline located from position N5 to position

    C5. This led to a set of 85 structures, used for determina-

    tion of the sequence and of the dihedral angles of the

    residues surrounding the proline.

    Amino acid propensity

    The propensity Pak of amino acid a to occur at position

    k of a motif K was determined by the ratio of the number

    of amino acids a observed at this position k (nak) to the

    number of amino acids a expected (naexp) from the amino

    acid distribution in the entire data set:

    Pak nak=naexp

    Pak nak=NK =na=N

    where nak is the number of amino acids a at position k,

    NK is the number of motifs K, na is the total number of

    amino acids a in the full data set and N the total number

    of amino acids in the full data set. The amino acid distri-

    bution of PDB_25 was used as reference after verifying

    the absence of any significant change in the amino acid

    distributions of PDB_25, of PDB_90 and of the database

    of the proteins containing the HXH motifs.

    The number NK of the motifs analyzed varied from 37

    to 218 and the expected number of amino acids could be

    very low. To avoid biases due to small sample sizes, for

    each amino acid a at position k, we calculated a Z-score

    defined as:

    Zak nak naexprawhere ra, the standard deviation of the observed numberof an amino acid a, can be approximated by the square

    Figure 2Definitions of the bend angle yb and of the wobble angle yw. (a) and (b)correspond to two perpendicular views of a schematic HXH motif, parallel and

    perpendicular to the axis of H1, respectively. The Ca atoms of the linker X andof residue C4 of H1 are shown as light grey and dark grey spheres, respectively.

    The Helix-X-Helix Motif

    PROTEINS 117

  • root of the expected value. This formula, initially pro-

    posed by Engel and DeGrado,17 was experimentally vali-

    dated by random trials in PDB_25. |Z| of 2.0 and 2.6 cor-

    respond to 95% and 99% confidence levels, respectively.

    Hydrophobicity profiles

    The hydrophobicity index H at each position sur-

    rounding the reference residue X corresponds to the aver-

    age of the hydrophobicity index at this position for all

    the motifs in the subset considered. The Eisenbergs con-

    sensus scale22 was used after verifying that different

    scales23,24 led to similar results. Heterogeneity between

    subsets was tested using v2 homogeneity tests. For thisanalysis, amino acids were grouped in three classes:

    hydrophobic (L, M, I, V, F, Y, W), polar (H, K, R, E, Q,

    D, N, S, T), and other (P, G, A, C).

    Solvent accessibility

    Solvent accessibility was calculated with the NACCESS

    program25 (http://wolf.bms.umist.ac.uk/naccess). The pro-

    gram calculates the atomic accessible surface when a 1.4 A

    probe is rolled around the van der Waals surface of the

    protein, according to the Lee and Richard method.26 Resi-

    due accessibility corresponds to the sum of the atomic

    accessibilities.

    H-bond analysis

    Analysis of hydrogen bonds was carried out with the

    HBPLUS v3.0 program accessible at http://www.biochem.

    ucl.ac.uk/bsm/hbplus/home.html.27 Default parameters

    were used throughout the analysis. We analyzed main

    chain main chain H-bonds involving the carbonyl oxy-

    gens of residues X24 to X and the amide nitrogens ofresidues X to X14, in search of disruption of helical pat-tern and alternative H-bonds. When several H-bonds

    were possible, the H-bond conserving the helical pattern

    was privileged. The only exception was for H-bonds typi-

    cal of the Schellman motif for which bifurcate H-bonds

    were taken into account.

    We analyzed side chain main chain hydrogen bonds

    for polar side chain Asn, Asp, Gln, Glu, Ser, Thr, and His

    located at the linker position X. We considered hydrogen

    bonds involving the g-hydroxyl oxygen of Ser/Thr resi-dues as donor or acceptor, the d- and e-nitrogens of Asn,Gln, and His as donor, the d- and e-oxygens of Asp/Asnand Glu/Gln as acceptor. We searched for putative H-

    bond partners among polar atoms of the backbone at

    positions from X24 to X14. Hydrogen bonds linking adonor side chain atom at position X to an acceptor back-

    bone oxygen at position X2i are denoted by O(X2i).Similarly, hydrogen bonds linking a donor backbone

    nitrogen at position X1i to an acceptor side chain atomat position X are denoted by N(X1i).

    Prediction of the linker conformation

    Position-specific scoring matrices, based on the relative

    amino acid weights, were developed to predict linker

    conformations. The score S(C) of a sequence a1. . .ai. . .anfor the conformation C was given by:

    SC X

    iWCai

    where ai is the amino acid at position i in the sequence

    considered and wC (ai) its relative weight at the same

    position i in the conformation C.

    A sequence was assigned to the conformation giving

    the highest score. The accuracy of the prediction was

    defined by the Q-score matrix which gives both the ratio

    Qxobs of the number of correctly predicted conformations

    x to the number of observed conformations x:

    Qxobs MxxPx Mxy

    and the ratio Qxpred of the number of correctly predicted

    conformations x to the number of predicted conforma-

    tions x:

    Qxpred MxxPy Myx

    where Mxy is the number of sequences observed in con-

    formation x and predicted in conformation y.

    To built and test the scoring matrices on different data

    sets and to estimate standard deviations of the Q-scores,

    we used a 10-fold, cross validation procedure, that is, 9/

    10th of the data was used to build the matrices and the

    remaining 1/10th of the data was used to test them.

    Molecular graphics

    PYMOL (DeLano Scientific LLC, San Francisco) was

    used for figure preparation and molecular graphics analy-

    sis. Representative structures were chosen after visual

    inspection. They correspond to: (a1) O-acetylserine sulf-hydrylase from Thermotoga maritima (PDB code: 1O58,

    chain D)28; (a2) Gamma-glutamyl phosphate reductasefrom Saccharomyces Cerevisiae (PDB code: 1VLU, chain

    A) (unpublished); (b1) 28KDA Glutathione S-transferasefrom Schistosoma haematobium (PDB code: 1OE8, chain

    A)29; (b2) Human glutathione synthetase (PDB code:2HGS, chain A)30; (aL) S-Adenosylmethionine-depend-ent methyltransferase from Thermotoga maritima (PDB

    code : 1M6Y, chain A)31; (bM) Periplasmic iron-bindingprotein from Neisseria gonorrhoeae (PDB code, 1XC1,

    chain A).32 The structure with the typical translation

    observed for a2 motifs with negative psi corresponds tochorismate mutase from Saccharomyces cerevisiae (PDB

    code : 5CSM, chain A).33

    J. Deville et al.

    118 PROTEINS

  • RESULTS

    The helix-X-helix database

    Using the DSSP definition of SS elements, we analyzed

    the length distribution of short segments (up to three

    residues) joining two a-helices in a subset of PDB_25containing 1200 protein chains of soluble proteins solved

    by X-ray crystallography at a resolution of 2.5 A or bet-

    ter. The subset contained 8044 helices with a length of at

    least five residues. This length was chosen as it corre-

    sponds to the minimal length required to reliably deter-

    mine a-helices with DSSP.15

    A linker length of 0 residue corresponds to two contig-

    uous helices without any residue between them. It is sel-

    dom observed (3 examples). On the other hand, the

    numbers of observations for one, two, or three residue

    linkers are very similar (160, 144, and 192 observations,

    respectively). This result is in agreement with a previous

    study17 using a different helix definition. The distribu-

    tion of the bend angles ub between the two helices as afunction of the linker length is shown in Figure 3. Kinks

    with one residue linkers overwhelm the kink distribution

    when the bend angle is 1000) was sufficient for quantitative analysis but datacould be severely biased. To overcome these limitations,

    we developed a HXH database from PDB_90, according

    to the procedure described in Materials and Methods

    (see Fig. 1). This ensured a sequence identity 45% (5residues out of 11) for the minimal eleven residue long

    motif. This identity rate was the best balance between the

    size of the sample and the redundancy. A filter based on

    the backbone dihedral angles of the five residues sur-

    rounding the linker residue X was added to remove false

    positives (structures with at least one residue in a non-

    helical conformation from positions X25 to X21 andX11 to X15 after the DSSP-based selection process).Nonhelical dihedral angles were found for 2% of posi-

    tions X11, 1% of positions X21, and about 1% of allthe other positions. This led to the removal of 4% of the

    motifs, without altering the overall results. The resulting

    database is composed of 837 HXH motifs with associated

    protein structures. 47% of the sequence pairs of the min-

    imal HXH motifs have no sequence identity, whereas

    94% of them have a sequence identity 27% (3 residuesout of 11), enlightening the sequence diversity of the

    database.

    Backbone geometry

    Using the HXH database, we analyzed the geometry of

    the backbone at the linker position X. The phi/psi dihe-

    dral angles of this residue cluster in six distinct areas of

    the Ramachandran plot, indicating six possible confor-

    mations of the protein backbone [Fig. 4(a)]. None of

    these conformations is located in the most favored

    regions of the Ramachandran plot. Two of these confor-

    mations, a1 and a2, correspond to distortions of the aconformation and are in the additional allowed region

    surrounding the most favored a-helix region. Together,they represent about 50% of the motifs (212 and 188

    occurrences out of 837, for the a1 and a2 conformations,respectively). The a1 conformation is distorted as com-pared to a canonical a helix (phi 5 2588, psi 5 2478)with a large shift of the phi angle to 21188 158,whereas the psi angle corresponds to canonical values

    (2588 148). This conformation corresponds to thebackbone conformation observed in p helices.34 The a2conformation is located at the upper limit of the allowed

    a region. It is characterized by a strong correlationbetween the phi and psi angles whose sum is almost con-

    stant (2908 108). The average value of psi is slightlypositive (88 198) while phi is shifted to 2978 208.This conformation encompasses the 310 helix area (phi 52748, psi 5 248) and corresponds to the g conforma-tion reported by Brazhnikov and Efimov.16 It makes a

    Figure 3Distribution of the bend angles between two helices separated by a linker of one

    residue (black bars), two residues (white bars), and three residues (grey bars). N

    represent the number of observations in a subset of PDB_25 containing 1200

    protein chains.

    The Helix-X-Helix Motif

    PROTEINS 119

  • transition from the a to the b regions of the Ramachan-dran plot.

    Two areas of the b region of the Ramachandran plotare also possible for the linker [Fig. 4(a)]. They are not

    in the core, but in the additional allowed b region. Theb1 conformation represents 26% of the motifs (218occurrences). It is characterized by a phi angle of 21378 158 and a psi angle of 838 248. This region was firstdescribed by Karplus35 and is largely associated with res-

    idues preceding proline.36 The b2 conformation is lessfrequent and represents only 12% of the kinks (104

    occurrences). Its dihedral angles, phi 5 2738 118 andpsi 5 1228 268, are close to the values observed forresidues in polyProline type II helix conformation

    (PPII).36,37 PPII helix is a left-handed helical structure

    formed when sequential residues adopt backbone dihe-

    dral angles centered around 2758 and 1458.38,39 Thisarea is also frequently associated with pre-Pro resi-

    dues.35,36

    Thirteen percent of the kinks have a positive phi value.

    They correspond to two distinct conformations [Fig.

    4(a)]. The bM conformation (phi 5 968 158, psi 51578 238), observed in 9% of the kinks (78 occur-rences), corresponds to a mirror region of the b confor-mation.36,37 This conformation has been recently

    described by Lovell et al.37 It is associated with glycine

    residues, because the lack of a side chain for Gly pro-

    duces mirror symmetry in steric constraints and in the

    phi/psi distribution.37 Finally, the aL conformation, cor-responding to left a-helices with phi 5 738 248 andpsi 5 258 258, is observed for 4% of the kinks (37cases).

    It is worth to note that the relative weights of the six

    conformations are very similar in PDB_25S (a1: 0.22, a2:0.21, b1: 0.29, b2: 0.14, aL: 0.04, bM: 0.09) and in theHXH database (a1: 0.25, a2: 0.22, b1: 0.26, b2: 0.13, aL:0.04, bM: 0.09), strongly suggesting that the procedurefor building the database does not lead to significant bias

    of the data.

    Distribution of bend and wobble angles

    The distribution of bend and wobble angles describing

    the relative orientation of the two a-helices was analyzedfor each conformation of the linker residue [Fig. 4(b)].

    Examples of each conformation are shown in Figure 5(a

    f). The a1 conformation [Fig. 5(a)] allows only moderatedeviations from linearity with an average bend angle of

    30 158. The a2 conformation [Fig. 5(b)] allows amuch larger range of bend angles from 108 to 1008 withan average value of 528 238. Most a1 and a2 motifshave a positive wobble angle, corresponding to a left-

    handed helix motion. Negative wobble angles, corre-

    sponding to a right-handed helix motion, are observed

    only for a minor part of these motifs (3 and 6% for a1and a2 motifs, respectively), usually with small bendangles (ub < 308). The average wobble angles of theseconformations are 778 488 and 798 508 for the a1and a2 conformations, respectively.In addition to the a2 conformation, bend angles in the

    4081008 range can be reached through the b2 and theaL conformations [Fig. 4(b)]. However, the average wob-ble angles of these conformations are very different. The

    Figure 4(a) Ramachandran plot of the linker residue X and (b) distribution of the bend and wobble angles between the two helices, for the 837 helix-X-helix motifs of the

    database. The color code indicates the HXH conformation (a1: dark blue; a2: sky blue; b1: violet; b2: pink; aL: green; bM: red).

    J. Deville et al.

    120 PROTEINS

  • wobble angle is near zero for the aL conformation withan average value of 228 318 [Fig. 5(e)]. For the b2conformation [Fig. 5(d)], large amplitude counter-clock-

    wise rotation of H2 leads to uw values that can be largerthan 1808 (average value of 1628 358) and thus to posi-tive and negative wobble values.

    Bend angles ranging from 908 to 1608 can be reachedthrough either the b1 or the bM conformations (average

    values of 1138 168 and 1248 188, respectively [Fig.4(b)]). However, these two conformations correspond to

    very different wobble angles. The b1 conformation leadsto a left-handed motion of helix 2 with an average value

    of 888 368 [Fig. 5(c)], whereas the bM conformationleads to a right-handed motion of helix 2 with an aver-

    age value of 2558 358. This conformation leads to areversal of helix direction. In the example of bM confor-

    Figure 5Typical HXH motifs for each linker conformation. (a) a1 motif: residues 136153 of 1O58 chain D, X 5 Thr-145, yb 5 338, yw 5 1288; (b) a2 motif: residues 2357 of1VLU chain A, X 5 Asn-40, yb 5 678, yw 5 998; (c) b1 motif: residues 183206 of 1OE8 chain A, X 5 Ser-196, yb 5 1078, yw 5 1098; (d) b2 motif: residues 6991 of2HGS chain A, X 5 Asn-82, yb 5 908, yw 5 1378; (e) aL motif: residues 205228 of 1M6CY chain A, X 5 Arg-217, yb 5 538, yw 5 18; (f) bM motif: residues 140168 of 1XC1 chain A, X 5 Gly-154, yb 5 1348, yw 5 2828. The helix-X-motif is shown as a green ribbon. The axis of helix 1 is vertical. The side chains of the linkerresidues and of the prolines, when present, are shown as sticks. In (f), the Ca of Gly at position X is shown as a sphere and the side chain of Ala at position X14 isshown as a stick. Dashed lines represent H-bonds either between polar linker side chains and the protein backbone (ad) or typical of the Schellman motif (e). Polar

    atoms involved in H-bonds are shown as spheres.

    The Helix-X-Helix Motif

    PROTEINS 121

  • mation displayed in Figure 5(f), the bend angle between

    the two helices reaches 1348.A striking result of this analysis is the anisotropy of

    the wobble motion. Most kink conformations correspond

    to a counter-clockwise rotation of helix 2. This is the

    case for the a1 and a2 conformations, except for a fewmotifs, and for the b1 and b2 conformations. The nega-tive values measured for the b2 conformations corre-spond to counter-clockwise rotation larger than 1808.The aL conformation displays small amplitude wobblerotation. Only the bM conformation allows a markedclockwise rotation of helix 2 with a reversed wobble

    motion as compared to the other conformations.

    Main chain main chain hydrogen bonds

    HBPLUS was used for a detailed analysis of the H-

    bonds involving the backbone NH and CO groups of the

    HXH motif from position X24 to position X14. We

    checked the conservation of the conventional NH(i) to

    CO(i24) H-bonds within the motif and, when these H-bonds were not conserved, alternative interactions. We

    thus analyzed the H-bonds involving the carbonyl groups

    of helix 1, the amine groups of helix 2, and either group

    for residue X for the six conformations (Table I). In

    most cases, the helical H-bond pattern involving the

    linker X is conserved and this residue is involved both in

    NH to CO H-bond with residue X24 and in CO to NHH-bond with residue X14. The only exception is the b2conformation. For this conformation, only 50% of the

    NH(X) to CO(X24) H-bonds are conserved, but theoverwhelming majority (92%) of NH(X14) to CO(X)H-bonds are conserved. On the other hand, the helical i

    to i24 H-bond pattern disappears between residues X11to X13 and residues X23 to X21. The lack of thesebonds is observed even for conformations with small

    bend angles as a1 and a2.

    Table IMain Chain Main Chain Hydrogen Bonds Within the HXH Motif a

    Conformation N

    a-helical H-bonds Alternative H-bonds

    NH(i ) CO(i24) Nbonds NH(i ) CO(j ) Nbonds

    a1 212 X X24 194 X11 X24 10X11 X23 4 X12 X23 164X12 X22 0 X13 X22 12X13 X21 0 X13 X 13X14 X 177

    a2 188 X X24 135 X21 X24 9X11 X23 1 X X23 42X12 X22 2 X11 X22 130X13 X21 0 X12 X21 25X14 X 157 X13 X 26

    b1 218 X X24 183 X21 X24 18X11 X23 0 X X23 27X12 X22 0 X11 X22 1X13 X21 0 X13 X 43X14 X 168

    b2 104 X X24 50 X21 X24 25X11 X23 0 X X23 44X12 X22 0 X13 X 6X13 X21 0X14 X 96

    aL 37 X X24 24b X11 X24 14b

    X11 X23 0 X X23 18c

    X12 X22 0 Schellman motifd 9X13 X21 0 Partial Schellman motife 23X14 X 34

    bM 78 X X24 74 X21 X24 2X11 X23 0 X X23 1X12 X22 0 X12 X21 1X13 X21 0 X13 X 8X14 X 65

    aThe number Nbonds of H-bonds was determined with HBPLUS as described in Materials and Methods for the six conformations of the linker X. N represents the num-

    ber of motifs for each conformation.bFor 10 of these H-bonds, interactions from NH(X) or NH(X11) to CO(X24) are equally probable.cFor 10 of these H-bonds, interactions from NH(X) to CO(X23) or CO(X24) are equally probable.dPresence of both NH(X11) to CO(X24) and NH(X) to CO(X23) H-bonds.ePresence of either NH(X11) to CO(X24) or NH(X) to CO(X23) H-bond.

    J. Deville et al.

    122 PROTEINS

  • Alternative interactions are possible for some confor-

    mations. The a1 conformation is characterized by NH(i)to CO(i25) H-bonds, observed mainly between residuesX12 and X23 (77% of the a1 motifs). A few examplesof i to i25 H-bonds are also observed between positionsX13 and X22 when residue X13 is not a proline. Inmost cases, however, the amide group of residue X13 isnot involved in H-bonds with neighbor carbonyl groups.

    Examples of NH(i) to CO(i23) H-bonds are commonlyfound in a2 motifs, especially from position X11to position X22. 70% of the a2 motifs are involved inthis interaction. NH(i) to CO(i23) H-bonds are alsoobserved from position X to position X23 in 42% of theb2 motifs. H-bonds from NH(X13) to CO(X) and fromNH(X) to CO(X23) can also be observed for the b1conformation (19 and 12%, respectively). For bM motifs,involvement of residues X23 to X21 and X11 to X13in main chain main chain H-bonds is very marginal.

    The occurrence of Schellman motifs40 was checked for

    the aL conformation. When a a-helix terminates by aCcap residue in the aL conformation, a characteristic H-bond pattern links the NH of the C0 and Ccap residuesto the CO of the C4 and C3 residues, respectively. The

    occurrence of these two NH(C0) to CO(C4) andNH(Ccap) to CO(C3) H-bonds forms the so-called

    Schellman motif.40,41 In the aL motifs, possibilities ofX11 to X24 and/or X to X23 H-bonds, typical of theSchellman motif, are observed in about 60% of the struc-

    tures. Complete Schellman motif with both H-bonds is

    however observed only in 25% of the HXH motifs in the

    aL conformation, either in presence or not of glycine atposition X. An example of a Schellman motif with an

    Arg linker is displayed in Figure 5(e).

    Comparison with normal helices

    Kinks with bend angles smaller than 408 are com-monly included within the DSSP definition of a-helices.Larger bend angles are infrequent but examples ofextreme distortion have been reported.42 We thussearched for reverse false positives (residues included inDSSP defined contiguous helices and having dihedralangles similar to those of the linker residue of the HXHmotif). The PDB_90S subset was used for this analysis toobtain the maximal limit of reverse false positives. Thesearch was carried out on helices with a length of at least11 residues, which corresponds to the minimal length ofthe HXH motifs. The central residue of each 11 residuelong window sliding from position N1 to position C1was considered, after ensuring that the other residues ofthe window were helical. This led to 99,920 residueswhose Ramachandran plot is shown in Figure 6. Com-parison with the Ramachandran plot of the linker resi-dues in HXH motifs [Fig. 4(a)] indicates the absence ofsignificant overlap or of reverse false positives for mostHXH conformations, except for the a2 and bM motifs.However, they correspond to two very different cases.

    The tail of the a2 conformation (psi

  • resulting in a markedly bend motif [Fig. 5(b)]. It is note-

    worthy that there is a continuum between these confor-

    mations as the bend angle varies linearly with the psi/phi

    dihedral angles.

    Concerning the bM conformation, 36 residues with di-hedral angles in this area are found in contiguous helices.

    We analyzed the corresponding structures. In 20% of these

    structures, H-bonds between the NH group at position

    X13 and the CO group at position X21 are detectedwith HBPLUS. In most cases, however, these motifs can-

    not be distinguished from the bM motifs of our HXHdatabase. In particular, the dihedral angles of the sur-

    rounding residues are very similar, with the preceding resi-

    due markedly different from standard helix (phi 5 2988 208, psi 5 2188 168). Clearly, DSSP may be mis-taken by the very severe folding back of the protein chain

    and underestimates the weight of the bM HXH motifs.Analysis of the DSSP data does not reveal any clear pat-

    tern for rationalizing the different assignments.

    The a1 conformation is clearly distinct from the con-formations accessible to central positions of a-helices,with no significant overlap [Figs. 4(a) and 6], albeit the

    resulting bend angle is small (308 158). Visual inspectionof the a1 motifs reveals an opening of the helix at thisposition [Fig. 5(a)], leading to a bulged structure, in agree-

    ment with the pattern of alternative H-bonds involving

    residues X12 and X23 (Table I). Finally, only a fewreverse false positives are observed for the b1 conforma-tions, whereas the b2 and aL conformations are excludedfrom internal positions of DSSP-defined a-helices (Fig. 6).

    Amino acid propensities

    The propensities P of the different amino acids at and

    around the linker position X were analyzed from position

    X24 to X14 for the six geometrically defined conforma-

    tions (Table II). To take into account the low expected

    number of some amino acids, data in Table II are high-

    lighted by their Z-scores. Five out of six conformations

    have a characteristic glycine or proline residue with a

    very high propensity (P > 6, corresponding to Z > 13).The glycine residue is located at the linker position X of

    the aL and bM conformations, whereas the proline islocated either at position X11 of the b1 and b2 confor-mations or at position X13 of the a1 conformation. Inaddition to these hallmark residues, each conformation

    has its own amino acid distribution.

    More than forty percent of the HXH motifs in the a1conformation possess a proline located at position X13.Figure 5(a) displays such a HXH motif with proline

    located three residues downstream the linker X. In addi-

    tion, 9% of the a1 motifs possess a proline at positionX12. This conformation is the only one for which thelinker residue has a marked preference for hydrophobic

    residues, especially Val and Phe. Positions X24, X11,and X14 also have a marked preference for hydrophobicresidues. Aromatic residues are overrepresented at posi-

    tion X24 where they are observed in 24% of the a1motifs. Position X21 is polar with a high propensity forAsn. Other positions do not display clear tendencies.

    The a2 conformation does not possess a specific over-whelming Gly or Pro residue. Proline can be observed at

    position X12 with a propensity of 2.5, corresponding toonly 12% of the observations. The a2 linker has a highpropensity for Asn and His (P > 2.8). These amino acidsare favored at the Ccap position of a-helices.43,44 Anexample of a2 motif with an Asn residue at position X isshown in Figure 5(b). Positions X22, X21, and X12have a marked tendency for polar residues, whereas posi-

    tion X11 displays a high propensity for Ala, Lys, andArg and position X14 has a marked preference forhydrophobic residues.

    Both the b1 and b2 conformations have a high pro-pensity for proline at position X11. This position corre-sponds to the N1 position of helix 2 and to the C0 posi-tion of helix 1. The high propensity of proline for both

    the N1 and the C0 positions has been widelyreported.15,4346 In these conformations, the dihedral

    angles of the X linker correspond to values accessible to

    pre-Pro residues.36,37 Fifty-six percent of the b1 motifshave a proline at position X11. An example of a b1motif with a proline is displayed in Figure 5(c). The

    linker residue has a high propensity for Asp, Asn, His,

    and Tyr (Z > 2.6) and Ser (Z > 2.0), consistent withpropensities for Ccap positions in the b-strand confor-mation.46 These five residues are found at position X in

    two thirds of the b1 motifs. Positions X22, X21, andX12 have a marked tendency for polar residues, whereaspositions X24, X23, X13, X14 are preferentially hydro-phobic. When only b1 motifs without proline at positionX11 are considered, this position has a preference forbulky charged or polar residues, specially Lys (P 5 2.9).

    Figure 7Typical translation of the helix axis in a22 motifs. The HXH motif (shown as aribbon) corresponds to residues 140170 of 5CSM chain A, X 5 Phe-160, yb 5108, yw 5 618. The phi and psi angles of the linker are 2728 and 2108,respectively.

    J. Deville et al.

    124 PROTEINS

  • Table IIAmino Acid Propensitiesa

    aThe amino acid propensities were calculated from position X24 to position X14 for the six conformations of the linker X as described in Materials and Methods.Hallmark residues (Z > 13) are highlighted in yellow, favorable residues with Z > 2.0 and 2.6 are highlighted in light green and dark green, respectively, and disfavoredresidues with Z < 22.0 and 22.6 are highlighted in pink and red, respectively.

  • Prolines are present in 32% of the b2 motifs at the posi-tion X11. Glu has also a high propensity for this positionand is observed in 18% of the b2 motifs. The b2 linker hashigh propensities for Ser, Asp, and Asn (Z > 2.6), typicalof Ncap positions.15,43,45,47 These three residues are

    found at position X in 68% of the b2 motifs, whereas ali-phatic residues are uncommon and aromatic residues rig-

    orously excluded at this position (no case out of 188). Thr

    that is usually favored at the Ncap position of a-helices isnot favored at the linker position (P 5 0.5). This behaviorof Thr is also observed for the other conformations.

    In spite of similarities in the amino acid propensities

    at the linker position, the b1 and b2 conformations dis-play a phase shift in the hydrophobicity of the residues

    surrounding the kink, observable at positions X24, X22,and X14. In b1 motifs, positions X24 and X14 arepreferentially hydrophobic, whereas position X22 is pref-erentially polar. In b2 motifs, position X24 displays apreference for polar residues or Ala and position X14for Arg and Gln which are present in 40% of these

    motifs, whereas position X22 has a preference for hydro-phobic residues.

    Both the aL and bM conformations have a very highpropensity for glycine at the linker position. This residue

    is observed in 54% of aL motifs and in all the bM motifsexcept one. This single exception is a Gln residue (posi-

    tion 369) in the isocitrate lyase from Mycobacterium

    tuberculis48 (PDB access number: 1F8M, chain A). This

    high propensity for glycine is also observed in the bMreverse false positives (34 cases out of 36). The bM regionis highly specific for glycine residues, while the aL regionhas a high propensity for glycine but can accommodate

    other residues, in agreement with literature data.36,37,49

    The number of aL motifs (37) is too low to have high Z-scores and levels of significance. However, qualitative

    comparison of amino acid propensities for aL and bMmotifs indicates a marked difference in the hydrophobic-

    ity pattern of these motifs. In aL motifs, positions X21and X14 have a preference for polar residues and posi-tion X12 for hydrophobic residues. On the other hand,in bM motifs, position X21 is preferentially hydrophobicwith high propensities for Leu and Phe (P 5 2.3 and 2.9,respectively), but Lys, whose part of the side chain is ali-

    phatic, can also be observed. Hydrophobic residues and

    Ala are favored at position X14 (P 5 2.7, 2.1, and 3.2for Ala, Leu, and Phe, respectively), whereas position

    X12 has a marked polar character with high propensity(P > 3) for negatively charged residues.

    Hydrophobicity profiles

    To further compare the different conformations, we

    analyzed the average hydrophobicity profiles from posi-

    tion X24 to position X14, using Eisenbergs consensusscale (see Fig. 8). The use of different scales did not lead

    to significant differences in data (not shown). For this

    analysis, the a1, b1, b2, and aL conformations wereshared in two groups depending upon the presence or

    not of proline at position X11 (b1 and b2) or X13 (a1)or of glycine at position X (aL). This led to the a11 anda1- subsets (121 and 97 motifs, respectively), to the b11and b1- subsets (121 and 97 motifs, respectively), to theb21 and b2- subsets (33 and 71 motifs, respectively) andto the aL1 and aL2 subsets (20 and 17 motifs, respec-tively). In addition, we shared the a2 motifs in two equalsubsets by the median of ub (558). This parameterappeared as the most pertinent because of its linear rela-

    tionship with the phi/psi dihedral angles. This led to two

    a2inf and a2sup subsets (94 motifs each one) with averageub of 338 158 and 728 108, respectively, allowing totest the homogeneity of this conformation. As previously

    noted, the a1 motif is mainly hydrophobic or neutral[Fig. 8(a)]. The presence or not of proline at position

    X13 has, however, a marked effect on the hydrophobic-ity at positions X21 and X11 (99% significance level forv2 homogeneity test). These positions have polar andhydrophobic characters, respectively, in the presence of

    proline, whereas no clear tendency is observed in its ab-

    sence. The presence of proline at position X11 does notalter significantly the hydrophobicity profile of the b1and b2 motifs [Fig. 8(c,d)]. The only exception isobserved at position X24 of the b1 motif that is mark-edly hydrophobic in the presence of the Pro. For the

    Figure 8Average hydrophobicity index H from position X24 to position X14 for the a1(a), a2 (b), b1 (c), b2 (d), aL (e), and bM (f) motifs. Open bars represent thea11 motifs with proline at position X13, the b11 and b21 motifs with prolineat position X11, the aL1 motifs with glycine at position X and the a2inf motifswith yb < 558. Closed bars represent the a12 motifs without proline at positionX13, the b12 and b22 motifs without proline at position X11, the aL2 motifswithout glycine at position X and the a2sup motifs with yb > 558. The 1F8Mstructure with X 5 Gln was removed from the bM data set.

    J. Deville et al.

    126 PROTEINS

  • other positions (except the Pro position), differences

    observed in the average hydrophobicity index H do not

    correlate with significant changes in the distribution of

    hydrophobic or polar residues.

    Major differences in the hydrophobicity of the a2infand a2sup subsets are observed, especially for positionsX22 and X12 to X14 (99% confidence level). Thehydrophobicity profile of the a2inf subset is similar tothat observed for the a12 subset with only position X21exhibiting a polar character. By contrast, the hydrophobic-

    ity profile of a2sup indicates that the positions surroundingthe X linker from position X22 to X12 are polar. This iscorroborated by the detailed analysis of the amino acid

    distribution from position X24 to X14 (not shown). Inparticular, at position X, the propensities of His and Asn

    raise from 0.47 and 1.21, respectively, for the a2inf motifs,to 5.2 and 5.6, respectively, for the a2sup motifs.The polar character at and around the linker X of the

    a2sup subset is also observed for the b1 motifs (positionsX22 to X12) and for the b2 motifs (positions X toX12), either in the presence or not of proline (Fig. 8and Table II). The average hydrophobicity index at posi-

    tion X of the b11 motifs has a moderate polar characterdue to the increased propensity of Tyr and Phe in that

    case (3.0 and 1.9, respectively, vs. 1.4 and 1.3 in the ab-

    sence of proline). Similarly, the average hydrophobicity

    index of position X21 for the b21 motif is related to thehigh propensity of Ala (P 5 3.4 in the presence of proline)and underestimates the polar character of this position.

    The length of the polar stretch decreases, however, from

    the a2sup to the b1 and b2 conformations. In either case,position X23 is hydrophobic. This residue is followed by apolar position in the a2sup and b1 motifs and by a hydro-phobic position for b2 motifs. On the other hand, positionX14 has a marked hydrophobic character in a2sup motifsand polar character in b2 motifs. This corroborates the dif-ferences in the phases of the helices observed by analyzing

    the amino acid propensities (Table II).

    Comparison of the hydrophobicity profiles of the aL1and aL2 motifs does not lead to significant differences(except for the Gly position), because of the small number

    of observations. On the other hand, marked differences are

    observed between the aL1 and bM motifs for positionsX21, X12, and X14 (99% confidence level). This reversedhydrophobicity corroborates the observations drawn for

    amino acid propensities [Fig. 8(e,f) and Table II].

    Solvent accessibility

    The average accessible solvent area (ASA) of the resi-dues located from position X24 to position X14 of theHXH motifs was measured for the different subsets (seeFig. 9). The most buried motifs are the a12 ones. In thatcase, the average solvent accessibility is lower than 50%for any position of the motif, in agreement with thehydrophobicity profile indicating low polarity [Fig. 8(a)].

    The a2inf motifs are slightly more solvent exposed withthe average ASA raising to 60% for positions X21 andX12. Helix 1 is more solvent exposed in the a11 motifsthan in a12 motifs, with an average ASA of 75% at posi-tion X21, in agreement with the polar character of thisposition. This increased solvent accessibility is notobserved for helix 2 (ASA 40%).The a2sup, b1, and b2 motifs are very solvent exposed

    with accessibilities 80% for individual positions. Thepresence of proline at position X11 of the b1 and b2motifs does not alter the solvent accessibility [Fig.9(c,d)]. Both helices have highly solvent exposed posi-tions. Position X21 is the most solvent exposed positionof helix 1 for the a2sup, b1, and b2 motifs, whereas a dif-ference is observed for helix 2 whose the most solvent

    exposed position is either X12 for the a2sup and b1motifs or X11 for the b2 motifs [Fig. 9(bd)]. These sol-vent accessibilities are consistent with the hydrophobicity

    profiles and amino acid propensities indicating that polar

    residues or Ala are favored at these positions. It is worth

    to note that, in spite of the polar character of the linker

    residue X in any of these three conformations, this resi-

    due has a limited solvent accessibility ( 558.

    The Helix-X-Helix Motif

    PROTEINS 127

  • Concerning the aL and bM conformations, the low ac-cessible surface area of glycine is related to the absence of

    side chain. This caveat leads to very different ASA for

    position X of the aL1 and aL2 subsets (20 and 73%,respectively) that are not related to changes in the solvent

    exposed location of this residue on the protein surface.

    Either the aL or the bM conformation leads to a break inthe helical pattern of helix 1, with exposure of the linker

    residue [see Fig. 5(e,f) as examples]. Major differences

    between the two motifs come from the second helix. In

    the aL motifs, helix 2 initiates from the buried side,whereas in bM motifs, helix 2 initiates from the exposedside, leading to very different solvent exposures at posi-

    tions X12 and X14 [Fig. 9(e,f)], in agreement with thehydrophobicity profiles [Fig. 8(e,f)].

    Side chain main chain hydrogen bonds

    Side chains of polar and negatively charged residues

    (Asn, Asp, Gln, Glu, Ser, Thr, and His) can be involved

    in the formation of closed-loop conformations through

    side chain main chain hydrogen bonds.50 They have

    the capability to form C10- to C17-membered ring con-

    formations, through H-bonding of the side chain oxygen

    or nitrogen to the backbone polar groups of residues

    located up to four positions upstream or downstream.

    We thus screened the HXH motifs in search of H-bonds

    between polar or negatively charged side chains of the

    linker X and polar groups of neighbor backbone (Table

    III). As previously observed,50 side chains acting as

    donor can form a H-bond only with the carbonyl oxygen

    Table IIIH-Bonds Between the Linker Side Chain and the HXH Motif Main Chain a

    Residue Donor/acceptor Conformation N Nbonds Acceptor/donor

    Asn Od1 a1 9 0Nd2 a2 28 17 17 O(X24)

    b1 41 31 14 O(X24), 4 N(X12), 13 N(X13)b2 19 19 1 O(X23), 3 N(X12), 15 N(X13)aL 1 0

    Asp Od a1 4 0a2 7 0b1 44 36 11 N(X12), 25 N(X13)b2 30 29 6 N(X12), 23 N(X13)aL 2 2 2 N(X13)

    Gln Oe1 a1 12 1 1 O(X21)Ne2 a2 6 1 1 N(X12)

    b1 9 1 1 N(X12)b2 2 0aL 2 1 1 O(X23)

    Glu Oe a1 13 0a2 7 0b1 6 0b2 4 1 1 N(X13)aL 1 0

    Ser Og a1 12 4 4 O(X24)a2 8 7 3 O(X24), 4 O(X23)b1 22 19 1 O(X24), 1 N(X12),17 N(X13)b2 22 20 1 O(X23), 19 N(X13)aL 0 0

    Thr Og1 a1 13 6 5 O(X24), 1 O(X23)a2 6 5 2 O(X24), 3 O(X23)b1 3 2 2 O(X24)b2 3 3 3 N(X13)aL 0 0

    His Nd1 a1 9 0Ne2 a2 12 5 5 O(X24)

    b1 18 6 5 O(X24), 1 N(X12)b2 5 3 2 N(X12), 1 N(X13)aL 2 0

    aThe number Nbonds of H-bonds was determined with HBPLUS as described in Materials and Methods for the indicated atoms of polar side chains at position X of

    HXH motifs and neighbor polar groups of the protein backbone. O(X2i) indicates a H-bond with the carbonyl group at position X2i. N(X1j) indicates a H-bondwith the amide group at position X1j. N represents the number of amino acids a at position X in the considered conformation.

    J. Deville et al.

    128 PROTEINS

  • of upstream residues whereas side chains acting as

    acceptor can interact only with amide nitrogen of down-

    stream residues.

    Gln and Glu residues located at position X are seldom

    involved in side chain main chain H-bonds (3%),

    although these residues can be involved in such H-

    bonds.50 For Asn, Asp, Ser, Thr, and His, the H-bonding

    pattern depends upon the conformation of the linker

    (Table III). The percent of polar side chains involved in

    H-bonds raises from less than 15% for aL and a1 motifsup to 80% for b2 motifs.In a1 motifs, only Thr or Ser at position X can be

    involved in H-bond interactions. These H-bonds involve

    only carbonyl groups, mainly at position X24 (9 casesout of 10). The H-bond linking the Ser/Thr side chain to

    the X24 carbonyl is typically observed for Ser or Thrresidues located within a-helices.50 Such an interactioninvolving a Thr side chain and the carbonyl group at

    position X24 is shown in Figure 5(a). When they arepresent at position X of a1 motifs, Asp, Asn, and His arenot involved in H-bonds.

    Ser and Thr are not favored at position X of the a2motifs (Table II). However, when present, most of them

    (80%) are involved in H-bonds with either the X24 orX23 carbonyl groups. The interaction between the Ser/Thr side chains and the carbonyl groups at position X23is typical of helix C-terminus.50 In addition, Asn and

    His, which have high propensities for position X of a2motifs, can interact with the carbonyl group of residue

    X24. These latter interactions are also typical of the C-terminal end of a-helices.50 Two thirds of the Asn resi-dues at position X of a2 motifs are involved in such H-bond interactions. An example of this interaction is given

    in Figure 5(b). Asp, which cannot form this H-bond, is

    seldom present in the a2 conformation (P 5 0.65). Noexample of interaction of Asn or Asp side chains with

    the amide group at position X13, typical of N-capping,is observed.

    In b1 motifs, polar side chains present a different H-bonding pattern and can be involved in interaction with

    either the carbonyl groups of residue X24 (Asn, Ser,Thr, and His) or with the amide groups of residues X12or X13 (Asn, Asp, Gln, Ser and His). When present,most Ser side chains (77%) are involved in H-bonds

    with the amide group at position X13. An example ofthis interaction is given in Figure 5(c). Thr is seldom

    present at position X of the b1 conformation but twoout of the three cases are involved in H-bonds with the

    carbonyl groups at X24. Either in b11 or b12 motifs,about 80% of Asn and Asp side chains are involved in

    H-bonds. Interestingly, the presence of proline favors H-

    bonding of Asn with carbonyl groups (10 cases out of

    16) whereas its absence favors H-bonding with amide

    groups (11 cases out of 15) (Table IV). Asp is involved

    Table IVEffect of Proline on H-Bonds Between the Linker Side Chain and the HXH Motif Main Chaina

    Residue Conformation Pro N P Nbonds Acceptor/donor

    Asn b1 2 21 4.9 15 4 O(X24), 1 N(X12), 10 N(X13)1 20 3.8 16 10 O(X24), 1 N(X12), 5 N(X13)

    b2 2 16 5.2 16 1 O(X23), 2 N(X12), 13 N(X13)1 3 2.1 3 1 N(X12), 2 N(X13)

    Asp b1 2 27 4.9 22 4 N(X12), 18 N(X13)1 17 2.5 14 4 N(X12), 10 N(X13)

    b2 2 19 4.7 18 18 N(X13)1 11 5.8 11 4 N(X12), 7 N(X13)

    Ser b1 2 13 2.1 13 1 O(X24), 12 N(X13)1 9 1.2 6 1 N(X12), 5 N(X13)

    b2 2 18 4.0 17 1 O(X23),16 N(X13)1 4 1.9 3 3 N(X13)

    Thr b1 2 0 0.0 01 3 0.4 2 2 O(X24)

    b2 2 1 0.2 1 1 N(X13)1 2 1.0 2 2 N(X13)

    His b1 2 6 2.7 1 1 O(X24)1 12 4.4 5 4 O(X24), 1 N(X12)

    b2 2 2 1.2 01 3 4.0 3 2 N(X12), 1 N(X13)

    aThe number Nbonds of H-bonds was determined with HBPLUS as described in Materials and Methods for the indicated polar side chains at position X and neighbor

    polar groups of the protein backbone in b1 and b2 motifs, as a function of the presence (1) or not (2) of proline at position X11. Propensities P with Z > 2.0 are inbold font. N represents the number of amino acids a at position X in the considered conformation. The total numbers of b11 and b12 motifs are 121 and 97, respec-tively. The total numbers of b21 and b22 motifs are 33 and 71, respectively.

    The Helix-X-Helix Motif

    PROTEINS 129

  • in H-bonds with amide nitrogens, either at position X12or X13. About 30% of His are involved in H-bond withcarbonyl groups at X24. Four cases out of five for thisinteraction are observed for b11 motifs (Table IV).Either in the presence or not of proline, polar residues

    Asn, Asp, Ser, Thr, and His are present in about 70% of

    the b2 motifs and more than 90% of their side chainsare involved in H-bonds (Tables III and IV). These H-

    bonds involve almost exclusively the NH groups of resi-

    due X12 or X13. Only two out of 75 examples of H-bonds involve carbonyl groups. Most H-bonds involving

    the amide group at position X12 are observed for b21motifs (7 cases out of 11), in spite of the limited number

    of these motifs (Table IV). This effect of proline is espe-

    cially marked for Asp which can interact with amide

    groups either at position X12 or X13 (4 and 7 cases,respectively) in b21 motifs whereas only H-bonds withamide groups at position X13 are observed in b22motifs (18 cases).

    DISCUSSION

    The helix-X-helix motif

    One of the difficulties in analyzing the secondary

    structure of proteins relies on the definition of SS ele-

    ments, in particular when the attention is focused on the

    limits of these elements. Different algorithms, based on

    H-bond pattern, Ca geometry, backbone dihedral angles,or a combination of different criteria have been devel-

    oped.20,5155 The analysis of the HXH motif is depend-

    ent upon the definition of the SS elements. For example,

    in their analysis of a-a linking motifs, Engel andDeGrado17 used a broad phi/psi based definition of a-helix with psi ranging up to 458 and thus could notobserve the a1 and a2 conformations that we have deter-mined.

    In this article, we relied on the DSSP definition of SS

    elements.20 DSSP is based on the detection of H-bonds

    and was developed to define SS elements in which not all

    possible H-bonds are formed, for example in bended or

    curved helices.20 In addition, we added a phi/psi filter to

    insure that the five residues surrounding the linker were

    in a helical conformation and to remove false positive

    motifs in which one of the helices was not correctly

    defined. This filter suppressed also potential problems

    with the definition of the helix termini. As a matter of

    fact, in 3% of the HXH motifs initially found with DSSP,

    the C-terminus of helix 1 or the N-terminus of helix 2

    were out of the a region. These structures correspondedto helices linked by two residues and were erroneously

    assigned as HXH motifs by DSSP.

    Analysis of the dihedral angles of residues located in

    the middle of a-helices (see Fig. 6) indicates that the bMconformation is the only one for which a significant

    number of reverse false positives can be found. Although

    the reasons are not clear, DSSP does not cope well with

    the extreme distortion of the protein chain observed in

    these motifs. This is not the case for the other conforma-

    tions. Only a few cases of reverse false positives are

    observed for the b1 conformation and none for the aLand b2 conformations. The dihedral angles of the a1 anda21 motifs, albeit included in the additional allowed a-helical region, can only marginally be accessed by resi-

    dues located in the middle of contiguous helices (see Fig.

    6). Finally, detailed analysis of the a22 motifs indicatethat, albeit the dihedral angles of the linker overlap the

    a-helix area, they are clearly distinct from kinks includedin contiguous a-helices (see Fig. 7) and correctlyassigned as HXH motifs by DSSP.

    The precise determination of the H-bond pattern of

    the HXH motif was carried out with HBPLUS (Table I).

    This motif is characterized by the lack of NH(i) to

    CO(i24) hydrogen bonds between the three residuesdownstream and upstream the linker residue. The linker

    residue X is usually involved in H-bond interactions both

    with residue X24 and residue X14 but there is completedisruption of the helical pattern between residues N1N3

    of helix 2 and residues C3C1 of helix 1.

    The limited number of motifs found in PDB_25 led us

    to develop an alternative strategy to build a HXH data-

    base (see Fig. 1). Although this may introduce some bias

    in the quantitative results, several lines of evidence

    strongly suggest that the procedure should not signifi-

    cantly affect the general conclusions of this study: (1)

    sequence diversity is high, with 94% of the sequence

    pairs having a sequence identity 27% (to be comparedto 96% for the PDB_25 set); (2) the relative weights of

    the six conformations are very similar in both data sets;

    (3) sequence properties can be rationalized by energetic

    considerations, in relation with the specific structural

    properties of these motifs.

    A striking property of the HXH motifs is their solvent

    exposure (see Fig. 9). Disruption of the helical H-bond-

    ing pattern makes several polar groups of the protein

    backbone free (Table I). The high solvent exposure of

    most motifs may be related to this property and required

    for energetic reasons. The a1 and a2inf motifs, for whichalternative H-bonds are the most frequent, are the most

    buried ones (see Fig. 9). Other motifs are solvent

    exposed. However, the linker residue, in spite of its usual

    polar character, is more buried than its neighbors in

    most conformations (Figs. 6 and 7). This is consistent

    with its H-bonding properties. First, the amide and car-

    bonyl groups of this residue are involved in main chain

    main chain H-bonds with residues X24 and X14 (TableI). This makes easier the burying of the backbone at this

    position. Second, polar side chains at position X are

    involved in H-bonds with neighbor polar groups of the

    protein backbone and may contribute to stabilize the kink.

    The propensities of Asn and Asp at position X are

    very consistent with the H-bonding patterns. Both Asn

    J. Deville et al.

    130 PROTEINS

  • and Asp have high propensities in b1 and b2 motifswhere their side chain can interact with downstream

    amide groups, indicating that these N-capping interac-

    tions are stabilizing. Similarly, the high propensity of Asn

    in a2 motifs can be related to C-capping interactionswith upstream carbonyl groups. His has also a high pro-

    pensity at position X in the a2, b1, and b2 motifs whereit can form H-bonds. Possibility of H-bonds between

    Ser/Thr and carbonyl groups in a1 and a2 motifs is notrelated to an increased propensity of these residues (P 1). However, these H-bonds are concurrent of the main

    chain main chain H-bonds involving the X24 andX23 oxygens (Table I) and thus may not be stabilizing.On the other hand, favorable propensity for Ser is

    observed when it can be involved in interactions with

    amide groups (P 5 1.6 and 3.3 for the b1 and b2 confor-mations, respectively), indicating that these N-capping

    interactions stabilize these motifs.

    Proline-induced kinks

    Proline is rigorously excluded from the linker position

    X and the preceding positions in HXH motifs but is

    found at positions X11 to X13 in almost forty percentof these motifs, enlightening the role of this residue as

    helix breaker. Proline has an average propensity of 1.4 at

    position X12 and is observed in 6% of the motifs,mainly for the a1 and a2 conformations. Most proline-induced kinks, however, correspond either to a a1 motifwith proline at position X13 (12% of the 837 motifs) orto a b1 or b2 motif, with proline located at positionX11 (20% of the 837 motifs).Because of the bulkiness of the pyrrolidine ring, the

    backbone of the residue preceding a proline can adopt

    only a limited range of dihedral conformations. This

    includes a narrow range in the a region and two subsetsof the b region corresponding to the b1 and b2 areadefined in this study.36,37,49 When proline residues are

    found in the middle of a-helices, steric constraints leadto a kink in the helix to avoid a clash between the Cdatom of Pro at position i and the carbonyl oxygen at

    position i24. These kinks have bend angles in the 208308 range, with an average value of 268,56 similar to theaverage bend angle of the a1 motifs. Analysis of the dihe-dral angles of the residues located around the proline

    reveals that, when proline is included in a contiguous he-

    lix, the distortion of the backbone is shared by the resi-

    dues located two positions (phi 5 2808 128, psi 52308 118) and three positions upstream the proline(phi 5 2778 138, psi 5 2358 118). On the otherhand, the a1 motifs correspond to a large distortion ofthe residue located three positions upstream the proline

    (phi 5 21228 98, psi 5 2568 78) and, to a lesserextent, of the preceding residue (phi = 918 118; psi =238 118), whereas the following residue is not affected(phi 5 2648 68, psi 5 2528 78). In spite of these

    differences in the dihedral angles, the hydrophobicity

    patterns of the residues surrounding the proline are simi-

    lar, with positions P+2 and P2 having a marked hydro-

    phobic character in both cases.

    When proline is present in b1 or b2 motifs, the linkerconformation corresponds to one of the two conforma-

    tions accessible to pre-Pro residues in the b area36,37,49

    and proline is located at the next position. The linker is

    mainly in the b1 conformation (78% of the motifs withproline at X11) which allows a dramatic change in thehelix orientation with bend angles larger than 908. Theb1 conformation corresponds to the well described ProC-capping motif.57 This conformation allows a stabiliz-

    ing electrostatic interaction of residues X and X11 withthe helix dipole. The high propensity of His and aro-

    matic amino acids at position X for this motif in the

    presence of Pro (4.4, 1.9 and 3.0 for His, Phe, and Tyr,

    respectively) is consistent with stabilization of the Pro C-

    capping motif by interaction of these rings with the car-

    bonyl group located 4 residues upstream.57 The presence

    of H-bonds between His at the linker position and the

    carbonyl group at position X24 (Table IV) provides anadditional evidence of such interaction.

    Analysis of the amino acid distributions and of the

    hydrophobicity patterns (Table II and Fig. 8) strongly

    suggests that position-specific scoring matrices could be

    used to predict the backbone conformation of the pre-

    Pro residues in proline-containing sequences (see Materi-

    als and Methods). The matrix for pre-Pro residues in the

    a conformation (a matrix) was built from the sequencesof the a1 motifs and of Pro-containing helices inPDB_25 (89 and 85 sequences, respectively). The matrix

    for pre-Pro residues in the b conformation (b matrix)was built from the sequences of the b1 and b2 motifs(121 and 33 sequences, respectively). The limited number

    of b2 motifs did not allow considering them separately.In either case, the window ranged from five residues

    upstream to three residues downstream the proline. Pre-

    dictions were based on a 10-fold, cross-correlation proce-

    dure. The Q-score matrix is shown in Table V and corre-

    sponds to an average accuracy of 0.81 0.06. Indeed, upto 85% of sequences with the pre-Pro residue in the bregion were predicted successfully, clearly indicating the

    usefulness of these position-specific scoring matrices for

    prediction purpose.

    Glycine-induced kinks

    Thirteen percent of the HXH motifs have a glycine at

    the linker position. This corresponds to an average pro-

    pensity of 1.7. However, Gly is seldom observed in a1,a2, b1, and b2 motifs and has a propensity P < 1 atposition X of these conformations (Table II). In most

    cases, the dihedral angles of glycine linkers are either in

    the bM or in the aL conformation, characterized by posi-tive phi values. In addition to its high propensity for the

    The Helix-X-Helix Motif

    PROTEINS 131

  • Ccap position of a-helices, glycine is known to have ahigh propensity for the C0 position.43,45,47 However, afavourable propensity for Gly at position X11, corre-sponding to the C0 position of helix 1, is not observed inany of the conformations (the value of 1.4 in aL motifsis not significant, due to the limited number of observa-

    tions).

    Glycine is characterized by the absence of side chain

    that allows its backbone dihedral angles to experience a

    much broader range than for other residues.36,37,49 Its

    phi dihedral angle can be positive and glycine can access

    either the aL or the bM regions of the Ramachandranplot, corresponding to mirror regions of the a-helix orof the b-strand, respectively. In particular, the bM regionis very specific of Gly residues.36

    Termination of a-helices by a glycine residue in the aLconformation at the Ccap position is commonly observed

    in proteins.40,41,46 This conformation allows the forma-

    tion of the Schellman motif,40,41 which involves two

    main chain main chain hydrogen bonds joining

    NH(C0) to CO(C4) and NH(Ccap) to CO(C3). The pro-pensity of Gly in aL motifs, 7.1, correlates well with thepropensity of 8 observed for Gly at the C-cap position

    when this one is in the aL conformation.46 However, aLmotifs represent only 20% of the HXH motifs with a Gly

    at the linker position. This is in agreement with the lim-

    ited number of aL-terminated helices followed by a sec-ond helix initiating at the next position (2%).46

    In most cases (80%), when a second helix initiates af-

    ter a glycine, the dihedral angles of the glycine linker are

    in the bM conformation. This dihedral conformation hasbeen recently described as a glycine specific conforma-

    tion.37 To the best of our knowledge, this specific motif,

    in which two a-helices are linked by a Gly residue in thebM conformation, has not been described yet. This con-formation allows large bend angles (1248 188) and aclockwise wobble rotation. It is the only conformation

    allowing such wobble rotation, reversed as compared to

    the other conformations. The weight of the bM motifs is

    underestimated by DSSP (see Fig. 6). Nevertheless, they

    represent 9% of all the HXH motifs in our database,

    enlightening their structural importance.

    Because of the very limited number of Gly-containing

    aL1 motifs (20 cases), position-specific scoring matricescould not be developed for quantitative prediction of the

    linker conformation. However, comparison of the amino

    acid propensities (Table II) and hydrophobicity profiles

    (see Fig. 8) of aL and bM motifs clearly shows thereversed hydrophobicity of the helix following the glycine

    linker, especially at positions X12 and X14, and suggestsrules of the thumb to differentiate these motifs. In partic-

    ular, in bM motifs, the residue located at position G14is hydrophobic or Ala and stabilizes the motif by interac-

    tion with hydrophobic residues in helix 1 (residues G25and/or G24). Alanine, with its small size, favors thefolding back of the protein backbone [Fig. 5(f)] and is

    usually observed for high bend angles (ub = 1348 88).This position is polar and solvent exposed in aL motifs.It is also interesting to note the reversed polarity of the

    position preceding the glycine. Although partly exposed

    on the protein surface in both cases, position G21 has apreference for hydrophobic residues in bM motifs and forpolar residues in aL1 motifs. In particular, Ser is foundat position G21 in 30% of the aL1 motifs (propensityof 4.6).

    Non-Gly, non-Pro motifs

    In spite of the high number of proline or glycine

    induced kinks, about half the HXH motifs involve nei-

    ther proline nor glycine residue. Among these motifs, the

    most frequent one is a2 (40%) and the less frequent oneis aL (5%). The a1, b1 and b2 motifs have a weight inthe 1520% range. The a2 motif is the only one withouta characteristic proline or glycine residue. It has however

    high propensities for Asn and His at position X (Table

    II). These propensities are dependent upon the bend

    angle and these two residues represent 10 and 36% of

    linker residues when ub is lower and higher than 558,respectively. These residues may be involved in H-bond

    interactions with the carbonyl group at position X24(Table III). These H-bonds are typical of C-capping sta-

    bilization.46 The increased propensity of His and Asn in

    a2sup motifs indicates that the C-capping H-bondsinvolving these residues stabilize the a2 motifs with highbend angles.

    Either in the b1 or b2 conformation, the absence ofproline at position X11 is usually correlated with anincreased propensity of Asn, Asp, and Ser. The only

    exception is Asp in the b2 conformation whose propen-sity is not significantly altered by the presence of Pro

    (Table IV). These three residues represent 38 and 55% of

    the b11 and b21 linkers, respectively, and 63 and 75% ofb12 and b22 linkers. The presence of proline does notalter the percent of these residues involved in H-bonds

    Table VConformational Predictionsa

    Conformation Qobs Qpred

    Proline-induced kinksb a 0.78 0.07 0.86 0.06b 0.85 0.07 0.77 0.06

    NonP, nonG motifsc a2sup 0.87 0.13 0.89 0.07b2 0.86 0.10 0.85 0.13

    aThe conformation of the pre-Pro residue in proline-induced kinks or of the

    linker residue in nonPro, nonGly HXH motifs was predicted as described in

    Materials and Methods. The Q-scores were determined by a 10-fold cross correla-

    tion procedure.bThe position-specific scoring matrices included the five positions preceding the

    proline residue and the three positions following it. In the a pre-Pro conforma-tion, proline could be in a1 motifs or in contiguous helices. In the b pre-Pro con-formation, proline could be either in b1 or b2 motifs.cThe position-specific scoring matrices included the four positions surrounding

    the linker residue.

    J. Deville et al.

    132 PROTEINS

  • (80 and 95% for the b1 and b2 motifs, respectively).However, it alters the H-bond patterns (Table IV). In

    particular, the preference for H-bonds involving the am-

    ide group at position X13 is more marked in b12 andb22 motifs (80 and 92%, respectively) than in b11 andb21 motifs (56 and 71%, respectively) (Table IV). TheseH-bonds are typical of helix N-terminus15,50,5860 and

    appear to stabilize b1 and b2 motifs, especially in the ab-sence of proline.

    Asn has unique properties as helix breaker. Its propen-

    sity at the linker position is 5.6, 4.9, and 5.2 for the

    a2sup, b12 and b22 motifs, respectively. These high pro-pensities can be related to the H-bond pattern of its side

    chain. It may be involved in C-capping stabilizing inter-

    actions in the a2 conformation or in N-capping stabiliz-ing interactions in the b1 or b2 conformations. As theAsp side chain can be involved only in H-bond interac-

    tion with upstream amides, it is seldom observed in the

    a2 motifs but displays high propensity in the b12 and b22motifs. On the other hand, although Ser can be involved in

    C-capping interactions in a2 motifs, it has a low propensity(0.75) in this motif. Its high propensity in the b12 andb22 motifs appears to be related to its capability to beinvolved in N-capping interactions.

    Comparison of the hydrophobicity patterns of the

    a2sup, b12 and b22 motifs [Fig. 8(c)] enlightens the gen-eral properties of HXH motifs when a large bend angle is

    observed between two a-helices in the absence of prolineor glycine breakers. In the three cases, the linker has a

    marked polar character and is located within a polar

    stretch. Position X23 is buried and helix 1 ends at itssolvent exposed side. In b12 and b22 motifs, helix 2 ini-tiates from the solvent exposed side. In a2sup motifs, thesolvent exposed side of helix 2 starts at position X12. How-ever, position X11 displays a preference for either polar res-idues or alanine. This leads to a characteristic five residue

    long polar stretch (which may include Ala at position

    X11), which appears as a hallmark of the a2sup motifs.The a2sup and b22 motifs correspond to kinks with

    bend angles in the same 5081008 range, but with differ-ent wobble values [Fig. 4(b)]. A tool able to discriminate

    between these two conformations should be very useful

    both for protein modeling and protein design. We thus

    tested the capability of the position-specific scoring mat-

    rices to differentiate them. Accuracy of the prediction

    reached 85% (Table V), indicating that this method may

    be used to estimate the wobble motion of kinks with

    bend angle in the 5081008 range.

    CONCLUSIONS

    The present work describes the first systematic descrip-

    tion of a structural motif characterized by two helices

    linked by a single residue. This motif is commonly found

    in soluble proteins and about 10% of the proteins pos-

    sess such a HXH structure. Most importantly, only a few

    backbone conformations are allowed at the linker posi-

    tion, leading to a classification of these kinks in six

    classes with characteristic amino acid distributions. The

    a1 conformation is mainly limited to small bend angles(ub < 308), and displays a high propensity for proline atposition N3 of helix 2. Larger amplitude kinks are usu-

    ally located either at Gly, Ser, Asp, or Asn residues or at

    positions preceding proline residues. Bend angles larger

    than 908 can be obtained through the bM or the b1 con-formations and are usually related either to the presence

    of a glycine at the linker position or to the presence of a

    proline at the following position. It is noteworthy that,

    when a glycine residue is located between two a-helices,the unconventional bM conformation is more frequentthan the aL conformation.The analysis of the HXH motifs developed here should

    provide useful information both for molecular modeling

    and for de novo design of protein structures. Among pos-

    sible applications in the modeling field, it should contrib-

    ute (1) to determine correctly the location of the putative

    kink between two helices when SS predictions lead to an

    unrealistic long helix and (2) to determine the relative

    orientation of the two helices when the kink position is

    determined. In the protein design field, it will be possible

    to test the compatibility of a sequence with specific con-

    formations, especially for Pro-induced kinks. The posi-

    tion-specific scoring matrices that we developed should

    be particularly useful for this purpose.

    ACKNOWLEDGMENTS

    We thank NEC Computers Services SARL (Angers,

    France) for the kind availability of a multiprocessor

    server. We thank D. Thybert for the clustering algorithm.

    J.D. was supported by fellowships from INSERM-Region

    des Pays-de-la-Loire and from the Association pour la

    Recherche sur le Cancer (ARC). J.R. is supported by a

    fellowship from CNRS.

    REFERENCES

    1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural

    classification of proteins database for the investigation of sequences

    and structures. J Mol Biol 1995;247:536540.

    2. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thorn-

    ton JM. CATHa hierarchic classification of protein domain struc-

    tures. Structure 1997;5:10931108.

    3. Chou PY, Fasman GD. Prediction of protein conformation. Bio-

    chemistry 1974;13:222245.

    4. Chou PY, Fasman GD. Conformational parameters for amino acids

    in helical, beta-sheet, and random coil regions calculated from pro-

    teins. Biochemistry 1974;13:211222.

    5. Chou PY, Fasman GD. Prediction of the secondary structure of pro-

    teins from their amino acid sequence. Adv Enzymol Relat Areas

    Mol Biol 1978;47:45148.

    The Helix-X-Helix Motif

    PROTEINS 133

  • 6. Garnier J, Osguthorpe DJ, Robson B. Analysis of the accuracy

    and implications of simple methods for predicting the second-

    ary structure of globular proteins. J Mol Biol 1978;120:97

    120.

    7. Albrecht M, Tosatto SC, Lengauer T, Valle G. Simple consensus pro-

    cedures are effective and sufficient in secondary structure predic-

    tion. Protein Eng 2003;16:459462.

    8. Ouali M, King RD. Cascaded multiple classifiers for secondary

    structure prediction. Protein Sci 2000;9:11621176.

    9. Kabsch W, Sander C. How good are predictions of protein second-

    ary structure? FEBS Lett 1983;155:179182.

    10. Cuff JA, Barton GJ. Evaluation and improvement of multiple

    sequence methods for protein secondary structure prediction. Pro-

    teins 1999;34:508519.

    11. Cuff JA, Barton GJ. Application of multiple sequence alignment

    profiles to improve protein secondary structure prediction. Proteins

    2000;40:502511.

    12. Frishman D, Argos P. Seventy-five percent accuracy in protein sec-

    ondary structure prediction. Proteins 1997;27:329335.

    13. Rost B, Sander C. Improved prediction of protein secondary struc-

    ture by use of sequence profiles and neural networks. Proc Natl

    Acad Sci USA 1993;90:75587562.

    14. Salamov AA, Solovyev VV. Prediction of protein secondary struc-

    ture by combining nearest-neighbor algorithms and multiple

    sequence alignments. J Mol Biol 1995;247:1115.

    15. Wilson CL, Hubbard SJ, Doig AJ. A critical assessment of the sec-

    ondary structure alpha-helices and their termini in proteins. Protein

    Eng 2002;15:545554.

    16. Brazhnikov EV, Efimov AV. [Structure of alpha-spiral hairpins with

    short connections in globular proteins]. Mol Biol (Mosk) 2001;35:

    100108 (in Russian).

    17. Engel DE, DeGrado WF. Alpha-alpha linking motifs and interhelical

    orientations. Proteins 2005;61:325337.

    18. Lahr SJ, Engel DE, Stayrook SE, Maglio O, North B, Geremia S,

    Lombardi A, DeGrado WF. Analysis and design of turns in alpha-

    helical hairpins. J Mol Biol 2005;346:14411454.

    19. Hobohm U, Sander C. Enlarged representative set of protein struc-

    tures. Protein Sci 1994;3:522524.

    20. Kabsch W, Sander C. Dictionary of protein secondary structure:

    pattern recognition of hydrogen-bonded and geometrical features.

    Biopolymers 1983;22:25772637.

    21. Kahn PC. Defining the axis of a helix. Comput Chem 1989;13:185

    189.

    22. Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of mem-

    brane and surface protein sequences with the hydrophobic moment

    plot. J Mol Biol 1984;179:125142.

    23. Kyte J, Doolittle RF. A simple method for displaying the hydro-

    pathic character of a protein. J Mol Biol 1982;157:105132.

    24. Hopp TP, Woods KR. Prediction of protein antigenic determinants

    from amino acid sequences. Proc Natl Acad Sci USA 1981;78:3824

    3828.

    25. Hubbard SJ, Beynon RJ, Thornton JM. Assessment of conforma-

    tional parameters as predictors of limited proteolytic sites in native

    protein structures. Protein Eng 1998;11:349359.

    26. Lee B, Richards FM. The interpretation of protein structures: esti-

    mation of static accessibility. J Mol Biol 1971;55:379400.

    27. McDonald IK, Thornton JM. Satisfying hydrogen bonding potential

    in proteins. J Mol Biol 1994;238:777793.

    28. Heine A, Canaves JM, von Delft F, Brinen LS, Dai X, Deacon

    AM, Elsliger MA, Eshaghi S, Floyd R, Godzik A, Grittini C,

    Grzechnik SK, Guda C, Jaroszewski L, Karlak C, Klock HE, Koe-

    sema E, Kovarik JS, Kreusch A, Kuhn P, Lesley SA, McMullan D,

    McPhillips TM, Miller MA, Miller MD, Morse A, Moy K, Ouyang

    J, Page R, Robb A, Rodrigues K, Schwarzenbacher R, Selby TL,

    Spraggon G, Stevens RC, van den Bedem H, Velasquez J, Vincent

    J, Wang X, West B, Wolf G, Hodgson KO, Wooley J, Wilson IA.

    Crystal structure of O-acetylserine sulfhydrylase (TM0665) from

    Thermotoga maritima at 1.8 A resolution. Proteins 2004;56:387

    391.

    29. Johnson KA, Angelucci F, Bellelli A, Herve M, Fontaine J, Tserno-

    glou D, Capron A, Trottein F, Brunori M. Crystal structure of the

    28 kDa glutathione S-transferase from Schistosoma haematobium.

    Biochemistry 2003;42:1008410094.

    30. Polekhina G, Board PG, Gali RR, Rossjohn J, Parker MW. Molecu-

    lar basis of glutathione synthetase deficiency and a rare gene per-

    mutation event. EMBO J 1999;18:32043213.

    31. Miller DJ, Ouellette N, Evdokimova E, Savchenko A, Edwards A,

    Anderson WF. Crystal complexes of a predicted S-adenosylmethio-

    nine-dependent methyltransferase reveal a typical AdoMet binding

    domain and a substrate recognition domain. Protein Sci 2003;12:

    14321442.

    32. Zhong W, Alexeev D, Harvey I, Guo M, Hunter DJ, Zhu H, Cam-

    popiano DJ, Sadler PJ. Assembly of an oxo-zirconium(IV) cluster in

    a protein cleft. Angew Chem Int Ed Engl 2004;43:59145918.

    33. Strater N, Schnappauf G, Braus G, Lipscomb WN. Mechanisms of

    catalysis and allosteric regulation of yeast chorismate mutase from

    crystal structures. Structure 1997;5:14371452.

    34. Fodje MN, Al-Karadaghi S. Occurrence, conformational features

    and amino acid propensities for the pi-helix. Protein Eng 2002;15:

    353358.

    35. Karplus PA. Experimentally observed conformation-dependent geom-

    etry and hidden strain in proteins. Protein Sci 1996;5:14061420.

    36. Ho BK, Brasseur R. The Ramachandran plots of glycine and pre-

    proline. BMC Struct Biol 2005;5:14.

    37. Lovell SC, Davis IW, Arendall WB, III, de Bakker PI, Word JM, Pri-

    sant MG, Richardson JS, Richardson DC. Structure validation by

    Calpha geometry: phi, psi and Cbeta deviation. Proteins 2003;50:

    437450.

    38. Adzhubei AA, Sternberg MJ. Left-handed polyproline II helices

    commonly occur in globular proteins. J Mol Biol 1993;229:472493.

    39. Cubellis MV, Caillez F, Blundell TL, Lovell SC. Properties of poly-

    proline II, a secondary structure element implicated in protein-pro-

    tein interactions. Proteins 2005;58:880892.

    40. Schellman C. The aL-conformation at the ends of helices. In: Jae-nicke R, editor. Protein folding. Amsterdam: Elsevier; 1980. pp 53

    61.

    41. Aurora R, Srinivasan R, Rose GD. Rules for alpha-helix termination

    by glycine. Science 1994;264:11261130.

    42. Cubellis MV, Cailliez F, Lovell SC. Secondary structure assignment

    that accurately reflects physical and evolutionary characteristics.

    BMC Bioinformatics 2005;6 (Suppl 4):S8.

    43. Richardson JS, Richardson DC. Amino acid preferences for specific

    locations at the ends of alpha helices. Science 1988;240:16481652.

    44. Kumar S, Bansal M. Dissecting alpha-helices: position-specific anal-

    ysis of alpha-helices in globular proteins. Proteins 1998;31:460476.

    45. Engel DE, DeGrado WF. Amino acid propensities are position-de-

    pendent throughout the length of alpha-helices. J Mol Biol

    2004;337:11951205.

    46. Gunasekaran K, Nagarajaram HA, Ramakrishnan C, Balaram P. Ster-

    eochemical punctuation marks in protein structures: glycine and pro-

    line containing helix stop signals. J Mol Biol 1998;275:917932.

    47. Kumar S, Bansal M. Geometrical and sequence characteristics of

    alpha-helices in globular proteins. Biophys J 1998;75:19351944.

    48. Sharma V, Sharma S, Hoener zu Bentrup K, McKinney JD, Russell

    DG, Jacobs WR, Jr, Sacchettini JC. Structure of isocitrate lyase