5
Proc. Nati. Acad. Sci. USA Vol. 83, pp. 2521-2525, April 1986 Evolution Evolutionary origin of autoreactive determinants (autogens) (variability/autoantigenicity/cytochrome c/lysozyme) THOMAS KIEBER-EMMONS AND HEINZ KOHLER Department of Molecular Immunology, Roswell Park Memorial Institute, New York State Department of Health, 666 Elm Street, Buffalo, NY 14263 Communicated by E. Margoliash, November 15, 1985 ABSTRACT The question addressed in this report focuses on the autoantigenicity of self antigens, principally cytochrome c and lysozyme. Of interest is whether the immune system produces autoantibodies to its host proteins reacting randomly with all potential antigen sites or is autoreactively selective for certain determinants. Based on experimental evidence from autoantibodies against cytochromes c, Jemmerson and Margoliash [Jemmerson, R. & Margoliash, E. (1979) Nature (London) 282, 468-471] have described a striking correlation between autoreactive sequence regions and evolutionary insta- bility. While their analysis of evolutionary variation was based on simple sequence variability plots, we present here a rermed approach that takes into account the distinction between evolutionary substitutions that induce a change in the protein surface from those that do not (surface-neutral substitutions). A quantitative aspect of surface variation (surface consensus) is included in the algorithm that produces a ranked order for autoantigenic determinants. The rinal plot, called surface variability, indicates sequence regions having a preference for autoimmune reaction. We propose the term "autogen" to designate such protein determinants. Autoimmune disease is thought to result from an abnormal reaction of the immune system against self. The maintenance of self-tolerance is the result of complex immune regulatory mechanisms established by the immune network (1). The concept of regulatory control mechanisms implies that the immune system is intrinsically able to recognize self. How- ever, it is not clear whether self determinants are randomly selected by the immune system as targets or whether certain determinants are more prone to become autoantigenic. A favored hypothesis for autoreactivity states that immu- nological cross-reactivity between self and foreign antigen structures can bypass normal control mechanisms preventing autoimmune response. In an alternative explanation for the molecular basis of autoreactivity, the coevolution of self components and the immune system has been proposed to be a determining factor (2-6), leading to the proposition of a multideterminant regulatory model for autoreactivity (2). This model implies that the response to self proteins has been directed against those sequence regions that exhibit the highest evolutionary instability. It has been argued that insufficient time in evolution did not permit the elimination of self-reactive clones recognizing recent evolutionary substi- tutions in proteins (2, 3). Typically, the evolutionary diversity of proteins is evalu- ated by the simple analysis of sequence variation (7). Jemmerson and Margoliash (3), using the variability ap- proach of Wu and Kabat (7), demonstrated several sequence areas of variability on mammalian cytochrome c. Three of these peak areas coincided with experimentally determined autoreactive determinants in rabbits. In a predictive manner, the analysis of sequence variability does not, in general, discriminate between substitutions affecting and not affecting the antigenic structural properties of proteins. While an amino acid outside of an epitope may not directly affect determinant recognition (8, 9), such surface-neutral substi- tutions are registered by sequence variability plots. This condition is best typified by immunoglobulins whereby the sequence variability of the hypervariable regions can affect either antibody binding (10) or idiotypic activity (11, 12). Idiotypic determinants are autoantigens recognized by the immune system. The surface nature of idiotypic sequences is evidenced by anti-peptide antibodies interacting with intact immunoglobulin molecules (13-18) and analysis of crystallo- graphic structures of Fab fragments (19, 20) and related proteins (21). Herein, a method is described that does not score surface-neutral substitutions in the analysis of evolu- tionary variability. This approach produces a representation of the shape variability of evolutionarily related proteins. Following the rationale of Jemmerson and Margoliash (3), the variation in the protein surface of cytochrome c and avian lysozyme is used to evaluate cross-reactivities as well as distinctive specificities for the respective protein families, subsequently designating possible autoreactive loci for the families. RATIONALE AND METHODS The approach for identifying possible autoreactive sequence regions from primary sequence data presumes that the autoreactive sequence repertoire is a subset of a continuum of potential antigenic sites on the protein surface. Therefore, the selection of an autoreactive locus is contingent on the evolutionary diversity in the surface features of the protein family. Thus, the identification of potential autoreactive loci may be inferred by evaluating those substitutions that affect the surface properties of the protein family. The described approach quantitates a relationship between intrinsic and extrinsic factors associated with protein antigenic structures (22). Comparisons of antibody response to panels of evolution- arily variant proteins have demonstrated, by effects on cross-reactivities, that -80% of the residue substitutions scattered around the protein surface were detectable by antibodies (2, 23). In general, residue variability is most likely to persist in those regions of the molecule where local changes in conformation can be tolerated. Differences in surface properties of variant proteins can be identified by comparison of hydropathy profiles of homologous sequence segments (24). Such a comparison accounts for substitutions of polar and nonpolar residues, which are both found on the protein surface, with both residue types implicated in protein antigenicity. Residue substitutions affecting idiotypic behav- ior in immunoglobulins (13, 15, 17, 25) have been shown to be associated with changes in hydropathic properties (26) such as Gly - Arg (25) and Val -* Lys (15). Subsequently, it is suggested that those hydropathic regions exhibiting the highest evolutionary diversity are likely to participate in 2521 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Evolutionary origin of autoreactive determinants (autogens)

  • Upload
    lyhuong

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Evolutionary origin of autoreactive determinants (autogens)

Proc. Nati. Acad. Sci. USAVol. 83, pp. 2521-2525, April 1986Evolution

Evolutionary origin of autoreactive determinants (autogens)(variability/autoantigenicity/cytochrome c/lysozyme)

THOMAS KIEBER-EMMONS AND HEINZ KOHLERDepartment of Molecular Immunology, Roswell Park Memorial Institute, New York State Department of Health, 666 Elm Street, Buffalo, NY 14263

Communicated by E. Margoliash, November 15, 1985

ABSTRACT The question addressed in this report focuseson the autoantigenicity of self antigens, principally cytochromec and lysozyme. Of interest is whether the immune systemproduces autoantibodies to its host proteins reacting randomlywith all potential antigen sites or is autoreactively selective forcertain determinants. Based on experimental evidence fromautoantibodies against cytochromes c, Jemmerson andMargoliash [Jemmerson, R. & Margoliash, E. (1979) Nature(London) 282, 468-471] have described a striking correlationbetween autoreactive sequence regions and evolutionary insta-bility. While their analysis of evolutionary variation was basedon simple sequence variability plots, we present here a rermedapproach that takes into account the distinction betweenevolutionary substitutions that induce a change in the proteinsurface from those that do not (surface-neutral substitutions).A quantitative aspect of surface variation (surface consensus)is included in the algorithm that produces a ranked order forautoantigenic determinants. The rinal plot, called surfacevariability, indicates sequence regions having a preference forautoimmune reaction. We propose the term "autogen" todesignate such protein determinants.

Autoimmune disease is thought to result from an abnormalreaction of the immune system against self. The maintenanceof self-tolerance is the result of complex immune regulatorymechanisms established by the immune network (1). Theconcept of regulatory control mechanisms implies that theimmune system is intrinsically able to recognize self. How-ever, it is not clear whether self determinants are randomlyselected by the immune system as targets or whether certaindeterminants are more prone to become autoantigenic.A favored hypothesis for autoreactivity states that immu-

nological cross-reactivity between self and foreign antigenstructures can bypass normal control mechanisms preventingautoimmune response. In an alternative explanation for themolecular basis of autoreactivity, the coevolution of selfcomponents and the immune system has been proposed to bea determining factor (2-6), leading to the proposition of amultideterminant regulatory model for autoreactivity (2).This model implies that the response to selfproteins has beendirected against those sequence regions that exhibit thehighest evolutionary instability. It has been argued thatinsufficient time in evolution did not permit the elimination ofself-reactive clones recognizing recent evolutionary substi-tutions in proteins (2, 3).

Typically, the evolutionary diversity of proteins is evalu-ated by the simple analysis of sequence variation (7).Jemmerson and Margoliash (3), using the variability ap-proach ofWu and Kabat (7), demonstrated several sequenceareas of variability on mammalian cytochrome c. Three ofthese peak areas coincided with experimentally determinedautoreactive determinants in rabbits. In a predictive manner,the analysis of sequence variability does not, in general,

discriminate between substitutions affecting and not affectingthe antigenic structural properties of proteins. While anamino acid outside of an epitope may not directly affectdeterminant recognition (8, 9), such surface-neutral substi-tutions are registered by sequence variability plots. Thiscondition is best typified by immunoglobulins whereby thesequence variability of the hypervariable regions can affecteither antibody binding (10) or idiotypic activity (11, 12).Idiotypic determinants are autoantigens recognized by theimmune system. The surface nature of idiotypic sequences isevidenced by anti-peptide antibodies interacting with intactimmunoglobulin molecules (13-18) and analysis of crystallo-graphic structures of Fab fragments (19, 20) and relatedproteins (21). Herein, a method is described that does notscore surface-neutral substitutions in the analysis of evolu-tionary variability. This approach produces a representationof the shape variability of evolutionarily related proteins.Following the rationale ofJemmerson and Margoliash (3), thevariation in the protein surface of cytochrome c and avianlysozyme is used to evaluate cross-reactivities as well asdistinctive specificities for the respective protein families,subsequently designating possible autoreactive loci for thefamilies.

RATIONALE AND METHODSThe approach for identifying possible autoreactive sequenceregions from primary sequence data presumes that theautoreactive sequence repertoire is a subset of a continuumof potential antigenic sites on the protein surface. Therefore,the selection of an autoreactive locus is contingent on theevolutionary diversity in the surface features of the proteinfamily. Thus, the identification of potential autoreactive locimay be inferred by evaluating those substitutions that affectthe surface properties of the protein family. The describedapproach quantitates a relationship between intrinsic andextrinsic factors associated with protein antigenic structures(22).Comparisons of antibody response to panels of evolution-

arily variant proteins have demonstrated, by effects oncross-reactivities, that -80% of the residue substitutionsscattered around the protein surface were detectable byantibodies (2, 23). In general, residue variability is most likelyto persist in those regions of the molecule where localchanges in conformation can be tolerated. Differences insurface properties of variant proteins can be identified bycomparison of hydropathy profiles of homologous sequencesegments (24). Such a comparison accounts for substitutionsof polar and nonpolar residues, which are both found on theprotein surface, with both residue types implicated in proteinantigenicity. Residue substitutions affecting idiotypic behav-ior in immunoglobulins (13, 15, 17, 25) have been shown to beassociated with changes in hydropathic properties (26) suchas Gly - Arg (25) and Val -* Lys (15). Subsequently, it issuggested that those hydropathic regions exhibiting thehighest evolutionary diversity are likely to participate in

2521

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2: Evolutionary origin of autoreactive determinants (autogens)

2522 Evolution: Kieber-Emmons and Kohler

autoreactive determining regions by virtue of these regionsbeing more apt to be surface exposed.The polymorphism in surface properties in evolutionarily

variant proteins can be assessed by comparing the thermo-dynamic similarity of homologous residue sequences bymeans of a sequence-independent parameter-the hydrationparameter (27). These profiles are then averaged, describinga consensus hydropathy profile for the family. The variationin the set of hydropathy profiles is then determined, definingthe hydropathic variable regions. The product of the averagehydropathy profile for the family and the variability profilerelates the extent of hydropathy to the extent of variability indefining surface-variable regions within the protein family.The partial stripping away of solvent water from interacting

groups is a necessary requirement in many processes ofbiological recognition. The equating of affinity relationshipsbetween amino acids and water with the relative distributionof amino acids between surfaces and interiors of nativeglobular proteins (27-32) has indicated a sharp bias in thegenetic code (27-33), thereby maintaining protein stabilitythrough evolution. Subsequently, the thermodynamic simi-larity of segments of proteins can be demonstrated bycomparing the relative hydropathy profiles of highly homol-ogous proteins (24).

Since hydration potentials have been equated with thespatial distribution of residues with respect to solvent expo-sure in globular proteins, such relative hydropathy profilesprovide a general pattern of the protein sequence propertiescharacteristic for the homologous family. The average ofsuch aligned hydrophilicity profiles defines a consensushydropathy profile for the protein set and describes thestructure of the set of sequence regions more apt to be on theprotein exterior or surface. While hydrophilicity does not initself adequately predict antigenic regions on proteins, polarregions are still representative of epitope regions (22, 34).Hydrophilic properties of sequence regions are in fact per-haps more representative of the mobility properties of se-quence regions, which in turn has a more fundamentalrelationship to antigenicity (35, 36). Since thermomobilityparameters are only obtainable from high-resolution solu-tions of crystal structures, hydrophilicity is the only biophys-ical index presently available for de novo predictions. It willbe demonstrated that the variability in the hydropathy profileof primary sequences within a protein family is a betterindicator of antigenic regions than hydrophilicity profiles orsequence-variability plots by themselves. This approach isapplied to the amino acid sequences of cytochrome c from 17mammalian species and 6 avian species as well as tolysozyme c from 9 avian species.Hydropathy Profiles. In the first step, for each protein in the

respective samples, the calculation of the hydropathy profilefollows that of Hopp and Woods (37), whereby the hydro-philicity index at each residue position is the average of thesum of six (n to n + 5, starting at position n) sequentialresidues. The summation method accounts for the influenceof the short- and medium-range environment on the hydro-pathic surface and imparts a region-specific description ofsequence regions more apt to be involved in the expressionof epitopes. For convenience, the first residue position waschosen to initiate the plots. In the calculation of the hydro-philicity profiles, five tailing glycines were added to theprotein sequences and included in the calculation. The choicefor the inclusion of glycine in the calculation is arbitrary. Theaddition of five glycines provides for a continuum of se-quences in the hydropathy profile calculation for the last fiveresidues in the actual sequences. The addition of the tailingend, while artificial, does account for the relative differencein the last five residues of the actual sequence if substitutionsare evident in this region.

The hydration potentials reported by Wolfenden et al. (27)were used in the calculation of the hydropathy profiles.Hydration potentials for arginine and proline were thosesuggested by Moews et al. (24). A reverse sign conventionfrom those reported was adopted so that the hydrophilicityprofiles are in the positive direction. These hydration poten-tials have been shown to be well-correlated with residue-exposed surface areas (27) and protein conformation prop-erties (24). Therefore, the advantage of these potentials incalculating hydropathy profiles is that these profiles areintrinsically correlated to protein structure and constitute aconsistent set of values to evaluate protein surface structuresreflective of epitope regions. These profiles are thus termedsurface profiles.Consensus and Variable Hydropathy. In the second step,

the hydropathy profiles of the respective proteins in thesample are aligned according to established sequence align-ments (38-40). A consensus profile is determined by calcu-lating the average of the aligned hydropathy profiles. Avariability index is then calculated and plotted for eachresidue position in the aligned profiles according to theformula of Wu and Kabat (7)-e.g., the number of differenthydropathy indexes for a given residue position divided bythe frequency of the most common hydropathy index at thatposition. This plot is a measure of the variation in thehydropathy profiles of the sample and therefore depictshydropathic variable regions.

Surface Variability and Autoantigenicity. In the final step,the relative hydrophilicity of these variable regions is eval-uated by calculating the product of the variability index andthe average hydrophilicity index for each residue position inthe family. This product quantitates the relative hydrophilic-ity changes accompanied with residue substitution. Thevariability of the protein surface within an evolutionarilyrelated protein family indicates evolutionary instabilities anddiversity. Such substitutions define target determinants forautoreactive immune responses (2, 3, 13-18, 26). The productof the hydrophilicity consensus and the surface variabilitydescribes autogenic loci. The approach permits a moreconcise description ofthose regions that are more prone to beautoreactive targets than those observable from sequencevariability plots alone. These regions are termed autogenicor, in short, autogens. This approach has been shown to becapable of identifying idiotypic-determining regions, whichare autogens on immunoglobulins (26), in agreement withlocations defined by anti-peptide antibodies against intactimmunoglobulin molecules and sequence analysis (13-18).

RESULTSA large number of sequences of eukaryote cytochromes chave been determined (38, 39). High-resolution analysis ofcrystal structures of cytochromes c (41) show that theseproteins exhibit the same fold, with the only differences beingobserved for substituted side chains. The antigenic propertiesof evolutionarily variant cytochromes have been studied (2,3), identifying autoreactive loci in the regions of 89-92,60-62, and around residue position 44. A Wu and Kabat (7)variability plot of mammalian cytochrome c sequences hasshown that these autoreactive regions are defined by highsequence variability (3).The thermodynamic differences associated with evolution-

ary substitutions is assessed by comparing hydropathy pro-files for each member ofthe sample. The consensus (average)hydropathy profile in Fig. 1 illustrates the hydropathicsurface pattern within the protein family, which may influ-ence the protein antigenicity of cytochrome c. This profiledelineates the possible hydrophilic surface repertoire withinthe protein family and is different from typical hydrophilicityplots of a single sequence (37). It is possible that if a position

Proc. Natl. Acad. Sci. USA 83 (1986)

Page 3: Evolutionary origin of autoreactive determinants (autogens)

Proc. Natl. Acad. Sci. USA 83 (1986) 2523

7-l

-2

0 20 40 60 80 100Residue sequence

FIG. 1. Mean hydropathy profile of 17 mammalian cytochrome csequences. The hydrophilicity of each sequence was determined byusing the algorithm ofHopp and Woods (37). The arithmetic mean ofthe hydrophilicity values for each residue is plotted against sequenceposition.

is highly variable, the mean hydrophilicity may be close tozero because the larger positive and negative values ofindividual sequences average out. Such additive effectswould imply that the hydrophilic surface changes (i.e.,thermodynamic changes) accompanying residue substitu-tions within the family is not important to the intrinsic proteinstructure-function relationships of each individual species.Therefore, the consensus profile provides a measure ofpotential hydrophilic regions that are persistent in evolution.This information is not obtainable from individual hydrop-athy profiles.

It has been tempting to relate hydrophilicity analysis toantigenicity (22, 34, 37). On face value, the characterizationof the consensus plot suggests that within the family, NH2-terminal regions encompassing positions 89, 70, 50, and 100are potential candidates for antigenic sequences. An insolu-ble synthetic peptide fragment encompassing the regionaround position 50 of beef cytochrome c has been shown tobind to rabbit antibodies (42). However, synthetic fragmentsof the regions around positions 89 and 62 were not found tobind to the appropriate antibodies (42).The calculation of the variability in the sample hydropathy

profile defines variable hydropathic regions. The product ofthe variability index for each residue position and the averagehydropathy index for that position quantitates the amount ofvariability and extent of hydropathy in the consensushydropathy profile of the family (Fig. 2). Therefore, the

200180 -

~160140-120-

t 10080-

~6040-

10 20 30 0 50 60 70 80 90 100Residue sequence

FIG. 2. Surface variability plot for 17 mammalian cytochrome csequences. The product of the variability index and the meanhydrophilicity index is calculated for each residue position andplotted against sequence position. The product is scaled by a factorof 10 in the plot.

product accounts for the relative hydropathy of the variableregions. The method for calculating hydropathy profilesaverages over groups of six residues (n to n + 5 for positionn), resulting in nominal variability for 5-6 residues to the leftof the truly variable positions 62 and 89. The peak at position58 encompasses positions 58-63 and the peak at position 88encompasses positions 88-93. The peak at position 44 en-compasses residue positions 44-50.An interesting observation stems from the relative peak

heights in Fig. 2. The relative differences indicate a rankordering in autogen recognition, which has been observed inbinding assays of rabbit anti-cytochrome c to rabbit cyto-chrome c, using as competitive inhibitors various mammaliancytochromes c (3). In these experiments, it was shown thatmouse cytochrome c, which differs from rabbit cytochromec at positions 44 and 89, is a more effective competitor withrabbit cytochrome c than guanaco cytochrome c, whichdiffers from rabbit at positions 62 and 89. These resultsindicate that the anti-rabbit antibody population distin-guished the residue change at position 62 in guanacocytochrome more effectively than the residue change atposition 44 in mouse. The relative peak heights in Fig. 2suggest this effect, even though the difference between rabbitcytochrome c and guanaco cytochrome involves a minuteside-chain difference in going from aspartic acid to glutamicacid, respectively, at position 62. Therefore, Fig. 2 impliesthat those cytochromes with joint changes in the highlysurface-variable regions around positions 62 and 89 would beless competitive with rabbit cytochrome c than thosecytochromes exhibiting residue substitutions at regions 44and 89. Subsequently, joint changes in all three regions, suchas in beef cytochrome c, would result in these proteins beingeven less autologous inhibitors in the competitive bindingassay.The pattern in the relative peak height corresponds to the

trend in the competitive binding assay (3). The analysis ofthesurface variability in the cytochrome family therefore quan-titates a relationship between protein surface properties andautoimmune response. These aspects are not observablefrom sequence variability plots. In addition, Fig. 2 designatespossible fine surface specificities upon substitutions. Theeffect of substitutions at position 44 changes the hydrophilic-ity pattern more in the COOH terminus direction from thisposition, while the substitutions at positions 62 and 89 affectthe NH2-terminal sides of these positions. That is, the effecton the summation of those hydrophilicity indexes to the leftof position 44 is not as great as the effect on the summationto its right, which includes position 44. The substitutions atpositions 62 and 89 have the opposite effect.

In Fig. 3, autogen loci are illustrated for avian cytochromec derived from six bird species for avian immune hosts. Ananalysis of mice B-cell response to pigeon cytochrome c

32-28-24

C 20 2

> 16-

12c8

010 20 30 40 50 60 70 80 90 100

Residue sequence

FIG. 3. Surface variability plot for six avian species. See legendfor Fig. 2.

Evolution: Kieber-Emmons and Kohler

Page 4: Evolutionary origin of autoreactive determinants (autogens)

2524 Evolution: Kieber-Emmons and Kohler

suggests that at least three residue positions-3, 100, and104-are critical for the antigenic determinant (2, 43). In Fig.3, these positions are included in the peak regions. Further-more, of interest is the variable hydrophilic region aroundposition 87-88. While it has been shown that antibodiesagainst pigeon cytochrome c recognize determinants createdby portions of the molecule-e.g., segments 1-65 and66-104-most of the reaction is observed with the 66-104segment. This type of response is suggested from the plot.

Similar to cytochromes c, lysozymes are excellent antigenswhose cross-reactivities and distinct specificities have beenextensively studied (5, 44, 45) and recently reviewed (2).Chicken lysozyme c from hen egg white (HEL) has longserved as a prototype protein for investigating the specificityof immune recognition. The question of how much surface ofHEL is antigenic has been controversial and extensivelydiscussed (2).

In Fig. 4, the surface-variability profile of nine avianlysozymes (40) is depicted. The principal residue positionsresponsible for cross-reactivities have been reviewed (2),with the peak positions in Fig. 4 in good agreement with theexperimental data. In addition, suspected autogenic locations(44) overlap with determinants observed on peptide frag-ments that have been shown to be antigenic when testedagainst anti-HEL (2). Analysis of surface accessibilities forthe major peak areas in Fig. 4 shows the major regions to besurface exposed (46). Furthermore, the peak areas corre-spond to highly mobile areas, as deemed by temperature-factor analysis (36). While monoclonal antibodies have indi-cated positions 102-103 as antigenic positions (2), differencesin the binding affinity of substrate analogues to lysozymescorrelate with mutations at these positions (47). Therefore,these positions are more associated with the binding site ofHEL, with this recessed nature suggested in Fig. 4. Recently,monoclonal antibodies have detected an antigenic site asso-ciated with position 91 (45), as indicated in Fig. 4, as well asfor positions 38-45 (48).

DISCUSSIONFrom a recent survey on the antigenicity of proteins andpeptides, it appears that essentially all surface-exposedregions of proteins can be recognized by the immune system(2). The question addressed here focuses on the autoantigen-icity of proteins. Does the immune system produce autoanti-bodies to host proteins that react with all potential antigensites or is autoreactivity selective with respect to proteindeterminants? In other words, are there sequence regions inself-antigens that are preferred by an autoreactive immuneresponse or is the selection of autoantigenic determinantsrandom? Based on experimental evidence from autoantibod-ies against cytochromes c, Jemmerson and Margoliash (3)

16

14

>, 12 210

> 8-

010 20 30 40 50 6070 80 0 1~00110o120o13'0

Residue sequence

FIG. 4. Surface variability plot for nine avian lysozymes. Seelegend for Fig. 2.

have described a striking correlation between autoreactivesequence regions and evolutionary instability. While theiranalysis of evolutionary variation was based on simplesequence-variability plots [Wu-Kabat plot (7)], we present arefined approach that takes into account the distinctionbetween evolutionary substitutions that induce a change ofthe protein surface and those that do not (surface-neutralsubstitutions). A quantitative aspect of surface variation(surface consensus) is included in our algorithm. The finalstep produces a plot called surface variability, which indi-cates sequence regions having a preference for autoimmunereaction. We propose the term "autogen" to designate suchprotein determinants.To a large extent, the use ofpanels of evolutionarily variant

proteins to define autoreactive sites implies that the sitesshould occur in variable regions. However, not all residuesubstitutions affect the autoantigenic repertoire. Simple vari-ability plots do not distinguish between evolutionary substi-tutions that affect surface characteristics and those that donot. For example, classical sequence-variability analysis (7)cannot distinguish between those residues in the hypervari-able region of immunoglobulins that contribute to idiotypicdiversity (surface antigenic determinants) from those thatparticipate in antigen binding and selection. Presuming thatantibodies recognize surface determinants, the identificationof those evolutionarily variable sequence regions that aremore apt to be on the exterior of a protein is a critical steptoward understanding the potential antigenic repertoire forautoreactivity. The variability in the hydropathy profiles ofevolutionarily distant members of a protein family is a refinedapproach to identify autoreactive regions. However, thecomplete description of autogen loci may entail the structuralconfiguration of regions remote in sequence to the primaryrecognition regions.For highly conserved proteins such as cytochrome c,

polyclonal antisera can be fractionated into distinct nonover-lapping specificities that correspond to sites of sequencevariation. Experiments designed to examine the full reper-toire of monoclonal antibody specificities to protein antigenshave shown complex patterns of overlap (2), as typified bylysozyme (45). Figs. 2-4 indicate the possible effects ofsingle-residue substitutions on the immediate environment.Such effects may allow a single residue to be contained inseveral overlapping unit determinants. Subsequently, it isconceivable that the number of discrete determinants isgreater than the number of surface residues (2).The predictive plot (Fig. 3) for avian cytochrome c impli-

cates those regions recognized by the immune system of birdspecies. While position 89 overlaps with mammaliancytochrome c, other sites may be evident if avian cytochromec is confronted with the mammalian immune system (2). Themore differences between avian and mammalian sequencesexist, the more regions that are likely to be recognized asantigenic by the mammalian immune system and vice versa.According to the multideterminant regulatory model (2),

evolutionary variation determines what is antigenic and whatis tolerogenic. Sequence regions of a protein family that areevolutionarily conserved due to the structural and functionalconstraints induce tolerance, whereas highly variable regionsremain antigenic. Naturally, evolutionary variation is per-missive only in regions that are not important for maintainingprotein function or structure. Therefore, regions that exhibitvariability tolerate local changes in conformation and, inturn, correlate with thermal mobility (35). However, theremay also be mobile regions that are intimately related to thefunction of the protein. This type of mobility is not stochasticin nature, but it is inherent to the function of the protein.Allosteric mobility is an example of nonstochastic mobility.Therefore, methods for identifying antigenic regions based on

Proc. Natl. Acad. Sci. USA 83 (1986)

Page 5: Evolutionary origin of autoreactive determinants (autogens)

Proc. Natl. Acad. Sci. USA 83 (1986) 2525

mobility plots per se (35) may implicate more regions asantigenic than are warranted.Although the possibility of eliciting antibody responses to

conserved regions has been documented (49), these studiesinvolve the presentation of sequence regions to the immunesystem in a nonphysiological manner. By denaturing proteinsthrough extensive crosslinking or presenting only fragments,the pattern of natural antigenicity may be destroyed andtherefore any protein region can, in principle, be renderedantigenic.The approach of using the evolutionary surface variability

for the identification of autogenic loci, as illustrated in Figs.2-4, requires sufficient sequence information on proteinfamilies obtained from evolutionarily distant species. Thedescribed approach can be used as a predictive analysis ofautoantigenic determinants for other proteins that are incontact with the immune system and have the potential ofbeing autoantigenic. The immunoglobulins themselves, viatheir idiotypic determinants, are a prime example for selfdeterminants being recognized by the immune system (50).Furthermore, the question of the evolutionary contact of theprotein family with the immune system must be carefullyconsidered. Cytochromes, lysozymes, hemoglobulins, immu-noglobulins, and major histocompatibility complex proteinsare examples of proteins that have evolutionary interactionswith the immune system. The relationship between what theimmune system recognizes and which structures are highlyvariable in evolution led to the proposal of a mechanism ofevolutionary surveillance for proteins that are in contact withthe immune system (51). It will be interesting to search forcoevolutionary events in protein families and the immunerecognition genes by using the described analysis of evolu-tionarily linked antigenicity.

We thank Mrs. Cheryl Zuber for preparation of the manuscript.This work was supported in part by Grant AG04180 from theNational Institute on Aging, Department of Health and HumanServices, and by American Cancer Society Institutional ResearchGrant IN-54X. This work was partially supported by NASA-NSG-7305 to R. Rein (Unit of Theoretical Biology).

1. Fishbach, M. & Talal, N. (1984) in Idiotypy in Biology andMedicine, eds. Kohler, H., Urbain, J. & Cazenave, P. A.(Academic, New York), pp. 417-428.

2. Benjamin, D. C., Berzofsky, J. A., East, I. J., Gurd,F. R. N., Hannum, C., Leach, S. J., Margoliash, E., Michael,J. G., Miller, A., Prager, E. M., Reichlin, M., Sercarz, E. E.,Smith-Gill, S. J., Todd, P. E. & Wilson, A. C. (1984) Annu.Rev. Immunol. 2, 67-101.

3. Jemmerson, R. & Margoliash, E. (1979) Nature (London) 282,468-471.

4. Ibrahimi, I. M., Prager, E. M., White, T. J. & Wilson, A. C.(1979) Biochemistry 18, 2736-2744.

5. Reichlin, M. (1975) Adv. Immunol. 20, 71-123.6. White, T. J., Ibrahimi, I. M. & Wilson, A. C. (1978) Nature

(London) 274, 92-94.7. Wu, T. T. & Kabat, E. A. (1970) J. Exp. Med. 132, 211-250.8. Urbanski, G. J. & Margoliash, E. (1977) J. Immunol. 118,

1170-1180.9. Jemmerson, R. & Margoliash, E. (1979) J. Biol. Chem. 254,

12706-12716.10. Kabat, E. A., Wu, T. T. & Bilofsky, H. (1976) Proc. Natl.

Acad. Sci. USA 73, 617-619.11. Capra, J. D. & Fougerau, M. (1983) Immunol. Today 4,

177-179.12. Gridley, T., Margolies, M. N. & Gefter, M. L. (1985) J.

Immunol. 134, 1236-1244.13. Chen, P. P., Goni, F., Fong, S., Jirik, F., Vaughan, J. H.,

Frangione, B. & Carson, D. A. (1985) J. Immunol. 134,3281-3285.

14. Chen, P. P., Fong, S., Normansell, D., Houghten, R. A.,Karras, J. G., Vaughan, J. H. & Carson, D. A. (1984) J. Exp.Med. 159, 1502-1511.

15. Chen, P. P., Fong, S., Houghten, R. A. & Carson, D. A.(1985) J. Exp. Med. 161, 323-331.

16. Chen, P. P., Houghten, R. A., Fong, S., Rhodes, G. H.,Gilbertson, T. A., Vaughan, J. H., Lerner, R. A. & Carson,D. A. (1984) Proc. Nati. Acad. Sci. USA 81, 1784-1788.

17. Seiden, M. V., Clevinger, B., McMillian, S., Srouji, A.,Lerner, R. & Davie, J. M. (1984) J. Exp. Med. 159, 1338-1350.

18. Thielemans, K., Rothbard, J. B., Levy, S. & Levy, R. (1985)J. Exp. Med. 162, 19-34.

19. Poljak, R. J. (1984) in The Biology ofIdiotypes, eds. Greene,M. I. & Nisonoff, A. (Plenum, New York), pp. 131-140.

20. Rudikoff, S. (1984) in The Biology of Idiotypes, eds. Greene,M. I. & Nisonoff, A. (Plenum, New York), pp. 115-128.

21. Chang, C.-H., Short, M. T., Westholm, F. A., Stevens, F. J.,Wang, B.-C., Furey, W., Jr., Solomon, A. & Schiffer, M.(1985) Biochemistry 24, 4890-4896.

22. Berzofsky, J. A. (1985) Science 229, 932-940.23. Prager, E. M., Welling, G. W. & Wilson, A. C. (1978) J. Mol.

Evol. 10, 293-307.24. Moews, P. C., Knox, J. R., Waxman, D. J. & Strominger,

J. L. (1981) Int. J. Pept. Protein Res. 17, 211-218.25. Radbruch, A., Zaiss, S., Kappen, C., Bruggemann, M.,

Beyrenther, K. & Rajewsky, K. (1985) Nature (London) 315,506-508.

26. Kieber-Emmons, T. & Kohler, H. (1986) Immunol. Rev. 90,29-48.

27. Wolfenden, R. V., Cullis, P. M. & Southgate, C. C. F. (1979)Science 206, 575-577.

28. Janin, J. (1979) Nature (London) 277, 491-492.29. Chothia, C. H. (1976) J. Mol. Biol. 105, 1-12.30. Chothia, C. H. (1975) Nature (London) 254, 304-308.31. Argos, P. & Palau, J. (1982) Int. J. Pept. Protein Res. 19,

380-393.32. Go, M. & Miyazawa, S. (1980) Int. J. Pept. Protein Res. 15,

211-224.33. Lacey, J. W., Jr., & Mullins, D. W., Jr. (1983) Origins Life 13,

3-42.34. Palfreyman, J. W., Aitcheson, T. C. & Taylor, P. (1984) J.

Immunol. Methods 75, 383-393.35. Tainer, J. A., Getzoff, E. D., Paterson, Y., Olson, A. J. &

Lerner, R. A. (1985) Annu. Rev. Immunol. 3, 501-535.36. Rose, G. D. (1978) Nature (London) 272, 586-590.37. Hopp, T. P. & Woods, K. R. (1981) Proc. Nati. Acad. Sci.

USA 78, 3824-3828.38. Borden, D. & Margoliash, E. (1976) in Handbook ofBiochem-

istry and Molecular Biology, Proteins, ed. Fasman, G. D.(Chemical Rubber Co., Cleveland, OH), Vol. 3, pp. 268-279.

39. Dayhoff, M. 0. & Eck, R. A. (1982) Atlas ofProtein Sequenceand Structure (Natl. Biomed. Res. Found., Washington, DC).

40. Smith-Gill, S., Wilson, A. C., Potter, M., Prager, E. M.,Feldmann, R. J. & Mainhart, C. R. (1982) J. Immunol. 128,314-322.

41. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer,E. F., Jr., Brice, M. D., Rodgers, J. R., Kennard, O.,Shimanouchi, T. & Tasumi, M. (1977) J. Mol. Biol. 112,535-542.

42. Atassi, M. Z. (1981) Mol. Immunol. 18, 1021-1025.43. Hannum, C., Ultee, M., Matis, L. A., Schwartz, R. H. &

Margoliash, E. (1982) Advances in Experimental Medicine andBiology, ed. Atassi, M. Z. (Plenum, New York), Vol. 150, pp.37-52.

44. Prager, E. M. & Wilson, A. C. (1971) J. Biol. Chem. 246,5978-5989.

45. Metzger, D. W., Ching, L.-K., Miller, A. & Sercarz, E. E.(1984) Eur. J. Immunol. 14, 87-93.

46. Lee, B. & Richards, F. M. (1971) J. Mol. Biol. 55, 379-400.47. Hornbeck, P. V. & Wilson, A. C. (1984) Biochemistry 23,

998-1002.48. Hirayama, A., Takugaki, Y. & Karush, F. (1985) J. Immunol.

134, 3241-3247.49. Jemmerson, R., Morrow, P. R., Klinman, N. R. & Paterson,

Y. (1985) Proc. Natl. Acad. Sci. USA 82, 1508-1512.50. Kieber-Emmons, T., Ward, R. E., Raychaudhuri, S., Rein, R.

& Kohler, H. (1985) Int. Rev. Immunol. 1, 1-30.51. Kohler, H. (1984) in Idiotypy in Biology and Medicine, eds.

Kohler, H., Cazenave, P. A. & Urbain, J. (Academic, NewYork), pp. 3-13.

Evolution: Kieber-Emmons and Kohler