10
On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants Milan Randic ´ ,* Marjana Novic ˇ , and Marjan Vrac ˇ ko National Institute of Chemistry, Ljubljana, Slovenia Received December 23, 2001 We explore the characterization of 2-D electrophoresis proteomics maps by certain structural invariants derived from matrixes constructed by considering for all pairs of spots in a proteomics maps the shortest (Euclidean) distances and distances measured along zigzag lines connecting protein spots of the neighboring abundance. This paper is a sequel to previous papers in which we outlined the idea of characterizing 2-D proteomics maps by graph-theoretical descriptors. To illustrate the approach, we selected data of Anderson et al. (Anderson, N. L.; Esquer-Blasco, R.; Richardson, F.; Foxworthy, P.; Eacho, P. The effects of peroxisome proliferators on protein abundances in mouse liver. Toxicol. Appl. Pharmacol. 1996, 137, 75-89) on protein abundance in mouse liver under a series of dose of peroxisome proliferator LY1711883. We found strong linear correlation between the experimentally applied doses and the leading eigenvalue of a D/D-type matrix (Randic ´ , M.; Kleiner, A. F.; DeAlba, L. M. Distance/ distance matrices. J. Chem. Inf. Comput. Sci. 1994, 34, 277-286) constructed for the experimental proteomics maps. Keywords: map invariants chemical graph theory D/D matrix proteomics maps peroxisome proliferators Introduction Proteomics became one of the most expanding fields of applied biochemistry because for the first time experimental data have been collected on an abundance of proteins in single animal organ cells. The growth in accumulation of experimental data, mostly in the form of 2-D proteomics maps in which proteins are separated by charge (electrophoresis) and by mass (chromatography), is not followed by the development of theoretical methodologies that would assist in “digestion” of the plethora of data. We have recently initiated one such theoretical approach that shows promise 3-7 that is based on construction of map invariants, which are analogous to a degree to sequence invariants used for characterization of DNA 8-12 and ultimately to graph invariants used in QSAR (quantitative structure-activity relationship). 13-16 An invariant is a mathematical property of a system that may be a molecular graph, molecular structures in 3-D, DNA sequence, or a 2-D map, the special case of which are proteomics maps. The strategy that we have developed in order to arrive at map invariants is to associate with a map a graph or a zigzag line. After embedding such graph or zigzag line onto the map, one tries to associate a matrix with the so-embedded geometrical object. Matrix is a natural mathematical tool for characteriza- tion of such objects because one can associate with matrix element (i, j) information concerning pair of vertexes i and j. The elements of the particular matrix that we will consider are given as a quotient of distances. One of the distances consid- ered is the Euclidean distance between two points considered (protein spots in 2-D gel), and the other is obtained by measuring the distance along the edges of a graph or along the zigzag line. Hence, we succeeded to combine information on the adjacency of the spots along the zigzag line and on the mutual distances of all spots. In this way we obtain the so- called D/D matrix for the embedded zigzag line. The D/D matrix has been initially designed for characterization of conformations (i.e., the geometrical shape) of chainlike structures. 2,17-21 Once the D/D matrix is obtained by use of standard matrix algebra, one can extract various matrix invari- ants to serve as the map descriptor and generate additional structurally related map matrixes. The intuitive argument why the D/D matrix could in principle characterize a map is based on the fact that such a matrix combines information on the distances between the spots with the information on the sequential ordering of spots. Both the graph theoretical distance matrix 22 and the geometry- based distance matrix have been employed in chemistry for characterization of molecules. Although they are information- rich, they do not carry information on adjacencies, i.e., which atom is bonded to which. In the cases of molecules, however, the bonding pattern can be deduced by simply finding pairs of atoms associated with the shortest distances. The adjacency matrix, on the other hand, only tells which atoms in a molecule are bonded but is devoid of any information on molecular geometry. The D/D matrix combines information given by both matrixes in a single matrix, which is based on information that relates to through-space and through-bonding distances. This is essential for characterization of maps because the separations * Professor Emeritus, Department of Mathematics and Computer Science, Des Moines, IA 50311. Present address: 3225 Kingman Road, Ames, IA 50014. Fax: (515) 292 8629. 10.1021/pr0100117 CCC: $22.00 2002 American Chemical Society Journal of Proteome Research 2002, 1, 217-226 217 Published on Web 03/01/2002

On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

  • Upload
    marjan

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

On Characterization of Dose Variations of 2-D Proteomics Maps by

Matrix Invariants

Milan Randic,* Marjana Novic, and Marjan Vracko

National Institute of Chemistry, Ljubljana, Slovenia

Received December 23, 2001

We explore the characterization of 2-D electrophoresis proteomics maps by certain structural invariantsderived from matrixes constructed by considering for all pairs of spots in a proteomics maps the shortest(Euclidean) distances and distances measured along zigzag lines connecting protein spots of theneighboring abundance. This paper is a sequel to previous papers in which we outlined the idea ofcharacterizing 2-D proteomics maps by graph-theoretical descriptors. To illustrate the approach, weselected data of Anderson et al. (Anderson, N. L.; Esquer-Blasco, R.; Richardson, F.; Foxworthy, P.;Eacho, P. The effects of peroxisome proliferators on protein abundances in mouse liver. Toxicol. Appl.Pharmacol. 1996, 137, 75-89) on protein abundance in mouse liver under a series of dose of peroxisomeproliferator LY1711883. We found strong linear correlation between the experimentally applied dosesand the leading eigenvalue of a D/D-type matrix (Randic, M.; Kleiner, A. F.; DeAlba, L. M. Distance/distance matrices. J. Chem. Inf. Comput. Sci. 1994, 34, 277-286) constructed for the experimentalproteomics maps.

Keywords: map invariants • chemical graph theory • D/D matrix • proteomics maps • peroxisome proliferators

Introduction

Proteomics became one of the most expanding fields ofapplied biochemistry because for the first time experimentaldata have been collected on an abundance of proteins in singleanimal organ cells. The growth in accumulation of experimentaldata, mostly in the form of 2-D proteomics maps in whichproteins are separated by charge (electrophoresis) and by mass(chromatography), is not followed by the development oftheoretical methodologies that would assist in “digestion” ofthe plethora of data. We have recently initiated one suchtheoretical approach that shows promise3-7 that is based onconstruction of map invariants, which are analogous to adegree to sequence invariants used for characterization ofDNA8-12 and ultimately to graph invariants used in QSAR(quantitative structure-activity relationship).13-16 An invariantis a mathematical property of a system that may be a moleculargraph, molecular structures in 3-D, DNA sequence, or a 2-Dmap, the special case of which are proteomics maps. Thestrategy that we have developed in order to arrive at mapinvariants is to associate with a map a graph or a zigzag line.After embedding such graph or zigzag line onto the map, onetries to associate a matrix with the so-embedded geometricalobject. Matrix is a natural mathematical tool for characteriza-tion of such objects because one can associate with matrixelement (i, j) information concerning pair of vertexes i and j.The elements of the particular matrix that we will consider aregiven as a quotient of distances. One of the distances consid-

ered is the Euclidean distance between two points considered(protein spots in 2-D gel), and the other is obtained bymeasuring the distance along the edges of a graph or alongthe zigzag line. Hence, we succeeded to combine informationon the adjacency of the spots along the zigzag line and on themutual distances of all spots. In this way we obtain the so-called D/D matrix for the embedded zigzag line. The D/Dmatrix has been initially designed for characterization ofconformations (i.e., the geometrical shape) of chainlikestructures.2,17-21 Once the D/D matrix is obtained by use ofstandard matrix algebra, one can extract various matrix invari-ants to serve as the map descriptor and generate additionalstructurally related map matrixes.

The intuitive argument why the D/D matrix could inprinciple characterize a map is based on the fact that such amatrix combines information on the distances between thespots with the information on the sequential ordering of spots.Both the graph theoretical distance matrix22 and the geometry-based distance matrix have been employed in chemistry forcharacterization of molecules. Although they are information-rich, they do not carry information on adjacencies, i.e., whichatom is bonded to which. In the cases of molecules, however,the bonding pattern can be deduced by simply finding pairsof atoms associated with the shortest distances. The adjacencymatrix, on the other hand, only tells which atoms in a moleculeare bonded but is devoid of any information on moleculargeometry. The D/D matrix combines information given by bothmatrixes in a single matrix, which is based on information thatrelates to through-space and through-bonding distances. Thisis essential for characterization of maps because the separations

* Professor Emeritus, Department of Mathematics and Computer Science,Des Moines, IA 50311. Present address: 3225 Kingman Road, Ames, IA 50014.Fax: (515) 292 8629.

10.1021/pr0100117 CCC: $22.00 2002 American Chemical Society Journal of Proteome Research 2002, 1, 217-226 217Published on Web 03/01/2002

Page 2: On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

between the spots along the zigzag line may vary considerablyand thus one cannot deduce from the information on distancesbetween spots which pairs of spots are connected by zigzagline.

D/D Matrix. In Figure 1 we show schematically the pro-teomics map with 99 spots that has been reconstructed fromFigure 2 of the paper of Anderson et al.1 It illustrates distribu-tion of proteins in the control group of a study after the effectsof peroxisome proliferators on protein abundance in mouseliver. First we need the x and y coordinates (that were notreported in ref 1) for 107 protein spots that have been identifiedand labeled in ref 1. We superimposed a 200 × 200 grid overthe map in order to estimate the x, y coordinates and wereable to obtain coordinates for 99 individual spots. In Table 1we have listed coordinates (not reported in ref 1) and abun-dances (reported in ref 1) for the first 10 points of 99 that weselected from ref 1. To facilitate comparison with the originaldata, we used the same labels as given by Anderson et al. forthe protein spots (shown in the first column of Table 1). Theabundance in the column “0” is that of the control group. Theremaining six columns give the abundance for the followingsix concentrations, respectively, of LY171883 in mouse diet:0.003, 0.01, 0.03, 0.1, 0.3, and 0.6. The schematic chemicalstructure of LY171883 is shown in Figure 2.

Because charge, mass, and abundance are measured eachin their own units, they may result in numerical values of widely

different magnitude. In order to ensure that the three coordi-nates x, y, and z (charge, mass, and abundance, respectively)play approximately the same roles, the first step in theconstruction of the D/D matrix is to scale the input x, y, zcoordinates (Table 1). Following the recommendation of Kow-alski and Bender,23 we re-scaled coordinates and abundances,all of which are now in the interval (-1, +1). In Table 2 weshow for the first 10 spots listed in Table 1 the new re-scaledcoordinates and abundances. For example, fot the case of therelative abundance the simple regression

relates the scaled and the nonscaled abundance (the correlationcoefficient r ) 1.00000; the standard error s ) 0.051; the Fisherration F ) 2,736,440).

The next step is to construct a zigzag line, which connectsspots of adjacent values of the abundance. In preparation forconstruction of the D/D matrix for the proteomics maps wefirst have to order spots according to relative abundance inthe control group. We start with the most intensive spot no.51, which is then connected to the next most intensive spotno. 129, which is connected to the next most intensive spotno. 162, etc. In Figure 3, we have depicted the projection of aso constructed zigzag line in the (x, y) plane. The line for whichwe compute the D/D matrix is in 3-D, the third dimensionbeing given by normalized abundances (listed in Tables 2 and3).

Figure 1. Schematic representation of a two-dimensional proteinpattern for 99 proteins of whole mouse liver homogenanteconsidered in this paper and based on the map reported in ref1.

Figure 2. Schematic chemical structure of LY171883.

Table 1. Coordinates of Protein Spots and Their Abundancefor Different Concentrations of Peroxisome ProliferatorLY171883

spot x y 0 0.003 0.01 0.03 0.1 0.3 0.6

7 78 147 24.6 24.6 32.0 34.4 41.8 46.7 56.613 103 161 15.3 18.4 18.4 16.8 16.8 15.3 16.814 78 128 54.1 54.1 59.5 59.5 59.5 70.3 75.719 84 144 17.6 21.1 21.1 19.4 19.4 19.4 21.122 120 130 17.2 25.8 22.4 24.1 37.8 49.9 55.023 73 129 26.4 26.4 26.4 26.4 29.0 29.0 31.726 90 144 13.9 16.7 16.7 15.3 15.3 15.3 18.029 63 181 9.8 6.9 6.9 5.9 5.9 5.9 5.931 107 143 10.7 8.6 11.8 12.8 12.8 15.0 20.333 107 123 40.1 45.0 28.6 28.6 36.8 28.6 20.5

Figure 3. (x, y) projection of a zigzag line connecting the 20 mostabundant proteins of Figure 1 spots having neighboring abun-dance values.

scaled ) 0.00609 (nonscaled) - 0.07809

research articles Randic et al.

218 Journal of Proteome Research • Vol. 1, No. 3, 2002

Page 3: On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

In Table 3, we listed the 20 most intense gel spots. Observealready from the first few rows of Table 3 that intensities ofspots regularly decrease only in the control group. However,once spots are ordered we maintain the same ordering of spotsin the remaining six columns corresponding to differentconcentrations of LY1711883 (percent in diet). The D/D matrixis constructed by calculating the Euclidean distance throughthe space for any pair of spots (i, j) and then calculating thedistance between the same two spots i and j along the zigzagline. The matrix element is given as the quotient DE(i,j)/DL(i,j), where DE and DL are the distances measured throughthe space and along the zigzag line, respectively.

Leading Eigenvalue of D/D Matrix. Once we have associateda matrix with a map we can consider various matrix invariantsas potential descriptors for the map. As is known from chemicalgraph theory,13-16,24 where mathematical invariants are usedfor characterization of molecular structures, there are nogeneral rules of how to construct and how to select one set ofinvariants over the other set. Invariants, which are known inQSAR as topological indices, are judged by their utility: If theyoffer useful correlation between structure and property theybecame important, if not, they fade away and tend to beforgotten. However, it is important to realize (1) that judgmenton novel invariants comes after they have been proposed and(2) that often it not easy to predict in advance whether aninvariant will be useful or not for a particular task. This situationextends to characterization of proteomics maps. In our earlierwork we selected the leading eigenvalue of D/D matrix as aninvariant of choice even though the question of utility ofnumerical characterizations of proteomics maps has yet to be

better explored. In this paper, we will present an applicationof the leading eigenvalue of D/D matrixes for characterizationof proteomics maps, which demonstrates use of the leadingeigenvalue of D/D matrixes as useful map descriptor.

Let us comment on why we selected the leading eigenvalueof a matrix instead of several other possibilities, such as theaverage matrix element that is related to the Wiener index25

that has some prominence in QSAR, or an index analogous toBalaban’s J index,26 also having use in QSAR. The mentionedalternatives, as well as a dozen other descriptors, are legitimatechoices, and there is no doubt that they also will be examinedas descriptors for characterizations of maps. While manytopological indices may have somewhat unclear structuralinterpretation, the leading eigenvalues of D/D matrixes17-21,27-29

have been found to have rather interesting structural interpre-tation as descriptors. They offer, at least in the case of chainstructures (as is the case with zigzag line considered here), ameasure of the degree of bending of a structure. By extension,the leading eigenvalue of cyclic structures is some measure ofthe compactness of a structure. Because the zigzag 3-D curveof the control group is descending regularly, one expects thatthe leading eigenvalue of this curve will be accompanied withthe largest leading eigenvalue in comparison with the leadingeigenvalues of curves corresponding to different concentrationsof LY171883. The other zigzag curves will show some oscillatoryvariations in spot abundances along the zigzag line and, hence,will induce a greater “bending” of the line, thus reducing thecorresponding leading eigenvalue. To what extent this is thecase can be seen from the first numerical row of Table 4 in

Table 2. Scaled Coordinates and Abundances of Protein Spots for Different Concentrations of Peroxisome Proliferator LY171883

spot x y 0 0.003 0.01 0.03 0.1 0.3 0.6

7 -0.103 0.118 0.072 0.054 0.099 0.127 0.168 0.208 0.29513 -0.037 0.159 0.015 0.022 0.026 0.023 0.021 0.015 0.02514 -0.103 0.063 0.251 0.203 0.248 0.275 0.272 0.353 0.42419 -0.087 0.110 0.029 0.036 0.040 0.038 0.036 0.040 0.05422 0.008 0.069 0.027 0.060 0.047 0.066 0.145 0.227 0.28423 -0.116 0.066 0.083 0.063 0.069 0.079 0.093 0.099 0.12626 -0.071 0.110 0.007 0.014 0.016 0.014 0.012 0.015 0.03429 -0.143 0.218 -0.019 -0.036 -0.037 -0.042 -0.043 -0.043 -0.04931 -0.026 0.107 -0.013 -0.027 -0.010 -0.001 -0.002 0.013 0.04933 -0.026 0.048 0.171 0.157 0.081 0.093 0.139 0.097 0.050

Table 3. Ordered Spots by Their Abundance in the ControlGroup

spot x y 0 0.003 0.01 0.03 0.1 0.3 0.6

51 -0.198 -0.192 0.464 0.515 0.455 0.502 0.498 0.578 0.394129 -0.198 -0.063 0.438 0.402 0.522 0.425 0.371 0.181 0.198162 0.184 -0.163 0.353 0.467 0.385 0.304 0.422 0.313 0.295

14 -0.103 0.063 0.251 0.203 0.248 0.275 0.272 0.353 0.424111 0.143 -0.031 0.190 0.108 0.141 0.184 0.104 0.138 0.150

94 0.124 -0.025 0.182 0.168 0.158 0.151 0.149 0.131 0.11433 -0.026 0.048 0.171 0.157 0.081 0.093 0.139 0.097 0.05034 -0.127 -0.002 0.138 0.073 0.080 0.091 0.110 0.073 0.12879 0.037 -0.043 0.134 0.088 0.114 0.108 0.127 0.070 0.07673 0.106 0.030 0.126 0.116 0.107 0.102 0.100 0.106 0.070

262 0.129 -0.166 0.100 0.137 0.069 0.079 0.060 0.065 0.010219 0.103 -0.166 0.090 0.111 0.061 0.054 0.052 0.056 0.005127 0.066 -0.090 0.088 0.054 0.059 0.052 0.067 0.038 0.022217 0.132 -0.222 0.083 0.077 0.070 0.080 0.063 0.067 0.055

23 -0.116 0.066 0.083 0.063 0.069 0.079 0.093 0.099 0.12677 -0.161 -0.151 0.078 0.059 0.037 0.044 0.058 0.062 0.050

7 -0.103 0.118 0.072 0.054 0.099 0.127 0.168 0.208 0.29585 0.164 0.001 0.069 0.051 0.122 0.109 0.092 0.143 0.20591 0.119 0.004 0.068 0.051 0.069 0.079 0.078 0.098 0.10619 -0.087 0.110 0.029 0.036 0.040 0.038 0.036 0.040 0.054

Table 4. Leading Eigenvalues for Different Concentrations andDifferent Powers of nD/nD Matrixes

0 0.003 0.01 0.03 0.1 0.3 0.6

1 9.7811 9.6913 9.7450 9.7638 9.7180 9.6679 9.64182 4.2507 4.0793 4.2667 4.2661 4.1557 4.2476 4.20403 3.4662 3.2282 3.4098 3.4293 3.2978 3.5104 3.38894 3.1789 2.8988 3.0611 3.1042 2.9428 3.2337 3.04615 3.0265 2.7658 2.8654 2.9270 2.7732 3.0866 2.85566 2.9237 2.7267 2.7545 2.8062 2.7353 2.9884 2.73087 2.8447 2.7001 2.7312 2.7225 2.7097 2.9136 2.64038 2.7797 2.6799 2.7142 2.7046 2.6904 2.8523 2.57019 2.7386 2.6634 2.7006 2.6903 2.6747 2.8002 2.5133

10 2.7310 2.6492 2.6893 2.6782 2.6613 2.7548 2.465815 2.7074 2.5944 2.6479 2.6329 2.6110 2.5945 2.371320 2.6924 2.5514 2.6161 2.5974 2.5728 2.5010 2.350225 2.6800 2.5144 2.5879 2.5657 2.5401 2.4452 2.335430 2.6687 2.4815 2.5617 2.5364 2.5111 2.4236 2.323035 2.6582 2.4520 2.5371 2.5088 2.4850 2.4069 2.311940 2.6483 2.4254 2.5138 2.4829 2.4613 2.3936 2.301650 2.6297 2.3793 2.4707 2.4352 2.4198 2.3738 2.282560 2.6124 2.3406 2.4317 2.3924 2.3844 2.3596 2.264970 2.5960 2.3076 2.3961 2.3537 2.3538 2.3485 2.248680 2.5804 2.2792 2.3636 2.3188 2.3269 2.3393 2.2333

2-D Proteomics Maps by Matrix Invariants research articles

Journal of Proteome Research • Vol. 1, No. 3, 2002 219

Page 4: On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

which we have listed the leading eigenvalue of D/D matrixesfor the control group and the six concentrations of LY171883considered.

Higher Order nD/nD Matrixes. A single invariant, even if itoffers some interpretation, is by far too limited descriptor foruseful characterization of 2-D maps. We need one dozen, twodozen, and even more such descriptors if we hope to capturemore information on similarities and dissimilarities of differentmaps. A possible route to additional matrix invariants is toperform some matrix operation on the D/D matrix, such asraising matrixes to higher power, and thus generating ad-ditional map matrixes. This can be accomplished in two ways,either (1) to consider the Kronecker (or Hadamard) multiplica-tion of matrixes in which individual matrix elements aremultiplied (or raised to higher powers) or (2) to consider thestandard matrix multiplication. The Kronecker (or Hadamard)product of matrixes is referred to as “element-by-element arraymultiplication” in the MATLAB tutorial.30 MATLAB is softwaretool suitable for manipulation of matrixes and arrays andvisualization of the results of such operations.31 Since the matrixelements of D/D matrixes are already less than, or at most equalto 1, we decided to construct additional invariants applyingthe Kronecker product on the D/D matrix. In this way, wegenerate nD/nD matrixes of ever increasing power. In theremaining rows of Table 4 we have listed the leading eigen-values of so-constructed higher order nD/nD matrixes startingwith the exponent n ) 1 (which is the already considered D/Dmatrix) and stopping at n ) 80.

Each column in Table 4 corresponds to the leading eigen-value of the seven maps (the control map and the six mapscorresponding to the six different concentrations of LY171883).We included in Table 4 the results only for the powers 1-10,and then we show the eigenvalues after increasing the powersby 5 (from 10 to 40), and finally we show the eigenvalues afterincreasing the powers by 10 (from 40 to 80). We carried ourcalculations using all 80 powers, that is, an 80-componentvector (column) represents each map. There are two questionto consider: (1) how well the 80-component vectors character-ize the seven individual maps and (2) what is the smallestnumber of components of a vector that offers useful charac-terization of selected properties.

The plot of the leading eigenvalues against the exponent nthat defines the nth order nD/nD matrix is shown in Figure 4.The curve correspond to the leading eigenvalues of the controlgroup while the leading eigenvalues of the six different dosesof LY171883 show very similar dependence on n. Hence, allseven curves show the same general shape with some but notpronounced differences in details of the curves. To magnifythese minor differences in Figure 5a-f we show plots in whichinstead of the leading eigenvalues we chosen the quotient ofthe leading eigenvalues for various concentrations and thecorresponding leading eigenvalue of the control group. As hasbeen outlined in ref 5, this approach may for differentcompounds and concentration produced considerably differentcurves, but as we see from Figure 5-f no simple pattern ofquotient curves emerges that would suggest a regular increaseof dose concentrations in mouse diets. Hence, we have to resortto novel avenues of processing the numerical data of Table 4.

Principal Component Analysis (PCA) on nD/nD Eigenvalues.Principal component analysis is a multivariate technique forexamining relationships among several quantitative variables.The method originated 100 years ago by Pearson32 and was fullydeveloped later by Hotelling.33 We will use the principal

component analysis to evaluate derived numerical character-izations of the 2-D proteomics maps. In the Appendix, we haveoutlined a brief mathematical description of PCA. To analyzethe eigenvalues of Table 4 and their quotients illustrated inFigure 5a-f we will simplify the input data and consider vectorshaving 24 rather than 80 components. Besides the first 10eigenvalues associated with powers 1-10, we included for theremaining components only powers that increase by five: 15,20, 25, ..., 75, 80. However, to test use of novel map descriptors,we would need to have a large number of experimental pointsfor fitting. In view that we have in all seven input doseconcentrations (including the control group which has zeroconcentration of LY171883 in food diet), we have to reducefurther the number of vector components. We decided to usevectors with six components. To select descriptors we haveexamined more closely the leading eigenvalues of Table 4 andnoticed that the relative magnitudes vary as we change n, theexponent, which runs from n ) 1 to n ) 80. In Table 5, wehave listed the variation of the relative magnitudes by usinglabels 1-7, where 1 belongs to the largest leading eigenvalueand 7 to the smallest leading eigenvalue. As we see from Table5, in all there are eight different relative orderings associatedwith the displayed powers. These are: n ) 1, 2, 3-6, 7-10, 15,20-30, 35-50, and 60-80. For the six components of vectorscharacterizing maps, we have selected n ) 1, 5, 10, 20, 40, and80; thus, after n ) 5 each successive choice for the componentinvolves the exponent n, which has been doubled. Hence, wehave dropped the eigenvalue for n ) 2 and n ) 15 and keptonly one of the egenvalues of each of the remaining six groups.As will be seen from the following paragraph, the approachappears robust and alternative selections produce similarresults.

The next question to decide is how many descriptors willbe used in PCA. We performed the PCA using as input the sevenvectors corresponding to seven dose concentrations havingfrom three to five components, selected from the following sixpowers (1, 5, 10, 20, 40, 80). The components correspond tothe leading eigenvalues of concentrations of LY171883 in mousediet from 0 to 0.6. The selection of the six powers (1, 5, 10, 20,40, 80) is to some degree arbitrary, because equally otherchoices are possible, such as, for example, (3, 6, 15, 30, 50, 70).

Figure 4. Plot of the leading eigenvalue of nD/nD matrix againstthe exponent n for the control group.

research articles Randic et al.

220 Journal of Proteome Research • Vol. 1, No. 3, 2002

Page 5: On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

However, as long as a choice covers the whole range of thepowers, as the both above choices do, one may expect similarresults. This is because the leading eigenvalues of alternativechoices themselves are highly interrelated. Let consider moreclosely the above two choices for the case of the control group(given in the first column of eigenvalues of Table 4), which

translate into the following two vectors with components ofthe leading eigenvalues: (9.7811, 3.0265, 2.7310, 2.65924, 2.6483,2.5804) and (3.4662, 2.9237, 2.7074, 2.6687, 2.6297, 2.5960),respectively. A correlation between the corresponding compo-nents of the two vectors is shown in Figure 6. When the plot isfitted by cubic polynomial we obtain for the correlation

Figure 5. Plot of the quotient of the leading eigenvalue of nD/nD matrixes corresponding to various concentrations of LY171883 in dietand the leading eigenvalue of the control group against the exponent n.

2-D Proteomics Maps by Matrix Invariants research articles

Journal of Proteome Research • Vol. 1, No. 3, 2002 221

Page 6: On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

coefficient r ) 1.0000, the standard error s ) 0.005, and theFisher ratio F ) 465072. In Table 6, we show the r, s, and Fvalues for the similar correlations of the remaining six vectorscorresponding to different concentrations of LY17188 in micefood diet constructed by alternative choice of the powers ofD/D matrixes. As we see all cases show very great degree ofinterrelation that well exceed in quality the correlations be-

tween map descriptors and the considered concentrations ofLY17188 shown in Table 7.

As we see from Table 7, where we summarized the statisticsfor the principal component regressions based on three to fivecomponents, the first three PCA components do not offer usefulregression. This perhaps is not surprising, but nevertheless itis disappointing to see that three PCA components that accountfor 98.46% of variance of data do not yield useful regression.On the other hand, correlations based on four and five PCAcomponents give satisfactory results. However, fitting of sevenexperimental dose points with five descriptors may representan over-fitting. We are not questioning that aspect of theregression, but want to point out that our main objective wasnot so much as to arrive at a regression but to demonstratethat already a set of eight eigenvalues of selected nD/nD matrixeshave the capacity to capture and characterize the main featuresof a proteomics map and reduce the initial 99 data points tovectors having only five PC as components.

Multivariate Regression Analysis of nD/nD Eigenvalue.Finally, we decided to perform stepwise multivariate regressionusing the leading eigenvalues of nD/nD matrixes as descriptors.In Table 8, we give the statistical details of the stepwiseregression using from two to five descriptors. The selected fivedescriptors in their order of appearance in the stepwiseregression are 1, 40, 5, 80, 10. The order is based on the so-called “greedy algorithm” in which the first descriptor is thebest single descriptor among the eight possibilities, the secondone is the best from the remaining seven possibilities, and so

Figure 6. Fitting of cubic polynomial to plot of components ofvector (3, 6, 15, 30, 50, 70) against the corresponding componentsof vector (1, 5, 10, 20, 40, 80).

Table 5. Relative Magnitudes of the Leading Eigenvalues forDifferent Concentrations and Different Powers of nD/nD

Matrices

power 0 0.003 0.01 0.03 0.1 0.3 0.6

1 1 5 3 2 4 6 72 3 7 1 2 6 4 53 2 7 4 3 6 1 54 2 7 4 3 6 1 55 2 7 4 3 6 1 56 2 7 4 3 6 1 57 2 6 3 4 5 1 78 2 6 3 4 5 1 79 2 6 3 4 5 1 7

10 2 6 3 4 5 1 715 1 6 2 3 4 5 720 1 5 2 3 4 6 725 1 5 2 3 4 6 730 1 5 2 3 4 6 735 1 5 2 3 4 6 740 1 5 2 3 4 6 750 1 5 2 3 4 6 760 1 6 2 3 4 5 770 1 6 2 4 3 5 780 1 6 2 4 3 5 7

Table 6. Correlation Coefficient (r), the Standard Error (s), andthe Fisher Ratio (F) for Correlation among Vectors Using D/DMatrxes with Powers (1, 5, 10, 20, 40, 80) and Powers (3, 6, 15,30, 50, 70) for Different Concentrations of LY17188 Based onCubic Polynomial

concn (M) r s F

0 1.000 00 0.0055 465 0720.003 0.999 96 0.0419 81230.01 1.000 00 0.0128 86 5790.03 1.000 00 0.0042 665 6440.1 0.999 99 0.0206 33 6560.3 0.999 96 0.0399 87720.6 0.999 99 0.0244 24 258

Table 7. Principal Component Regression Based on Use ofThree to Five Components as Descriptorsa

r s F

3 0.3738 0.296 0.24 0.9925 0.048 32.95 0.9997 0.014 326.3

PC1 PC2 PC3 PC4 PC5 Cons.

69.38% 22.02% 7.06% 1.50% 0.04%0.162 533 0.398 297 -0.937 406 6.820 919 -5.685 923 0.149 0000.0030 0.0100 0.030 0.100 0.300 0.600

3 0.1355 -0.0223 0.117 0.214 0.203 0.1874 0.0057 0.0089 0.028 0.065 0.273 0.6125 0.0025 0.0091 0.033 0.105 0.289 0.604

a R, S, and F designate the correlation coefficient, the standard error, andthe Fisher ratio, respectively.

Table 8. Multivariate Linear Regression Using from Two toFive Descriptorsa

r s F

2 0.9234 0.106 123 0.9390 0.110 74 0.9702 0.095 85 0.9994 0.018 185

parameter value error t value probability

2 A 0.02195 0.04068B 0.85266 0.15851

3 A 0.01762 0.03706 0.47543 0.65453B 0.88176 0.14440 6.10617 0.00171

4 A 0.00875 0.02698 0.32428 0.75885B 0.94127 0.10514 8.95223 0.00029

5 A 0.00016 0.00377 0.04274 0.96752B 0.99946 0.01471 67.92877 <0.0001

a A and B are the coefficients in a linear regression Y ) Ax + B.

research articles Randic et al.

222 Journal of Proteome Research • Vol. 1, No. 3, 2002

Page 7: On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

on. It turns out that the best single descriptor is the leadingeigenvalue λ1 of nD/nD for n ) 1, briefly λ1 (1).

Observe from Table 8 that already two descriptors give agood regression with the regression coefficient r ) 0.9234.Naturally, as we increase the number of descriptors we obtainbetter and better regression. With five descriptors the regressioncoefficient r ) 0.9994, the standard error below 0.02 and theFisher ratio F ) 185. However, one must be at guard in view ofrather small number of the experimental points (doses) to befitted. On one side we have seen (Table 7) that already twoprincipal components account for 91.4% of the variance, threeprincipal components account for 98.6% and four principalcomponents account for 99.96%. Hence, use of five descriptors,with 100% account of the variance, is not likely to offerstatistically significant results. That already two descriptors offera good regression is noteworthy because it clearly demonstratethat the leading eigenvectors of nD/nD matrixes do encodeimportant information on proteomics maps.

To find out if we may use three and four descriptors tocharacterize proteome/dose response we have constructedtwenty additional MRA regression in which the experimentaldata (the seven dose concentrations) were permuted at ran-dom. The seven-digit numbers in the first column of Table 9are built from random selected permutations of first 7 digits.Thus, the first entry 5361274 (which stands for the order 5, 3,6, 1, 2, 7, 4) means that the seven doses: 0, 0.003, 0.01, 0.03,0.1, 0.3, 0.6 that correspond to the order given by 1234567 havebeen used in the following order: 0.1, 0.01, 0.3, 0, 0.003, 0.6,0.03. Using two, three, and four descriptors selected in astepwise fashion we preformed MRA and recorded the standarderrors and the regression coefficients, which are listed in Table9. The random “events” were ordered by increasing regressioncoefficients when four descriptors are used (shown in the lastcolumn of Table 9). As we see from Table 9 most of randomlyconstructed regressions have low correlation coefficients,certainly smaller than the critical values of r ) 0.9234 for twovariables, r ) 0.9390 for three variables, and r ) 0.9702 for fourvariables, respectively. Only two out of twenty regressions(printed in bold in Table 9) show correlation coefficient better

than the above critical values. This suggests that the probabilityfor a chance correlation be at most 1/10. In the last row ofTable 9 we included an additional “random” regression, whichhowever was constructed so at to produce good correlation byselecting permutation of data accordingly. This is merely toshow that when considering random data points one can getalso good regression. What is, however, more important is thatmany of randomly produced data cannot be well described bythe map invariants considered which results in correspondinglow regression coefficients.

In Figure 7a-c, we show the plot in which predicted dosesderived from a stepwise regression using from three to fivedescriptors are shown against the experimental dose concen-trations. The figures visually demonstrate that the leadingeigenvalue of nD/nD matrixes can serve as biodescriptors forproteomics maps.

Concluding Remarks

Visual inspection of proteomics maps, which is currentlyoften considered, has obvious limitations. It seems thereforeimperative to develop some numerical characterizations of suchmaps that will assist in establishing similarities and dissimilari-ties of different maps, and serve as biodescriptors of proteomicsmaps. While neither the zigzag approach toward constructionof nD/nD matrixes nor the leading eigenvalue of nD/nD matrixesare the only possible route to characterization of maps, theyclearly point out that characterization of proteomics maps ispossible using even a limited number of map invariants thatare extracted from matrixes associated with maps. Elsewhere,4,5

we described use of partial ordering as an intermediate stepin order to arrive at a graph embedded on a map, from whichthen graph invariants can be constructed and consulted.Alternative approaches, currently under investigation, includeconstruction of embedded graphs over map either obtainedby use of the relative abundances as a guide in connecting spotsof a map,34 using of the Voronoi polygons and correspondingDeleney triangulation (its dual) which is uniquely defined fora map,35 and combining abundances of adjacent spots inconstruction of novel matrixes.36,37 Finally, we should mentionwhat appears a promising direction in which the Laplacianmatrix is constructed for a graph embedded on a map, whichthen allows use of all eigenvalues of Laplacian matrix as mapinvariants.38 Each such an approach is likely to capture differentaspects of the geometry of protein spot distributions inproteomics maps, and hence combined they may offer novelnumerical characterizations of maps. An obvious advantage ofnumerical invariants is to allow computer processing of pro-teomics maps as input data. The ultimate goal is to integratebiodescriptors that characterize proteomics maps and chemo-descriptors that characterize chemical substances used inexperiments (here LY17188) causing the proteome variation.This will be in principle possible, at least for cases whenchanges in the relative abundance of proteins are due todifferent chemicals (toxicants). In summary the present workrepresents an extension of use of topological indices as widelyapplied to QSAR (Quantitative Structure Activity Relationship)13-16

and topological indices as used for characterization of DNA8-12

to use of topological indices and structural invariants for mapcharacterization. We hope that the present outline of charac-terization of dose related changes in proteomics on peroxisomeproliferator LY17188 by map invariants shows some promisefor use of matrix invariants as a tool for characterization ofproteomics maps.

Table 9. Statistical Parameters for Stepwise RegressionsUsing from Two to Four Descriptors for 20 RandomPermutations of the LY1711883 Doses

d1, d2 d1, d2, d3 d1, d2, d3, d4

random descriptors s r s r s r

5361274 10, 1, 80, 40 0.258 0.357 0.296 0.374 0.356 0.4116513472 20, 5, 10, 40 0.252 0.408 0.280 0.482 0.336 0.511

7362 145 5, 20, 80, 1 0.229 0.562 0.226 0.598 0.308 0.6157361245 5, 20, 80, 1 0.228 0.568 0.255 0.601 0.307 0.6187351426 5, 1, 40, 80 0.241 0.488 0.260 0.580 0.300 0.6422316745 20, 1, 10, 5 0.243 0.476 0.233 0.683 0.281 0.6967156243 80, 5, 40, 1 0.224 0.587 0.236 0.675 0.269 0.7254253176 1, 80, 5, 10 0.220 0.606 0.249 0.640 0.300 0.7552631745 1, 20, 80, 5 0.203 0.680 0.202 0.773 0.242 0.7856425713 5, 10, 1, 20 0.191 0.725 0.196 0.789 0.186 0.8802471365 10, 5, 1, 80 0.136 0.871 0.155 0.873 0.186 0.8804127536 80, 1, 20, 10 0.149 0.843 0.151 0.881 0.183 0.8843745162 5, 1, 10, 20 0.182 0.754 0.148 0.885 0.179 0.8891462537 80, 1, 40, 10 0.215 0.628 0.136 0.905 0.162 0.9106351427 1, 40, 80, 20 0.129 0.884 0.141 0.898 0.159 0.9135126437 1, 40, 80, 20 0.152 0.834 0.134 0.907 0.159 0.9141465327 10, 1, 20, 40 0.237 0.517 0.144 0.892 0.158 0.9153547126 80, 5, 10, 20 0.142 0.857 0.149 0.884 0.130 0.9365742631 1, 10, 80, 5 0.054 0.981 0.061 0.982 0.040 0.9825273416 10, 5, 1, 20 0.103 0.963 0.109 0.940 0.060 0.9885627134 80, 40, 20, 10 0.062 0.974 0.052 0.987 0.058 0.989

2-D Proteomics Maps by Matrix Invariants research articles

Journal of Proteome Research • Vol. 1, No. 3, 2002 223

Page 8: On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

Acknowledgment. M.R. thanks Professor Jure Zupanand the National Institute of Chemistry in Ljubljana, Slovenia,for the kind invitation and support for his annual visits toLjubljana. We would also like to thank Professor Zupan for hisinterest in this work and useful comments on the manuscript.This is contribution no. 311 from the Center for Water andEnvironment of the Natural Resources Research Institute.Research in this paper was supported in part by Grant No.F49620-01-1-0098 from the United States Air Force.

Appendix

Principal Component Analysis (PCA). PCA40 is usuallyapplied for analyzing many-dimensional data sets. In PCA, wesearch for novel coordinates for the set of data consideredwhich optimally coincide with the direction of the maximalvariation in the data. The transformed coordinates must beorthogonal to each other. The direction of the maximal

Figure 7. Plot of concentration (doses) predicted against theexperimental concentration obtained from MRA using from threeto five descriptors.

Figure 8. Plot of concentration (doses) predicted against theexperimental concentration obtained from PCA using from threeto five descriptors. PCA based on three descriptors does not givestatistically significant results.

research articles Randic et al.

224 Journal of Proteome Research • Vol. 1, No. 3, 2002

Page 9: On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

variation becomes the first new coordinate axis. The secondaxis is in a plane perpendicular to the first axis and its directionis found by rotating it around the first axis until variation alongthe new axis is maximal. The procedure is repeated m-timesfor m-dimensional data set.

In matrix notation, it can be shown that the PCA is basedon a decomposition of the data matrix X into two matrixes Vand U.

VT is transpose of matrix V. The matrixes V and U areorthogonal. The matrix V is usually referred to the loadingsmatrix, and U is the scores matrix. In other words, the matrixU contains the original data in a rotated coordinate system:

The main goal of PCA is to determine the loading matrix V,which defines the transformation from old to new coordinates.During the mathematical procedure, first the scattered matrixZ is calculated from the data matrix X:

Then the matrix Z is diagonalized so that its eigenvalues andeigenvectors are determined:

where m is the number of columns (variables) in X. Theeigenvectors ei are rows of the loading matrix V.

Two ways of normalization of original data are used: (1)mean centered data and (2) as standardized data. To obtainmean centered data the mean value in each column issubtracted from every column element. As a consequence allthe elements in each column add to zero. In the standardizeddata normalization the mean value in each column is sub-tracted from every column element and divided by the standarddeviation of the column elements. In the case of usingnormalized data matrix X the Z matrix becomes the variance-covariance matrix in the case of mean centered data oralternatively becomes the correlation matrix in the case of thestandardized data normalization.

Multivariate Regression Analysis (MRA). The MRA41 is astandard tool in linear statistics, often applied to multidimen-sional data sets. It is used for procedures in which a correlationbetween the multidimensional objects and their properties aresought. Let the property of an ith object from the data set bethe concentration ci, while the object representation vector bexij, j ) 1, 2, ... m for m descriptors (that is m components ofthe representation vector). A set of equations can be writtenfor the data set describing a linear relationship between theproperty (concentration ci) and representation vectors xij:

The above equations are written for n data vectors ofdimension m. In matrix notation the above set equationbecomes

Here, c is the vector of properties (dependent variables), Xis independent variables matrix, p is the vector of parametersto be estimated, and ε is the error vector. It can be shown43

that by minimizing the sum of the squared residuals, the least-squares estimates of p is obtained, denoted here as b:

The estimated properties c are calculated with the abovecoefficients given by the regression equation

From the estimated and the experimental values the stan-dard error s, an estimate of the experimental error, is deter-mined:

The obtained regression parameters define the MRA model.To evaluate the model additional parameters are consideredincluding the correlation coefficient r, and the Fisher ratio F.Ideally r should tend to the value 1, the standard error s tozero, and F to be as large as possible. The Fisher ratio F isdefined as the ratio of the mean sum of squares sue to lack offit and the mean sum of squares due to pure error (see pp 270-277 of ref 41). The larger is the F value the higher is the qualityof the particular regression in comparison with other regres-sions, when other factors remain constant. The experimentaland the estimated properties are linearly related:

A, the constant, is interpreted as intercept and B as the slopein parallel with the equation of line (y ) a + bx) in a plane.

References

(1) Anderson, N. L.; Esquer-Blasco, R.; Richardson, F.; Foxworthy,P.; Eacho, P. The effects of peroxisome proliferators on proteinabundances in mouse liver. Toxicol. Appl. Pharmacol. 1996, 137,75-89.

(2) Randic, M.; Kleiner, A. F.; DeAlba, L. M. Distance/distancematrices. J. Chem. Inf. Comput. Sci. 1994, 34, 277-286.

(3) Randic, M. On graphical and numerical characterization ofproteomics maps. J. Chem. Inf. Comput. Sci. 2001, 41, 1330-1338.

(4) Randic, M.; Zupan, J.; Novic, M. On 3-D graphical representationof proteomics maps and their numerical characterization. J.Chem. Inf. Comput. Sci. 2001, 41, 1339-1334.

(5) Randic, M.; Witzmann, F.; Vracko, M.; Basak, S. C. On charac-terization of proteomics maps and chemically induced changesin proteomics using matrix invariants: Application to peroxisomeproliferators. Med. Chem. Res. 2001, 10, 456-479.

(6) Randic, M. A graph theoretical characterization of proteomicsmaps. Int. J. Quantum Chem., in press.

(7) Randic, M.; Basak, S. C. A comparative study of proteomics mapsusing graph theoretical biodescriptors. J. Chem. Inf. Comput. Sci.,in press.

(8) Randic, M. Condensed representation of DNA primary sequencesby condensed matrix. J. Chem. Inf. Comput. Sci. 2000, 40, 50-56.

(9) Randic, M.; Vracko, M. On the similarity of DNA primarysequences. J. Chem. Inf. Comput. Sci. 2000, 40, 599-606.

(10) Randic, M.; Vracko, M.; Nandy, A, Basak, S. C. On 3-D graphicalrepresentation of DNA primary sequences and their numericalcharacterization. J. Chem. Inf. Comput. Sci. 2000, 40, 1235-1244.

(11) Randic, M.; Basak, S. C. Characterization of DNA primarysequences based on the average distance between bases. J. Chem.Inf. Comput. Sci. 2001, 41, 561-568.

(12) Guo, X.; Randic, M.; Basak, S. C. A novel 2-D graphical repre-sentation of DNA sequences of lower degeneracy. Lett., in press.

X ) VTU

U ) VX

Z ) XT X

Zei ) λiei

ZE ) Ediag (λ1, ..., λm)

c1 ) p0 + x11p1 + x12p2 + ... + x1mpm + ε1

c2 ) p0 + x21p1 + x22p2 + ... + x2mpm + ε2

...

cn ) p0 + xn1p1 + xn2p2 + ... + xnmpm + εn

c ) Xp + ε

b ) (XTX)-1XTc

c ) Xb

s2 ) [∑(ci - ci)2]/[n - m - 1]

c ) A + Bc

2-D Proteomics Maps by Matrix Invariants research articles

Journal of Proteome Research • Vol. 1, No. 3, 2002 225

Page 10: On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants

(13) Randic, M. On characterization of chemical structure. J. Chem.Inf. Comput. Sci. 1997, 37, 672-687.

(14) Balaban, A. T. A personal view about topological indices forQSAR/QSPR. In: QSPR/QSAR Studies by Molecular Descriptors;Diudea, M. V., Ed.; Nova Sci. Publ. Inc.: Huntington, NY, 2001;Chapter 1, pp 1-30.

(15) Randic, M. Topological indices. In The Encyclopedia of Compu-tational Chemistry; Schleyer, P. v. R., Editor-in-chief; J. Wiley andSons: London 1998; pp 3018-3032.

(16) Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors;Mannhold, R., Kubinyi, H., Timmerman, H., Eds.; Wiley-VCH:New York; Vol. 11, Methods and Principles in Medicinal Chem-istry.

(17) Randic, M. On characterization of the conformations of nine-membered rings. Int. J. Quantum Chem: Quantum Biol. Symp.1995, 22, 61-73.

(18) Randic, M. Molecular bonding profiles. J. Math. Chem. 1996, 19,375-392.

(19) Randic, M.; Krilov, G. Bond profiles for cuboctahedron and twistcuboctahedron. Int. J. Quantum Chem: Quantum Biol. Symp.1996, 23, 127-139.

(20) Randic, M.; Krilov, G. On characterization of 3-D sequences ofproteins. Chem. Phys. Lett. 1997, 272, 115-119.

(21) Randic, M.; Krilov, G. On characterization of the folding ofproteins. Int. J. Quantum Chem. 1999, 75, 1017-1026.

(22) Harary, F. Graph Theory; Addison-Wesley: Reading, MA 1969.(23) Kowalski, B. R. Bender, C. F. A powerful approach to interpreting

chemical data. J. Am. Chem. Soc. 1972, 94, 5632-5639.(24) Trinajstic, N. Chemical Graph Theory; CRC Press: Boca Raton,

FL, 1992(25) Wiener, H. Structural determination of paraffin boiling points. J.

Am. Chem. Soc. 1947, 69, 17-20.(26) Balaban, A. T. Highly discriminating distance-based topological

index. Chem. Phys. Lett. 1982, 89, 399-404.(27) Randic, M.; Vracko, M.; Novic, M.; Basak, S. C. On ordering of

folded structures. MATCH 2000, 42, 181-231.(28) Randic, M.; Krilov, G. On characterization of molecular surfaces.

Int. J. Quantum Chem. 1997, 65, 1065-1076.

(29) Randic, M.; Vracko, M.; Novic, M. Eigenvalues as moleculardescriptors. In QSPR/QSAR Studies by Molecular Descriptors;Diudea, M., Ed.; Nova Sci. Publ.: New York, 2001.

(30) Hanselman, D. Littlefield, B. Mastering MATLAB 5; PrenticeHall: Upper Saddle River, NJ, 1998.

(31) The Mathworks, Inc.(32) Pearson, K. On lines and planes of closest fit to systems of points

in space. Philos. Mag. 1901, 6, 559-572.(33) Hotelling, H. Analysis of a complex of statistical variables into

principal components. J. Educ. Psychol. 1933, 245, 417-441, 498-520.

(34) Randic, M.; Bajzer, Z. On characterization of proteomics mapsby clustering approach. Manuscript in preparation.

(35) Randic, M.; Pisanski, T.; Vracko, M.; Novic, M. Zupan, J. Char-acterization of proteomics maps by Delaney triangulation ap-proach. Manuscript in preparation.

(36) Randic, M.; Pisanski, T.; Vracko, M.; Novic, M. Zupan, J. On useof differences in abundances as weights for characterization ofproteomics maps based on adjacency matrix of embeddedgraphs. Manuscript in preparation.

(37) Randic, M.; Plavsic, D. On use of average abundance as weightsfor characterization of proteomics maps based on adjacencymatrix of embedded graphs. Manuscript in preparation.

(38) Randic, M.; Vracko, M. On Characterization of 2-D ProteomicsMaps by Eigenvalues of Laplacian Matrix. Manuscript in prepara-tion.

(39) Basak, S. C. Private communication.(40) Massart, D. L.; Vandeginste, B. G. M.; Buydens, L. M. C.; De Jong,

S.; Lewi, P. J.; Smeyers-Verbveke, J. Handbook of Chemometricsand Qualimetrics: Part A; Elsevier: Amsterdam, 1997; pp 519-556.

(41) Massart, D. L.; Vandeginste, B. G. M.; Buydens, L. M. C.; De Jong,S.; Lewi, P. J.; Smeyers-Verbveke, J. Handbook of Chemometricsand Qualimetrics: Part A; Elsevier: Amsterdam, 1997; pp 263-295.

(42) Draper, N. R.; Smith, H. Applied Regression Analysis, 2nd ed.; JohnWiley: New York, 1981.

PR0100117

research articles Randic et al.

226 Journal of Proteome Research • Vol. 1, No. 3, 2002