31
Quality and Quantity, 12 (1978) 267-297 267 0 Elsevier Scientific Publishing Company, Amsterdam - Printed in The Netherlands THE PROBLEM OF MULTICOLLINEARITY IN A MULTISTAGE CAUSAL ALIENATION MODEL: A COMPARISON OF ORDINARY LEAST SQUARES, MAXIMUM-LIKELIHOOD AND RIDGE ESTIMATORS PETER SCHMIDT Seminar fiir Sozialwissenschaften, University of Hamburg EDWARD N. MULLER Department of Political Science, State University of Tucson at Arizona 1. Introduction Multistage causal modelling such as path analysis has been intro- duced into sociology and political scienceas a means of estimating par- ameters of complex empirical theories. However the stability of param- eter estimates depend highly on the degreeto which key assumptions of the estimation techniques are not seriously violated. One assumption refers to the multicollinearity between the independent variables of a given model. As Rockwell (1975) has noted, there is more than one assumption concerning the multicollinearity problem : Assumption 1 (Johnston, 1963: pp. 106-108) A nonsingular corre- lation matrix of the independent variables. Assumption 1’ (Heise, 1969: p. 57) The sources of variation for each variable in the system must be sufficiently diverse such that the corre- lations between variables are not extremely large in absolute magnitude. Whereasassumption 1 is determined easily mathematically, there are no clearcut guidelines for determining assumption 1’. Given that assumption 1 is not violated in our paper we first want to compare how multicollinearity can under assumption 1’ be assessed at every stageof a given “causal model” by three different methods: (a) Multiple correlation between the independent variables of every causalstage; (b) Haitovsky test; and (c) Determinant and eigenvalues of the correlation matrix of the independent variables. Second we want to analyze how the parameter estimates and stan-

The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

Embed Size (px)

Citation preview

Page 1: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

Quality and Quantity, 12 (1978) 267-297 267 0 Elsevier Scientific Publishing Company, Amsterdam - Printed in The Netherlands

THE PROBLEM OF MULTICOLLINEARITY IN A MULTISTAGE CAUSAL ALIENATION MODEL:

A COMPARISON OF ORDINARY LEAST SQUARES, MAXIMUM-LIKELIHOOD AND RIDGE ESTIMATORS

PETER SCHMIDT

Seminar fiir Sozialwissenschaften, University of Hamburg

EDWARD N. MULLER

Department of Political Science, State University of Tucson at Arizona

1. Introduction

Multistage causal modelling such as path analysis has been intro- duced into sociology and political science as a means of estimating par- ameters of complex empirical theories. However the stability of param- eter estimates depend highly on the degree to which key assumptions of the estimation techniques are not seriously violated. One assumption refers to the multicollinearity between the independent variables of a given model. As Rockwell (1975) has noted, there is more than one assumption concerning the multicollinearity problem :

Assumption 1 (Johnston, 1963: pp. 106-108) A nonsingular corre- lation matrix of the independent variables.

Assumption 1’ (Heise, 1969: p. 57) The sources of variation for each variable in the system must be sufficiently diverse such that the corre- lations between variables are not extremely large in absolute magnitude.

Whereas assumption 1 is determined easily mathematically, there are no clearcut guidelines for determining assumption 1’. Given that assumption 1 is not violated in our paper we first want to compare how multicollinearity can under assumption 1’ be assessed at every stage of a given “causal model” by three different methods:

(a) Multiple correlation between the independent variables of every causal stage;

(b) Haitovsky test; and (c) Determinant and eigenvalues of the correlation matrix of the

independent variables. Second we want to analyze how the parameter estimates and stan-

Page 2: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

268

dard errors of regression coefficients computed by the Ordinary Least Squares (OLS) Method and by the Maximum-Likelihood-Method (ML) are affected by multicollinearity. Then we discuss a method to com- pensate for multicollinearity: Ridge Regression. After a general descrip- tion of Ridge Regression we apply this method to our data and com- pare the parameter estimates of OLS, ML and Ridge Regression. Finally we discuss shortly how the ridge approach could be combined with the Linear Structural Relations System which uses Maximum Likelihood Estimation.

2. Diagnosis of Multicollinearity in an Alienation model of Political Pro test

A common explanation of political protest in the sociological and political science literature has entailed the protohypothesis that protest is a function of alienation. Many different “operational definitions” of the alienation concept have been proposed, and these have given rise to different hypotheses about protest. One operationalization of alien- ation is based on the concept of “political trust” or “diffuse support” (Easton, 1965; Gamson, 1968; Aberbach and Walker, 1970; Paige, 197 1; Muller, 1972). Political trust is an attitude of generalized satisfac- tion with the overall performance and/or structure of the political sys- tem. Another proposal to measure alienation stresses the importance of the concept “powerlessness” (Zeitlin, 1966; Ransford, 1968; Allardt, 1970; Crawford and Naditch, 1970; Finifter, 1970; Seeman, 1972). The idea of powerlessness refers to a feeling that one’s own abilities are insufficient to obtain the rewards or outcomes that one seeks in life; specifically political powerlessness refers to a feeling that individuals like oneself are unable to influence political decisions and outcomes.

A plausible multistage model linking these alienation variables to protest potential might run as follows [ I] :

(1) If an individual feels that the political system affords little or no opportunity for the average member to exert influence in political affairs, he will come to distrust the political authorities in general. In addition, if he is a member of a low-status solidary group which has experienced a history of discrimination in a society, he will learn to dis- trust the political authorities, regardless of powerlessness, because, due to childhood socialization and adult life experiences, he will come to believe that the authorities as a whole are unjust, not beneficent, and, in general, cannot be trusted to act in his interest. Hence, political powerlessness and solidary group status will jointly determine political distrust.

Page 3: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

269

(2) If an individual strongly distrusts the political authorities as a whole, he will be susceptible to mobilization for protest actions against the government. Since his feeling of political powerlessness will have been generalized into a sweeping rejection of the political authorities, political powerlessness, per se, will be related to protest potential only indirectly, via its effect on political distrust. Hence, political powerless- ness will not have an independent effect on protest potential. Solidary group status, on the other hand, will have a small direct effect on pro- test potential because members of such groups may feel other kinds of dissatisfaction, apart from political distrust, which would motivate them to have a readiness for protest.

2.1. MEASUREMENT OF VARIABLES IN THE ALIENATION MODEL [ 21

A 90 minute interview schedule was administered to two samples of residents (18 years and over) of Waterloo, Iowa, a community where acts of political protest had been relatively frequent in the past. The Citizen sample is a disproportionately stratified (by race) cluster sam- ple of 503 persons. On the basis of reports obtained from the Citizen sample, an Influential sample was subsequently interviewed. This sam- ple consists of I 15 persons identified as influential members of various organizations in the community.

Respondents were asked whether they approved or disapproved of, and would engage or not engage in, a series of behaviors ranging from legally permitted protest which is supportive of the regime (in the sense that it does not violate regime rules about allowable means of partici- pating in the political process), through illegal protest which adds dis- advantages to the regime but does not involve the use of violence, to illegal protest which does involve the use of violence against regime per- sonnel and property. The “approval” items show a satisfactory degree of fit to the Guttman scale model, as evidenced by Reproducibility coefficients of .91 for the Citizen sample and .95 for the Influential sample; the “engage” items have the same rank order as the “approval” items, and their Reproducibility is .93 for the Citizens and .94 for the Influentials. Since these scales are strongly correlated, they are added together to form a single summary measure of readiness for increasingly extreme forms of political protest [ 31. This scale, labelled Potential for Violent Protest, ranges from 0 to 10.

The variable Solidary Group Status is measured by race. It is a dichotomy with the property “black” scored as 1 and the property “white” scored as 0. The Political Powerlessness variable is a scale con- structed by summing three items adapted from scales used by Campbell

Page 4: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

270

et al. ( 1954) and Easton and Dennis ( 1967) [ 41. It ranges from 0 to 18. Political distrust is. measured by a variable labelled Distrust Political Authorities. This scale was constructed by summing the standardized scores on 23 items which had been multiplied by the ratio of each item loading on the first component of principal components solution to the eigenvalue for that component IS]. It ranges from .OlO to -520.

2.2. SPECIFICATION OF THE CAUSAL MODEL

We can now state our propositions in the more explicit form of a verbal propositional inventory. These propositions are general “succes- sion laws” without explicit time lag between the independent and the dependent variable. They have the character of statistical laws for explaining the distributions of the dependent variables and not individ- ual values. But in terms of the general theory they are deterministic propositions with means (e.g. Namboodiri et al., 1975: pp. 76-80).

The inventory of propositions (Hi) consists of: Hr. The higher the mean Political Powerlessness, the higher the mean

Distrust in Political Authorities. Hz. The higher the mean Solidarity Group status, the higher the

mean Distrust in Political Authorities. Ha. The higher the mean Solidarity Group status, the higher the

mean Potential for Violent Protest. Hq. The higher the mean Distrust in Political Authorities, the higher

the mean Potential for Violent Protest. Specifically, this simplified version of an alienation model might be

diagrammed as shown in Fig. 1. Where the unbroken arrows denote direct causal effect, the curved double-headed arrow denotes correla- tion due to factors outside the model which are assumed not to affect the endogenous variables in the model (D and v), and the absence of an arrow denotes no direct causal effect. P stands for political power- lessness, S for solidary group status, D for political distrust, and V for protest potential.

Fig. 1. Alienation and protest potential.

Page 5: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

271

2.3. MEASURES OF MULTICOLLINEARITY

In this section we want to discuss three different measures of multi- collinearity, which refer to samples:

(a) The multiple correlation between the independent variables of every stage of a multistage causal model.

(b) The Haitovsky Test. (c) Determinant and eigenvalues of the matrix of correlation of the

independent variables. Finally we want to compare the adequacy of these different measures for diagnosing multicollinearity in structural equation models.

Let us now introduce the following definitions: D,. Multicollinearity = Departures from orthogonality in a set of

independent variables of one stage in a multistage causal model without linear dependence within this set.

DZ. Exact Multicollinearity = Linear dependence within a set of inde- dependent variables of one stage in a multistage causal model.

In the limiting case of exact multicollinearity (D2) the matrix of correlation coefficients of the independent variables becomes singular and the unknown regression coefficients cannot be computed. In the case of multicollinearity the unknown coefficients can mathematically be computed but there may be other unwanted consequences, which we discuss in Section 3.

2.3.1. Multiple correlation of independen t variables In the case of more than two independent variables the simple zero-

order correlation coefficient gives an underestimation of the degree of multicollinearity. Only in the special case when rii = Ri,j,k,l...m, is the simple zero-order correlation between two independent variables (for example the highest one) not an underestimation of the degree of mul- ticollinearity of one independent variable. As this is hardly ever the case in empirical situations, one may take as a measure of “local multi- collinearity” of every independent variable, the multiple correlation (%j,k,,...??J or its square (Ri2,j,k,,,..m ) of one independent variable in one stage of a multistage causal model with all other independent vari- ables of that stage (Farrar and Glauber, 1967: p. 102 et seq.). With these measures we can diagnose which of the different independent variables is affected most and which is affected least by multicollinear- ity.

Let us now apply these measures to our alienation model. We want to measure the multicollinearity of the independent variables for the last stage of the model with the dependent variable (0. Table I pre-

Page 6: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

272

TABLE I Correlations: Alienation theory of Potential for Violent Protest

Citizens (N = 479) Influentials (N = 111)

s P D V

Mean SD Mean SD

0.37 0.48 0.39 0.49 8.34 5.46 6.14 5.55 1.98 1.00 1.92 1.22 2.42 2.16 2.89 2.35

S P D s P D

s .33 52 - L P .28 .27 .63 .61 - - D .57 .52 .46 .71 .78 .69 - - V .36 .20 .45 .40 .20 .42

Underlined coefficients on the diagonal are the multiple R2 for the column variable dependent on the others, excluding V.

sents the correlations between the variables in the model. The correla- tion matrices contain our index of multicollinearity - R2 for the depen- dence of the column variable on the other two (I’ excluded) - along the diagonal. We have therefore computed R&,, Rimso and R&s for both samples. Among the Citizens, none of the variables appear to be seriously affected by multicollinearity. But in the case of the Influen- tial sample, both P and D are seriously affected by,multicollinearity.

What is still unclear with this formulation is to what extent multi- collinearity is or is not acceptable for the researcher. Rules of thumb such as that no zero-order correlation should be higher than .80 or the multiple correlation of one independent variable with the other inde- pendent variables should not be higher than .70 seem to us unsatisfac- tory, so long as one does not show to what extent the efficiency of the parameter estimates is affected. In the next two sections we therefore discuss global measures of multicollinearity which give a more precise explication of the seriousness of multicollinearity. These are the Haitovsky Test and a measure which is based on the eigenvalues of the correlation matrix of the independent variables.

2.3.2. The Haitovsky’Test Rockwell (1975) has recently proposed the use of an heuristic test

developed by Y. Haitovsky (1969) to diagnose multicollinearity in the correlation matrices of the “independent” variables. We first want to

Page 7: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

273

describe that test briefly. Then we apply the test to our data. Finally we discuss the adequacy of the test and Rockwell’s evaluation of it.

We need now the following definitions (Haitovsky, 1969: p. 487; Rockwell, 1975: p. 3 13):

R N””

= correlation matrix of explanatory variables = sample size

P = number of explanatory variables (1) K = [l +(2p+5)/6-N] IR,I = determinant of the correlation matrix of explanatory variables V =p(p - I)/2 = d egrees of freedom

Haitovsky’s chi-square statistic has the following form:

xzh(v) = K log,(l - IR,I > (2)

which is asymptotically distributed under the assumption of multivari- ate normality.

The interpretation of the test is as follows (Haitovsky, 1969: p. 487; Rockwell, 1975: p. 314):

(a) If there is a random sample and the assumption of multivariate normality is valid, probability levels of the tests can be interpreted. This means that one can test the null-hypothesis as to whether the popula- tion correlation matrix of the independent variables is singular.

(b) If there is no random sample and the assumption of multivariate normality is not valid, one can use the test as a descriptive device in the following sense : The smaller the values of the chi* statistic the more severe is the multi- collinearity. Now Rockwell (1975: p. 314) sets up the following thesis (TRI): “We propose that the term “multicollinear” be applied only to those ma- trices for which we are unable to reject the null hypothesis of singular- ity, thus giving a definitive meaning to the term.” In the following part of this section we discuss the adequacy of this thesis (Tni) by first applying the Haitovsky Test to our two sample correlation matrices and then review some further counterarguments against Tnl .

(a) Computation of xi for the Citizen sample:

IR,I = .49 N = 479 p = 3 K =11+(2x3+5)/6-4791

Now we get for xi :

x;(3) = 139.489

Page 8: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

274

This chi2 value is significant at 0.001. This means that the probability that this sample matrix is singular is .OOl or smaller.

(b) Computation of Xi for the Influential sample:

IR,I = .19 N =I11 p=3 K = I 1 + (2 X 3 + 5)/6 - 1111

x$(3) = 9.975

This means that the probability that this sample matrix is singular is smaller than .025.

As the elite sample is not a random sample we cannot interpret the probability levels in the sense of statistical inference. The chi2 statistic has here only descriptive value. This means that the much smaller value of our chi’ statistic in the elite sample indicates a higher multicollinear- ity in that sample than in the mass sample. But even if the elite sample were a random sample, we would falsify the H,-Hypothesis that this sample matrix is singular. This leads to our first counterargument against Tni. Following T R1 we would say that both matrices are not singular, although in the case of the Influential sample which is not a random sample, Rockwell gives no precise numerical value for chi’ at which one can say that the matrices are singular. According to our deti- nition D, both matrices are multicollinear although, as the Haitovsky test shows, not exactly multicollinear (D2). Our later analysis shows that, in the case of the elite sample, this even leads to a wrong sign of one regression coefficient. So it seems to us not sufficient to use only the Haitovsky Test to diagnose multicollinearity in samples. The Haitovsky Test can be used only to test for exact multicollinearity (D2), but the general multicollinearity which is not exact (Di) may still have unintended effects and as it is the usual one, should therefore also be measured.

Instead of Tni we suggest the following thesis (TsM1): We propose that the term “exactly multicollinear” be applied only to those ma- trices, for which we are unable to reject the null hypothesis of singular- ity.

For the measurement and analysis of the general multicollinearity we need different devices. Rockwell’s explication of multicollinearity seems therefore too narrow for the general problem of multicollinear- ity. Besides that, the application of the significance test leads to all the problems which so often were criticized. The most important point seems to be here the dependence of such tests on the largeness of the sample (cf. Morrison and Henkel, 1970).

Page 9: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

275

2.3.3. Determinant and Eigenvalues In this section we discuss two global measures of multicollinearity of

the correlation matrix of independent variables. First we refer to the determinant of the correlation matrix of inde-

pendent variables. Farrar and Glauber (1967) show that the determi- nant of the correlation matrix indicates the severity of multicollinear- ity. In the case of exact multicollinearity the determinant of the corre- lation matrix becomes zero, whereas in the case of orthogonal indepen- dent variables the determinant becomes 1. Therefore one can conclude: The smaller the determinant, the higher the multicollinearity.

Further Hoer1 and Kennard (1970a) and Silvey (1969) propose another useful measure for the severity of multicollinearity. To analyse the effects of multicollinearity Hoer1 and Kennard (1970a, p. 56) con- sider the distance of the estimated vector of regression coeffcients j from its expected value.

L i = Distance from B to B (3) The distance of 8 from its expected value is given by:

L; = (B -B)‘@ - B) (4) By substitution and using the expectation operator we get from eqn. (4)

E(L:) = 6* tr (X’X)-’ (5) where tr = trace We now consider the eigenvalues of X’X which are denoted by:

Al, X* . . . hi.

We substitute ZfZ1 (1 /Xi) for trace (X’X)-’ in eqn. (5) and get after substitution the following expression:

E(L:) = 6* 6 (l/Xi> (7)

Now the higher the multicollinearity the smaller is one or more of the eigenvalues of the matrix X’X. As a consequence the expected distance between i and B will be large. To determine the likelihood that the least-squares vector E(L:) is far from its expected value we must com- pute Zr= i (1 /Xi) for every matrix of correlation coefficients of indepen- dent variables, and compare E(L:) from a given data set with the corre- sponding value of an orthogonal data set which is easily determined, as is demonstrated below.

Let us now apply these two measures to our two data sets. In Table

Page 10: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

276

TABLE II Eigenvalues (hi) of the independent variables in the two samples

Variable Citizen sample Influential sample

s 1.92 2.14 P .I2 .31 D .35 .21

II we present the eigenvalues of the variables in two samples. The higher multicollinearity in the Influential sample is reflected by the fact that the eigenvalue of D in the Influential sample is nearer zero than the eigenvalue of D in the Citizen sample. From those eigenvalues we can compute our global statistic 21;=, l/h, which we present with the deter- minant of the correlation matrix of predictors in Table III. The stronger multicollinearity of the total correlation matrix of predictors in the Influential sample is reflected here by the value of the determinant of .19, which is nearer to zero than the determinant of the Citizen sample (.49).

We now compare the numbers of the second column. For an orthog- onal data-set with three predictors Zy=r 1 /Xi = 3. Thus the probability that the least squares coefficient vector is either too long or too small is not very high in the mass sample with a value of 4.75, whereas it is much greater for the influential sample with a value of 7.27. Now we are again confronted with the tricky problem of deciding at which value of the determinant of R,, or 2& l/Xi the multicollinearity becomes damaging for the stability of the parameter estimates. We argued before that the application of the Haitovsky Test is not sufficient because we can only test for exact multicollinearity with it. As a descriptive mea- sure it gave us the same substantive results as the determinant of the correlation matrix of the independent variables, i.e., R,, of the Citizen sample. We have to conclude that our two additional measures do not

TABLE III

Global measures of multicollinearity in the two samples

Sample IR,I = Determinant of the correlation matrix n

C l/hi i=l

Citizen sample Influential sample

.49 4.15

.19 1.21

Page 11: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

277

give us an exact value or interval at which the multicollinearity is dam- aging for the stability of the parameter estimates either. A solution to this problem is discussed in Section 4 when we discuss the application of the ridge trace, a graphical technique to evaluate the effects of multi- collinearity on the stability of the parameters. But before we refer to the technique of ridge regression we analyze the effects of multicol- linearity more precisely.

3. Effects of Multicollinearity

The major justification for using the least-squares method and the maximum likelihood method is given by the Gauss-Markov Theorem (Wonnacott and Wonnacott, 1970: p. 22).

GAUSS-MARKOV THEOREM:

Within the class of linear unbiased estimators of 0 (or a) the least squares (ML) estimator has minimum variance. It holds for both the OLS and ML-Method as they have the same estimators, when there are no cross-equation and within-equation constraints (Wonnacott and Wonnacott, 1970: p. 244).

What is important to remember is that the least squares (ML) estimates remain unbiased however severe multicollinearity is (Deegan, 1972: p. 43).

We now discuss more intensively the effect of multicollinearity on the efficiency of the estimates. Rockwell (1975: p. 329) gives without derivation the following formula for the variances of the estimates of the regression coefficients:

var@i*) = 1 l -R*(N-p- 1)-l I

R = Multiple correlation of the dependent variable y with the set of explanatory variables xi Ri= Multiple correlation of some particular xi with the remaining p - 1 explaining variables N = Sample size b: = Estimate of standardized regression coefficient p = Number of explanatory variables Now Rockwell states (1975: p. 3 10):

(1) If N, p, and R are fixed, the variance of the estimate of bZT clearly

Page 12: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

278

increases as the dependence of xi (as measured by Ri) on other explana- tory variables increases.

(2) The variance of b; approaches infinity as Ri approaches I. As a consequence of the increasing variance the estimated coeffici-

ents may take a wrong sign and/or the coefficients may be unreason- ably large (Hoer1 and Kennard, 1970a and Deegan, 1973).

We now present the OLS and ML estimators and their corresponding standard errors (only OLS). It is seen that in the case of the more multi- collinear sample there is as a consequence of multicollinearity one wrong sign and increasing standard errors of all estimated coefficients.

3.1. ORDINARY LEAST SQUARES (OLS) AND MAXIMUM-LIKELIHOOD (ML) ESTIMATORS FOR THE ALIENATION MODEL

The alienation model is a simple linear recursive sequence of equa- tions. The path coefficients can be estimated by a single equation tech- nique like Ordinary Least Squares or a system equation technique like Maximum Likelihood Estimation. Usually the OLS technique is applied, although Jiireskog (1970, pp. 248-25 1) has formulated three arguments, of which two are directly relevant for us, in favour of his Linear Structural Relations System which uses the Maximum Likeli- hood Method.

(1) Since the regression technique (OLS) is applied to each equation separately, one does not get an overall test of the entire causal struc- ture.

(2) In the case of overidentification of the equation system there are no clear-cut guidelines in Ordinary Least Squares about how to handle such situations.

The Maximum Likelihood Method offers for the first problem the Likelihood Ratio Test. The second problem is solved by fixing certain parameters to zero. Another argument in favor of the Maximum Likeli- hood Method is that we can use it to perform comparisons across pop- ulations. One can use the parameters of one sample, fix them at certain values, and test the specification in a new sample with the Likelihood Ratio Test.

So far we have mentioned only the advantages of the Maximum Likelihood Method in the form of Jiireskog’s Linear Structural Rela- tions System. In one point at least the Maximum Likelihood Method seems inferior to OLS. In the presence of specification errors and high multicollinearity OLS seems to be more robust than the Maximum Likelihood Method, especially when the sample size is not very great (Johnston, 1963: p. 282). The structural equations in standardized

Page 13: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

279

form for the model have different notations:

OLS LISREL (Maximum Likelihood)

D = p&P+ /3&S + eb D=y&P+yi)SS+f; (9) V=OP+&S+&D++ V=OP+&S+&,D+& (10)

The a-priori knowledge concerning the signs of the coefficients is not tested in a formal way by either method. But we want to explicitly state our restrictions on the signs of the coefficients:

Estimates of parameters for the two samples computed by the OLS method are shown in Fig. 2. For the Citizens the a-priori specifications concerning the signs of the coefficient and the relation between P and I’ (the a-priori specification of a value of zero) are confirmed by both methods in all instances (c.f. Figs. 2 and 4). The differences between the two methods are minor and only in the second stage are there some small differences. But as we indicated earlier there is no severe problem of multicollinearity in the citizen sample at any stage of the model.

Turning to OLS estimators of the Influential sample, we see that the predictions of the alienation model are confirmed in all instances save one: Whereas P was predicted to have no direct effect on V, the ob- served direct effect of P on I/ is strong and, moreover, negative. Sup- posedly, this coefficient means that, regardless of solidary group status and political distrust, the lower the sense of political-powerlessness, the greater the potential for protest!

Unfortunately, there is a serious multicollinearity problem in the third stage of the model, namely, prediction of I’. This occurs because D is so accurately predicted by P and S in the stage immediately ante- cedent to V. Presumably due to the problem of multicollinearity, some

Citizens (N = 479) Influentials (N = 111)

Fig. 2. OLS estimates for an alienation theory of Potential for Violent Protest.

Page 14: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

280

rather strange results occur in the prediction equation for V: (1) The relationship between political powerlessness and potential

for protest changes direction and size as one goes from the bivariate to the multivariate case. The zero-order correlation between P and V is .20, i.e., the gross effect of P on V is small but in the expected direction - positive. The path coefficient, however, which describes the effect of P on V, independent of S and D, is -.39, i.e., the net effect of P on V is large and in an unexpected direction.

(2) The net effect of D on V is larger than the gross effect, i.e., the path coefficients is .53, whereas the zero-order correlation is only .40.

All of these results are curious, since one would normally expect that, when all the variables in the model are positively correlated, no sign changes would occur among the path coefficients, and they would be smaller than the zero-order correlations, or at least equal to them, according to the logic that the zero-order correlation represents the sum of the direct and indirect effects. Of course, if one accepts, for exam- ple, the observed strong negative path coefficient between P and V as a correct estimate, then the sign change and inflation in size, relative to the zero-order correlation between P and I/, is explicable due to off- setting indirect effects which are of a different sign from the direct effect. The correlation, rPy, can be expressed as [6] :

.20 *-.39+.29+.17+.12

According to the observed estimates of direct and indirect effects, polit- ical powerlessness is positively related to potential for protest, via its relationship with political distrust (the indirect effect of P on V via D is .29); but at the same time, political powerlessness is also negatively related to potential for protest (as indicated by the path coefficient of -.39 which describes the direct effect of P on V). From the standpoint of substantive theory, this is a state of affairs which is not easy to inter- pret.

Since both P and D are seriously affected by multicollinearity, per- haps the most sensible interpretation of the path coefficients estimated for the second stage of the alienation model would be to acknowledge that they are probably nonsense. Hence, in this example, one is faced with the problem that, because the theory seems to hold up pretty well in the first stage of the model, it is impossible to test it reliably in the final stage. Of course, if one knew beforehand that P had no direct effect on Y, then the parameters of the final stage could be estimated

Page 15: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

281

TABLE IV

OLS standardized regression coefficients (b), their standard errors (s) and the corresponding F-values

Citizen sample Influential sample

Regression Standard F-value Regression Standard F-value coefficient error coefficient error

bDp = .39 SD&l = .04 125.59 b SDP = .ol 63.50

$I-:“,: svp SDS = = .03 .05 114.15 .08 b SD‘J = .ol 21.40 1.11 bvs = .15 svs = .05 9.52 4.16 bvD= .39 SVD =. 06 48.31 12.40

with confidence. But, as a general rule, one cannot expect that infor- mation about some of the path coefficients in a model will be available prior to the testing of the model. Rather, the theorist normally wants to use path analysis precisely so as to acquire such information.

As predicted, the standard errors of the regression coefficients in the Influential sample, which is more multicollinear, are higher. This can be seen in Table IV, in which we summarize the point estimates, their stan- dard error, and the corresponding F-value [ 71.

We now refer to the Maximum Likelihood Estimators of the param- eters. We performed the computation with the program LISREL I (K.G. Jiireskog and M. van Thillo, 1972) which only computes the point estimates and not the standard errors. For translating our sym- bols into the notation of the Linear Structural Relations System, we need the following:

s=t1 p= (2 Residual of D = t1

D =ql v=7?2 Residual of V = &

The relationships between the theoretical variables (.$ and n) are given by the following equation in matrix notation:

The relationships between the theoretical variables (I: and 7) and their corresponding indicators (x and y) must be specified as a deterministic relationship as we have only one indicator for every theoretical variable and have no a-priori knowledge of the amount of measurement error.

Page 16: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

282

The matrix equations are as follows:

In eqn. (12) we can see that two coefficients are set to zero; /3, 2 is set to zero, because we postulate a recursive model, y22 is set to zero, as we postulate that there is no direct effect of t2(P) on q2( V). Unlike the OLS estimation this last a-priori restriction is tested by the data, as the minimization procedure in the LISREL program is done under the restriction of the values of the fixed and constrained parameters. The corresponding path diagram is given in Fig. 3. x1 and x2 are indicators of the exogenous theoretical variables & and (2

yi and y2 are indicators of the endogenous theoretical variables n1 and 772

1, and c2 are the residuals of the theoretical endogenous variables nl and 772 6i and lj2 are the residuals of the x-indicators cl and c2 are the residuals of the y-indicators

Estimates of parameters for the two samples are given in Fig. 4. The estimated parameters in the Citizen sample have with one exception all the same values as those estimated by OLS (Fig. 2). But this excep- tion (b;,) differs only by .03 from the OLS Estimator. There are two devices to evaluate the model: (a) the Likelihood Ratio Test which we

Fig. 3. Path diagram for the relationship between theoretical variables and observed variables.

Page 17: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

283

Citizens (N = 479) Influentials (N = 111) Influentials (N = 111)

Model b (unrestricted) Model a (restricted)

Fig. 4. LISREL (Maximum-Likelihood) estimators for an alienation theory of Potential for violent Protest.

mentioned before, and (b) the matrix of differences between observed and expected correlations. The Likelihood Ratio Test gives no unam- biguous result: Chi* with 1 degree of freedom = .8321; probability level = .3617. On the use of this test Joreskog (1969; p. 201) gives the following hint :

“If on the other hand, a value of chi* is obtained which is close to the number of degrees of freedom, this is an indication that the model fits too well. Such a model is not likely to remain stable in future samples and all parameters may not have real meaning.”

From this point of view we had to impose even more restrictions on the model to enlarge the number of degrees of freedom. But we have two counterarguments against building a new model with more restric- tions than the present one. The first argument is, that we had certain theoretical reasons to formulate just this model and Joreskog himself argues (1969, p. 201) that when to stop fitting additional parameters cannot be decided on a purely statistical basis and adds: “This is largely a matter of the experimenter’s interpretations of the data based on sub- stantive theoretical and conceptual considerations.” The second argu- ment refers to the matrix of differences of observed and expected cor- relations, which should give a hint of where to change the model. The matrix of residuals is shown in Table V. In this matrix of residuals we see that no single difference between one estimated correlation and the corresponding observed correlation is greater than .03. From this we can conclude that the restriction we have imposed on the model seems sound i.e. not falsified.

For the Influentials we first tested the same model as for the Cit- izens. We call it Model (a), the restricted model, as one coefficient is fixed to zero. Although the Likelihood Ratio Test is not strictly appro-

Page 18: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

284

TABLE V

Residuals = SIGMA - S Citizen sample *

D V S P

D 0.000 V 0.000 0.000

s 0.000 0.000 0.000 P 0.000 0.031 0.000 0.000

* SIGMA = estimated correlation matrix; S = observed correlation matrix.

priate, since the Influential sample is not a true random sample, we give the result of the test for descriptive purposes: ch? with 1 degree of freedom = 7.6623; Probability level = 0.0056.

If the Influential sample were a true random sample we could infer that the model is falsified. As the application of the Likelihood Ratio Test is not appropriate here we want to analyze the matrix of residuals. When there are bigger differences this a a clue that the model is mis- specified. From Table VI we can see that there is a considerable residual correlation between I’ and P.

We modify the model by giving up the restriction of a zero relation- ship between v and P and come up with our unrestricted Model (b). The estimated parameters can be found in Fig. 2. This seems to be the adequate model for the Influential sample. A summary of the results for the three models is given in Table VII. From our previous discussion of least squares estimators and the present results of Maximum Likeli- hood Estimators we would conclude that the model which seemed ap- propriate in the Citizen sample, was falsified in the Influential sample, because ybp f 0.

In the next section we discuss the question of whether the falsifica- tion of our model in the Influential sample is justified from substantive reasons or whether it is an effect of the sensitivity of OLS and ML Esti-

TABLE VI

Residuals = SIGMA - S Influential sample, restricted Model (a)

D V S P

D 0.000 V 0.000 0.000 S 0.000 0.000 0.000 P 0.000 0.143 0.000 0.000

Page 19: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

285

TABLE VII

Summary table of Maximum Likelihood analyses

Model and data-set chi2 No. of parameters d.f. P N

Restricted Model .8321 4 1 .3617 479 Citizen sample Restricted Model (a) 7.6623 4 1 .0056 111 Influential sample Unrestricted Model (b) 5 0 111 Influential sample

mator to strong multicollinearity and therefore a methodologically pro- duced artefact. For this purpose we introduce as an alternative to the class of best linear unbiased estimators, the technique of ridge regres- sion which seems to be more robust in the presence of high multi- collinearity than the OLS and ML technique.

4. A Method to compensate for Multicollinearity: Ridge Regression

An alternative to the well-known least squares estimator which has minimum variance in the class of estimators that are linear unbiased is ridge regression, developed by A. Hoer1 (1962), as an estimation tech- nique which is not linear unbiased but has a greater efficiency than the ordinary least squares and ML estimators.

4.1. GENERAL DESCRIPTION OF RIDGE REGRESSION

The ridge estimators denoted by B, are defined as follows:

&=(X’X+kl)-‘X’Y; h-20 (12)

The corresponding definition for standardized ridge estimators is given by:

A;= (Rx+ kl)-‘R,,: (13)

whereby &* is the vector of standardized ridge coefficients;

(14)

Page 20: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

286

R, is the symmetric correlation matrix of the independent variables;

1 r12 y23 a** rip 1

(15)

R,, is the vector of the pairwise correlation of every independent vari- able with the dependent variable.

YlY

Y2Y

Rx,= rsy (16)

YPY

kI is the Identity Matrix multiplied by the constant k.

1 k-1 0 0 . ..O 1

kI=i; “’ ;.l!:.,

In comparison with OLS which is given in its raw form by:

(17)

B = (Xx)-’ X’Y (18)

we add small quantities, denoted by k to the main diagonal of X’X. When k > 0 and B, satisfies eqn. (12), then B, minimizes the sum of squared residuals (SSR) which is given by (Deegan, 1973: p. 4):

SSR@,) = (Y - X&)‘( Y - XB,) (1%

Thus we see from eqns. (12) and (19) that SSR(i,) is a monotonic increasing function of k. Therefore Hoer1 and Kennard (1970a: p. 57) argue that in the case when k > 0 the ridge solution requires some increase of the residual sum of squares above the least squares residual sum of squares. What then is the relationship between the least squares and the ridge estimator? Hoer1 and Kennard (1970a: p. 57) and Deegan (1973: p. 4) discuss the relationship between i and j,. We follow Deegan’s argumentation, as it is simpler than Hoer1 and Kennard’s orig- inal contribution. From our eqn. (12) we know the definition of the

Page 21: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

287

ridge estimators:

B, = (X’X + kl)-1 X’Y (12’)

As we know that

X’ Y = (X’X) A cm

we can now substitute the right-hand side of eqn. (20) into eqn. (12’) and get:

B, = (X’X + kfl-’ (X’X) B (21)

To simplify this expression we introduce the following definition:

(X’X + kfl-’ (X’X) = Zk (22)

Now we substitute the right-hand side of eqn. (22) in eqn. (21) and get:

& = Z,B (23)

From eqn. (23) we can see the important fact that the ridge estimator B, is a biased estimator of 8, if k > 0 and therefore Zk f 1.

A further result of Hoer1 and Kennard (1970a, p. 57) which we want to present without proof is that B,. for k > 0 is shorter than B.

(B,)‘(B,) < @‘b) (24) That means: the ridge regression coefficients are always smaller than the least squares (or maximum likelihood) estimators when k > 0.

The next point we want to discuss is the mean square error (MSE) of the ridge estimator B,. It is defined as:

MSE(B,) = Variance + (BIAS)* (25)

The problem now consists in finding the optimal point estimate which only slightly raises bias (which one wants to minimize) and raises effi- ciency of the estimation as much as possible. We know from eqn. (5) that the expected squared distance between B and B is defined as:

E(L:) = 6* tr(X’X)-’

The corresponding expression for the ridge estimators is:

E(L:) = trl var@,) 1 + B’(Z, - I)‘(Z, - Z) B

where (26)

tr 1 var(B,) I represents the variance and B’(Zk - fi’(Zk - I> B repre- sents the squared bias. Zk is defined in eqn. (22) and var (i,) = S* zk (x’x>-zk

Page 22: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

288

To simplify eqn. (26) we introduce the following definitions:

y,(k) = trl var@,) I = variance (27)

Y#-) = B’(Z, - I)‘(.& - I)B = squared bias (28)

Then we get after substitution from eqn. (26) the following equation:

(29)

The problem is that according to Hoer1 and Kennard (1970a) there is no automatic procedure to get an optimal point estimate. As an alter- native it is suggested to use a graphic technique called the ridge trace [ 8.3 This ridge trace shows, for each predictor, how the bias and the mean square error develop with values of k from 0 to 1. For working with the ridge trace the following criteria are proposed (Hoer1 and Kennard, 1970a: p. 65):

(1) At a certain value of k the system will stabilize and have the gen- eral characteristics of an orthogonal system.

(2) Coefficients will not have unreasonable values with respect to the factors for which they represent rates of change.

(3) Coefficients with apparently incorrects signs at k = 0 will have changed to have the proper sign.

(4) The residual sum of squares will not have been inflated to an un- reasonable value.

These criteria are proposed as heuristic guidelines in making a choice of which value of k to use in order to arrive at a more satisfactory esti- mate of regression coefficients in the presence of multicollinearity. We will discuss the application of the ridge trace procedure and the above criteria to the alienation model in the next section. To illustrate the relationship between squared bias and variance of the estimates, Hoer1 and Kennard (1970a, p. 6 1) use a diagram similar to that shown in Fig. 5. From the diagram we can see the following: (1) if k = 0 the bias of the ridge estimator is zero and ridge and least squares estimators are identical; (2) the variance term is a continuous monotonic decreasing function of k; and (3) the squared bias term is a continuous monotonic increasing function of k. From these results Hoer1 and Kennard (1970a: p. 61) propose the following rule for the application of ridge regres- sion: “move to k > 0, take a little bias and substantially reduce the vari- ance; thereby improving the mean square error of estimation and pre- diction.” But this rule is not precise enough for the practiticner of the art. What is, for example, quantitatively “little bias” and “substantive reduction of variance”?

Hoer1 and Kennard (1970b, p. 79) give the following explication of

Page 23: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

289

ridge mean square error ridge bias squared

least squares mean square error

ridge variance

0 .l .2 .3 .4 .5 .6 .7 .a .9 1.0

k

Fig. 5. Mean square error functions (after Hoer1 and Kennard). Ridge mean square = ( bias)2 + variance.

their strategy of dimension reduction and factor screening which seems to be for them the main aim for the application of ridge regression: (1) Examine the stable coefficients and eliminate the factors with the least predicting power; and (2) Examine the unstable coefficients and elim- inate those factors that cannot hold their predicting power.

This strategy has been criticized by Deegan (1973: p. 12) with the following arguments. Based on his work on the effects of different types of specification errors (Deegan, 1972) he argues first that the combination of specification errors with biased ridge estimates can result in both unrecognizable error forms and uninterpretable biased coefficients. Second, he found that, of 52 models with known popula- tion parameters, 34 had least squares coefficient vectors that were longer than the true coefficient vectors, while 18 had coefficient vec- tors that were shorter. Yet ridge regression is predicated on the assump- tion that the least squares coefficient vector estimated from a particular sample is too long.

Deegan’s results do indeed show that the probability is high and that the least squares coefficient vectors may be too long; yet, nevertheless, there is some change of guessing incorrectly. If, in fact, the sample least squares vector is too short, then application of ridge regression leads to still smaller coefficients, and this would lead to faulty inferences con- cerning specification of the model, namely, that the coefficients for

Page 24: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

290

some variables are zero when, in reality, they are nonzero. To circum- vent this problem Deegan (1973: p. 13) proposes the following alterna- tive strategy:

(1) Ridge Regression should be used only with multiple samples of data, thus permitting application of his technique for “causal infer- ence” based on specification error knowledge.

(2) Ridge Regression should be used for sensitivity testing estimated coefficients, rather than relying on it as a point estimating procedure.

(3) The latter is to be accomplished by combining biased ridge esti- mators at k = 0.1 with unbiased least squares estimators to form a mixed estimator. Specifically, this is to be done by first computing a ridge estimator at k = 0.1 and a least squares estimator for each sample, then arriving at the mixed estimators by taking the mean of the ridge and least squares estimators.

This procedure is recommended by him regardless of the level of multicollinearity present. He sees two advantages in favor of his strat- egy (Deegan, 1973: p. 13)

(a) One can double the number of coefficient estimates for each rep- licated sample analyzed, thus contributing to a more stable approxima- tion of the unknown population parameters.

(b) By securing two estimates of each coefficient one can reduce the number of replicated samples needed for proper application of the tech- nique, which may also have the side effect of encouraging and permit- ting wider use of the technique.

In the following we discuss arguments against the strategy of apply- ing ridge regression as Deegan has proposed it. In our opinion, this mixed strategy violates the principle purpose of the ridge trace, which is to determine, for any given sample, the particular region of k where the system appears to stabilize. The a priori selection of a k = 0.1 as the value where the system stabilizes is totally arbitrary and unwarranted. In addition we do not know the sampling distribution of these mixed estimators and cannot compute the corresponding standard errors of the coefficients. Also, the a priori selection of a mixed estimator flys in the face of a basic principle in the philosophy of science, namely, that hypotheses should be subjected to repeated testing and that their acceptance or rejection depends to an important degree on whether or not consistent results are obtained across replications. If consistent results are obtained, then one can justifiably use a mixed estimator. But if results are not consistent, one has no business combining them; rather, one should concentrate on attempting to account somehow for the inconsistencies, by virtue of revision and respecification of the the- ory, closer attention to sources of measurement error, etc.

Page 25: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

291

For example, using data from five samples to illustrate his strategy, Deegan has a number of instances where some estimators are strongly positive, others strongly negative; in other instances, given the same sign, the magnitude of estimators from different samples still varies greatly. When faced with such a situation, the mixed estimator strategy is misleadingly insensitive to inconsistencies which should not be ob- scured when evaluating the degree of support for the hypothesis involved.

In short, although the ridge trace strategy has its own elements of arbitrariness (determination of just where the system stabilizes is not exact), the mixed strategy of Deegan seems considerably more arbi- trary. Second, although the ridge trace strategy may occasionally result in misspecification of the true model, the mixed estimator strategy could just as often lead to the same problem while, in addition, masking possibly important estimator inconsistencies. We have therefore decided to apply ridge regression in the way that Hoer1 and Kennard have em- phasized.

To remove this arbitrariness two further steps must be taken which we could not perform. First one needs an automatic procedure to get the optimal point estimates which includes the choice of the constants. Such work has been started by Ellingsen and Leathrum (1975) and Browne and Rock (1975). A second step should be the application of a test to see whether the Mean Square Error of Ridge-Estimators is smaller than the OLS or LISREL-Estimators. Such work has been started by Mason and Brown (1975).

In the following we discuss the application of ridge regression to our two samples.

4.2. RIDGE REGRESSION APPLIED TO THE ALIENATION MODEL

We have computed standardized ridge regression coefficients for the Citizen and Influential samples following the procedures discussed in the preceeding section. The coefficients for .l increments of k between 0 and 1 .O are the ridge trace for each sample as depicted in Figs. 6 and 7. The ridge trace shows that, among the Influentials, P(xZ) and D(xJ) are obviously more unstable than S(x,). This was to be expected, since they are more collinear than S - as was indicated by their smaller eigen- values. The estimated coefficient for P is dampened considerably when bias is introduced. It is shortened from a value of -.39 at k = 0 to a value of .O 1 at k = 1 .O, a difference of .40. However, the reversal of the implausible sign is reached only at k = 1 .O. The coefficient for D is also substantially dampened, decreasing from a value of .53 at k = 0 to a

Page 26: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

292

.60 I

.50 .

Value of the .30 -

standardized .20 -

ridge regres- -

sion coeffi- 1.-x--x4-.-x~x~~~

.lO -

cient BFi Oli-..-

.- x-x x-

-.lO -

-.20.

-.30 -

-.40 -

I , ,

.lO .20 .30 .40 .50 .60 .70 .80 .90 1.0

k

Amount of the constant added to the main diagonal

of the correlation matrix

Fig. 6. Ridge trace of S (BiGI .23), P (B&,,), and D (B:43.12): Citizen sample.

value of .13 at k = 1 .O, again a difference of .40. The S coefficient is stable from the beginning, decreasing from a value of .27 at k = 0 to a value of. 11 at k = 1 .O, a difference of only .16.

Inspection of the ridge trace for the system as a whole indicates that it stabilizes in the region k = 0.2 to k = 0.4. Therefore, we choose k = 0.3 as a stable point solution. At this value of k the unreasonably high coefficient for D has been dampened, yet the implausible sign for the P coefficient still remains. Nevertheless, the P coefficient has also been considerably reduced. It is now perhaps close enough to zero for us to regard it as being without substantive significance. Our ridge prediction equation, then, for the Influential sample, is:

V=.23S-.llP+.29D+e.

Turning to the Citizen sample, we see that P, D, and S are all rather stable. This is to be expected, since multicollinearity is much less of a problem for this sample. The estimated coefficients for P and S change hardly at all from k = 0 to k = 1 .O, the respective differences being .05 and .03. The coefficient for D shows some change, but it is minimal,

Page 27: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

293

Value of

the stand-

ardized

ridge

regression

coefficient

B;. 1

t

.60 I

.50

.40

.30

.20

10

0

-.lO

-.20

-.30

-.40

4’. ,. , , 0 .lO .20 .30 .40 .50 .60 .70 .80 .90 1.0

k

Amount of the constant added to the main diagonal

of the correlation matrix

Fig. 7. Ridge trace of S (B:41.23), P (f?:42.13), and D (BF43.12): Influential sample.

decreasing from a value of .39 at k = 0 to a value of .17 at k = 1.0, a difference of .22. The stability of this system is reflected by the fact that the ridge traces for each predictor are practically linear from the beginning.

Inspection of the ridge trace for the Citizen sample indicates that the system stabilizes in the region k = 0 to k = 0.2. Although one could perhaps accept the ordinary least squares coefficients as relatively satis- factory estimates, introduction of a small amount of bias at k = 0.1 does yield a somewhat more stable solution. At k = 0.1 our ridge pre- diction equation for the Citizen sample is:

V=.15S-.OlP+.34D+e.

It is clear that application of the ridge trace technique to our equa- tions results in considerably different substantive conclusions when both samples are compared. Instead of two apparently quite different equations for V, we have two rather similar equations. Instead of inter- preting the direct effect of D on V as being of clearly greater magnitude in the Influential sample than in the Citizen sample, we now would

Page 28: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

294

interpret the effect as being of approximately the same magnitude. Instead of interpreting the direct effect of P on I’ as “strong” and neg- ative in the Influential sample, but zero in the Citizen sample, we now would interpret the effect as essentially zero in both samples. And we still would interpret the direct effect of S on I’ as “weak” and positive in both samples. Finally, instead of trying to come up with possibly farfetched speculations about the reasons for the implausible results in the Influential sample, given our theoretical expectations and the zero- order correlations, we now have results which make more sense.

5. Conclusion

There is no simple clear-cut choice possible from the three methods. LISREL is superior to OLS and Ridge Regression as it allows one to test over-identifying restrictions in an efficient way and gives a global test of the whole causal model. On the other hand, Ridge Regression in the presence of high multicollinearity gives parameter estimates which have a lower Mean Square Error than OLS and LISREL.

The best future strategy would be to combine the advantages of both methods. That means, that it would be preferable to build into the LISREL program the Ridge algorithm as a subroutine. Then one could specify the a-priori restrictions and perform the global test of the causal model. Additionally one could test whether in cases of high multicollin- earity the Mean Square Error of the Ridge LISREL estimates would be higher than the Mean Square Error of the normal LISREL parameter estimates.

Another approach would be to handle the Linear Structural Rela- tions System within the context of Bayesian statistics. This would have the advantage that one could use even more types of a-priori restric- tions. Additionally one could use different algorithms, which are as effi- cient as Ridge estimators (cf. Lindley and Smith, 1972 and Dempster, 1973).

Acknowledgement

The authors thank J. Graff, C. Liidemann, H. Renn and E. Weede for their comments.

Page 29: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

295

Notes

1 The model presented here is purely for illustrative purposes and does not repre- sent substantive theorizing on our part. Therefore the relationship between alienation and political trust or powerlessness is not tested here. A single equa- tion model for a general theory of collective political agression is given in Muller (1976).

* The samples analyzed herein, and construction of some of the variables, are dis- cussed in more detail in Muller (1972).

3 The “approval” and “engage” scales range from 0 to 5, with item 5 below defin- ing the highest rank order position and the remaining items defining lower rank order positions in descending order. The scales correlate (r) at .68 for the Citizen sample, and .77 for the Influential sample. The behaviours are: (1) Taking part in protest meetings or marches that are permitted by the local authorities; (2) Refusing to obey a law which one thinks is unjust, if the person feels so strongly about it that he is willing to go to jail rather than obey the law; (3) Trying to stop the government from functioning by engaging in sit-ins, mass demonstra- tions, takeovers of buildings, and things like that; (4) Trying to stop the govern- ment from functioning by engaging in violent protest demonstrations - in- cluding actions such as fighting with the police and destroying public and private property; (5) Trying to challenge the power of the government by arming one- self in preparation for battles with government authorities such as the police and National Guard.

4 Item intercorrelations (r) ranged from .48 to 52 for the Citizen sample, and 59 to .67 for the Influential sample. The items (7-point agree-disagree) are: (1) The average man does not have any say about what the government does; (2) Public officials do not care much what the average man thinks; (3) What happens in the government will happen no matter what people do; it is like the wheather, there is nothing people can do about it.

5 All items loaded at .5 or greater on the first component for both samples. Repre- sentative items (‘l-point agree-disagree) are: (1) The national government can be trusted to do what is right just about always; (2) Most of the people running the national government do not really mean what they say; (3) Considering every- thing, the police in the United States deserve great respect; (4) Most policemen in the United States are unnecessarily violent when arresting suspected lawbreakers who represent what they personally dislike; (5) The courts in the United States give everyone a fair trial, regardless of whether they are rich or poor, white or Negro.

6 Cf. Alwin and Hauser (1975). Up to now there has been no discussion in the lit- erature that “indirect effects” can also become heavily unstable, when the multi- collinarity of R,, becomes exteme.

’ A separate analysis with the same sample size in both samples did not change the results substantively (cf. also J. Deegan, 1972).

8 Hoer1 and Kennard (1970a: p. 65) say that, based on their experience, “the best method for achiving a better estimate i* is to use ki = k for all i and use the Ridge Trace to select a single value of k and a unique i*“; but they point out that another approach would be to compute estimates of the optimum values of ki, according to a procedure which they discuss on pp. 63-64. (In the above quotation, the symbol, i* , is their term for the biased ridge estimator.) A fur-

Page 30: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

296

ther development in the direction of an algorithm for an optimal estimate seems to be the work by Ellingsen and Leathrum (1975), Brown and Rock (1975), Farebrother (1975) and Kasarda and Shih (1977).

References

Aberbach, J.L. and Walker, J.L. (1970). “Political Trust and Racial Ideology,” American Political Science Review 64: 1199-1299.

Allardt, E. (1970). “Types of Protest and Alienation,” in E. Allardt and S. Rokkan (Eds.), in Mass Politics: Studies in Political Sociology. New York: Free Press.

Alwin, D.F. and Hauser, R.M. (1975). “The Decomposition of Effects in Path Anal- ysis,” American Sociological Review 40: 37-47.

Browne, M.W. and Rock, D.A. (1975). “The Choice of Additive Constants in Ridge Regression,” unpublished manuscript, Princeton, 1975.

Campbell, A., Gurin, G. and Miller, W.E. (1954). The Voter Decides. Evanston, Illinois: Row, Peterson.

Crawford, T.J. and Naditch, M. (1970). “Relative Deprivation, Powerlessness, and Militancy: The Psychology of Social Protest,” Psychiatry 33: 208-223.

Deegan, J., Jr. (1972). “The Effects of Multicollinearity and Specification Error on Models of Political Behavior,” unpublished Ph.D. Dissertation, The University of Michigan, 1972.

Deegan, J., Jr. (1973). “The Process of Political Development: An Illustrative Use of a Technique for Regression in the Presence of Multicollinearity,” paper pre- sented at the 1973 Annual Meeting of the American Political Science Associ- ation, New Orleans, Louisiana, September 4-8.

Dempster, A.P. (1973). “Alternatives to Least Squares in Multiple Regression,” in D.G. Kabe and R.P. Gupta (Eds.), Multivariate Statistical Inference. Amsterdam: North Holland, pp. 25-40.

Easton, D. (1965). A systems Analysis of Political Life. New York: Wiley. Easton, D. and Dennis, J. (1967). “The Child’s Acquisition of Regime Norms:

Political Efficacy,” American Political Science Review 61: 25-38. Ellingsen, W.R. and Leathrum, J.F. (1975). “On-Line Ridge Regression; Sequential

Biased Estimation for Non-orthogonal Problems,” Journal of Statistical Compu- tation and Simulation 3: 249-264.

Farebrother, R.W. (1975). “The Minimum Mean Square Error Linear Estimator and Ridge Regression,” Technometrics 17: 127- 128.

Farrar, D.E. and Glauber, R.R. (1967). “Multicollinearity in Regression Analysis: The Problem Revisited,” Review of Economics and Statistics 49: 92-107.

Finifter, A.W. (1970). “Dimensions of Political Alienation,” American PoZiticaZ Science Review 64: 389-410.

Gamson, W. (1968). Power and Discontent. Homewood, Illinois: Dorsey. Haitovsky, Y. (1969). “Multicollinearity in regression analysis: A comment,”

Review of Economics and Statistics 5 1: 486-489. Heise, D.R. (1969). “Problems in Path Analysis and Causal Inference,” in E.F.

Borgatta, (Ed.), Sociological Methodology 1969. San Francisco: Jossey-Bass. Hoerl, A.E. (1962). “Application of Ridge Analysis to Regression Problems,”

Chemical Engineering Progress 58: 54-59. Hoerl, A.E. and Kennard, R.W. (1970a). “Ridge Regression: Biased Estimation for

Nonorthogonal Problems,” Technometrics 12: 55-67.

Page 31: The problem of multicollinearity in a multistage causal alienation model: A comparison of ordinary least squares, maximum-likelihood and ridge estimators

297

Hoerl, A.E. and Kennard, R.W. (1970b). “Ridge Regression: Applications to Non- orthogonal Problems,” Technometrics 12: 69-82.

Johnston, J. (1963). Econometric Methods. New York: McGraw-Hill. Joreskog, K.G. (1969). “A General Approach to Confirmatory Maximum Likeli-

hood Factor Analysis,” Psychometrika 34: 183-202. Jiireskog, K.G. (1970). “Application of a General Method of Covariance Analy-

sis,” Biometrika 57: 239-25 1. Joreskog, K.G. (1973). “A General Method for Estimating a Linear Structural

Equation System,” in A.S. Goldberger and O.D. Duncan (Eds.), Structural Equation Models in the Social Sciences, New York: Seminar Press, pp. 85-l 12.

Jiireskog, K.G. and Van Thillo, M. (1972). ‘LISREL, A General Computer Pro- gram for Estimating a Linear Structural Equation System Involving Multiple Indicators of Unmeasured Variables. ” Princeton: Educational Testing Service.

Kasarda, J.D. and Shih, W.P. (1977). “Optimal Bias in Ridge Regression Approaches to Multicollinearity,” Sociological Methods and Research 5: 46 l-470.

Lindley, D.V. and Smith, A.F.M. (1973). “Bayes Estimates for the Linear Model,” Journal of the Royal Statistical Society 34: I- 18.

Mason, R. and Brown, W.G. (1975). “Multicollinearity Problems and Ridge Regres- sion in Sociological Models,” Social Science Research 4: 135-149.

Morrison, D.E. and Henkel, R.E. (1970). The Significance Test Controversy. Chicago: Aldine.

Muller, E.N. (1972). “A Test of a Partial Theory of Potential for Political Vio- lence,” American Political Science Review 66: 928-959.

Muller, E.N. (1976). “A Model for Prediction of Participation in Collective Politi- cal Agression,” unpublished paper, New York.

Namboodiri, N.K., Carter, L.F. and Blalock, H.M. (1975). Applied Multivariate Analysis and Experimental Designs. New York: McGraw-Hill.

Paige, J.M. (1971). “Political Orientation and Riot Participation,” American Socio- logical Review 36: 8 10-820.

Ransford, E.H. (1968). “Isolation, Powerlessness, and Violence: A Study of Atti- tudes and Participation in the Watts Riot,” American Journal of Sociology 73: 581-591.

Rockwell, R.C. (1975). “Assessment of Multicollinearity. The Haitovsky Test of the Determinant,” Sociological Methods and Research 3: 308-320.

Seeman, M. (1972). “The Signals of ‘68: Alienation in Pre-Crisis France,” Ameri- can Sociological Review 37: 385-402.

Silvey, S.D. (1969). “Multicollinearity and Imprecise Estimation,” Journal of the Royal Statistical Society 31B: 539-552.

Zeitlin, M. (1966). “Alienation and Revolution,” Social Forces 45: 224-236,