Multiple Regression in SPSS GV917. Multiple Regression Multiple Regression involves more than one predictor variable. For example in the turnout model

Multiple Regression in SPSS

GV917

Multiple Regression

Multiple Regression involves more than one predictor variable. For example in the turnout model

Yi = a + b1Xi1 + b2Xi2 + ei If Ŷ = a + b1Xi1 + b2Xi2 Then Yi – Ŷ = ei

Where

Yi is the observed value of Reported Turnout Xi1 is the observed value of Actual Turnout Xi2 is the Effective Number of Parties Index a is the intercept and bj are the slope coefficients of the relationship between Reported

and Actual Turnout and Reported Turnout and Electoral Distortion Ŷ is the predicted value of Reported Turnout from the linear relationship with Actual

Turnout and Electoral Distortion ei is the residual or error term

Add an Effective Number of Parties Index to the Turnout Model This measure was devised by Laakso and

Taagepera (Comparative Political Studies 1979). It is designed to summarize the degree of fragmentation of the party system in a country. It is defined as:

1 ------ Σ (Pv)2

Where Pv is each party’s proportion of the total vote

Two Examples Suppose there is a two party system in a country and the votes are

shared 60% to 40%. This is not a fragmented system so that: 1 1 ------ = ------------------- = 1.92 Σ (Pv)2 (0.60) 2 + (0.40) 2

Intuitively this means that the party system contains 1.92 ‘equally sized’ parties.

But suppose in the country next door the vote is divided among four parties as follows: 35%, 30%, 20%, 15%. This is much more fragmented:

1 1 ------ = -------------------------------------------- = 3.64 Σ (Pv)2 (0.35) 2 + (0.30) 2 + (0.20) 2 + (0.15) 2

In this case there are 3.64 ‘equally sized’ parties.

Country Reported Turnout

Actual Turnout Effective No Parties

Austria 80.88 84.30 3.02

Belgium 78.71 90.60 8.84

Switzerland

54.14 43.20 5.87

Czech Republic

63.43 57.90 4.82

Germany 77.89 79.10 4.09

Denmark 88.33 87.10 4.69

Spain 71.40 68.70 3.12

Finland 71.43 65.30 6.03

France 62.84 60.30 5.22

Britain 67.18 59.40 3.33

Greece 83.37 75.00 2.64

Hungary 78.59 73.50 2.94

Ireland 75.57 62.60 4.13

Israel 71.15 67.80 7.05

Italy 84.28 81.40 6.32

Luxembourg

80.99 79.10 4.71

Netherlands 81.03 75.00 6.04

Norwary 61.42 46.20 6.19

Poland 68.71 62.80 4.50

Portugal 81.38 80.10 3.03

Slovenia 74.10 70.40 5.15

Reported Turnout Regression with Two Predictors

Model Summary

.928a .861 .846 3.45050Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), epartyv Effective No of Partiesby Votes, ActualX Actual turnout (IDEA data)

a.

ANOVAb

1330.069 2 665.034 55.857 .000a

214.307 18 11.9061544.376 20

RegressionResidualTotal

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), epartyv Effective No of Parties by Votes, ActualX Actualturnout (IDEA data)

a.

Dependent Variable: ReportedY Reported turnout (ESS data)b.

Coefficientsa

33.932 5.037 6.737 .000

.636 .061 .908 10.337 .000

-.884 .486 -.160 -1.819 .086

(Constant)ActualX Actualturnout (IDEA data)epartyv Effective Noof Parties by Votes

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: ReportedY Reported turnout (ESS data)a.

Why this effect?

Note that the fragmentation of parties tends to reduce reported turnout. This effect has been attributed to information processing costs. If the average citizen has to make choices among a lot of alternatives before voting, this raises the costs of voting and it has the effect of reducing turnout

The parties effect is independent of the actual turnout effect – since in multiple regression we identify the effects of one predictor controlling for all other predictors.

In the Turnout model we are fitting a regression plane to a Three Dimensional Scattergram

How Does Controlling Work? Step One: Regress the Effective Number of Parties on

Reported Turnout: Yi = a + b1Xi2 + vi Note that the vi represents the variation in Reported

Turnout NOT accounted for by the Effective Number of Parties. We have removed the number of parties as an influence on reported turnout.

Step Two: Regress the Effective Number of Parties on Actual Turnout

Xi1 = a + b2Xi2 + ui Thus ui represents the variation in Actual Turnout NOT

accounted for by the Effective Number of Parties. We have removed the number of parties as an influence on Actual Turnout

Controlling in Multiple Regression

Step Three: In the Multiple Regression Model Yi = a + b1Xi1 + b2Xi2 + ei

b1 or the effect of actual turnout on reported turnout can be found by regressing the residuals vi on the residuals ui because both are independent of the Effective Number of Parties.

This is in effect what multiple regression does.

Actual Turnout

Effective Number of Parties

Reported Turnout

Controlling in Regression

Coefficientsa

-1E-014 .733 .000 1.000

.636 .060 .925 10.620 .000

(Constant)resu Actual = f(Effective Parties)

Model1

B Std. Error


Beta


t Sig.

Dependent Variable: rese Reported = f (Effective Parties)a.

In this model we are regressing the residuals of the Effective Number of Parties (vi) on the residuals of the Actual Number of Parties (ui). This produces the same regression coefficient (0.636) as in the earlier multivariate model

Another Look at ANOVA and the F test in Multiple Regression The F test compares the Mean Square with the

Residual Mean Square. If it has a high value then the regression explains a lot more variation than is left unexplained.

If it has a low value then the regression explains very little variation

The theoretical F distribution measures the probability that the F statistics will take on a particular value if the Null Hypothesis (the regression explains nothing) is correct

F Test in Multiple RegressionANOVAb

1330.069 2 665.034 55.857 .000a

214.307 18 11.9061544.376 20


Model1


Predictors: (Constant), epartyv Effective No of Parties by Votes, ActualX Actualturnout (IDEA data)

a.

Dependent Variable: ReportedY Reported turnout (ESS data)b.

Mean Square = Regression Sum of Squares 1330.07 _________________ = ______ = 665.04 Degrees of Freedom 2

Residual Mean Square = Residual Sum of Squares = 214.31 ____________________ _____ = 11.91 Degrees of Freedom 18 F = Mean Square/ Residual Mean Square = 665.03 / 11.91 = 55.86

What are Degrees of Freedom? –They are useable bits of information

Total: If we had one observation we could not say anything about the total variation – we need more than one case. This is why the degrees of freedom or usable bits of information is n-1 or 20 (given 21 cases).

Residual: If we had two observations we could fit the regression line in a bivariate model since the shortest distance between two points is a straight line, but there would be no residuals since the line would fit perfectly. In a three variable model we would need three observations to fit the regression line since it is a three dimensional space. So to define residuals we need n-3 degrees of freedom or 18 degrees of freedom

Since the Total Variation = Explained Variation + Residual Variation Then Explained Variation = Total Variation – Residual Variation Explained Variation = (N-1) – (N-3) = 2 Degrees of freedom

The F test F = Mean Square/ Residual Mean Square is an F

distribution. If we start by assuming that the regression explains

nothing then the F ratio will not be zero, because by chance we might get a small positive value

The F distribution maps the probability that a ratio of a given size will occur if the regression actually explains nothing

The larger the value of F, the smaller the likelihood that it will occur by chance if the regression explains nothing.

In this case an F of 55.86 occurring due to chance is much smaller than 0.05, so we can say that the F statistic is significant at the 0.05 level.

The F Distribution – (named after Ronald Fisher)

Another Model – Explaining Happiness in the ESS 2002 Dataset

happy How happy are you

Frequency Percent Valid Percent

Cumulative

Percent

Valid 0 Extremely unhappy 247 .6 .6 .6

1 1 238 .6 .6 1.2

2 2 450 1.1 1.1 2.2

3 3 943 2.2 2.2 4.5

4 4 1149 2.7 2.7 7.2

5 5 4128 9.7 9.8 17.0

6 6 3349 7.9 7.9 24.9

7 7 7169 16.9 17.0 41.9

8 8 11859 28.0 28.1 70.1

9 9 7555 17.8 17.9 88.0

10 Extremely happy 5069 12.0 12.0 100.0

Total 42157 99.5 100.0

Missing 77 Refusal 29 .1

88 Don't know 118 .3

99 No answer 54 .1

Total 201 .5

Total 42358 100.0

Income Scale in the European

Social Survey 2002hinctnt Household's total net income, all sources

Frequency Percent Valid Percent

Cumulative

Percent

Valid 1 J 713 1.7 2.1 2.1

2 R 1752 4.1 5.3 7.4

3 C 2762 6.5 8.3 15.7

4 M 4722 11.1 14.2 29.9

5 F 4736 11.2 14.2 44.2

6 S 4113 9.7 12.4 56.5

7 K 3738 8.8 11.2 67.8

8 P 3136 7.4 9.4 77.2

9 D 4719 11.1 14.2 91.4

10 H 1978 4.7 5.9 97.4

11 U 554 1.3 1.7 99.0

12 N 326 .8 1.0 100.0

Total 33248 78.5 100.0

Missing 77 Refusal 4876 11.5

88 Don't know 3573 8.4

99 No answer 660 1.6

Total 9110 21.5

Total 42358 100.0

Does Money Buy Happiness?

Model R R Square Adjusted R Square Std. Error of the Estimate

1 .271a .073 .073 1.857

a. Predictors: (Constant), income

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 9043.539 1 9043.539 2621.315 .000a

Residual 114347.222 33144 3.450

Total 123390.761 33145

a. Predictors: (Constant), income

b. Dependent Variable: happy How happy are you

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) 6.150 .027 228.961 .000

income .208 .004 .271 51.199 .000

a. Dependent Variable: happy How happy are you

Is the Specification Correct? Perhaps we should use a Quadratic Version of the Income Variable *Calculating Quadratic Functions in the ESS 2002.

Compute income = hinctnt. compute incomsq = hinctnt*hinctnt.

Where incomsq is the square of the hinctnt (household income) variable.

If we use incomsq in the model in addition to income this captures a non-linear relationship between income and happiness – more income increases happiness but at a declining rate of change

Regression of Income on Happiness in the ESS 2002 – Does Money Buy

Happiness?Model Summary

.278a .077 .077 1.824Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), incomsq, hinctnt Household'stotal net income, all sources

a.

ANOVAb

7993.407 2 3996.703 1200.938 .000a

95473.91 28688 3.328103467.3 28690


Model1


Predictors: (Constant), incomsq, hinctnt Household's total net income, all sourcesa.

Dependent Variable: happy How happy are youb.

Coefficientsa

5.221 .066 79.038 .000

.545 .022 .703 24.839 .000

-.027 .002 -.450 -15.875 .000

(Constant)hinctnt Household's totalnet income, all sourcesincomsq

Model1

B Std. Error


Beta


t Sig.

Dependent Variable: happy How happy are youa.

Quadratic Relationship Between Two Variables

Suppose we want to use Occupational Status as a predictor in the Happiness model – we would have to create this variable This is done with the assistance of the variable ISCOCO. This is a

classification of the many occupations which exist in Europe. For example: iscoco Occupation 100 Armed forces 1100 Legislators and senior officials 1110 Legislators, senior government officials 1140 Senior officials of special-interest org 1141 Senior officials of political-party org 1142 Senior officials of economic-interest org

To put this in a form which is useable in the regression model we recode it as follows:

recode iscoco (2000 thru 2470=6)(1000 thru 1319=5)(3000 thru 3480=4)(4000 thru 4223=3)(5000 thru 8340=2)(9000 thru 9330=1)(else=sysmis) into occup.

value labels occup 1 'unskilled or semi-skilled manual workers' 2 'skilled manual workers' 3 'white collar clerical & administrative workers' 4 'white collar technical workers' 5 'middle managers' 6 'professionals and senior managers'.

The Recoded Occupational Status Variable in the ESS 2002 Data

occup

3805 9.0 10.8 10.8

14349 33.9 40.7 51.5

4033 9.5 11.4 62.9

5474 12.9 15.5 78.5

2840 6.7 8.1 86.5

4752 11.2 13.5 100.0

35253 83.2 100.0

7105 16.8

42358 100.0

1.00 unskilled orsemi-skilled manualworkers

2.00 skilled manualworkers

3.00 white collar clerical& administrative workers

4.00 white collartechnical workers

5.00 middle managers

6.00 professionals andsenior managers

Total

Valid

SystemMissing

Total

Frequency Percent Valid PercentCumulative

Percent

Suppose we want to add a gender variable – to see if women are happier than men If statements can be used to create new

variables in SPSS. These are recodes which are carried out if certain conditions are met.

For example: compute female=0. (creates a new variable

consisting only of zeroes) if (gndr eq 2) female=1.(changes this new

variable to a score of 1 if the existing variable gndr has a score of 2)

If Statements in SPSS – gndr and Female gndr Gender

20322 48.0 48.0 48.0

21981 51.9 52.0 100.0

42303 99.9 100.0

55 .1

42358 100.0

1 Male

2 Female

Total

Valid

9 No answerMissing

Total


Percent

Female

20309 47.9 48.0 48.021994 51.9 52.0 100.042303 99.9 100.0

55 .142358 100.0

.001.00Total

Valid

SystemMissingTotal


Percent

Revised Happiness Model

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 9393.557 4 2348.389 700.984 .000a

Residual 95396.295 28475 3.350

Total 104789.852 28479

a. Predictors: (Constant), incomsq, female, occup, income

b. Dependent Variable: happy How happy are you

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig.B Std. Error Beta

1 (Constant) 5.050 .063 80.658 .000

female .090 .022 .024 4.160 .000

occup .035 .007 .029 4.818 .000

income .565 .020 .741 27.645 .000

incomsq -.029 .002 -.481 -17.942 .000

a. Dependent Variable: happy How happy are you

Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate

1 .299a .090 .090 1.830

a. Predictors: (Constant), incomsq, female, occup, income

Conclusions

Multiple Regression is a relatively simple extension of Two variable regression

Unlike two variable regression in multiple regression we are controlling for the influence of additional variables when examining the relationship between the independent variable and the dependent variable – it is a bit like a statistical experiment

The great majority of social science models are multivariate models and so commonly we used multiple regression

Documents

Multiple Regression in SPSS GV917. Multiple Regression Multiple Regression involves more than one predictor variable. For example in the turnout model