Upload
serban-zodian
View
231
Download
0
Embed Size (px)
Citation preview
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
1/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 1
Fitting Linear Regression in SPSS and OutputInterpretation
Tuesday 26thAugust
The aim of this workshop is to introduce you to fitting linear regression in SPSS. It
will be using the DHS from Ghana, although the techniques shown are the same for
all datasets. This worksheet and the data associated with the workshop are all
available on the course website.
At the end of this session you should be able to:
- fit a simple linear regression model in SPSS
- understand how to create dummy variables for use in linear regression with
categorical explanatory variables
- interpret output from linear regression analyses
1. Simple Linear Regression Continuous Explanatory
Variables
First of all, download the dataset from the course website at
www.southampton.ac.uk/socsci/ghp3/course/material.html to your desktop. The
dataset that will be used for this session is the same as for Computer Workshop 3. It
is a reduced version of the Ghana DHS 2003, with a line for each child aged under 5
years old in the selected households.
Open SPSS in the usual way and open up the dataset.
In the first part of the workshop we will be looking at the relationship between birth
weight and weight-for-age z-score. The hypothesis is that the lower the birth weight,
the lower the weight-for-age z-score against the reference population. We will start
with some data manipulation, followed by exploratory analyses and then to the
simple linear regression.
It is always extremely important to get a feel for the data before you rush headlong
into some complicated statistical analysis.
http://www.southampton.ac.uk/socsci/ghp3/course/material.html7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
2/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 2
1. Select Analyze | Descriptive Statistics | Explore. The following dialogue
box should appear.
2. Transfer the variable Wt/A Standard deviation to the Dependent List box
by clicking the right arrow next to the box, and then click on the OKbutton.
3. The output will appear in the right-hand pane of the Output Viewer window.
Scroll through this output carefully and note what SPSS has produced. The
default output will include the mean and standard deviation for your data, a
95% confidence interval for the population mean, a stem-and-leaf plot and a
boxplot. The stem-and-leaf plot is useful as it enables us to see whether the
distribution of our response variable (weight-for-age) is highly skewed or not
in this case it is not! However, it is clear from the boxplot that there are
some strange values, with a score of about 1000.
4. There are a number of children who have had their weight-for-age flagged.
This is because the values for weight-for-age for those children are outside
acceptable ranges the measurement for height may have been incorrect.
These are coded as 9998, but are included in the analysis at the moment. We
need to change this (and while we are doing this we will change other
variables like this as well.
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
3/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 3
5. Go to Transform | Recode into Same Variables and recode values 9996 and
9998 into System-missing for Height-for-age, Weight-for-age, Weight-for-
height and birth weight. If you have forgotten how to recode variables please
ask.
6. Rerun the Explore command and study the results again. The results have
changed by a large amount.
7. We can investigate the relationship between weight-for-age and birth weight
by looking at the correlation between the two variables. Correlation is usually
calculated between two continuous variables. A correlation of 1 indicates
perfect positive correlation as one variable increases the other also increases
at exactly the same rate, while a correlation of -1 indicates perfect negative
correlation as one variable increases the other decreases at exactly the same
rate. A correlation of 0 indicates no linear relationship between the two
variables.
- Go to Analyse | Correlate | Bivariate. The following box appears.
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
4/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 4
- Place Wt/A Standard Deviations and Birth weight in the right hand
Variables Box, as shown above. ClickOK. The following table is
produced in the output.
Correlations
Wt/A Standard
deviations
Birth weight
(kilos - 3 dec.)
Pearson Correlation 1.000 .109**
Sig. (2-tailed) .002
Wt/A Standard deviations
N 3094.000 837
Pearson Correlation .109**
1.000
Sig. (2-tailed) .002
Birth weight (kilos - 3 dec.)
N 837 974.000
**. Correlation is significant at the 0.01 level (2-tailed).
- The correlation between Weight-for-age Standard deviation and birth
weight is 0.109. This is not that high, but the p-value (in the Sig. (2-
tailed) is 0.002. This is below 0.05 (for a 5% test) and thus is
significant at the 5% level. Thus there is a relationship between the two
variables. Also note that the number of children included in thiscorrelation is only 837. Many children do not have a recorded birth
weight, and some do not have a weight-for-age (the children without a
weight-for-age include those who have died between birth and the
survey)/
8. It is now time for the simple linear regression. Select Analyze | Regression |
Linear. The linear regression dialogue box appears (see next page).
9. Our dependent variable is Wt/A Standard deviations, so place this into the
dependent box. We are predicting weight-for-age using Birth Weight, so
place birth weight into the independent(s) box.
10. ClickOK.
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
5/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 5
11. The following output is produced:
Variables Entered/Removedb
ModelVariablesEntered
VariablesRemoved
Method
1 Birth weight(kilos - 3 dec.)
a . Enter
a. All requested variables entered.
b. Dependent Variable: Wt/A Standard deviations
Model Summary
Model R R SquareAdjusted R
SquareStd. Error of the
Estimate
1 .109a .012 .011 120.660
a. Predictors: (Constant), Birth weight (kilos - 3 dec.)
ANOVAb
Model Sum of Squares df Mean Square F Sig.
Regression 145568.173 1 145568.173 9.999 .002a
Residual 1.216E7 835 14558.868
1
Total 1.230E7 836
a. Predictors: (Constant), Birth weight (kilos - 3 dec.)
b. Dependent Variable: Wt/A Standard deviations
This table simply statesthe variables in themodel and the selectionmethod chosen.
The results indicate thecorrelation (0.109, as seen before)and the r-square this indicateshow much variation is explained in this case not much!
Do notworryabout this
box!
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
6/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 6
Coefficientsa
Unstandardized CoefficientsStandardizedCoefficients
Model B Std. Error Beta t Sig.
(Constant) -146.216 18.112 -8.073 .0001
Birth weight (kilos - 3 dec.) .017 .005 .109 3.162 .002
a. Dependent Variable: Wt/A Standard deviations
The final box, labelled coefficients gives the results of the analysis. Each of the
columns is explained below:
- Unstandardized Coefficients B: This shows the values of the numbers in the
linear regression equation.
o The constant term is -146.2 indicating that a child who weighs 0g at birth
(impossible, but this is the theory) will be -146.2 standard deviations below
the mean for their weight-for-age.
o The relationship between birth weight and weight-for-age is 0.017. For every
gram increase in birth weight, weight-for-age increases by 0.017.
- Unstandardized Coefficients Std.Error: This is the standard error for the
coefficient it is used in the calculation of significance
- Standardized Coefficients Beta: Do not worry about this!
- t: This is the t-test to see if the coefficients are significantly different from 0. A
value over 1.96 indicates significance at the 5% level.
- Sig.: This is the p-value. If it is under 0.05 then the variable is significant. The
value we have here is 0.002, which is highly significant. There is a significant
relationship between birth weight and weight-for-age.
2. Simple Linear Regression Categorical Explanatory
Variables
1. The procedure for conducting linear regression when there are categorical
explanatory variables is slightly different, as you need to create dummy
variables, as explained earlier. If you do not do this, the results that you
obtain will not be valid. We will look at the relationship between wealth index
and weight-for-age standard deviations.
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
7/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 7
2. Firstly, do some exploratory analysis. One way to do this with categorical
variables is to calculate the mean standard deviation for each wealth quintile.
To do this:
- Go toAnalyze | Compare Means | Means- Place Wt/A Standard Deviations in the Dependent List
- Put Wealth index into the Independent list box
- Click OK. The following results should be produced:
Report
Wt/A Standard deviations
Wealthindex Mean N Std. Deviation
Poorest -135.55 1031 127.879
Poorer -113.66 694 122.574
Middle -110.86 556 117.847
Richer -94.47 425 112.391
Richest -68.28 388 117.536
Total -112.12 3094 123.417
- There are large differences in weight-for-age by wealth. The average for the
poorest quintile is -135.55, while for the richest it is -68.28. As wealth
increases, weight-for-age against the reference population also increases.
3. We will now recreate this analysis by conducting linear regression. But first,
we will need to create dummy variables for the wealth index
- Four new variables need to be created, as wealth has five categories
(remember that the number of dummy variables is needed is one less than the
number of categories!)
- Go to Transform | Recode into Different Variables
- PlaceWealth index into the central box. On the right hand side, under
Output Variable, enter in Poorest into the name variable and label this
Dummy variable for Poorest Wealth Quintile. ClickChange.
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
8/12
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
9/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 9
- ClickContinue and then OK. A new variable is created called poorest.
4. You now need to create three more dummy variables for other categories of
wealth. To do this, go to Transform | Recode into Different Variablesand follow the process above for Poorer, Middle and Richer. Each time
you will need to recode a different value to be the dummy (for instance for
Middle, all those with a 3 in the original dataset need to be recoded as a 1,
and all other variables as a 0. Please ask if you are confused!
Alternatively, use the syntax to do this automatically. A file is included on the
website for you to use to create your dummy variables.
5. Now the linear regression can be run. Go toAnalyze | Regression |
Linear. The regression from the previous analysis will still be there. The
Dependent variable remains the same,Wt/A Standard deviations, but the
Independent variables are now different.
Remove Birth weight from the Independent(s)box. Enter instead the four
dummy variables: Poorest, Poorer, Middle and Richer.
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
10/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 10
ClickOK
6. Four boxes are produced, as before. Below is the final box, labelled
Coefficients.
Coefficientsa
Unstandardized CoefficientsStandardizedCoefficients
Model B Std. Error Beta t Sig.
(Constant) -68.284 6.173 -11.062 .000
Dummy variable for poorestwealth quintile
-67.262 7.242 -.257 -9.288 .000
Dummy variable for poorerwealth quintile
-45.381 7.707 -.153 -5.888 .000
Dummy variable for middlewealth quintile
-42.576 8.043 -.132 -5.294 .000
1
Dummy variable for richerwealth quintile
-26.189 8.537 -.073 -3.068 .002
a. Dependent Variable: Wt/A Standard deviations
You will see that all of the variables are highly significant! This is seen in the
final column, Sig., which shows the p-value. This indicates that all wealth
quintiles are different from the Constant, which is the Richest quintile.
The value for the constant is -68.284, which is the same as seen previously for
the mean standard deviation for the Richest quintile!
For the poorest quintile the average score is -68.284 67.262 = -135.546. The
same as before! For all the wealth quintiles the results mirror the results seen
before.
3. Multiple Linear Regression
You may be wondering why we bothered doing the regression on weight-for-age and
wealth when we can get the results simply using the Compare Means command.
The reason is to show the differences when more than one variable is added into the
model at the same time.
We have seen that birth weight and wealth are related to weight-for-age when thesimple bivariate analysis is conducted. But what happens if we analyse them together?
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
11/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 11
Birth weight is highly related to wealth: infants born to poorer households are likely
to be lighter than infants born to richer households. So is the relationship between
wealth and weight-for-age only due to the relationship with birth weight those of a
lighter birth weight are likely to remain below the norm throughout childhood.
To test this we enter the variables into the model together.
1. Go toAnalyze | Regression | Linear. The previous regression variables
will still be contained in the different boxes.
2. Click on Birth Weight and place it into the Independent(s)box, alongside
the wealth quintile dummy variables.
3. ClickOK. The final table in the output is copied below.
Coefficientsa
Unstandardized CoefficientsStandardizedCoefficients
Model B Std. Error Beta t Sig.
(Constant) -119.658 18.412 -6.499 .000
Dummy variable for poorest
wealth quintile-83.830 14.066 -.220 -5.960 .000
Dummy variable for poorerwealth quintile
-37.202 12.418 -.115 -2.996 .003
Dummy variable for middlewealth quintile
-42.243 12.494 -.130 -3.381 .001
Dummy variable for richerwealth quintile
-39.684 11.140 -.138 -3.562 .000
1
Birth weight (kilos - 3 dec.) .018 .005 .120 3.491 .001
a. Dependent Variable: Wt/A Standard deviations
The results have changed! Partly this is due to there being a different sample
being used (only those with a birth weight AND a wealth quintile are included
in the analysis) but it is also due to having both variables in the model at one
time.
All the variables are significant in the model still, although after taking
account of birth weight the difference between richest and poorest actually
increases. This shows that even though birth weight is significantly related to
weight-for-age, there is a very large effect of wealth after the birth on weight-for-age.
7/29/2019 Fitting Linear Regression in SPSS and Output Interpretation
12/12
UNICEF Workshop on Global Study18th to 28thAugust 2008
Centre for Global Health, Population, Poverty and Policy (GHP3) 12
4. The analysis can be extended to include other variables, such as Type of
Place of Residence, Educational Level and Place of Delivery.
However, all of these are categorical variables, so remember to categorise
these as dummy variables first!
Exercises
1. Conduct multiple linear regression on Weight-for-age Standard deviations,
including as explanatory variables birth weight, wealth index, urban/rural and
highest educational level of the parent
2. Conduct multiple linear regression on Weight-for-Height, using the same
variables as in Exercise 1. Are there any obvious differences that you can see?
What is the relationship between wealth and weight-for-height after
controlling for the other variables?