Upload
wayaya2009
View
226
Download
0
Embed Size (px)
Citation preview
7/28/2019 Sense of Regression
1/18
Making Sense of Regression
Results
Kwamina BansonSocio-Economics Department
30th 07 - 2009
BNARI Seminar Room
7/28/2019 Sense of Regression
2/18
Linear Regression: Introduction
Interpreting SPSS regression output
Coefficients for independent variables
Fit of the regression: R Square
Statistical significance
How to reject the null hypothesis
Multivariate regressionsAcademic Performance of Junior High Sch.
7/28/2019 Sense of Regression
3/18
What is SPSS?
SPSS is a computer program used for a wide variety of statistical
analysis. (Statistical Package for the Social Sciences) Statistical
Product and Service Solutions
In addition to statistical analysis, data management and datadocumentation are features of the base software.
Statistics included in the base software:
Descriptive statistics: Cross tabulation, Frequencies, Descriptives, Explore,
Descriptive Ratio Statistics Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial,
distances), Nonparametric tests
Prediction for numerical outcomes: Linear regression
Prediction for identifying groups: Factor analysis, cluster analysis (two-
step, K-means, hierarchical), Discriminant
7/28/2019 Sense of Regression
4/18
Interpreting SPSS regression
output
Average SAT Score
16001400120010008006004002000
100
80
60
40
20
0 Rsq = 0.3454
How tight is
the fit?
Y-interceptor constant
Slope orcoefficient
y = mx + b.
where m is theslope of the lineandb is the y-
intercept
7/28/2019 Sense of Regression
5/18
Interpreting SPSS regression
output
An SPSS regression output includes two
key tables for interpreting your results:
A Coefficients table that contains the y-intercept (or constant) of the regression, a
coefficient for every independent variable,
and the standard error of that coefficient.
A Model Summary table that gives you
information on the fit of your regression.
7/28/2019 Sense of Regression
6/18
Interpreting SPSS regression
output: Coefficients
Coefficientsa
4.236 7.048 .601 .549
5.88E-02 .007 .588 8.778 .000
(Constant)
Average
SAT Score
Model
1
B
Std.
Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Graduation Ratea.
Here, we willONLY LOOK AT
UNSTANDARDIZED COEFFICIENTS!
The y-intercept is 4.2% with a standard error of 7.0%
The coefficient for SAT Scores is 0.059%, with a
standard error of 0.007%.
y = mx + b.
where m is the
slope of the lineandb is the y-intercept
7/28/2019 Sense of Regression
7/18
Coefficientsa
4.236 7.048 .601 .549
5.88E-02 .007 .588 8.778 .000
(Constant)
Average
SAT Score
Model
1
B
Std.
Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Graduation Ratea.
The y-intercept or constant is the predicted value of the dependent
variable when the independent variable takes on the value of zero.
This basic model predicts that when a college admits a class of
students who averaged zero on their SAT, 4.2% of them will
graduate.
The constant is not the most helpful statistic.
Interpreting SPSS regression
output: Coefficients
y = mx + b.
where m is theslope of the lineandb is the y-
intercept
7/28/2019 Sense of Regression
8/18
Interpreting SPSS regression
output: Coefficients
Coefficientsa
4.236 7.048 .601 .549
5.88E-02 .007 .588 8.778 .000
(Constant)
Average
SAT Score
Model
1
B
Std.
Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Graduation Ratea.
The coefficient of an independent variable is the predicted change in the
dependent variable that results from a one unit increase in the
independent variable.
A college with students whose SAT scores are one point higher onaverage will have a graduation rate that is 0.059% higher.
Increasing SAT scores by 200 points leads to a
(200)(0.059%) = 11.8% rise in graduation rates
y = mx + b.
where m is theslope of the lineandb is the y-
intercept
7/28/2019 Sense of Regression
9/18
Interpreting SPSS regression
output: Fit of the Regression
Model Summary
.588a .345 .341 12.45%
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predictors: (Constant), Average SAT Scorea.
The R Square measures how closely a regression line
fits the data in a scatter plot.
It can range from zero (no explanatory power) to one
(perfect prediction).
An R Square of 0.345 means that differences in SAT
scores can explain 35% of the variation in college
graduation rates.
7/28/2019 Sense of Regression
10/18
Statistical Significance
What would the null hypothesis look like
in a scatterplot?
If the independent variable has no effect onthe dependent variable, the scatterplot
should look random, the regression line
should be flat, and its slope should be zero.
Null hypothesis: The regression coefficient
for an independent variable equals zero.
7/28/2019 Sense of Regression
11/18
Statistical Significance
7/28/2019 Sense of Regression
12/18
Multivariate Regressions
A multivariate regression uses more than
one independent variable (or confound) to
explain variation in a dependent variable.
The coefficient for each independent variable
reports its effect on the DV, holding constant all
of the other IVs in the regression.
Thought experiment: Looking at factors such asclass size, sch. feeding program, and credentials
effect on academic performance of Junior High
School
7/28/2019 Sense of Regression
13/18
Let's perform a regression analysis using ap2000 as the outcome
variable and the variables acs_JH, meals and full as predictors
(ap2000)- These measure the academic performance of the school(
acs_JH)- the average class size in Junior High Sch.
(meals)- the percentage of students receiving free meals - which is an
indicator of poverty, and
(full)- the percentage of teachers who have full teaching credentials
We expect that better academic performance would be
associated with lower class size, fewer students receiving freemeals, and a higher percentage of teachers having full teaching
credentials.
Multivariate Regressions
7/28/2019 Sense of Regression
14/18
Coefficients(a)
Unstandardized
CoefficientsStandardize
d
Coefficients t Sig.Model B Std. Error Beta
1
(Constant) 906.739 28.265 32.080 .000ACS_JH -2.682 1.394 -.064 -1.924 .055MEALS -3.702 .154 -.808 -24.038 .000FULL .109 .091 .041 1.197 .232
a Dependent Variable: AP2000
Multivariate Regressions
Model Summary
Model
R
R Square Adjusted R
SquareStd. Error of the
Estimate1 .821(a) .674 .671 64.153a Predictors: (Constant), FULL, ACS_JH, MEALS
An R Square of 0.674 means th at differences in ACS-JH, MEALS and FULL can exp lain 67%
of the variat ion in academic perform ance rates.
7/28/2019 Sense of Regression
15/18
Coefficients(a)
Unstandardized
CoefficientsStandardize
d
Coefficients t Sig.Model B Std. Error Beta
1
(Constant) 906.739 28.265 32.080 .000ACS_JH -2.682 1.394 -.064 -1.924 .055MEALS -3.702 .154 -.808 -24.038 .000FULL .109 .091 .041 1.197 .232
a Dependent Variable: AP2000
The average class size (acs_JH, b=-2.682) is not significant(p=0.055), but the coefficient is negative which would indicate
that larger class sizes is related to lower academic
performance -- which is what we would expect.
Multivariate Regressions
7/28/2019 Sense of Regression
16/18
Coefficients(a)
Unstandardized
CoefficientsStandardize
d
Coefficients t Sig.Model B Std. Error Beta
1
(Constant) 906.739 28.265 32.080 .000ACS_JH -2.682 1.394 -.064 -1.924 .055MEALS -3.702 .154 -.808 -24.038 .000FULL .109 .091 .041 1.197 .232
a Dependent Variable: AP2000Next, the effect ofmeals (b=-3.702, p=.000) is significant and its coefficient is
negative indicating that the greater the proportion students receiving free meals, the
lower the academic performance.Please note that we are not saying that free meals are causing lower academic
performance. The meals var iable is highly related to incom e level and func t ions mo re as
a proxy for pov erty. Thus, higher levels of poverty are associated with low er academic
performance. This result also makes sense.
Multivariate Regressions
7/28/2019 Sense of Regression
17/18
Coefficients(a)
Unstandardized
CoefficientsStandardize
d
Coefficients t Sig.Model B Std. Error Beta
1
(Constant) 906.739 28.265 32.080 .000ACS_JH -2.682 1.394 -.064 -1.924 .055MEALS -3.702 .154 -.808 -24.038 .000FULL .109 .091 .041 1.197 .232
a Dependent Variable: AP2000
Finally, the percentage of teachers with full credentials (full, b=0.109,
p=.2321) seems to be unrelated to academic performance. This would
seem to indicate that the percentage of teachers with full credentials is notan important factor in predicting academic performance
- this result was somewhat unexpected.
Multivariate Regressions
7/28/2019 Sense of Regression
18/18
Should we take these results and
write them up for publication?
From these results, we would conclude that :
lower class sizes are related to higher performance,
that fewer students receiving free meals is associated with higher
performance, and
that the percentage of teachers with full credentials was not related
to academic performance in the schools.
Before we write this up for publication, we should do a number of
checks to make sure we can firmly stand behind these results.
We start by
getting more familiar with the data file, doing preliminary data checking, and
looking for errors in the data.