Sense of Regression

Embed Size (px)

Citation preview

  • 7/28/2019 Sense of Regression

    1/18

    Making Sense of Regression

    Results

    Kwamina BansonSocio-Economics Department

    30th 07 - 2009

    BNARI Seminar Room

  • 7/28/2019 Sense of Regression

    2/18

    Linear Regression: Introduction

    Interpreting SPSS regression output

    Coefficients for independent variables

    Fit of the regression: R Square

    Statistical significance

    How to reject the null hypothesis

    Multivariate regressionsAcademic Performance of Junior High Sch.

  • 7/28/2019 Sense of Regression

    3/18

    What is SPSS?

    SPSS is a computer program used for a wide variety of statistical

    analysis. (Statistical Package for the Social Sciences) Statistical

    Product and Service Solutions

    In addition to statistical analysis, data management and datadocumentation are features of the base software.

    Statistics included in the base software:

    Descriptive statistics: Cross tabulation, Frequencies, Descriptives, Explore,

    Descriptive Ratio Statistics Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial,

    distances), Nonparametric tests

    Prediction for numerical outcomes: Linear regression

    Prediction for identifying groups: Factor analysis, cluster analysis (two-

    step, K-means, hierarchical), Discriminant

  • 7/28/2019 Sense of Regression

    4/18

    Interpreting SPSS regression

    output

    Average SAT Score

    16001400120010008006004002000

    100

    80

    60

    40

    20

    0 Rsq = 0.3454

    How tight is

    the fit?

    Y-interceptor constant

    Slope orcoefficient

    y = mx + b.

    where m is theslope of the lineandb is the y-

    intercept

  • 7/28/2019 Sense of Regression

    5/18

    Interpreting SPSS regression

    output

    An SPSS regression output includes two

    key tables for interpreting your results:

    A Coefficients table that contains the y-intercept (or constant) of the regression, a

    coefficient for every independent variable,

    and the standard error of that coefficient.

    A Model Summary table that gives you

    information on the fit of your regression.

  • 7/28/2019 Sense of Regression

    6/18

    Interpreting SPSS regression

    output: Coefficients

    Coefficientsa

    4.236 7.048 .601 .549

    5.88E-02 .007 .588 8.778 .000

    (Constant)

    Average

    SAT Score

    Model

    1

    B

    Std.

    Error

    Unstandardized

    Coefficients

    Beta

    Standardized

    Coefficients

    t Sig.

    Dependent Variable: Graduation Ratea.

    Here, we willONLY LOOK AT

    UNSTANDARDIZED COEFFICIENTS!

    The y-intercept is 4.2% with a standard error of 7.0%

    The coefficient for SAT Scores is 0.059%, with a

    standard error of 0.007%.

    y = mx + b.

    where m is the

    slope of the lineandb is the y-intercept

  • 7/28/2019 Sense of Regression

    7/18

    Coefficientsa

    4.236 7.048 .601 .549

    5.88E-02 .007 .588 8.778 .000

    (Constant)

    Average

    SAT Score

    Model

    1

    B

    Std.

    Error

    Unstandardized

    Coefficients

    Beta

    Standardized

    Coefficients

    t Sig.

    Dependent Variable: Graduation Ratea.

    The y-intercept or constant is the predicted value of the dependent

    variable when the independent variable takes on the value of zero.

    This basic model predicts that when a college admits a class of

    students who averaged zero on their SAT, 4.2% of them will

    graduate.

    The constant is not the most helpful statistic.

    Interpreting SPSS regression

    output: Coefficients

    y = mx + b.

    where m is theslope of the lineandb is the y-

    intercept

  • 7/28/2019 Sense of Regression

    8/18

    Interpreting SPSS regression

    output: Coefficients

    Coefficientsa

    4.236 7.048 .601 .549

    5.88E-02 .007 .588 8.778 .000

    (Constant)

    Average

    SAT Score

    Model

    1

    B

    Std.

    Error

    Unstandardized

    Coefficients

    Beta

    Standardized

    Coefficients

    t Sig.

    Dependent Variable: Graduation Ratea.

    The coefficient of an independent variable is the predicted change in the

    dependent variable that results from a one unit increase in the

    independent variable.

    A college with students whose SAT scores are one point higher onaverage will have a graduation rate that is 0.059% higher.

    Increasing SAT scores by 200 points leads to a

    (200)(0.059%) = 11.8% rise in graduation rates

    y = mx + b.

    where m is theslope of the lineandb is the y-

    intercept

  • 7/28/2019 Sense of Regression

    9/18

    Interpreting SPSS regression

    output: Fit of the Regression

    Model Summary

    .588a .345 .341 12.45%

    Model

    1

    R R Square

    Adjusted

    R Square

    Std. Error of

    the Estimate

    Predictors: (Constant), Average SAT Scorea.

    The R Square measures how closely a regression line

    fits the data in a scatter plot.

    It can range from zero (no explanatory power) to one

    (perfect prediction).

    An R Square of 0.345 means that differences in SAT

    scores can explain 35% of the variation in college

    graduation rates.

  • 7/28/2019 Sense of Regression

    10/18

    Statistical Significance

    What would the null hypothesis look like

    in a scatterplot?

    If the independent variable has no effect onthe dependent variable, the scatterplot

    should look random, the regression line

    should be flat, and its slope should be zero.

    Null hypothesis: The regression coefficient

    for an independent variable equals zero.

  • 7/28/2019 Sense of Regression

    11/18

    Statistical Significance

  • 7/28/2019 Sense of Regression

    12/18

    Multivariate Regressions

    A multivariate regression uses more than

    one independent variable (or confound) to

    explain variation in a dependent variable.

    The coefficient for each independent variable

    reports its effect on the DV, holding constant all

    of the other IVs in the regression.

    Thought experiment: Looking at factors such asclass size, sch. feeding program, and credentials

    effect on academic performance of Junior High

    School

  • 7/28/2019 Sense of Regression

    13/18

    Let's perform a regression analysis using ap2000 as the outcome

    variable and the variables acs_JH, meals and full as predictors

    (ap2000)- These measure the academic performance of the school(

    acs_JH)- the average class size in Junior High Sch.

    (meals)- the percentage of students receiving free meals - which is an

    indicator of poverty, and

    (full)- the percentage of teachers who have full teaching credentials

    We expect that better academic performance would be

    associated with lower class size, fewer students receiving freemeals, and a higher percentage of teachers having full teaching

    credentials.

    Multivariate Regressions

  • 7/28/2019 Sense of Regression

    14/18

    Coefficients(a)

    Unstandardized

    CoefficientsStandardize

    d

    Coefficients t Sig.Model B Std. Error Beta

    1

    (Constant) 906.739 28.265 32.080 .000ACS_JH -2.682 1.394 -.064 -1.924 .055MEALS -3.702 .154 -.808 -24.038 .000FULL .109 .091 .041 1.197 .232

    a Dependent Variable: AP2000

    Multivariate Regressions

    Model Summary

    Model

    R

    R Square Adjusted R

    SquareStd. Error of the

    Estimate1 .821(a) .674 .671 64.153a Predictors: (Constant), FULL, ACS_JH, MEALS

    An R Square of 0.674 means th at differences in ACS-JH, MEALS and FULL can exp lain 67%

    of the variat ion in academic perform ance rates.

  • 7/28/2019 Sense of Regression

    15/18

    Coefficients(a)

    Unstandardized

    CoefficientsStandardize

    d

    Coefficients t Sig.Model B Std. Error Beta

    1

    (Constant) 906.739 28.265 32.080 .000ACS_JH -2.682 1.394 -.064 -1.924 .055MEALS -3.702 .154 -.808 -24.038 .000FULL .109 .091 .041 1.197 .232

    a Dependent Variable: AP2000

    The average class size (acs_JH, b=-2.682) is not significant(p=0.055), but the coefficient is negative which would indicate

    that larger class sizes is related to lower academic

    performance -- which is what we would expect.

    Multivariate Regressions

  • 7/28/2019 Sense of Regression

    16/18

    Coefficients(a)

    Unstandardized

    CoefficientsStandardize

    d

    Coefficients t Sig.Model B Std. Error Beta

    1

    (Constant) 906.739 28.265 32.080 .000ACS_JH -2.682 1.394 -.064 -1.924 .055MEALS -3.702 .154 -.808 -24.038 .000FULL .109 .091 .041 1.197 .232

    a Dependent Variable: AP2000Next, the effect ofmeals (b=-3.702, p=.000) is significant and its coefficient is

    negative indicating that the greater the proportion students receiving free meals, the

    lower the academic performance.Please note that we are not saying that free meals are causing lower academic

    performance. The meals var iable is highly related to incom e level and func t ions mo re as

    a proxy for pov erty. Thus, higher levels of poverty are associated with low er academic

    performance. This result also makes sense.

    Multivariate Regressions

  • 7/28/2019 Sense of Regression

    17/18

    Coefficients(a)

    Unstandardized

    CoefficientsStandardize

    d

    Coefficients t Sig.Model B Std. Error Beta

    1

    (Constant) 906.739 28.265 32.080 .000ACS_JH -2.682 1.394 -.064 -1.924 .055MEALS -3.702 .154 -.808 -24.038 .000FULL .109 .091 .041 1.197 .232

    a Dependent Variable: AP2000

    Finally, the percentage of teachers with full credentials (full, b=0.109,

    p=.2321) seems to be unrelated to academic performance. This would

    seem to indicate that the percentage of teachers with full credentials is notan important factor in predicting academic performance

    - this result was somewhat unexpected.

    Multivariate Regressions

  • 7/28/2019 Sense of Regression

    18/18

    Should we take these results and

    write them up for publication?

    From these results, we would conclude that :

    lower class sizes are related to higher performance,

    that fewer students receiving free meals is associated with higher

    performance, and

    that the percentage of teachers with full credentials was not related

    to academic performance in the schools.

    Before we write this up for publication, we should do a number of

    checks to make sure we can firmly stand behind these results.

    We start by

    getting more familiar with the data file, doing preliminary data checking, and

    looking for errors in the data.