Chapter 07MultipleRegression

  • Upload
    fa2heem

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

  • 8/11/2019 Chapter 07MultipleRegression

    1/51

    Multiple Regression

    Dr. Andy Field

  • 8/11/2019 Chapter 07MultipleRegression

    2/51

    Slide 2

    Aims

    Understand When To Use Multiple Regression. Understand the multiple regression equation andwhat the betas represent.

    Understand Different Methods of Regression Hierarchical Stepwise Forced Entry

    Understand How to do a Multiple Regression onPASW/SPSS

    Understand how to Interpret multiple regression. Understand the Assumptions of Multiple

    Regression and how to test them

  • 8/11/2019 Chapter 07MultipleRegression

    3/51

  • 8/11/2019 Chapter 07MultipleRegression

    4/51

    Regression: An Example A record company boss was interested in

    predicting record sales from advertising. Data

    200 different album releases Outcome variable:

    Sales (CDs and Downloads) in the week afterrelease

    Predictor variables The amount (in s) spent promoting the recordbefore release (see last lecture)

    Number of plays on the radio (new variable)

  • 8/11/2019 Chapter 07MultipleRegression

    5/51

    Slide 5

    The Model with One Predictor

  • 8/11/2019 Chapter 07MultipleRegression

    6/51

    Slide 6

    Multiple Regression as an Equation

    With multiple regression therelationship is described usinga variation of the equation of astraight line.

    110 X bb y inn X b X b 22

  • 8/11/2019 Chapter 07MultipleRegression

    7/51

    Slide 7

    b0

    b0 is the intercept .

    The intercept is the value of the Yvariable when all Xs = 0.

    This is the point at which theregression plane crosses the Y-axis (vertical).

  • 8/11/2019 Chapter 07MultipleRegression

    8/51

    Slide 8

    Beta Values

    b1 is the regression coefficient forvariable 1.

    b2 is the regression coefficient forvariable 2.

    bn is the regression coefficient for nth

    variable.

  • 8/11/2019 Chapter 07MultipleRegression

    9/51

    Slide 9

    The Model with Two Predictors

    bAdverts

    b airplay

    b0

  • 8/11/2019 Chapter 07MultipleRegression

    10/51

    Slide 10

    Methods of Regression

    Hierarchical: Experimenter decides the order in which

    variables are entered into the model. Forced Entry:

    All predictors are entered simultaneously. Stepwise:

    Predictors are selected using their semi-partial correlation with the outcome.

  • 8/11/2019 Chapter 07MultipleRegression

    11/51

  • 8/11/2019 Chapter 07MultipleRegression

    12/51

    Slide 12

    Hierarchical Regression

    Known predictors (based on pastresearch) are entered into the

    regression model first. New predictors are then entered in a

    separate step/block. Experimenter makes the decisions.

  • 8/11/2019 Chapter 07MultipleRegression

    13/51

    Slide 13

    Hierarchical Regression

    It is the best method: Based on theory testing. You can see the unique predictive

    influence of a new variable on theoutcome because known predictorsare held constant in the model.

    Bad Point: Relies on the experimenter knowing

    what theyre doing!

  • 8/11/2019 Chapter 07MultipleRegression

    14/51

    Slide 14

    Forced Entry Regression

    All variables are entered into themodel simultaneously.

    The results obtained depend on thevariables entered into the model.

    It is important, therefore, to have goodtheoretical reasons for including aparticular variable.

  • 8/11/2019 Chapter 07MultipleRegression

    15/51

  • 8/11/2019 Chapter 07MultipleRegression

    16/51

    ExamPerformance

    RevisionTime

    Variance

    accounted forby Revision

    Time (33.1%)

    Previous Exam

    Varianceexplained

    (1.7%)

    Difficulty

    Varianceexplained

    (1.3%)

  • 8/11/2019 Chapter 07MultipleRegression

    17/51

  • 8/11/2019 Chapter 07MultipleRegression

    18/51

    Slide 18

    Stepwise Regression II

    Step 2: Having selected the 1 st predictor, a

    second one is chosen from theremaining predictors.

    The semi-partial correlation is usedas a criterion for selection.

  • 8/11/2019 Chapter 07MultipleRegression

    19/51

    Slide 19

    Semi-Partial Correlation Partial correlation:

    measures the relationship between twovariables, controlling for the effect that a

    third variable has on them both. A semi-partial correlation:

    Measures the relationship between twovariables controlling for the effect that athird variable has on only one of the others.

  • 8/11/2019 Chapter 07MultipleRegression

    20/51

    Slide 20

    Partial CorrelationSemi-PartialCorrelation

  • 8/11/2019 Chapter 07MultipleRegression

    21/51

    Slide 21

    Semi-Partial Correlation in Regression

    The semi-partial correlation Measures the relationship between a

    predictor and the outcome, controllingfor the relationship between thatpredictor and any others already in themodel.

    It measures the unique contribution of apredictor to explaining the variance ofthe outcome.

  • 8/11/2019 Chapter 07MultipleRegression

    22/51

    Slide 22

  • 8/11/2019 Chapter 07MultipleRegression

    23/51

    Slide 23

    Problems with Stepwise Methods

    Rely on a mathematical criterion. Variable selection may depend upon only

    slight differences in the Semi-partialcorrelation.

    These slight numerical differences can leadto major theoretical differences.

    Should be used only for exploration

  • 8/11/2019 Chapter 07MultipleRegression

    24/51

    Slide 24

    Doing Multiple Regression

  • 8/11/2019 Chapter 07MultipleRegression

    25/51

    Slide 25

    Doing Multiple Regression

  • 8/11/2019 Chapter 07MultipleRegression

    26/51

    Regression Statistics

  • 8/11/2019 Chapter 07MultipleRegression

    27/51

    RegressionDiagnostics

  • 8/11/2019 Chapter 07MultipleRegression

    28/51

    Slide 28

    Output: Model Summary

  • 8/11/2019 Chapter 07MultipleRegression

    29/51

    Slide 29

    R and R2 R

    The correlation between the observedvalues of the outcome, and the valuespredicted by the model.

    R2 Yhe proportion of variance accounted for by

    the model.

    Adj. R2 An estimate of R2 in the population

    (shrinkage ).

  • 8/11/2019 Chapter 07MultipleRegression

    30/51

  • 8/11/2019 Chapter 07MultipleRegression

    31/51

    Slide 31

    Analysis of Variance: ANOVA The F-test

    looks at whether the varianceexplained by the model (SS M) is

    significantly greater than the errorwithin the model (SS R). It tells us whether using the regression

    model is significantly better atpredicting values of the outcome thanusing the mean.

  • 8/11/2019 Chapter 07MultipleRegression

    32/51

    Slide 32

    Output: betas

  • 8/11/2019 Chapter 07MultipleRegression

    33/51

    Slide 33

    How to Interpret Beta Values

    Beta values: the change in the outcome associated

    with a unit change in the predictor. Standardised beta values:

    tell us the same but expressed asstandard deviations.

  • 8/11/2019 Chapter 07MultipleRegression

    34/51

  • 8/11/2019 Chapter 07MultipleRegression

    35/51

    Slide 35

    Constructing a Model

    plays3589Adverts087.041124Sales22110 X b X bb y

    181959538358700041124

    153589000,000,1087.041124Sales

    plays15g,AdvertisinMillion1

  • 8/11/2019 Chapter 07MultipleRegression

    36/51

  • 8/11/2019 Chapter 07MultipleRegression

    37/51

    Slide 37

    Interpreting Standardised Betas As advertising increases by 485,655,

    record sales increase by 0.523 80,699= 42,206.

    If the number of plays on radio 1 perweek increases by 12, record salesincrease by 0.546 80,699 = 44,062.

  • 8/11/2019 Chapter 07MultipleRegression

    38/51

    Reporting the Model

  • 8/11/2019 Chapter 07MultipleRegression

    39/51

    Slide 39

    How well does the Model fit thedata?

    There are two ways to assess theaccuracy of the model in the sample:

    Residual Statistics Standardized Residuals

    Influential cases Cooks distance

  • 8/11/2019 Chapter 07MultipleRegression

    40/51

    Slide 40

    Standardized Residuals In an average sample, 95% of

    standardized residuals should liebetween 2.

    99% of standardized residuals shouldlie between 2.5.

    Outliers

    Any case for which the absolute value ofthe standardized residual is 3 or more, islikely to be an outlier.

  • 8/11/2019 Chapter 07MultipleRegression

    41/51

    Slide 41

    Cooks Distance

    Measures the influence of a singlecase on the model as a whole.

    Weisberg (1982): Absolute values greater than 1 may be

    cause for concern.

  • 8/11/2019 Chapter 07MultipleRegression

    42/51

    Slide 42

    Generalization

    When we run regression, we hope to beable to generalize the sample model tothe entire population.

    To do this, several assumptions must bemet.

    Violating these assumptions stops usgeneralizing conclusions to our targetpopulation.

  • 8/11/2019 Chapter 07MultipleRegression

    43/51

    Slide 43

    Straightforward Assumptions

    Variable Type: Outcome must be continuous Predictors can be continuous or dichotomous.

    Non-Zero Variance: Predictors must not have zero variance.

    Linearity: The relationship we model is, in reality, linear.

    Independence: All values of the outcome should come from a

    different person.

  • 8/11/2019 Chapter 07MultipleRegression

    44/51

    Slide 44

    The More Tricky Assumptions No Multicollinearity:

    Predictors must not be highly correlated. Homoscedasticity:

    For each value of the predictors the variance of theerror term should be constant.

    Independent Errors: For any pair of observations, the error terms should be

    uncorrelated. Normally-distributed Errors

  • 8/11/2019 Chapter 07MultipleRegression

    45/51

  • 8/11/2019 Chapter 07MultipleRegression

    46/51

    Tolerance should be more than 0.2(Menard, 1995)

    VIF should be less than 10 (Myers, 1990)

  • 8/11/2019 Chapter 07MultipleRegression

    47/51

  • 8/11/2019 Chapter 07MultipleRegression

    48/51

    Regression Plots

  • 8/11/2019 Chapter 07MultipleRegression

    49/51

    Homoscedasticity:ZRESID vs. ZPRED

    GoodBad

  • 8/11/2019 Chapter 07MultipleRegression

    50/51

    Normality of Errors: Histograms

    Good Bad

  • 8/11/2019 Chapter 07MultipleRegression

    51/51

    Normality of Errors: NormalProbability Plot

    Normal P-P Plot of Regression

    Standardized Residual

    Dependent Variable: Outcome

    Observed Cum Prob

    1.00.75.50.250.00

    1.00

    .75

    .50

    .25

    0.00