Upload
fa2heem
View
214
Download
0
Embed Size (px)
Citation preview
8/11/2019 Chapter 07MultipleRegression
1/51
Multiple Regression
Dr. Andy Field
8/11/2019 Chapter 07MultipleRegression
2/51
Slide 2
Aims
Understand When To Use Multiple Regression. Understand the multiple regression equation andwhat the betas represent.
Understand Different Methods of Regression Hierarchical Stepwise Forced Entry
Understand How to do a Multiple Regression onPASW/SPSS
Understand how to Interpret multiple regression. Understand the Assumptions of Multiple
Regression and how to test them
8/11/2019 Chapter 07MultipleRegression
3/51
8/11/2019 Chapter 07MultipleRegression
4/51
Regression: An Example A record company boss was interested in
predicting record sales from advertising. Data
200 different album releases Outcome variable:
Sales (CDs and Downloads) in the week afterrelease
Predictor variables The amount (in s) spent promoting the recordbefore release (see last lecture)
Number of plays on the radio (new variable)
8/11/2019 Chapter 07MultipleRegression
5/51
Slide 5
The Model with One Predictor
8/11/2019 Chapter 07MultipleRegression
6/51
Slide 6
Multiple Regression as an Equation
With multiple regression therelationship is described usinga variation of the equation of astraight line.
110 X bb y inn X b X b 22
8/11/2019 Chapter 07MultipleRegression
7/51
Slide 7
b0
b0 is the intercept .
The intercept is the value of the Yvariable when all Xs = 0.
This is the point at which theregression plane crosses the Y-axis (vertical).
8/11/2019 Chapter 07MultipleRegression
8/51
Slide 8
Beta Values
b1 is the regression coefficient forvariable 1.
b2 is the regression coefficient forvariable 2.
bn is the regression coefficient for nth
variable.
8/11/2019 Chapter 07MultipleRegression
9/51
Slide 9
The Model with Two Predictors
bAdverts
b airplay
b0
8/11/2019 Chapter 07MultipleRegression
10/51
Slide 10
Methods of Regression
Hierarchical: Experimenter decides the order in which
variables are entered into the model. Forced Entry:
All predictors are entered simultaneously. Stepwise:
Predictors are selected using their semi-partial correlation with the outcome.
8/11/2019 Chapter 07MultipleRegression
11/51
8/11/2019 Chapter 07MultipleRegression
12/51
Slide 12
Hierarchical Regression
Known predictors (based on pastresearch) are entered into the
regression model first. New predictors are then entered in a
separate step/block. Experimenter makes the decisions.
8/11/2019 Chapter 07MultipleRegression
13/51
Slide 13
Hierarchical Regression
It is the best method: Based on theory testing. You can see the unique predictive
influence of a new variable on theoutcome because known predictorsare held constant in the model.
Bad Point: Relies on the experimenter knowing
what theyre doing!
8/11/2019 Chapter 07MultipleRegression
14/51
Slide 14
Forced Entry Regression
All variables are entered into themodel simultaneously.
The results obtained depend on thevariables entered into the model.
It is important, therefore, to have goodtheoretical reasons for including aparticular variable.
8/11/2019 Chapter 07MultipleRegression
15/51
8/11/2019 Chapter 07MultipleRegression
16/51
ExamPerformance
RevisionTime
Variance
accounted forby Revision
Time (33.1%)
Previous Exam
Varianceexplained
(1.7%)
Difficulty
Varianceexplained
(1.3%)
8/11/2019 Chapter 07MultipleRegression
17/51
8/11/2019 Chapter 07MultipleRegression
18/51
Slide 18
Stepwise Regression II
Step 2: Having selected the 1 st predictor, a
second one is chosen from theremaining predictors.
The semi-partial correlation is usedas a criterion for selection.
8/11/2019 Chapter 07MultipleRegression
19/51
Slide 19
Semi-Partial Correlation Partial correlation:
measures the relationship between twovariables, controlling for the effect that a
third variable has on them both. A semi-partial correlation:
Measures the relationship between twovariables controlling for the effect that athird variable has on only one of the others.
8/11/2019 Chapter 07MultipleRegression
20/51
Slide 20
Partial CorrelationSemi-PartialCorrelation
8/11/2019 Chapter 07MultipleRegression
21/51
Slide 21
Semi-Partial Correlation in Regression
The semi-partial correlation Measures the relationship between a
predictor and the outcome, controllingfor the relationship between thatpredictor and any others already in themodel.
It measures the unique contribution of apredictor to explaining the variance ofthe outcome.
8/11/2019 Chapter 07MultipleRegression
22/51
Slide 22
8/11/2019 Chapter 07MultipleRegression
23/51
Slide 23
Problems with Stepwise Methods
Rely on a mathematical criterion. Variable selection may depend upon only
slight differences in the Semi-partialcorrelation.
These slight numerical differences can leadto major theoretical differences.
Should be used only for exploration
8/11/2019 Chapter 07MultipleRegression
24/51
Slide 24
Doing Multiple Regression
8/11/2019 Chapter 07MultipleRegression
25/51
Slide 25
Doing Multiple Regression
8/11/2019 Chapter 07MultipleRegression
26/51
Regression Statistics
8/11/2019 Chapter 07MultipleRegression
27/51
RegressionDiagnostics
8/11/2019 Chapter 07MultipleRegression
28/51
Slide 28
Output: Model Summary
8/11/2019 Chapter 07MultipleRegression
29/51
Slide 29
R and R2 R
The correlation between the observedvalues of the outcome, and the valuespredicted by the model.
R2 Yhe proportion of variance accounted for by
the model.
Adj. R2 An estimate of R2 in the population
(shrinkage ).
8/11/2019 Chapter 07MultipleRegression
30/51
8/11/2019 Chapter 07MultipleRegression
31/51
Slide 31
Analysis of Variance: ANOVA The F-test
looks at whether the varianceexplained by the model (SS M) is
significantly greater than the errorwithin the model (SS R). It tells us whether using the regression
model is significantly better atpredicting values of the outcome thanusing the mean.
8/11/2019 Chapter 07MultipleRegression
32/51
Slide 32
Output: betas
8/11/2019 Chapter 07MultipleRegression
33/51
Slide 33
How to Interpret Beta Values
Beta values: the change in the outcome associated
with a unit change in the predictor. Standardised beta values:
tell us the same but expressed asstandard deviations.
8/11/2019 Chapter 07MultipleRegression
34/51
8/11/2019 Chapter 07MultipleRegression
35/51
Slide 35
Constructing a Model
plays3589Adverts087.041124Sales22110 X b X bb y
181959538358700041124
153589000,000,1087.041124Sales
plays15g,AdvertisinMillion1
8/11/2019 Chapter 07MultipleRegression
36/51
8/11/2019 Chapter 07MultipleRegression
37/51
Slide 37
Interpreting Standardised Betas As advertising increases by 485,655,
record sales increase by 0.523 80,699= 42,206.
If the number of plays on radio 1 perweek increases by 12, record salesincrease by 0.546 80,699 = 44,062.
8/11/2019 Chapter 07MultipleRegression
38/51
Reporting the Model
8/11/2019 Chapter 07MultipleRegression
39/51
Slide 39
How well does the Model fit thedata?
There are two ways to assess theaccuracy of the model in the sample:
Residual Statistics Standardized Residuals
Influential cases Cooks distance
8/11/2019 Chapter 07MultipleRegression
40/51
Slide 40
Standardized Residuals In an average sample, 95% of
standardized residuals should liebetween 2.
99% of standardized residuals shouldlie between 2.5.
Outliers
Any case for which the absolute value ofthe standardized residual is 3 or more, islikely to be an outlier.
8/11/2019 Chapter 07MultipleRegression
41/51
Slide 41
Cooks Distance
Measures the influence of a singlecase on the model as a whole.
Weisberg (1982): Absolute values greater than 1 may be
cause for concern.
8/11/2019 Chapter 07MultipleRegression
42/51
Slide 42
Generalization
When we run regression, we hope to beable to generalize the sample model tothe entire population.
To do this, several assumptions must bemet.
Violating these assumptions stops usgeneralizing conclusions to our targetpopulation.
8/11/2019 Chapter 07MultipleRegression
43/51
Slide 43
Straightforward Assumptions
Variable Type: Outcome must be continuous Predictors can be continuous or dichotomous.
Non-Zero Variance: Predictors must not have zero variance.
Linearity: The relationship we model is, in reality, linear.
Independence: All values of the outcome should come from a
different person.
8/11/2019 Chapter 07MultipleRegression
44/51
Slide 44
The More Tricky Assumptions No Multicollinearity:
Predictors must not be highly correlated. Homoscedasticity:
For each value of the predictors the variance of theerror term should be constant.
Independent Errors: For any pair of observations, the error terms should be
uncorrelated. Normally-distributed Errors
8/11/2019 Chapter 07MultipleRegression
45/51
8/11/2019 Chapter 07MultipleRegression
46/51
Tolerance should be more than 0.2(Menard, 1995)
VIF should be less than 10 (Myers, 1990)
8/11/2019 Chapter 07MultipleRegression
47/51
8/11/2019 Chapter 07MultipleRegression
48/51
Regression Plots
8/11/2019 Chapter 07MultipleRegression
49/51
Homoscedasticity:ZRESID vs. ZPRED
GoodBad
8/11/2019 Chapter 07MultipleRegression
50/51
Normality of Errors: Histograms
Good Bad
8/11/2019 Chapter 07MultipleRegression
51/51
Normality of Errors: NormalProbability Plot
Normal P-P Plot of Regression
Standardized Residual
Dependent Variable: Outcome
Observed Cum Prob
1.00.75.50.250.00
1.00
.75
.50
.25
0.00