View
221
Download
1
Category
Tags:
Preview:
Citation preview
1
Psych 5510/6510
Chapter 10. Interactions and Polynomial Regression: Models with Products of Continuous
Predictors
Spring, 2009
2
Broadening the Scope
So far we have been limiting our models by ignoring the possibility that the predictor variables might interact, and by using only straight lines for our regression (i.e. ‘linear’ regression). This chapter provides an approach that allows us to add both the interaction of variables and nonlinear regression to our models.
3
Our ‘Running’ Example
Throughout this chapter we will be working with the following example:
Y is the time (in minutes) taken to run a 5-kilometer race.
X1 is the age of the runner
X2 is how many miles per week the runner ran when in training for the race.
4
‘On Your Mark’
We will begin by taking another perspective of what we have been doing so far in the text, and then use that perspective to understand interactions and nonlinear regression.
5
Time and AgeThe analysis of the data leads to the following
‘simple’ relationship between Time (Y) and Age (X1).
MODEL C: Ŷi=β0
MODEL A: Ŷi=β0+β1X1i
Ŷi=15.104 + .213X1i
PRE=.218. F*=21.7, p<.01
6
Simple Relationship between Time and Age
7
Time and MilesThe simple relationship between Time (Y) and
Miles of Training (X2).
MODEL C: Ŷi=β0
MODEL A: Ŷi=β0+β2X2i
Ŷi=31.91 - .280X2i
PRE=.535. F*=89.6, p<.01
8
Simple Relationship between Race Time and Miles of Training
9
Both PredictorsNow regress Y, on both Age (X1), and Miles of
Training (X2).
MODEL C: Ŷi=β0
MODEL A: Ŷi=β0 +β1X1i +β2X2i
Ŷi=24.716 + 1.65X1i - .258X2i
PRE=.662. F*=75.55, p<.01
10
‘Get Set’
Now we will develop another way to think about multiple regression, one that re-expresses multiple regression in the form of a simple regression.
We will start with the Age (X1).
The simple regression of Y on X1 has this form:
Ŷi=(intercept) + (slope)X1i
11
The multiple regression model is:Ŷi=24.716 + 1.65X1i - .258X2i
We can make the multiple regression model fit the simple regression form:
Ŷi= (intercept) + (slope)X1i
Ŷi= (24.716 - .258X2i) + (1.65)X1i
When X2=10, then Ŷi= (22.136) + (1.65)X1i
When X2=30, then Ŷi= (16.976) + (1.65)X1i
From this it is clear that the value of X2 can be thought of as changing the intercept of the simple regression of Y on X1, without changing its slope.
12
The simple relationship of Time (Y) and Age (X1) at various levels of Training Miles (X2)
13
Of course we can also work the other direction, and change the multiple regression formula to examine the simple regression of Time (Y) on Miles of Training (X2)
14
The multiple regression model is:Ŷi=24.716 + 1.65X1i - .258X2i
We can make the multiple regression model fit the simple regression form:
Ŷi= (intercept) + (slope)X2i
Ŷi= (24.716 +1.65 X1i) + (-.258)X2i
When X1=20, then Ŷi= (57.716) + (-.258)X2i
When X1=60, then Ŷi= (123.72) + (-.258)X2i
From this it is clear that the value of X1 can be thought of as changing the intercept of the simple regression of Y on X2, without changing its slope.
15
The simple relationship of Time (Y) and Training Miles (X2)at various levels of Age (X1)
16
Additive Model
When we look at these simplified models it is clear that the effect of one variable gets added to the effect of the other, moving the line up or down the Y axis but not changing the slope.
This is known as the ‘additive model’.
17
Interactions Between Predictor Variables
Let’s take a look at a non-additive model. In this case, we raise the possibility that the relationship between age (X1) and time (Y) may differ across levels of the other predictor variable miles of training (X2). To say that the relationship between X1 and Y may differ across levels of X2 is to say that the slope of the regression line of Y on X1 may differ across levels of X2.
18
The slope of the relationship between age and time is less for runners who trained a lot than for those trained less.
Non-Additive Relationship Between X1 and X2
19
Interaction=Non-Additive
Predictor variables interact when the value of one variable influences the relationship (i.e. slope) between the other predictor variables and Y.
20
Interaction & Redundancy
Whether or not there is an interaction between two variables in predicting a third is an issue that is totally independent of whether or not the two predictor variables are redundant with each other. Expunge from your mind any connection between these two issues (if it was there in the first place).
21
Adding Interaction to the Model
To add an interaction between variables to the model, simply add a new variable that is the product of the other two (i.e. create a new variable whose values are the score on X1 times the score on X2), then do a linear regression on that new model:
Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)
Ŷi=19.20 +.302X1i +(-.076)X2i +(-.005)(X1iX2i)
22
Testing Significance of the Interaction
Test significance as you always do using the model comparison approach.
First, to test the overall model that includes the interaction term:
Model C: Ŷi=β0
Model A: Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)
H0: β1 = β2 = β3 =0HA: at least one of those betas is not zero.
23
Testing Significance of the Interaction
Second, to test whether adding the interaction term is worthwhile compared to a purely additive model:
Model C: Ŷi= β0 +β1X1i +β2X2i
Model A: Ŷi= β0 +β1X1i +β2X2i +β3(X1iX2i)
H0: β3=0
HA: β30
The test of the partial regression coefficient gives you:PRE=.055, PC=3, PA=4, F*=4.4, p=.039
24
Understanding the Interaction of Predictor Variables
To develop an understanding of the interaction of predictor variables, we will once again take the full model:
Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)
And translate it into the form of the simple relationship of one predictor variable (X1) and Y:
Ŷi=(intercept) + (slope)X1i
25
‘Go”Full model:
Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i) =
Ŷi=β0 +β2X2i +β1X1i +β3(X1iX2i) =
Ŷi=β0 +β2X2i +(β1 +β3X2i ) X1i
Simple relationship of Y (time) and X1 (age):
Ŷi= (intercept) + (slope)X1i
Ŷi= (β0 +β2X2i) + (β1 +β3X2i )X1i
26
Simple Relationship of Y (Time) and X1 (Age)
Ŷi= (intercept) + (slope)X1i
Ŷi= (β0 +β2X2i) + (β1 +β3X2i )X1i
It is clear in examining the relationship between X1 and Y, that the value of X2 influences both the intercept and the slope of that relationship.
27
Simple Relationship of Time and Age (cont.)
Ŷi= (intercept) + (slope)X1i
Ŷi= (β0 +β2X2i) + (β1 +β3X2i )X1i
b0=19.20 b1=.302 b2=-.076 b3=-.005
Ŷi=(19.20 +-.076X2i) + (.302 +-.005X2i )X1i
When X2 (i.e. miles) =10, thenŶi=18.44 + .252X1i
When X2 (i.e. miles) =50, thenŶi=15.4 + .052X1i
28
Interactive Model
29
Simple Relationship of Y (Time) and X2 (Miles)
Full model: Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i) =
Ŷi= β0 +β1X1i +(β2 +β3X1i ) X2i
Simple relationship of Y (time) and X2 (miles):
Ŷi=(intercept) + (slope)X2i
Ŷi=(β0 +β1X1i) + (β2 +β3X1i )X2i
30
Simple Relationship of Time and Miles (cont.)
Ŷi= (intercept) + (slope)X2i
Ŷi= (β0 +β1X1i) + (β2 +β3X1i )X2i
b0=19.20 b1=.302 b2=-.076 b3=-.005
Ŷi=(19.20 +.302X1i) + (-.076 +-.005X2i )X1i
When X1 (i.e. age) =60, thenŶi=37.32 - .376X2i
When X1 (i.e. age) =20, thenŶi=25.24 –.176X2i
31
Interactive Model
32
Back to the AnalysisWe’ve already looked at how you test to see if it
is worthwhile to move from the additive model to the interactive model:
Model C: Ŷi= β0 +β1X1i +β2X2i
Model A: Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)
H0: β3=0
HA: β30
The next topic involves the interpretation of the partial regression coefficients.
33
Interpreting Partial Regression Coefficients
Ŷi= β0 +β1X1i +β2X2i
Additive model: we’ve covered this in previous chapters. The values of β1 and β2 are the slopes of the regression of Y on that variable when the other variable is held constant (i.e. the slope across values of the other variable). Look back at the scatterplots for the additive model, β1 is the slope of the relationship between Y and X1 across various values of X2, note that the slope doesn’t change.
34
Interpreting Partial Regression Coefficients
Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)
Interactive model: when X1 and X2 interact, then the slope of the relationship between Y and X1 changes across values of X2 so what does β1 reflect?
Answer: β1 is the slope of the relationship between Y and X1 when X2=0. Note: the slope will be different for other values of X2.
Likewise: β2 is the slope of the relationship between Y and X2 when X1=0.
35
Interpreting β1 and β2 (cont.)
So, β1 is the slope of the regression of Y on X1 when X2=0, or in other words, the slope of the regression of Time on Age for runners who trained 0 miles per week (even though none of our runners trained that little).
β2 is the slope of the regression of Y on X2 when X1=0, or in other words, the slope of the regression of Time on Miles for runners who are 0 years old!
This is not what we are interested in!
36
Better Alternative
A better alternative for when scores of zero in our predictor variables are not of interest, is to use mean deviation scores instead (this is called ‘centering’ our data):
Then regress Y on X’1 and X’2
Ŷi=β0 +β1X’1i +β2X’2i +β3(X’1iX’2i)
22i2i11i1i XXX' XXX'
37
Interpreting β1 and β2 Now
So, β1 is still the slope of the regression of Y on X1 when X’2=0, but now X’2=0 when X2=the mean of X2, which is much more relevant, we now have the relationship between Time and Age for runners who trained an average amount.
β2 is the slope of the regression of Y on X2 when X’1=0, but now X’1=0 when X1=the mean of X1, i.e., we now have the relationship between Time and Miles for runners who were at the average age of our sample.
38
Interpreting β0
For the model:Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)
β0 is the value of Y when all the predictor scores equal zero (rarely of interest)
For the model:Ŷi=β0 +β1X’1i +β2X’2i +β3(X’1iX’2i)
β0 = μY (due to the use of mean deviation scores) and the confidence interval for β0 is thus the confidence interval for μY
39
Interpreting β3
Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)
β3 represents how much the slope changes in one variable as the other variable changes by 1. It is not influenced by whether you use X1 or X’ 1, or X2 or X’2. So β3 would be the same in both of the following models:
Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i) Ŷi=β0 +β1X’1i +β2X’2i +β3(X’1iX’2i)
But the values of β0 , β1 and β2 would be different in the two models.
40
Interpreting β3 (cont.)
Important note: β3 represents the interaction of X1 and X2 only when both of those variables are included by themselves in the model.
For example, in the following model β3 would not represent the interaction of X1 and X2 because β2X2i is not included in the model:
Ŷi=β0 +β1X1i +β3(X1iX2i)
41
Other TransformationsAs we have seen, using X’=(X-mean of X) allows
us to have meaningful β’s, as the partial regression coefficient is the simple relationship of the corresponding variable when the other variable equals its mean.
We can use other transformations. X1i”=(X1i-50) allows us to look at the simple relationship between miles (X2) and time (Y) when age (X1)=50.
42
Regular model:Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)
Model with transformed X1i Ŷi=β0 +β1X”1i +β2X2i +β3(X’’1iX2i)
Transforming the X1i score to X”1i will: 1. Affect the value of β2 (as it now gives the slope for the
relationship between X2 and Y when X1=50).
2. Will not affect B1 (the slope of the relationship between X1 and Y when X2=0).
3. Will not affect B3 (the slope of the interaction term is not affected by transformations of its components as long as all components are included in the model).
43
Power Considerations
The confidence interval formula is the same for all partial regression coefficients, whether of interactive terms or not:
2
p2
pip
crit1p.12...p
R1XX
MSEFb
44
Power Considerations
Smaller confidence intervals mean more power:1. Smaller MSE (i.e. error in the model) means more
power.
2. Larger tolerance (1-R²) means more power.
2
p2
pip
crit1p.12...p
R1XX
MSEFb
45
Power, Transformations, and Redundancy
If you use transformed scores (e.g. mean deviations) then it can affect the redundancy of the interaction term with its component terms (which should then affect the confidence intervals and thus affect power) but this change in redundancy is completely counterbalanced by changes in MSE. Thus using transformed scores will not affect the confidence intervals or power. So...
46
The Point Being...
If your stat package won’t let you include an interaction term because it is too redundant with its component terms (i.e. its tolerance is too low) then you can try using mean deviation component terms (which will change the redundancy of the interaction term with it components without altering the confidence interval of the interaction term).
47
Polynomial (Non-linear)Regressions
What we have learned about how to examine the interaction of variables also provides exactly what we need to see if there might be non-linear relationships between the predictor variables and the criterion variable (Y).
48
Polynomial (Non-linear)Regressions
Let’s say we suspect that the relationship between Time and Miles is not the same across all levels of Miles. In other words, adding 5 more miles per week of training when you are currently at 10 miles per week, will have a different effect than adding 5 more miles when you are currently training at 50 miles per week.
To say that Miles+5 has a different effect when Miles=10 then when Miles=50 is to say that the slope is different at 10 than at 50.
49
X2 Interacting With Itself
In essence we are saying that X2 is interacting with itself.
Previous model:Ŷi=β0 +β1X1i +β2X2i +β3(X1iX2i)
This model (ignore X1 and use X2 twice)
Ŷi=β0 +β1X2i +β2X2i +β3(X2iX2i)
50
Interaction Model
Ŷi=β0 +β1X2i +β2X2i +β3(X2iX2i), or,
Ŷi=β0 +β1X2i +β2X2i +β3(X2i²)
However, we cannot calculate the b’s because the variables that go with β1 and β2 are completely redundant (they are the same variable, thus tolerance =0), so we drop one of them (which makes conceptual sense in terms of model building), and get:
Ŷi=β0 +β1X2i +β2(X2i²)
51
In Terms of Simple Relationship
Now let’s once again organize this into the simple relationship between Y and X2 so we can see how it works.
Model: Ŷi= β0 +β1X2i+β2 X2i²
Ŷi= (intercept) + (slope)X2i
Ŷi= (β0 –β2X2i²) + (β1 +2β2X2i )X2i
Where did those terms for the intercept and slope come from? I’ll show you later, right now take my word for it.
52
Simple Relationship (cont.)
Ŷi= (intercept) + (slope)X2i
Ŷi= (β0 –β2X2i²) + (β1 +2β2X2i )X2i
Note that the value of X2 influences both the intercept and the slope of its simple relationship with Y. Thus the relationship (i.e. the slope) between X2 and Y changes across values of the predictor variable.
53
Ŷi=b0 +b1X2i +b2(X2i²)
b0=37.47 b1=.753 b2=.008
Ŷi= 37.47 +.753X2i +.008 (X2i²)
Ŷi= (intercept) + (slope)X2i
Ŷi= (b0 –b2X2i²) + (b1 +2b2X2i )X2i
Ŷi= (37.47–.008X2i²) + (-.753+2(.008)X2i )X2i
When X2=0Ŷi=37.47+(-.753 )X2i
When X2=20Ŷi=34.22+(-.433 )X2i
54
Nonlinear Relationship
The relationship between Time and Miles at any particularvalue of Miles is the line that is tangent to the curve at that point.
55
Nonlinear Relationship
More importantly: the above line is the regression line we arefitting to the data with the squared term in the Model.
56
Interpreting β0
Model: Ŷi= β0 +β1X2i+β2 X2i²
β0 is the predicted value of Y when X2 = 0. In other words, it is the predicted time for a runner who runs zero hours per week. If we use X mean deviation scores then this would be the predicted time for a runner who runs an average number of hours per week.
57
Interpreting β1
Model: Ŷi= β0 +β1X2i+β2 X2i²
β1 is the slope of the relationship between Y and X2 when X2 = 0. The slope will be different at other values of X2
58
Interpreting β2
Model: Ŷi= β0 +β1X2i+β2 X2i²
β2 times 2 is how much the slope of the relationship between Y and X2 changes when X2 increases by 1.
Why times 2?Ŷi=(intercept) + (slope)X2i
Ŷi=(β0 –β2X2i²) + (β1 +2β2X2i )X2i
When X2 changes by 1, the slope is effected by 2 times β2. Another way of saying it is that β2 is half how much the slope changes.
59
Interpreting β2 (cont.)
This interpretation of the coefficient for a quadratic (or higher) term only applies if all of its component terms are included in the model.
Ŷi= β0 +β1X2i+β2 X2i²
The interpretation of β2 depends upon β1 being there.
Ŷi= β0 +β1X2i+β2 X2i² +β3 X2i³
The interpretation of β3 depends upon β1 and β2
being there.
60
Testing Significance of the Quadratic (i.e. X²) Term
Test significance as you always do using the model comparison approach.
To test the overall model that includes the quadratic term:
Model C: Ŷi=β0
Model A: Ŷi=β0 +β1X2i +β2(X2i²)
H0: β1 = β2 =0HA: at least one of those betas is not zero.
61
Testing Significance of the Quadratic (i.e. X²) Term
To test whether adding the quadratic term is worthwhile compared to a linear model:
Model C: Ŷi= β0 +β1X2i
Model A: Ŷi=β0 +β1X2i +β2(X2i²)
The test of the partial regression coefficient does this for you.
62
What About the Linear Term (i.e. X)?
Model: Ŷi=β0 +β1X2i +β2(X2i²)
The t tests for the regression coefficients will tell you whether each β is significantly different than 0. What if, in the example above, β2 is significant but β1 is not? Should you drop β1X2i from your model and keep β2(X2i²)? No, the components of X2² (in this case just X2) give the analysis of X2² its meaning. If the model included X³ we would need to include X² and X in the model for the analysis of X³ to have meaning, and so on.
63
Why?Our goal is to move forward a step at a time in the
complexity of the model. We start with what can be explained linearly, then see how much can be explained above and beyond that by including a quadratic term (i.e. the partial correlation of adding the quadratic term last to a model that contains the linear term). We lose that meaning of the quadratic partial correlation if the linear term is dropped from the model.
Also note that the correlation between two powers of a variable (e.g. X and X²) tends to be very high, meaning that they are quite redundant, and it is not surprising that the linear term might be non-significant when the quadratic term is in the model.
64
Mean Deviation ScoresIf mean deviation scores are used (i.e. X’) then:
Ŷi=β0 +β1X’2i +β2(X’2i²)
1. The coefficient for X (i.e. β1) is the slope of the simple relationship between X and Y when X equals its mean.
2. The coefficient for the quadratic term (i.e. β2) is not affected (as long as all of its components are included in the model).
65
General Approach forArriving at ‘Simple’ Relationships
Being able to turn a complicated model into the simple relationship between Y and the various predictors can be a big aid in understanding how the model works.
In other words...Ŷi= β0 +β1X1i... βpXpi into:
Ŷi=(intercept)+(slope)X1i
Ŷi=(intercept)+(slope)X2i
etc.
66
General Approach
We need to find what is called the ‘partial derivative’ of the model for the particular variable whose simple relationship with Y we would like to examine. We will symbolize the partial derivate as: Modelpd
67
General ApproachThen to create the simple relationship of
Ŷi=(intercept)+(slope)X, where:1. Intercept = Model – (Modelpd)(X)
2. Slope = Modelpd
I would say stop there, but if you must know...combining the simple formula and the two pieces given above...
Ŷi=(Model – (Modelpd)(X))+(Modelpd)X, which while correct just looks confusing and there is no reason to go there.
68
Example
Model we will be working with:Ŷi=β0 +β1X1i +β2X2i +β3(X1i²)
We want to know the simple relationship between X1 and Y.
To make the notation simple, we will call the predictor variable of interest X, and the other predictor variables other letters (in this case we will use Z to stand for X2).
Ŷi=β0 +β1X +β2Z +β3(X²)
69
Rules for Arriving at the Partial Derivative
1. To find the partial derivative of items that are summed together, find the partial derivative of each item and add those together.
2. The partial derivative of aXm is amXm-1. Note that:
a. X1 = X, so the partial derivative of the term 3X² would be (3)(2)(X1) = 6X
b. X0 = 1, so the partial derivative of the term 2X would be (2)(1)(X0) = (2)(1)(1) = 2
3. The partial derivative of any term that doesn’t contain X is 0.
70
Solution for our ExampleModel: β0 + β1X +β2Z + β3X²
Modelpd:0 +(1)(β1)(X0) + 0 + (2)(β3)(X1)
Modelpd: β1 + 2β3X
intercept=Model – (Modelpd)X= β0 + β1X +β2Z + β3X² – (β1 + 2β3X)X
= β0 + β1X +β2Z + β3X² - (β1X + 2β3X²)
= β0 + β1X +β2Z + β3X² - β1X - 2β3X²
= β0 +β2Z - β3X²
slope= Modelpd=β1 + 2β3X
Ŷi=(intercept) + (slope)X
Ŷi=(β0 +β2Z - β3X²) + (β1 + 2β3X)X
71
Interpretation
Ŷi=(β0 +β2Z - β3X²) + (β1 + 2β3X)X
So what does this tell us? Well it tells us that for any particular value of X that the relationship between X and Y (i.e. the slope) is affected by the value of X. And that the intercept (which moves the regression line up or down on the Y axis) is influenced by both Z and X. This may not seem all that important, but in some complex models it might lead to a better understanding of the relationship between X and Y (to see what role the other variables, and X itself, play in that relationship).
72
Interpretation
Ŷi=(β0 +β2Z - β3X²) + (β1 + 2β3X)X
We can also plug in specific values for X and Z, and SPSS’s estimates of the β’s, to see the relationship between Y and X at that point. For example, if SPSS computes b0=3, b1=2.5, b2=6.1, b3=7, and we want to know the relationship between Y and X when X=2 and Z=5, then we have:
Ŷi=(3 +6.1(5) – 7(4)) + (2.5 + 2(7)(2))X
Ŷi=5.5+30.5X
Recommended