Upload
leonard-klein
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Applications. The General Linear Model. Transformations. Transformations to Linearity. Many non-linear curves can be put into a linear form by appropriate transformations of the either the dependent variable Y or some (or all) of the independent variables X 1 , X 2 , ... , X p. - PowerPoint PPT Presentation
Citation preview
Applications
The General Linear Model
Transformations
Transformations to Linearity • Many non-linear curves can be put into a linear
form by appropriate transformations of the either– the dependent variable Y or – some (or all) of the independent variables X1, X2, ... ,
Xp .
• This leads to the wide utility of the Linear model. • We have seen that through the use of dummy
variables, categorical independent variables can be incorporated into a Linear Model.
• We will now see that through the technique of variable transformation that many examples of non-linear behaviour can also be converted to linear behaviour.
Intrinsically Linear (Linearizable) Curves 1 Hyperbolas
y = x/(ax-b)
Linear form: 1/y = a -b (1/x) or Y = 0 + 1 X
Transformations: Y = 1/y, X=1/x, 0 = a, 1 = -b
b/a
1/a
positive curvature b>0
y=x/(ax-b)
y=x/(ax-b)
negative curvature b< 0
1/a
b/a
2. Exponential
y = ex = x
Linear form: ln y = ln + x = ln + ln x or Y = 0 + 1 X
Transformations: Y = ln y, X = x, 0 = ln, 1 = = ln
2100
5
Exponential (B > 1)
x
y aB
a
2100
1
2
Exponential (B < 1)
x
y
a
aB
3. Power Functions
y = a xb
Linear from: ln y = lna + blnx or Y = 0 + 1 X
Power functionsb>0
b > 1
b = 1
0 < b < 1
Power functionsb < 0
b < -1b = -1
-1 < b < 0
Logarithmic Functionsy = a + b lnx
Linear from: y = a + b lnx or Y = 0 + 1 X
Transformations: Y = y, X = ln x, 0 = a, 1 = b
b > 0b < 0
Other special functionsy = a e b/x
Linear from: ln y = lna + b 1/x or Y = 0 + 1 X
Transformations: Y = ln y, X = 1/x, 0 = lna, 1 = b
b > 0 b < 0
The Box-Cox Family of Transformations
0ln(x)
01
)(
x
xdtransformex
The Transformation Staircase
1 2 3 4
-4
-3
-2
-1
1
2
3
4
Graph of ln(x)
0
1
2
3
4
5
6
0 20 40 60 80 100 120 140 160 180
ln( )newx x
x
0
1
2
3
4
5
6
0 20 40 60 80 100 120 140 160 180
The effect of the transformation ln( )newx x
x
• The ln-transformation is a member of the Box-Cox family of transformations with = 0
• If you decrease the value of the effect of the transformation will be greater.
• If you increase the value of the effect of the transformation will be less.
The effect of the ln transformation• It spreads out values that are close to zero• Compacts values that are large
ln(x)newx
0
5
10
15
20
25
30
35
0
10
20
30
40
50
60
x
The Bulging Rule
x up
x up
y upy up
y downy down
x down
x down
Non-Linear Models
Nonlinearizable models
Mechanistic Growth Model
Non-Linear Growth models • many models cannot be transformed into a linear model
The Mechanistic Growth Model
Equation: kxeY 1
or (ignoring ) “rate of increase in Y” = Ykdx
dY
The Logistic Growth Model
or (ignoring ) “rate of increase in Y” = YkY
dx
dY
Equation:
kxeY
1
10864200.0
0.5
1.0
Logistic Growth Model
x
y
k=1/4
k=1/2k=1k=2
k=4
The Gompertz Growth Model:
or (ignoring ) “rate of increase in Y” =
YkY
dx
dY ln
Equation: kxeeY
10864200.0
0.2
0.4
0.6
0.8
1.0
Gompertz Growth Model
x
y
k = 1
Polynomial Regression models
Polynomial Models
y = 0 + 1x + 2x2 + 3x
3
Linear form Y = 0 + 1 X1 + 2 X2 + 3 X3
Variables Y = y, X1 = x , X2 = x2, X3 = x3
0 0.5 1 1.5 2 2.5 3
0.5
1
1.5
2
2.5
3
Suppose that we have two variables
1. Y – the dependent variable (response variable)
2. X – the independent variable (explanatory variable, factor)
Assume that we have collected data on two variables X and Y. Let
(x1, y1) (x2, y2) (x3, y3) … (xn, yn)
denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)
1. independent random variables.
2. Normally distributed.
3. Have the common variance, .
4. The mean of yi is:
The assumption will be made that y1, y2, y3
…, yn are
kikiiii xxxx 3
32
210
Each yi is assumed to be randomly generated
from a normal distribution with
mean and standard deviation .
kikiiii xxxx 3
32
210
0
40
80
120
160
0 20 40 60 80 100
The Model
nixxxy ikikiii ,,2 ,1 2
210
The matrix formulation
1
1
1
2
1
3
2
1
0
32
232
222
131
211
2
1
n
k
knnnn
k
k
n xxxx
xxxx
xxxx
y
y
y
y
εβX
The Normal Equations
ˆ
ˆ
ˆ
ˆ
1
1
2
1
1
2
1
0
1
2
1
2
1
1
1
1
2
1
4
1
3
1
2
1
1
1
3
1
2
1
11
2
1
n
ii
ki
n
iii
n
iii
n
ii
kn
i
ki
n
i
ki
n
i
ki
n
i
ki
n
i
ki
n
ii
n
ii
n
ii
n
i
ki
n
ii
n
ii
n
ii
n
i
ki
n
ii
n
ii
yx
yx
yx
y
xxxx
xxxx
xxxx
xxxn
ˆ yXβXX
Example In the following example two quantities are being measured
X = amount of an additive to a chemical process
Y = the yield of the process
X Y X Y X Y
4 35.0 44 93.9 84 65.48 70.1 48 88.1 88 98.012 61.4 52 88.4 92 104.316 106.9 56 76.6 96 128.020 93.2 60 76.0 100 150.824 100.1 64 68.128 90.8 68 75.032 99.4 72 81.236 103.2 76 68.540 86.2 80 78.1
Graph X vs Y
0
40
80
120
160
0 20 40 60 80 100
The Model – Cubic polynomial (degree 3)
nixxxy iiiii ,,2 ,1 33
2210
Comment:
A cubic polynomial in x can be fitted to y by defining the variables
X1 = x, X2 = x2, and X3 = x3
Then fitting the linear model 3322110 XXXy
Response Surface Models
Extending polynomial regression models to k independent variables
Response Surface models (2 independent vars.)
Dependent variable Y and two independent variables x1 and x2. (These ideas are easily extended to more the two independent variables)
The Model (A cubic response surface model)
21421322110 xxxxxY
319
22182
217
316
225 xxxxxxx
Compare with a linear model:
22110 xxY
01
23
45
0
1
23
4
0
20
40
01
23
45
0
1
23
4
The response surface model
can be put into the form of a linear model :
Y = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 + 5 X5 + 6 X6 + 7 X7 + 8 X8 + 9 X9+
by defining
21421322110 xxxxxY
319
22182
217
316
225 xxxxxxx
, , , , , 225214
2132211 xXxxXxXxXxX
319
22182
217
316 and , , xXxxXxxXxX
More Generally, consider the random variable Y with
1. E[Y] = g(U1 ,U2 , ... , Uk)
= 11(U1 ,U2 , ... , Uk) + 22(U1 ,U2 , ... , Uk) + ... + pp(U1 ,U2 , ... , Uk)
=
and
2. var(Y) = 2
• where 1, 2 , ... ,p are unknown parameters
• and 1 ,2 , ... , p are known functions of the nonrandom variables U1 ,U2 , ... , Uk.
• Assume further that Y is normally distributed.
k
p
iii UUU ,...,, 21
1
Now suppose that n independent observations of Y,
(y1, y2, ..., yn) are made
corresponding to n sets of values of (U1 ,U2 , ... , Uk) :
(u11 ,u12 , ... , u1k),
(u21 ,u22 , ... , u2k),...
(un1 ,un2 , ... , unk).
Let xij = j(ui1 ,ui2 , ... , uik) j =1, 2, ..., p; i =1, 2, ..., n. Then
or
p
jijj
p
jkiiijji xuuuyE
1121 ),...,,(
),...,,( where 21 kiiijij uuuxE XβXy
Polynomial Regression Model: One variable U.
kju
ju
ju
j
uu
k
jj
2
1
01
)( 2
Quadratic Response Surface Model: Two variables U1, U2.
22
21
21
2
1
21 Quadratic
Linear
constant1
),(
u
uu
u
u
u
uuj
Trigonometric Polynomial Models
Trigonometric Polynomial Models
y = 0 + 1cos(2f1x) + 1sin(2f1x) + … +
kcos(2fkx) + ksin(2fkx)
Linear form Y = 0 + 1 C1 + 1 S1 + … + k Ck + k Sk
Variables Y = y, C1 = cos(2f1x) , S2 = sin(2f1x) , …
Ck = cos(2fkx) , Sk = sin(2fkx)
-20
-10
0
10
20
30
0 1
General set of models XpXpXpY kk1100
The Normal equations: given data
nn yxyxyx ,,,,,, 2211
n
iiik
n
iii
n
iii
n
iii
kn
iik
n
iiki
n
iiki
n
iiki
n
iiki
n
ii
n
iii
n
iii
n
iiki
n
iii
n
ii
n
iii
n
iiki
n
iii
n
iii
n
ii
yxp
yxp
yxp
yxp
xpxpxpxpxpxpxp
xpxpxpxpxpxpxp
xpxpxpxpxpxpxp
xpxpxpxpxpxpxp
1
12
11
10
2
1
0
1
2
12
11
10
12
1
22
121
120
11
121
1
21
110
10
120
110
1
20
ˆ
ˆ
ˆ
ˆ
Two important Special Cases
kk XXpXXpXXpXp ,,,1 2
210
Polynomial Models
,2cos,2sin,1 12110 XfXpXfXpXp
Trig-polynomial Models
Orthogonal Polynomial Models
Definition
Consider the values x0, x1, … , xn and the polynomials
are orthogonal relative to x0, x1, … , xn if:
If in addition , they are called orthonormal
kkkkk xxxp
xxxp
xxp
xp
210
222
21
202
10
101
000
. allfor ,00
lmxpxpn
jjljm
mxpn
jjm allfor ,1
0
2
Consider the model XpXpXpY kk1100
This is equivalent to a polynomial model.
Rather than the basis for this model being
The basis is
,polynomials of degree 0, 1, 2, 3, etc
etc ,,,.1 32 XXX
etc ,,, 3210 XpXpXpXp
The Normal Equations given the data nn yxyxyx ,,,,,, 2211
n
iiik
n
iii
n
iii
n
iii
kn
iik
n
iiki
n
iiki
n
iiki
n
iiki
n
ii
n
iii
n
iii
n
iiki
n
iii
n
ii
n
iii
n
iiki
n
iii
n
iii
n
ii
yxp
yxp
yxp
yxp
xpxpxpxpxpxpxp
xpxpxpxpxpxpxp
xpxpxpxpxpxpxp
xpxpxpxpxpxpxp
1
12
11
10
2
1
0
1
2
12
11
10
12
1
22
121
120
11
121
1
21
110
10
120
110
1
20
ˆ
ˆ
ˆ
ˆ
n
iiik
n
iii
n
iii
n
iii
k
yxp
yxp
yxp
yxp
1
12
11
10
2
1
0
ˆ
ˆ
ˆ
ˆ
1000
0100
0010
0001
n
iiik
n
iii
n
iii
n
iii
k
yxp
yxp
yxp
yxp
1
12
11
10
2
1
0
ˆ
ˆ
ˆ
ˆ
solution with
Derivation of Orthogonal Polynomials
With equally spaced data points
Suppose x0 = a, x1 = a + b, x2 = a + 2b, … , xn = a + nb
jj xn
xp allfor 1
Thus 0
nnxp
xpn
jj
j
11
00
200
0
20
000
ofDerivation
01
0
11
10
010
n
j
n
jjj bja
nxpxp
0111
or 0
11
10
n
j
jbannn
2 and ,0
2 and 1
11
01
11
0
nba
nba
bjan
baxxp jj
1
11
11
11
01 2 Thus
njbn
jb
2
221
11
1
njK 2
xxp 11
101 ofDerivation
2 find to 1
1
bK 1 use we,
0
21
n
jjxp
n
j
n
j
njKnjK1
22
1
2 122 i.e.
n
j
nj
K
1
22
1or
n
j
j
nj
njxp
0
21
2
2 and
222
21
202 find weNow xxxp
01
1.0
222
21
20
020
n
j
n
jjj bjabja
nxpxp
02
2.
0
222
21
20
2
1
021
0
2
n
jnj
n
jjj
bjabjanj
xpxp
n
j
1 3.2
0
222
21
20
0
22
n
j
n
jj bjabjaxp
22
21
20 ,, find toused becan equations threeThese
0 becomes 1.Equation 0
222
21
20
n
j
bjabja
021110
22
0
222
0
21
20
n
j
n
j
n
j
jbjabanjbann
02 22
0
21
12
01
1221
01
120
n
jn
n
jn
n
jn jbjabajba
06
12
22
2222
12
0
nn
babnan
ba
6
)12(1 and
2
1 since
0
2
0
nnnj
nnj
n
j
n
j
01266366Finally 22
2221
20 nnbabnabna
:becomes 2.Equation
020
222
21
20
n
j
bjabjanj
21
0
20
0
22or n
j
n
j
bjanjnj
02 22
0
2
n
j
bjanj
011122 Now00
nnnnnnjnjn
j
n
j
n
j
n
j
najbnabjbjanj0
2
0
222
nanjbnajbn
i
n
j
12210
2
nannn
bnannn
b 12
12
6
1212
2
1
6
1212
nnbn
nnnb
231226
1bnnbn
n
26
12
6
1 2
nbnn
bnbnn
and
n
j
n
j
jnbabjb0
22
0
32 42
n
j
n
j
jbabjanjbjanj0
222
0
2 222
2
0
2 122 nanjabnan
j
2
0
3
2
1 Now
nnj
n
j
n
j
bjanj0
212 Thus
6
1214
2
12 2
22 nnn
nbabnn
b
22 12
122 nan
nnabna
n
j
bjanj0
212 Thus
6
1214
2
12 2
22 nnn
nbabnn
b
22 12
122 nan
nnabna
222322232 428336
1bnabnbnabnnbnb
n
222 666 naabnna
abnnnbn
226
1
124136
1 222 nnnbabnnbn
22 6223 naabnan
abnabnnbnbn
4226
1 22232
:becomes 2.Equation
21
20 2
6
1.0 nbn
n 0226
1 22
abnnnbn
26
1by Dividing
nbn
n
02 toleads 22
21 abn
are equations twoThe
01266366 1 22
2221
20 nnbabnabna
02 2 22
21 abn
22
21 2 2 from abn
01266236
becomes 12
222
222
0 nnbabnaabn
22
2222
220 1266236or nnbabnaabn
22
222 126623 nnbabnaabn
22
2222220 1266121236 nnbabnaaabnnb
22
2222 12663 nnbaabnnb
22
222222 2663 nbnbaabnnb
22
2222 66 nbaabnnb
22
2612
0 61 Thus abnannb
22
21 2 and abn
22
261 61 abnannb
222
222 xxabn
222
21
202 Thus xxxp
222612
2 21 xaxabnannb
bjapxp j 22 Thus
222612
2 21 bjabjabnannb
2222612
2 1 jbnjbnnb 2
6122
2 1 jnjnnb 22
22
61 where1 bKjnjnnK
1 3.0
22
n
jjxp
usingby , find, nowcan We 222 bK
11 i.e.0
2261
0
22
n
j
n
jj jnjnnKxp
n
j
jnjnn
K
0
2261 1
1 if
n
j
j
jnjnn
jnjnnxp
0
2261
261
2
1
1 Thus
n
j
jnjnn
jnjnn
0
2221
221
331
331
n
j
j
jnjnn
jnjnnxp
0
2221
221
2
331
331
n
j
j
nj
njxp
0
21
2
2
n
xp j
10
n
j
jnjnn
bax
bax
nnnxp
0
2221
2
21
2
331
331
n
j
nj
nb
ax
xp
0
21
2
2
n
xp1
0
333
232
31
303 find tocontinue we xxxxp
0 1.0
30
n
jjj xpxp
1 4.0
23
n
jjxp
33
32
31
30 ,,, find toused becan equationsfour These
0 2.0
31
n
jjj xpxp
0 3.0
32
n
jjj xpxp
etc ,,, find tocontinued is process The 654 xpxpxp
ji xp
n
iiik
n
iii
n
iii
n
iii
k
yxp
yxp
yxp
yxp
1
12
11
10
2
1
0
ˆ
ˆ
ˆ
ˆ
solution with
To do the calculations we need the values of:
These values depend only on 1. n = the number of observations2. i = the degree of the polynomial, and3. j = the index of xj.
Orthogonal Linear Contrasts for Polynomial Regression
k P o l y n o m i a l 1 2 3 4 5 6 7 8 9 1 0 a
i2
3 L i n e a r - 1 0 1 2 Q u a d r a t i c 1 - 2 1 6 4 L i n e a r - 3 - 1 1 3 2 0 Q u a d r a t i c 1 - 1 - 1 1 4 C u b i c - 1 3 - 3 1 2 0 5 L i n e a r - 2 - 1 0 1 2 1 0 Q u a d r a t i c 2 - 1 - 2 - 1 2 1 4 C u b i c - 1 2 0 - 2 1 1 0 Q u a r t i c 1 - 4 6 - 4 1 7 0 6 L i n e a r - 5 - 3 - 1 1 3 5 7 0 Q u a d r a t i c 5 - 1 - 4 - 4 - 1 5 8 4 C u b i c - 5 7 4 - 4 - 7 5 1 8 0 Q u a r t i c 1 - 3 2 2 - 3 1 2 8 7 L i n e a r - 3 - 2 - 1 0 1 2 3 2 8 Q u a d r a t i c 5 0 - 3 - 4 - 3 0 5 8 4 C u b i c - 1 1 1 0 - 1 - 1 1 6 Q u a r t i c 3 - 7 1 6 1 - 7 3 1 5 4
Orthogonal Linear Contrasts for Polynomial Regression
k P o l y n o m i a l 1 2 3 4 5 6 7 8 9 1 0 a
i2
8 L i n e a r - 7 - 5 - 3 - 1 1 3 5 7 1 6 8 Q u a d r a t i c 7 1 - 3 - 5 - 5 - 3 1 7 1 6 8 C u b i c - 7 5 7 3 - 3 - 7 - 5 7 2 6 4 Q u a r t i c 7 - 1 3 - 3 9 9 - 3 - 1 3 7 6 1 6 Q u i n t i c - 7 2 3 - 1 7 - 1 5 1 5 1 7 - 2 3 7 2 1 8 4 9 L i n e a r - 4 - 3 - 2 - 1 0 1 2 3 4 2 0 Q u a d r a t i c 2 8 7 - 8 - 1 7 - 2 0 - 1 7 - 8 7 2 8 2 7 7 2 C u b i c - 1 4 7 1 3 9 0 - 9 - 1 3 - 7 1 4 9 9 0 Q u a r t i c 1 4 - 2 1 - 1 1 9 1 8 9 - 1 1 - 2 1 1 4 2 0 0 2 Q u i n t i c - 4 1 1 - 4 - 9 0 9 4 - 1 1 4 4 6 8 1 0 L i n e a r - 9 - 7 - 5 - 3 - 1 1 3 5 7 9 3 3 0 Q u a d r a t i c 6 2 - 1 - 3 - 4 - 4 - 3 - 1 2 6 1 3 2 C u b i c - 4 2 1 4 3 5 3 1 1 2 - 1 2 - 3 1 - 3 5 - 1 4 4 2 8 5 8 0 Q u a r t i c 1 8 - 2 2 - 1 7 3 1 8 1 8 3 - 1 7 - 2 2 1 8 2 8 6 0 Q u i n t i c - 6 1 4 - 1 - 1 1 - 6 6 1 1 1 - 1 4 6 7 8 0
The Use of Dummy Variables
• In the examples so far the independent variables are continuous numerical variables.
• Suppose that some of the independent variables are categorical.
• Dummy variables are artificially defined variables designed to convert a model including categorical independent variables to the standard multiple regression model.
Example:Comparison of Slopes of k Regression Lines with Common Intercept
Situation:• k treatments or k populations are being compared.• For each of the k treatments we have measured
both – Y (the response variable) and – X (an independent variable)
• Y is assumed to be linearly related to X with – the slope dependent on treatment
(population), while – the intercept is the same for each treatment
The Model:k) , ... 2, 1, (i ient for treatm )(
1 XY i
30201000
20
40
60
80
100
120
Graphical Illustration of the above Model
x
yTreat 1
Treat 2
Treat 3
Treat k
.....
Common Intercept
Different Slopes
• This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments.
• Dummy variables are variables that are artificially defined
In this case we define a new variable for each category of the categorical variable.
That is we will define Xi for each category of treatments as follows:
otherwise0
i treatmentreceivessubject theifXX i
Then the model can be written as follows:
The Complete Model:
where
otherwise0
i treatmentreceivessubject theifXX i
kk XXXY )(
12)2(
11)1(
10
In this case
Dependent Variable: Y
Independent Variables: X1, X2, ... , Xk
In the above situation we would likely be interested in testing the equality of the slopes. Namely the Null Hypothesis
(q = k – 1)
)(1
)2(1
)1(10 : kH
The Reduced Model:
Dependent Variable: Y
Independent Variable:
X = X1+ X2+... + Xk
XY 10
Example:
In the following example we are measuring – Yield Y
as it depends on – the amount (X) of a pesticide.
Again we will assume that the dependence of Y on X will be linear.
(I should point out that the concepts that are used in this discussion can easily be adapted to the non-linear situation.)
• Suppose that the experiment is going to be repeated for three brands of pesticides:
• A, B and C. • The quantity, X, of pesticide in this
experiment was set at 3 different levels: – 2 units/hectare, – 4 units/hectare and – 8 units per hectare.
• Four test plots were randomly assigned to each of the nine combinations of test plot and level of pesticide.
• Note that we would expect a common intercept for each brand of pesticide since when the amount of pesticide, X, is zero the four brands of pesticides would be equivalent.
The data for this experiment is given in the following table:
2 4 8
A 29.63 28.16 28.45
31.87 33.48 37.21
28.02 28.13 35.06
35.24 28.25 33.99
B 32.95 29.55 44.38
24.74 34.97 38.78
23.38 36.35 34.92
32.08 38.38 27.45
C 28.68 33.79 46.26
28.70 43.95 50.77
22.67 36.89 50.21
30.02 33.56 44.14
0
20
40
60
0 1 2 3 4 5 6 7 8
A
B
C
Pesticide X (Amount) X1 X2 X3 Y
A 2 2 0 0 29.63A 2 2 0 0 31.87A 2 2 0 0 28.02A 2 2 0 0 35.24B 2 0 2 0 32.95B 2 0 2 0 24.74B 2 0 2 0 23.38B 2 0 2 0 32.08C 2 0 0 2 28.68C 2 0 0 2 28.70C 2 0 0 2 22.67C 2 0 0 2 30.02A 4 4 0 0 28.16A 4 4 0 0 33.48A 4 4 0 0 28.13A 4 4 0 0 28.25B 4 0 4 0 29.55B 4 0 4 0 34.97B 4 0 4 0 36.35B 4 0 4 0 38.38C 4 0 0 4 33.79C 4 0 0 4 43.95C 4 0 0 4 36.89C 4 0 0 4 33.56A 8 8 0 0 28.45A 8 8 0 0 37.21A 8 8 0 0 35.06A 8 8 0 0 33.99B 8 0 8 0 44.38B 8 0 8 0 38.78B 8 0 8 0 34.92B 8 0 8 0 27.45C 8 0 0 8 46.26C 8 0 0 8 50.77C 8 0 0 8 50.21C 8 0 0 8 44.14
The data as it would appear in a data file. The variables X1, X2 and X3 are the
“dummy” variables
Fitting the complete model :ANOVA
df SS MS F Significance F
Regression 3 1095.815813 365.2719378 18.33114788 4.19538E-07
Residual 32 637.6415754 19.92629923
Total 35 1733.457389
CoefficientsIntercept 26.24166667
X1 0.981388889
X2 1.422638889
X3 2.602400794
Fitting the reduced model :ANOVA
df SS MS F Significance F
Regression 1 623.8232508
623.8232508
19.11439978
0.000110172
Residual 34 1109.634138
32.63629818
Total 35 1733.457389
Coefficients
Intercept 26.24166667
X 1.668809524
The Anova Table for testing the equality of slopes
df SS MS F Significance F
common slope zero
1 623.8232508 623.8232508 31.3065283 3.51448E-06
Slope comparison 2 471.9925627 235.9962813 11.84345766 0.000141367
Residual 32 637.6415754 19.92629923
Total 35 1733.457389
Example:Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance)
Situation:• k treatments or k populations are being compared.• For each of the k treatments we have measured
both Y (then response variable) and X (an independent variable)
• Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment.
• Y is called the response variable, while X is called the covariate.
The Model:k) , ... 2, 1, (i ient for treatm 1
)(0 XY i
30201000
100
200
Graphical Illustration of the One-wayAnalysis of Covariance Model
x
y
Treat 1Treat 2
Treat 3
Treat k
Common Slopes
Equivalent Forms of the Model:
ient for treatm 1i XXY
ient for treatmmean adjusted i
1)
ient for treatm 1i XXY
responsemean adjusted overall i 2)
ient for treatmeffect adjusted i
ii
• This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical independent variable Treatments.
In this case we define a new variable for each category of the categorical variable.
That is we will define Xi for categories I
i = 1, 2, …, (k – 1) of treatments as follows:
otherwise0
i treatmentreceivessubject theif1iX
Then the model can be written as follows:
The Complete Model:
where
otherwise0
i treatmentreceivessubject theif1iX
XXXXY kk 11122110
In this case
Dependent Variable: Y
Independent Variables:
X1, X2, ... , Xk-1, X
In the above situation we would likely be interested in testing the equality of the intercepts. Namely the Null Hypothesis
(q = k – 1)
0: 1210 kH
The Reduced Model:
Dependent Variable: Y
Independent Variable: X
XY 10
Example:
In the following example we are interested in comparing the effects of five workbooks (A, B, C, D, E) on the performance of students in Mathematics. For each workbook, 15 students are selected (Total of n = 15×5 = 75). Each student is given a pretest (pretest score ≡ X) and given a final test (final score ≡ Y). The data is given on the following slide
The data
Pre Post Pre Post Pre Post Pre Post Pre Post
43.0 46.4 43.6 52.5 57.5 61.9 59.9 56.1 43.2 46.055.3 43.9 45.2 61.8 49.3 57.5 50.5 49.6 60.7 59.759.4 59.7 54.2 69.1 48.0 52.5 45.0 46.1 42.7 45.451.7 49.6 45.5 61.7 31.3 42.9 55.0 53.2 46.6 44.353.0 49.3 43.4 53.3 65.3 74.5 52.6 50.8 42.6 46.548.7 47.1 50.1 57.4 47.1 48.9 62.8 60.1 25.6 38.445.4 47.4 36.2 48.7 34.8 47.2 41.4 49.5 52.5 57.742.1 33.3 55.1 61.9 53.9 59.8 62.1 58.3 51.2 47.160.0 53.2 48.9 55.0 42.7 49.6 56.4 58.1 48.8 50.432.4 34.1 52.9 63.3 47.6 55.6 54.2 56.8 44.1 52.774.4 66.7 51.7 64.7 56.1 62.4 51.6 46.1 73.8 73.643.2 43.2 55.3 66.4 39.7 52.1 63.3 56.0 52.6 50.844.5 42.5 45.2 59.4 32.3 49.7 37.3 48.8 67.8 66.847.1 51.3 37.6 56.9 59.5 67.1 39.2 45.1 42.9 47.257.0 48.9 41.7 51.3 46.2 55.2 62.1 58.0 51.7 57.0
Workbook EWorkbook A Workbook B Workbook C Workbook D
The Model:
( )0 1 for workbook ( , , , , )iY X i i A B C D E
Graphical display of data
0
10
20
30
40
50
60
70
80
0 20 40 60 80Pretest Score
Fin
al
Sc
ore
Workbook A
Workbook B
Workbook C
Workbook D
Workbook E
Some comments
1. The linear relationship between Y (Final Score) and X (Pretest Score), models the differing aptitudes for mathematics.
2. The shifting up and down of this linear relationship measures the effect of workbooks on the final score Y.
The Model:( )0 1 for workbook ( , , , , )iY X i i A B C D E
30 20 10 0 0
100
200
Graphical Illustration of the One-way Analysis of Covariance Model
x
y D C
B
A
Common Slopes
The data as it would appear in a data file.
Pre Final Workbook43 46.4 A
55.3 43.9 A59.4 59.7 A51.7 49.6 A
53 49.3 A48.7 47.1 A45.4 47.4 A42.1 33.3 A
60 53.2 A32.4 34.1 A74.4 66.7 A43.2 43.2 A44.5 42.5 A47.1 51.3 A
57 48.9 A43.6 52.5 B45.2 61.8 B54.2 69.1 B45.5 61.7 B43.4 53.3 B
54.2 56.8 D51.6 46.1 D63.3 56 D37.3 48.8 D39.2 45.1 D62.1 58 D43.2 46 E60.7 59.7 E42.7 45.4 E46.6 44.3 E42.6 46.5 E25.6 38.4 E52.5 57.7 E51.2 47.1 E48.8 50.4 E44.1 52.7 E73.8 73.6 E52.6 50.8 E67.8 66.8 E42.9 47.2 E51.7 57 E
The data as it would appear in a data file with Dummy variables, (X1 , X2, X3, X4 )added
Pre Final Workbook X1 X2 X3 X443 46.4 A 1 0 0 0
55.3 43.9 A 1 0 0 059.4 59.7 A 1 0 0 051.7 49.6 A 1 0 0 0
53 49.3 A 1 0 0 048.7 47.1 A 1 0 0 045.4 47.4 A 1 0 0 042.1 33.3 A 1 0 0 0
60 53.2 A 1 0 0 032.4 34.1 A 1 0 0 074.4 66.7 A 1 0 0 043.2 43.2 A 1 0 0 044.5 42.5 A 1 0 0 047.1 51.3 A 1 0 0 0
57 48.9 A 1 0 0 043.6 52.5 B 0 1 0 045.2 61.8 B 0 1 0 0
37.3 48.8 D 0 0 0 139.2 45.1 D 0 0 0 162.1 58 D 0 0 0 143.2 46 E 0 0 0 060.7 59.7 E 0 0 0 042.7 45.4 E 0 0 0 046.6 44.3 E 0 0 0 042.6 46.5 E 0 0 0 025.6 38.4 E 0 0 0 052.5 57.7 E 0 0 0 051.2 47.1 E 0 0 0 048.8 50.4 E 0 0 0 044.1 52.7 E 0 0 0 073.8 73.6 E 0 0 0 052.6 50.8 E 0 0 0 067.8 66.8 E 0 0 0 042.9 47.2 E 0 0 0 051.7 57 E 0 0 0 0
Here is the data file in SPSS with the Dummy variables, (X1 , X2, X3, X4 )added. The can be added within SPSS
Fitting the complete model
The dependent variable is the final score, Y.The independent variables are the Pre-score X and the four dummy variables X1, X2, X3, X4.
The OutputVariables Entered/Removedb
X4, PRE,X3, X1, X2
a . Enter
Model1
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: FINALb.
Model Summary
.908a .825 .812 3.594Model1
R R SquareAdjustedR Square
Std. Errorof the
Estimate
Predictors: (Constant), X4, PRE, X3, X1, X2a.
The Output - continuedANOVAb
4191.378 5 838.276 64.895 .000a
891.297 69 12.917
5082.675 74
Regression
Residual
Total
Model1
Sum ofSquares df
MeanSquare F Sig.
Predictors: (Constant), X4, PRE, X3, X1, X2a.
Dependent Variable: FINALb.
Coefficientsa
16.954 2.441 6.944 .000
.709 .045 .809 15.626 .000
-4.958 1.313 -.241 -3.777 .000
8.553 1.318 .416 6.489 .000
5.231 1.317 .254 3.972 .000
-1.602 1.320 -.078 -1.214 .229
(Constant)
PRE
X1
X2
X3
X4
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: FINALa.
The interpretation of the coefficients
Coefficientsa
16.954 2.441 6.944 .000
.709 .045 .809 15.626 .000
-4.958 1.313 -.241 -3.777 .000
8.553 1.318 .416 6.489 .000
5.231 1.317 .254 3.972 .000
-1.602 1.320 -.078 -1.214 .229
(Constant)
PRE
X1
X2
X3
X4
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: FINALa.
The common slope
The interpretation of the coefficients
Coefficientsa
16.954 2.441 6.944 .000
.709 .045 .809 15.626 .000
-4.958 1.313 -.241 -3.777 .000
8.553 1.318 .416 6.489 .000
5.231 1.317 .254 3.972 .000
-1.602 1.320 -.078 -1.214 .229
(Constant)
PRE
X1
X2
X3
X4
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: FINALa.
The intercept for workbook E
The interpretation of the coefficients
Coefficientsa
16.954 2.441 6.944 .000
.709 .045 .809 15.626 .000
-4.958 1.313 -.241 -3.777 .000
8.553 1.318 .416 6.489 .000
5.231 1.317 .254 3.972 .000
-1.602 1.320 -.078 -1.214 .229
(Constant)
PRE
X1
X2
X3
X4
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: FINALa.
The changes in the intercept when we change from workbook E to other workbooks.
1. When the workbook is E then X1 = 0,…, X4 = 0 and
0 1 1 2 2 3 3 4 4 1Y X X X X X
The model can be written as follows:
The Complete Model:
0 1Y X 2. When the workbook is A then X1 = 1,…, X4 = 0 and
0 1 1Y X hence 1 is the change in the intercept when we change form workbook E to workbook A.
0 1 2 3 4i.e. : 0H
Testing for the equality of the intercepts
0 1Y X
The reduced model
The dependent variable in only X (the pre-score)
Fitting the reduced model
The dependent variable is the final score, Y.The independent variables is only the Pre-score X.
The Output for the reduced model
Variables Entered/Removedb
PREa . EnterModel1
VariablesEntered
VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: FINALb.
Model Summary
.700a .490 .483 5.956Model1
R R SquareAdjustedR Square
Std. Errorof the
Estimate
Predictors: (Constant), PREa.
Lower R2
The Output - continuedANOVAb
2492.779 1 2492.779 70.263 .000a
2589.896 73 35.478
5082.675 74
Regression
Residual
Total
Model1
Sum ofSquares df
MeanSquare F Sig.
Predictors: (Constant), PREa.
Dependent Variable: FINALb.
Increased R.S.SCoefficientsa
23.105 3.692 6.259 .000
.614 .073 .700 8.382 .000
(Constant)
PRE
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: FINALa.
The F Test
Reduction in R.S.S
MSE for complete modelq
F
ANOVAb
2492.779 1 2492.779 70.263 .000a
2589.896 73 35.478
5082.675 74
Regression
Residual
Total
Model1
Sum ofSquares df
MeanSquare F Sig.
Predictors: (Constant), PREa.
Dependent Variable: FINALb.
The Reduced model
The Complete modelANOVAb
4191.378 5 838.276 64.895 .000a
891.297 69 12.917
5082.675 74
Regression
Residual
Total
Model1
Sum ofSquares df
MeanSquare F Sig.
Predictors: (Constant), X4, PRE, X3, X1, X2a.
Dependent Variable: FINALb.
The F test
reduced ANOVASum of Squares df Mean Square F Sig.
Regression 2492.77885 1 2492.77885 70.2626 4.56272E-13Residual 2589.89635 73 35.47803219Total 5082.6752 74
Complete ANOVASum of Squares df Mean Square F Sig.
Regression 4191.377971 5 838.2755942 64.89532 9.99448E-25Residual 891.297229 69 12.91735115Total 5082.6752 74
Sum of Squares df Mean Square F Sig.
slope 2492.77885 1 2492.77885 192.9791 1.13567E-21equality of int. 1698.599121 4 424.6497803 32.87437 2.46006E-15Residual 891.297229 69 12.91735115Total 5082.6752 74
Test equality of slope
0 1i.e. : 0H
Testing for zero slope
0 1 1 2 2 3 3 4 4Y X X X X
The reduced model
The dependent variables are X1, X2, X3, X4 (the dummies)
The Reduced model
The Complete modelANOVAb
4191.378 5 838.276 64.895 .000a
891.297 69 12.917
5082.675 74
Regression
Residual
Total
Model1
Sum ofSquares df
MeanSquare F Sig.
Predictors: (Constant), X4, PRE, X3, X1, X2a.
Dependent Variable: FINALb.
ANOVAb
1037.475 4 259.369 4.488 .003a
4045.200 70 57.789
5082.675 74
Regression
Residual
Total
Model1
Sum ofSquares df
MeanSquare F Sig.
Predictors: (Constant), X4, X3, X2, X1a.
Dependent Variable: FINALb.
The F testReduced Sum of Squares df Mean Square F Sig.
Regression 1037.4752 4 259.3688 4.488237 0.002757501Residual 4045.2 70 57.78857143Total 5082.6752 74
Complete Sum of Squares df Mean Square F Sig.Regression 4191.377971 5 838.2755942 64.89532 9.99448E-25Residual 891.297229 69 12.91735115Total 5082.6752 74
Zero slope Sum of Squares df Mean Square F Sig.Regression 1037.4752 4 259.3688 20.0791 5.30755E-11zero slope 3153.902771 1 3153.902771 244.1602 2.3422E-24Residual 891.297229 69 12.91735115Total 5082.6752 74
The Analysis of Covariance
• This analysis can also be performed by using a package that can perform Analysis of Covariance (ANACOVA)
• The package sets up the dummy variables automatically
Here is the data file in SPSS . The Dummy variables are no longer needed.
In SPSS to perform ANACOVA you select from the menu –Analysis->General Linear Model->Univariatee
This dialog box will appear
You now select:1. The dependent variable Y (Final Score)2. The Fixed Factor (the categorical
independent variable – workbook)3. The covariate (the continuous independent
variable – pretest score)
Compare this with the previous computed table
Tests of Between-Subjects Effects
Dependent Variable: FINAL
4191.378a 5 838.276 64.895 .000
837.590 1 837.590 64.842 .000
3153.903 1 3153.903 244.160 .000
1698.599 4 424.650 32.874 .000
891.297 69 12.917
219815.6 75
5082.675 74
SourceCorrected Model
Intercept
PRE
WORKBOOK
Error
Total
Corrected Total
Type IIISum ofSquares df
MeanSquare F Sig.
R Squared = .825 (Adjusted R Squared = .812)a.
Sum of Squares df Mean Square F Sig.
slope 2492.77885 1 2492.77885 192.9791 1.13567E-21equality of int. 1698.599121 4 424.6497803 32.87437 2.46006E-15Residual 891.297229 69 12.91735115Total 5082.6752 74
The output: The ANOVA TABLE
This is the sum of squares in the numerator when we attempt to test if the slope is zero (and allow the intercepts to be different)
Tests of Between-Subjects Effects
Dependent Variable: FINAL
4191.378a 5 838.276 64.895 .000
837.590 1 837.590 64.842 .000
3153.903 1 3153.903 244.160 .000
1698.599 4 424.650 32.874 .000
891.297 69 12.917
219815.6 75
5082.675 74
SourceCorrected Model
Intercept
PRE
WORKBOOK
Error
Total
Corrected Total
Type IIISum ofSquares df
MeanSquare F Sig.
R Squared = .825 (Adjusted R Squared = .812)a.
The output: The ANOVA TABLE
The Use of Dummy Variables
Example:Comparison of Slopes of k Regression Lines with Common Intercept
Situation:• k treatments or k populations are being compared.• For each of the k treatments we have measured
both – Y (the response variable) and – X (an independent variable)
• Y is assumed to be linearly related to X with – the slope dependent on treatment
(population), while – the intercept is the same for each treatment
The Model:k) , ... 2, 1, (i ient for treatm )(
1 XY i
30201000
20
40
60
80
100
120
Graphical Illustration of the above Model
x
yTreat 1
Treat 2
Treat 3
Treat k
.....
Common Intercept
Different Slopes
The model can be written as follows:
The Complete Model:
where
otherwise0
i treatmentreceivessubject theifXX i
kk XXXY )(
12)2(
11)1(
10
Example:Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance)
Situation:• k treatments or k populations are being compared.• For each of the k treatments we have measured
both Y (then response variable) and X (an independent variable)
• Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment.
• Y is called the response variable, while X is called the covariate.
The Model:k) , ... 2, 1, (i ient for treatm 1
)(0 XY i
30201000
100
200
Graphical Illustration of the One-wayAnalysis of Covariance Model
x
y
Treat 1Treat 2
Treat 3
Treat k
Common Slopes
The model can be written as follows:
The Complete Model:
where
otherwise0
i treatmentreceivessubject theif1iX
XXXXY kk 11122110
Another application of the use of dummy variables
• The dependent variable, Y, is linearly related to X, but the slope changes at one or several known values of X (nodes).
Y
Xnodes
The model
0 1 1
0 1 2 1 2 1 2
0 1 2 1 2 3 2 3 2 3
X X x
x X x X xY
x x X x X x
Y
Xx1 x2 xk
12
k
0 1 1
0 1 1 2 1 1 2
0 1 1 2 2 1 3 2 2 3
X X x
x X x x X xY
x x x X x x X x
or
Now define
11
1 1
if
if
X X xX
x X x
1
2 1 1 2
2 1 2
0 if
if
X x
X X x x X x
x x x X
2
3 2 2 3
3 2 3
0 if
if
X x
X X x x X x
x x x X
Etc.
Then the model
can be written
0 1 1
0 1 1 2 1 1 2
0 1 1 2 2 1 3 2 2 3
X X x
x X x x X xY
x x x X x x X x
0 1 1 2 2 3 3Y X X X
An ExampleIn this example we are measuring Y at time X.
Y is growing linearly with time.
At time X = 10, an additive is added to the process which may change the rate of growth.
The data
X 0.0 1.0 2.0 3.0 4.0 5.0 6.0Y 3.9 5.9 6.4 6.3 7.5 7.9 8.5X 7.0 8.0 9.0 10.0 11.0 12.0 13.0Y 10.7 10.0 12.4 11.0 11.5 13.9 17.6X 14.0 15.0 16.0 17.0 18.0 19.0 20.0Y 18.2 16.8 21.8 23.1 22.9 26.2 27.7
Graph
0
5
10
15
20
25
30
0 5 10 15 20
Now define the dummy variables
1
if 10
10 if 10
X XX
X
2
0 if 10
10 if 10
XX
X X
The data as it appears in SPSS – x1, x2 are the dummy variables
We now regress y on x1 and x2.
The OutputModel Summary
.990a .980 .978 1.0626Model1
R R SquareAdjustedR Square
Std. Errorof the
Estimate
Predictors: (Constant), X2, X1a.
ANOVAb
1015.909 2 507.954 449.875 .000a
20.324 18 1.129
1036.232 20
Regression
Residual
Total
Model1
Sum ofSquares df
MeanSquare F Sig.
Predictors: (Constant), X2, X1a.
Dependent Variable: Yb.
Coefficientsa
4.714 .577 8.175 .000
.673 .085 .325 7.886 .000
1.579 .085 .761 18.485 .000
(Constant)
X1
X2
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Ya.
Graph
0
5
10
15
20
25
30
0 5 10 15 20
Testing for no change in slope
Here we want to test
H0: 1 = 2 vs HA: 1 ≠ 2
The reduced model is
Y = 0 + 1 (X1+ X2) +
= 0 + 1 X +
Fitting the reduced model
We now regress y on x.
The OutputModel Summary
.971a .942 .939 1.7772Model1
R R SquareAdjustedR Square
Std. Errorof the
Estimate
Predictors: (Constant), Xa.
ANOVAb
976.219 1 976.219 309.070 .000a
60.013 19 3.159
1036.232 20
Regression
Residual
Total
Model1
Sum ofSquares df
MeanSquare F Sig.
Predictors: (Constant), Xa.
Dependent Variable: Yb. Coefficientsa
2.559 .749 3.418 .003
1.126 .064 .971 17.580 .000
(Constant)
X
Model1
B Std. Error
UnstandardizedCoefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: Ya.
Graph – fitting a common slope
0
5
10
15
20
25
30
0 5 10 15 20
The test for the equality of slope
Reduced Model Sum of Squares df Mean Square F Sig.
Regression 976.2194805 1 976.2194805 309.0697 3.27405E-13Residual 60.01290043 19 3.158573707Total 1036.232381 20
Complete Model Sum of Squares df Mean Square F Sig.
Regression 1015.908579 2 507.9542895 449.8753 0Residual 20.32380204 18 1.129100113Total 1036.232381 20
equality of slope Sum of Squares df Mean Square F Sig.
slope 976.2194805 1 976.2194805 864.5996 1.14256E-16equality of slope 39.6890984 1 39.6890984 35.15109 1.30425E-05Residual 20.32380204 18 1.129100113Total 1036.232381 20