Solutions to obligatorisk oppgave 1, STK2100
Vinnie Ko
May 14, 2018
Disclaimer:This document is made solely for my own personal use and can contain many errors.
Oppgave 1
(a)
ci can be seen as a categorical variable that indicates the j-th category where 1 ≤ j ≤ K.xi,j can then be seen as a dummy variable which defined as:
xi,j = 1j(ci) :=
{1 if ci = j
0 if ci 6= j.
This way of “dummy coding” results in the following mapping:
xi,1 xi,2 · · · xi,K−1 xi,KWhen ci = 1 1 0 · · · 0 0When ci = 2 0 1 · · · 0 0
......
.... . .
......
When ci = K − 1 0 0 · · · 1 0When ci = K 0 0 · · · 0 1
The given models
Yi = β0 + β2xi,2 + · · ·+ βKxi,K + εi (1)
and
Yi = α1xi,1 + · · ·+ αKxi,K + εi (2)
correspond to 2 different ways of using this “dummy coding”.
For each model, we can see that:
Under equation (1): Under equation (2):Yi Yi
When ci = 1 β0 + εi α1 + εiWhen ci = 2 β0 + β2 + εi α2 + εi
......
...When ci = K − 1 β0 + βK−1 + εi αK−1 + εiWhen ci = K β0 + βK + εi αK + εi
1
So, if αj = β0 + βj , β1 = 0 for j ∈ {1, · · ·K}, then model (1) and model (2) are the same.
Interpretation:αj = average value of {Yi|ci = j} in the data.βj = difference between average of {Yi|ci = 1} (reference group) and average of {Yi|ci = j}
(b)
The design matrix for model (2): X =
x1,1 · · · x1,j · · · x1,K...
. . ....
. . ....
xi,1 · · · xi,j · · · xi,K...
. . ....
. . ....
xn,1 · · · xn,j · · · xn,K
XTX =
x1,1 · · · xl,1 · · · xn,1...
. . ....
. . ....
x1,j · · · xl,j · · · xn,j...
. . ....
. . ....
x1,K · · · xl,K · · · xn,K
·
x1,1 · · · x1,j · · · x1,K...
. . ....
. . ....
xl,1 · · · xl,j · · · xl,K...
. . ....
. . ....
xn,1 · · · xn,j · · · xn,K
=
n∑l=1
xl,1xl,1 · · ·n∑l=1
xl,1xl,j · · ·n∑l=1
xl,1xl,K
.... . .
.... . .
...n∑l=1
xl,jxl,1 · · ·n∑l=1
xl,jxl,j · · ·n∑l=1
xl,jxl,K
.... . .
.... . .
...n∑l=1
xl,Kxl,1 · · ·n∑l=1
xl,Kxl,j · · ·n∑l=1
xl,Kxl,K
We can see that
(XTX)i,j =
n∑l=1
xl,ixl,j =
n∑l=1
1j(cl) if i = j
0 if i 6= j
.
So, XTX is a diagonal matrix with diagonal elements (XTX)j,j =
n∑l=1
1j(cl).
2
XTy =
x1,1 · · · xl,1 · · · xn,1...
. . ....
. . ....
x1,j · · · xl,j · · · xn,j...
. . ....
. . ....
x1,K · · · xl,K · · · xn,K
·
y1...yl...yn
=
n∑l=1
xl,1yl
...n∑l=1
xl,jyl
...n∑l=1
xl,Kyl
=
n∑l=1
11(cl)yl
...n∑l=1
1j(cl)yl
...n∑l=1
1K(cl)yl
=
∑l:cl=1
yl
...∑l:cl=j
yl
...∑l:cl=K
yl
Now, we derive the least squares estimator of α:
RSS =
n∑i=1
(yi −K∑j=1
xijαj)2
=∥∥y −Xα∥∥2
= (y −Xα)T(y −Xα).
This leads us to:
α = arg minα
(y −Xα)T(y −Xα).
Differentiate:
∂RSS
∂α=∂(yTy − yTXα−αTXTy +αTXTXα)
∂α
= 0−XTy −XTy + (XTX + (XXT)T)α
= −2XTy + 2XTXα
This first derivative should equal to 0. So,
−2XTy + 2XTXα = 0
XTXα = XTy
α = (XTX)−1XTy.
Therefore, the least squares estimate for α is:
α = (XTX)−1XTy =
∑l:cl=1 yl∑nl=1 11(cl)
...∑l:cl=j
yl∑nl=1 1j(cl)
...∑l:cl=K
yl∑nl=1 1K(cl)
.
We can easily see that αj is {y|ci = j}.
XTX is a diagonal matrix. So, XTX is invertible if and only if all diagonal entries are non-zero. This
means that we can obtain α only if ∀j ∈ {1, · · · ,K} :
n∑l=1
1j(cl) 6= 0.
3
In other words, to compute α, we need at least 1 unique observation for every possible categorical valueof ci.
(c)
βj =
{α1 if j = 0
αj − α1 if j ∈ {1, · · · ,K}(3)
α is the least false estimate of α, obtained via the least squares principle. And we know that there isa one-to-one mapping between α and β (3). Therefore, β obtained through this one-to-one mapping isalso a least squares estimate of β.
(d)
The given alternative model:
Yi = γ0 + γ1xi,1 + · · ·+ γKxi,K + εi (4)
This way of “dummy coding” results in the following mapping:
Under equation (1): Under equation (2): Under equation (4):Yi Yi Yi
When ci = 1 β0 + εi α1 + εi γ0 + γ1 + εiWhen ci = 2 β0 + β2 + εi α2 + εi γ0 + γ2 + εi
......
......
When ci = K − 1 β0 + βK−1 + εi αK−1 + εi γ0 + γK−1 + εiWhen ci = K β0 + βK + εi αK + εi γ0 + γK + εi
We examine how γ is related to α and β.
From Kγ0 +
K∑j=1
γj =
K∑j=1
αj and
K∑j=1
γj = 0, we obtain
γj =
1
K
K∑j=1
αj if j = 0
αj −1
K
K∑j=1
αj if j ∈ {1, · · · ,K}.
From Kβ0 +
K∑j=2
βj = Kγ0 +
K∑j=1
γj and
K∑j=1
γj = 0, we obtain
γj =
β0 +
1
K
K∑j=2
βj if j = 0
βk −1
K
K∑j=1
βj if j ∈ {1, · · · ,K}.
4
Interpretation of γ:γj = difference between {Y i|ci = j} and Y for j ∈ {1, · · · ,K}.γ0 = Y .
(e)
> # Read data.
> Fe = read.table("http://www.uio.no/studier/emner/matnat/math/STK2100/data/fe.txt",
header=T, sep=",")
> Fe[,"form"] = as.factor(Fe[,"form"])
>
> # Model 2 (alpha’s)
> fit1 = lm(Fe~form+0,data=Fe)
> summary(fit1)
Call:
lm(formula = Fe ~ form + 0, data = Fe)
Residuals:
Min 1Q Median 3Q Max
-8.340 -1.255 -0.250 1.770 10.360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
form1 26.080 1.251 20.85 <2e-16 ***
form2 24.690 1.251 19.74 <2e-16 ***
form3 29.950 1.251 23.95 <2e-16 ***
form4 33.840 1.251 27.06 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 3.955 on 36 degrees of freedom
Multiple R-squared: 0.9834, Adjusted R-squared: 0.9815
F-statistic: 532.5 on 4 and 36 DF, p-value: < 2.2e-16
> logLik(fit1)
’log Lik.’ -109.6503 (df=5)
This model corresponds to model (2) where α0 = 0.
(f)
> # Model 1 (beta’s)
> fit2 = lm(Fe~form,data=Fe)
> summary(fit2)
Call:
lm(formula = Fe ~ form, data = Fe)
Residuals:
Min 1Q Median 3Q Max
-8.340 -1.255 -0.250 1.770 10.360
5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.080 1.251 20.852 < 2e-16 ***
form2 -1.390 1.769 -0.786 0.4371
form3 3.870 1.769 2.188 0.0352 *
form4 7.760 1.769 4.387 9.6e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 3.955 on 36 degrees of freedom
Multiple R-squared: 0.4748, Adjusted R-squared: 0.431
F-statistic: 10.85 on 3 and 36 DF, p-value: 3.199e-05
> logLik(fit2)
’log Lik.’ -109.6503 (df=5)
This model corresponds to model (1) where β1 = 0.
> # Model 3 (gamma’s)
> options(contrasts=c("contr.sum","contr.sum"))
> options()$contrasts
[1] "contr.sum" "contr.sum"
> fit3 = lm(Fe~form,data=Fe)
> summary(fit3)
Call:
lm(formula = Fe ~ form, data = Fe)
Residuals:
Min 1Q Median 3Q Max
-8.340 -1.255 -0.250 1.770 10.360
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.6400 0.6254 45.798 < 2e-16 ***
form1 -2.5600 1.0831 -2.363 0.023622 *
form2 -3.9500 1.0831 -3.647 0.000833 ***
form3 1.3100 1.0831 1.209 0.234375
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 3.955 on 36 degrees of freedom
Multiple R-squared: 0.4748, Adjusted R-squared: 0.431
F-statistic: 10.85 on 3 and 36 DF, p-value: 3.199e-05
> logLik(fit3)
’log Lik.’ -109.6503 (df=5)
>
> # The ’hidden’ coefficient of model 3
> coefficients(fit3)
(Intercept) form1 form2 form3
28.64 -2.56 -3.95 1.31
> new.data = data.frame(form = 4)
> new.data[,"form"] = as.factor(new.data[,"form"])
6
> new.data
form
1 4
> predict(fit3,new.data) - coefficients(fit3)[1]
1
5.2
This model corresponds to model (4) where
K∑j=1
γj = 0.
Estimates of 3 models we obtained:
Model 1:
Yi = β0 + β1xi,1 + β2xi,2 + β3xi,3 + β4xi,4 + εi
Yi = 26.1 + 0 · xi,1 − 1.39 · xi,2 + 3.87 · xi,3 + 7.76 · xi,4 + εi
Model 2:
Yi = α0 + α1xi,1 + α2xi,2 + α3xi,3 + α4xi,4 + εi
Yi = 0 + 26.08 · xi,124.69 · xi,2 + 29.95 · xi,3 + 33.84 · xi,4 + εi
Model 3:
Yi = γ0 + γ1xi,1 + γ2xi,2 + γ3xi,3 + γ4xi,4 + εi
Yi = 28.64− 2.56 · xi,1 − 3.95 · xi,2 + 1.31 · xi,3 + 5.20 · xi,4 + εi
(g)
This is a test called ‘model utility test’ and the belonging hypotheses are:
- Model (1): H0 : β2 = β3 = β4 = 0 vs. H1 : at least one βj 6= 0.
- Model (2): H0 : α1 = α2 = α3 = α4 = 0 vs. H1 : at least one αj 6= 0.
- Model (4): H0 : γ1 = γ2 = γ3 = 0 vs. H1 : at least one γj 6= 0.
The F-test of all 3 models tells us to reject the null hypothesis. So, there is certain form of differencebetween 4 iron content categories in terms of their effect on the response variable.If we want to check whether there is a difference between 2 specific iron content categories, we can usemodel (1) for hypothesis testing. This is because βj can be interpreted as a mean difference between thereference category and another comparing category.Note that the coefficients of model (1) only shows the mean difference between type 1 and other types.This is because type 1 is the reference category in the model. If we for example wish to compare type 2and type 3 iron contents, we can recreate model (1) with type 2 or type 3 as a reference category.
(h)
From the t-test of coefficients model (1), we can see that there is no significant difference between type1 and type 2 iron contents. So, we can consider to merge them as one type. This will result in a simplermodel with only 3 categories.
7
Oppgave 2
(a)
> # Load nuclear data
> Nuclear = read.table("http://www.uio.no/studier/emner/matnat/math/STK2100/data/
nuclear.dat", header=T)
> n = nrow(Nuclear)
> Nuclear[,"logcost"] = log(Nuclear[,"cost"])
> Nuclear = Nuclear[,-which(names(Nuclear) %in% c("cost"))]
> head(Nuclear)
date t1 t2 cap pr ne ct bw cum.n pt logcost
1 68.58 14 46 687 0 1 0 0 14 0 6.131335
2 67.33 10 73 1065 0 0 1 0 1 0 6.115870
3 67.33 10 85 1065 1 0 1 0 1 0 6.094066
4 68.00 11 67 1065 0 1 1 0 12 0 6.480535
5 68.00 11 78 1065 1 1 1 0 12 0 6.464946
6 67.92 13 51 514 0 1 1 0 3 0 5.844674
>
> # Plot the scatterplot matrix to get a overview of the data.
> pairs(Nuclear[,-which(names(Nuclear) %in% c("pr","ne","ct","bw","pt"))])
>
> # Plot boxplots for categorical variables
> boxplot(logcost ~ pr, data = Nuclear, xlab = "pr", ylab = "logcost")
> boxplot(logcost ~ ne, data = Nuclear, xlab = "ne", ylab = "logcost")
> boxplot(logcost ~ ct, data = Nuclear, xlab = "ct", ylab = "logcost")
> boxplot(logcost ~ bw, data = Nuclear, xlab = "bw", ylab = "logcost")
> boxplot(logcost ~ pt, data = Nuclear, xlab = "pt", ylab = "logcost")
>
> cor(Nuclear)
date t1 t2 cap pr ne
date 1.00000000 0.85785460 -0.40398529 0.019629178 -0.05481837 0.097498341
t1 0.85785460 1.00000000 -0.47429212 -0.093526193 0.05081973 0.087038828
t2 -0.40398529 -0.47429212 1.00000000 0.313031186 0.44316550 -0.155187617
cap 0.01962918 -0.09352619 0.31303119 1.000000000 0.16070299 -0.006582732
pr -0.05481837 0.05081973 0.44316550 0.160702986 1.00000000 -0.077849894
ne 0.09749834 0.08703883 -0.15518762 -0.006582732 -0.07784989 1.000000000
ct -0.04560687 -0.12949678 0.18735426 0.028720001 -0.14585425 0.110207754
bw -0.16004369 -0.37417353 0.35802537 0.112013371 0.02159168 -0.092450033
cum.n 0.54940746 0.39965783 -0.22767928 0.193393590 -0.04666996 0.205620967
pt -0.50697195 -0.39831375 0.17803447 0.007195490 0.19432508 -0.277350098
logcost 0.62910712 0.45430974 -0.03622449 0.443081827 -0.10717916 0.385821416
ct bw cum.n pt logcost
date -0.04560687 -0.16004369 0.54940746 -0.50697195 0.62910712
t1 -0.12949678 -0.37417353 0.39965783 -0.39831375 0.45430974
t2 0.18735426 0.35802537 -0.22767928 0.17803447 -0.03622449
cap 0.02872000 0.11201337 0.19339359 0.00719549 0.44308183
pr -0.14585425 0.02159168 -0.04666996 0.19432508 -0.10717916
ne 0.11020775 -0.09245003 0.20562097 -0.27735010 0.38582142
ct 1.00000000 -0.07132097 0.04181119 -0.23434034 0.25653912
bw -0.07132097 1.00000000 0.19036439 0.38461538 -0.14159736
cum.n 0.04181119 0.19036439 1.00000000 0.06184835 0.27147634
pt -0.23434034 0.38461538 0.06184835 1.00000000 -0.67419415
logcost 0.25653912 -0.14159736 0.27147634 -0.67419415 1.00000000
8
cost
68 69 70 71
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●● ●
●
●
●
● ●●
●
●
●●●
●●
●
●●
●
●
●● ●
●
●●
●
●
●
●
●
●● ●
●
●
●
● ●●
●
●
50 60 70 80
● ● ●
● ●
●
●●
●
●
●● ●
●
●●
●
●
●
●
●
●●●
●
●
●
● ● ●
●
●
● ●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●●●
●
●
●
● ●●
●
●
5 10 15 20
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●●●
●
●
●
● ●●
●
●
200
400
600
800
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
6869
7071
●
●●
●●●●
● ● ●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
date●
●●
●● ●●
● ●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
● ●
● ●●●
●● ●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●●
●●●●
● ●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●●
●●●●
● ●●●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●●●
● ● ●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●
●●
●
●
●
●●●●
●●
●●
●●●
●
●
●
●
●
● ●●
●
●
●
●●●
●
●
●●●
●
●
●●●●
●●
●●
● ●●
●
●
●
●
●
● ●●
●
●
●
● ●●
●
●
●●●
●
t1 ●
● ●● ●
●●
●●
●●●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
● ●●
●
●
●●●●
●●
●●
●●●
●
●
●
●
●
● ●●
●
●
●
●●●
●
●
●●●
●
●
●●●●
●●
●●
●●●
●
●
●
●
●
●●●
●
●
●
● ●●
●
●
●●●
●
1015
20
●
●●●●
●●
●●
●●●
●
●
●
●
●
● ●●
●
●
●
●●●
●
●
●●●
●
5060
7080
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
● ●
●
t2
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
● ●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
● ●
●
●
cap●
●● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
500
700
900
1100
●
●● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
510
1520
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
● ●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
● cum.n●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
200 400 600 800
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●● ●
●
●
●
● ●●
●
●
10 15 20
●●●
●●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●● ●
●
●
●
● ●●
●
●
● ● ●
● ●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●●●
●
●
●
● ● ●
●
●
500 700 900 1100
● ●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
● ●●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●●
●
●
●
● ●●
●
●
5.5 6.0 6.55.
56.
06.
5
logcost
Figure 1: Scatterplot matrix of Nuclear dataset.
9
0 1
5.5
6.0
6.5
pr
logc
ost
0 1
5.5
6.0
6.5
ne
logc
ost
0 1
5.5
6.0
6.5
ct
logc
ost
0 1
5.5
6.0
6.5
bw
logc
ost
0 1
5.5
6.0
6.5
pt
logc
ost
Figure 2: Boxplots of Nuclear dataset.
10
(b)
In linear regression, we assume that:
ε1, · · · , εni.i.d.∼ N(0, σ2).
The assumptions we make, in order of importance:- εi ∼ Normal distribution- Independence between ε1, · · · , εn- Var[εi] = σ2
- E[εi] = 0
> # Model with all predictors
> full.model = lm(logcost~.-cost, data = Nuclear)
> summary(full.model)
Call:
lm(formula = logcost ~ . - cost, data = Nuclear)
Residuals:
Min 1Q Median 3Q Max
-0.284032 -0.081677 0.009502 0.090890 0.266548
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.063e+01 5.710e+00 -1.862 0.07662 .
date 2.276e-01 8.656e-02 2.629 0.01567 *
t1 5.252e-03 2.230e-02 0.236 0.81610
t2 5.606e-03 4.595e-03 1.220 0.23599
cap 8.837e-04 1.811e-04 4.878 7.99e-05 ***
pr -1.081e-01 8.351e-02 -1.295 0.20943
ne 2.595e-01 7.925e-02 3.274 0.00362 **
ct 1.155e-01 7.027e-02 1.644 0.11503
bw 3.680e-02 1.063e-01 0.346 0.73261
cum.n -1.203e-02 7.828e-03 -1.536 0.13944
pt -2.220e-01 1.304e-01 -1.702 0.10352
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.1697 on 21 degrees of freedom
Multiple R-squared: 0.8635, Adjusted R-squared: 0.7985
F-statistic: 13.28 on 10 and 21 DF, p-value: 5.717e-07
(c)
> # Remove the variable with the highest p-value.
> model.without.t1 = lm(logcost~.-cost-t1, data = Nuclear)
> summary(model.without.t1)
Call:
lm(formula = logcost ~ . - cost - t1, data = Nuclear)
11
Residuals:
Min 1Q Median 3Q Max
-0.28898 -0.07856 0.01272 0.08983 0.26537
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.161e+01 3.835e+00 -3.027 0.006187 **
date 2.431e-01 5.482e-02 4.435 0.000208 ***
t2 5.451e-03 4.449e-03 1.225 0.233451
cap 8.778e-04 1.755e-04 5.002 5.25e-05 ***
pr -1.035e-01 7.944e-02 -1.303 0.205922
ne 2.607e-01 7.738e-02 3.368 0.002772 **
ct 1.142e-01 6.853e-02 1.667 0.109715
bw 2.622e-02 9.423e-02 0.278 0.783401
cum.n -1.220e-02 7.626e-03 -1.599 0.124034
pt -2.157e-01 1.249e-01 -1.727 0.098181 .
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.166 on 22 degrees of freedom
Multiple R-squared: 0.8631, Adjusted R-squared: 0.8072
F-statistic: 15.42 on 9 and 22 DF, p-value: 1.424e-07
It’s sensible to remove the variable with the highest P-value since, according to the Wald test, thisvariable has the lowest chance that it has a significant (linear) relationship with logcost.
Compare to the model with all explanatory variables, the model without t1 has somewhat changed p-values. It’s logical because the same variables are in 2 different context in 2 different models. In general,one would expect that the explanatory variables that had higher correlation with t1 will have biggerchanges in their p-values (usually decreased p-values). The reason is that these variables now take overthe role of t1.
(d)
# This function fits lm() with the given data.
# It uses y.name as reponse varialbe and all other variables as explanatory variables.
# It removes the variable with the highest p-value (according to the Wald test) from
the model.
# It repeats this process untill all variables in the model have significant p-values.
p.value.backward.selection.func = function(y.name, data) {
# Full model that cotains all variables.
full.model.formula = as.formula(paste(y.name, "~.", sep=""))
full.model = lm(formula = full.model.formula, data = data)
# Starting values
itr = 0
n.not.signif = sum(summary(full.model)$coefficients[,"Pr(>|t|)"] >= 0.05)
trimmed.model = full.model
# Make a frame
trimmed.model.list = list()
trimmed.model.list[[1]] = full.model
# A loop to remove all variables with a non-significant p-value.
12
while(n.not.signif > 0) {
# Iteration
itr = itr + 1
# cat("Itr = ", itr, "\n", sep = "")
# Extract coefficients from the model.
coeff.result.mat = summary(trimmed.model)$coefficients
# Remove intercept.
coeff.result.mat = coeff.result.mat[-1, ]
# Order such that the highest p-value comes on the top.
coeff.result.mat = coeff.result.mat[order(-coeff.result.mat[, "Pr(>|t|)"]), ]
# Write down which variable should be removed.
if (itr == 1) {
variables.to.remove = rownames(coeff.result.mat)[1]
} else {
variables.to.remove = c(variables.to.remove, rownames(coeff.result.mat)[1])
}
# Fit the model without the variable with the highest p-value.
trimmed.model.formula = as.formula(paste(y.name, "~.", paste("-",variables.to.
remove, sep="",collapse=""), sep=""))
trimmed.model = lm(trimmed.model.formula, data = data)
# Save the model
trimmed.model.list[[itr]] = trimmed.model
# Check whether the trimmed.model cotains a non-significant p-value.
coeff.result.mat = summary(trimmed.model)$coefficients
# Remove intercept.
coeff.result.mat = coeff.result.mat[-1, ]
# Order such that the highest p-value comes on the top.
coeff.result.mat = coeff.result.mat[order(-coeff.result.mat[, "Pr(>|t|)"]), ]
# Update n.not.signif
n.not.signif = sum(coeff.result.mat[,"Pr(>|t|)"]>=0.05)
# When all non-significant variables are removed.
if (n.not.signif == 0) {
final.model = trimmed.model
cat("Process done: all variables with a non-significant p-value are removed (itr
= ", itr, ").", sep="", "\n")
cat("The following variables are removed during the process:", "\n", paste(
variables.to.remove, collapse=", "),"\n")
cat("\n")
cat("The summary of the resulting model:","\n")
# Print summary of the final model.
print(summary(trimmed.model))
}
}
result = list(final.model = final.model, full.model = full.model, variables.removed
= variables.to.remove, all.models = trimmed.model.list, itr = itr)
return(result)
}
13
>
>
> source("./Oblig 1/func p value backward selection.R")
> # Keep removing non-significant variables untill only significant variables are left
.
> backward.selection.result = p.value.backward.selection.func(y.name = "logcost", data
= Nuclear)
Process done: all variables with a non-significant p-value are removed (itr = 6).
The following variables are removed during the process:
t1, bw, pr, t2, cum.n, ct
The summary of the resulting model:
Call:
lm(formula = trimmed.model.formula, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.42160 -0.10554 -0.00070 0.07247 0.37328
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.5035539 2.5022087 -1.800 0.083072 .
date 0.1439104 0.0363320 3.961 0.000491 ***
cap 0.0008783 0.0001677 5.238 1.61e-05 ***
ne 0.2024364 0.0751953 2.692 0.012042 *
pt -0.3964878 0.0963356 -4.116 0.000326 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.1767 on 27 degrees of freedom
Multiple R-squared: 0.8096, Adjusted R-squared: 0.7814
F-statistic: 28.7 on 4 and 27 DF, p-value: 2.255e-09
>
> # The model that contains only variables with significant p-values.
> final.model = backward.selection.result$final.model
>
> # Diagnostic plot of the model
> plot(final.model)
14
5.4 5.6 5.8 6.0 6.2 6.4 6.6
−0.
4−
0.2
0.0
0.2
0.4
Fitted values
Res
idua
ls
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
Residuals vs Fitted
7
19
10
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
−2 −1 0 1 2
−2
−1
01
2
Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
7
19
10
5.4 5.6 5.8 6.0 6.2 6.4 6.6
0.0
0.5
1.0
1.5
Fitted values
Sta
ndar
dize
d re
sidu
als
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
Scale−Location7
19
10
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
−3
−2
−1
01
2
Leverage
Sta
ndar
dize
d re
sidu
als
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
Cook's distance1
0.5
0.5
Residuals vs Leverage
26
19
10
Figure 3: Diagnostic plot of the final model that contains only significant variables.
According to the F -test, at least one of βj 6= 0. The diagnostic plots of the model suggest that there isa slight non-linearity in the upper quantile of y. However, this non linearity is caused by 2 data points.We can say more about the non-linearity when we have more data.
(e)
> # (e)
> # y.hat from the test set.
> y.hat = predict(final.model, newdata = Nuclear)
> # mean squared error
> MSE = 1/nrow(Nuclear)*sum((Nuclear[,"logcost"] - y.hat)^2)
> MSE
[1] 0.02635124
We have MSE = 0.0264, but this doesn’t tell much about how good the model is. Instead, we compute
R2 =Var(y)
Var(y)= 1−
∑ni=1(yi − yi)2∑ni=1(yi − y)2
= 1− n ·MSE
(n− 1) ·Var(y).
15
> # R-squared
> # Method 1
> 1 - n*MSE/((n-1)*var(Nuclear[,"logcost"]))
[1] 0.8095693
> # Method 2
> var(y.hat)/var(Nuclear[,"logcost"])
[1] 0.8095693
> # Method 3
> summary(best.model.Wald)$r.squared
[1] 0.8095693
So, the model explains about 80, 96% of the variance in y, which is not bad. However, we used thesame training data to measure the prediction performance of the model. This often gives too optimisticresult (i.e. underestimated MSE) since we are using the data that we have seen before. To prevent thisproblem, one can use test set, that is not used during the model fitting process, to measure the predictionperformance.
(f)
> best.model.AIC = stepAIC(full.model, direction = "backward")
Start: AIC=-105.01
logcost ~ date + t1 + t2 + cap + pr + ne + ct + bw + cum.n +
pt
Df Sum of Sq RSS AIC
- t1 1 0.00160 0.60603 -106.930
- bw 1 0.00345 0.60788 -106.832
<none> 0.60443 -105.014
- t2 1 0.04284 0.64727 -104.823
- pr 1 0.04826 0.65269 -104.556
- cum.n 1 0.06792 0.67235 -103.607
- ct 1 0.07781 0.68224 -103.140
- pt 1 0.08337 0.68781 -102.879
- date 1 0.19899 0.80343 -97.907
- ne 1 0.30859 0.91302 -93.815
- cap 1 0.68497 1.28940 -82.770
Step: AIC=-106.93
logcost ~ date + t2 + cap + pr + ne + ct + bw + cum.n + pt
Df Sum of Sq RSS AIC
- bw 1 0.00213 0.60816 -108.818
<none> 0.60603 -106.930
- t2 1 0.04135 0.64738 -106.818
- pr 1 0.04680 0.65283 -106.550
- cum.n 1 0.07045 0.67648 -105.411
- ct 1 0.07654 0.68257 -105.124
- pt 1 0.08216 0.68819 -104.862
- ne 1 0.31255 0.91858 -95.621
- date 1 0.54190 1.14793 -88.489
- cap 1 0.68916 1.29518 -84.627
Step: AIC=-108.82
logcost ~ date + t2 + cap + pr + ne + ct + cum.n + pt
16
Df Sum of Sq RSS AIC
<none> 0.60816 -108.818
- pr 1 0.05738 0.66554 -107.932
- t2 1 0.06379 0.67195 -107.626
- cum.n 1 0.06839 0.67656 -107.407
- ct 1 0.07440 0.68257 -107.124
- pt 1 0.08066 0.68882 -106.832
- ne 1 0.31375 0.92192 -97.505
- date 1 0.54592 1.15408 -90.318
- cap 1 0.68739 1.29556 -86.617
> best.model.BIC = stepAIC(full.model, direction = "backward", k = log(n))
Start: AIC=-88.89
logcost ~ date + t1 + t2 + cap + pr + ne + ct + bw + cum.n +
pt
Df Sum of Sq RSS AIC
- t1 1 0.00160 0.60603 -92.273
- bw 1 0.00345 0.60788 -92.175
- t2 1 0.04284 0.64727 -90.166
- pr 1 0.04826 0.65269 -89.899
- cum.n 1 0.06792 0.67235 -88.949
<none> 0.60443 -88.891
- ct 1 0.07781 0.68224 -88.482
- pt 1 0.08337 0.68781 -88.222
- date 1 0.19899 0.80343 -83.250
- ne 1 0.30859 0.91302 -79.158
- cap 1 0.68497 1.28940 -68.113
Step: AIC=-92.27
logcost ~ date + t2 + cap + pr + ne + ct + bw + cum.n + pt
Df Sum of Sq RSS AIC
- bw 1 0.00213 0.60816 -95.626
- t2 1 0.04135 0.64738 -93.626
- pr 1 0.04680 0.65283 -93.358
<none> 0.60603 -92.273
- cum.n 1 0.07045 0.67648 -92.219
- ct 1 0.07654 0.68257 -91.933
- pt 1 0.08216 0.68819 -91.670
- ne 1 0.31255 0.91858 -82.430
- date 1 0.54190 1.14793 -75.297
- cap 1 0.68916 1.29518 -71.435
Step: AIC=-95.63
logcost ~ date + t2 + cap + pr + ne + ct + cum.n + pt
Df Sum of Sq RSS AIC
- pr 1 0.05738 0.66554 -96.207
- t2 1 0.06379 0.67195 -95.900
- cum.n 1 0.06839 0.67656 -95.681
<none> 0.60816 -95.626
- ct 1 0.07440 0.68257 -95.398
- pt 1 0.08066 0.68882 -95.106
- ne 1 0.31375 0.92192 -85.779
- date 1 0.54592 1.15408 -78.592
- cap 1 0.68739 1.29556 -74.892
17
Step: AIC=-96.21
logcost ~ date + t2 + cap + ne + ct + cum.n + pt
Df Sum of Sq RSS AIC
- t2 1 0.02447 0.69001 -98.517
- cum.n 1 0.05351 0.71905 -97.198
<none> 0.66554 -96.207
- ct 1 0.10237 0.76791 -95.094
- pt 1 0.12015 0.78570 -94.361
- ne 1 0.28784 0.95339 -88.171
- date 1 0.49109 1.15664 -81.987
- cap 1 0.68019 1.34573 -77.141
Step: AIC=-98.52
logcost ~ date + cap + ne + ct + cum.n + pt
Df Sum of Sq RSS AIC
- cum.n 1 0.06006 0.75007 -99.312
<none> 0.69001 -98.517
- pt 1 0.11719 0.80720 -96.963
- ct 1 0.12931 0.81932 -96.486
- ne 1 0.27215 0.96216 -91.343
- date 1 0.46672 1.15673 -85.450
- cap 1 0.89456 1.58457 -75.379
Step: AIC=-99.31
logcost ~ date + cap + ne + ct + pt
Df Sum of Sq RSS AIC
<none> 0.75007 -99.312
- ct 1 0.09317 0.84324 -99.031
- ne 1 0.21478 0.96485 -94.720
- pt 1 0.37487 1.12494 -89.807
- date 1 0.55668 1.30675 -85.013
- cap 1 0.83451 1.58458 -78.845
>
> summary(best.model.AIC)
Call:
lm(formula = logcost ~ date + t2 + cap + pr + ne + ct + cum.n +
pt, data = Nuclear)
Residuals:
Min 1Q Median 3Q Max
-0.290513 -0.082692 0.008663 0.098259 0.260204
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.169e+01 3.748e+00 -3.118 0.004839 **
date 2.438e-01 5.366e-02 4.544 0.000145 ***
t2 6.018e-03 3.874e-03 1.553 0.134024
cap 8.739e-04 1.714e-04 5.099 3.65e-05 ***
pr -1.099e-01 7.458e-02 -1.473 0.154275
ne 2.611e-01 7.580e-02 3.445 0.002206 **
ct 1.111e-01 6.622e-02 1.677 0.106991
18
cum.n -1.176e-02 7.311e-03 -1.608 0.121414
pt -2.071e-01 1.186e-01 -1.747 0.094059 .
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.1626 on 23 degrees of freedom
Multiple R-squared: 0.8627, Adjusted R-squared: 0.8149
F-statistic: 18.06 on 8 and 23 DF, p-value: 3.307e-08
> summary(best.model.BIC)
Call:
lm(formula = logcost ~ date + cap + ne + ct + pt, data = Nuclear)
Residuals:
Min 1Q Median 3Q Max
-0.36175 -0.08063 0.00584 0.08594 0.42355
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.4058393 2.4567323 -2.200 0.036861 *
date 0.1563992 0.0356036 4.393 0.000167 ***
cap 0.0008674 0.0001613 5.378 1.24e-05 ***
ne 0.1973468 0.0723259 2.729 0.011252 *
ct 0.1154229 0.0642266 1.797 0.083943 .
pt -0.3477717 0.0964752 -3.605 0.001299 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.1698 on 26 degrees of freedom
Multiple R-squared: 0.8306, Adjusted R-squared: 0.798
F-statistic: 25.5 on 5 and 26 DF, p-value: 2.958e-09
When we use AIC, we remove t1, bw (in the order of removal) from the model that contains all explana-tory variables.When we use BIC, we remove t1, bw, pr, t2, cum.n (in the order of removal) from the model thatcontains all explanatory variables.
With Nuclear data, the BIC’s penalty term is bigger than that of AIC since ln(32) = 3.4657 > 2. Thisresults in that BIC penalizes bigger model more severely.
(g)
When stepAIC tests all the models that drop one variable, we can see that the rank between thosemodels are identical between AIC and BIC. This is because the penalty term is defined as the numberof parameters times a constant across all models. So, the penalty term is same for all the models withthe same dimension (within AIC or BIC). Thus, the rank is decided purely by the log-likelihood value,which is the same for AIC and BIC.
19
(h)
Table 1 shows which variables are (not) in the best model from each model selection framework. We seethat the best model from the Wald test frame work has the least number of variables. This is becauseAIC and BIC evaluates the model as a whole (in terms of KL divergence), while Wald-test frameworkevaluates components of the model (i.e. βj) separately. So, even though a variable has non-significantp-value, it can happen that this non-significant variable improves the model according to AIC or BIC.
In terms of the order that variables are removed, there is no difference between the frameworks (for thecommon variables that are removed). This can however differ from application to application.
Table 1: An overview of variables in the best models from 3 different model selection frameworks. Thenumbers in red cells indicate in which order the variables were removed from the model.
Model selection criterion date t1 t2 cap pr ne ct bw cum.n ptWald test 1 4 3 6 2 5AIC 1 2BIC 1 4 3 2 5
(i)
We know that the moment generating function is defined as MY (t) = E[eY t]. For normal distribution,
the moment generating function isMY (t) = exp[µt+ σ2t2/2
]. Thus, E
[eZ]
= MZ(1) = exp[µ+ 0.5σ2
].
We are given that θ = E [Z] and η = E[eZ].
The natural estimator of η = E[eZ]
can be obtained by replacing the expectation with sample equivalent:
η3 =1
n
n∑i=1
ezi .
However, since we know E[eZ]
= exp[θ + 0.5σ2
], we can estimate η more directly with a plug-in
estimator:η2 = exp
[θ + 0.5σ2
].
exp[·] is not a linear operator, so it can’t get out of expectation. But when an immature and naivestatistician does this, he/she gets η = E
[eZ]
= eE[Z] = eθ. And by using the plug-in estimator, he/shegets
η1 = exp[θ]
which is wrong.
We choose for η2 (or η3) since η1 is incorrect.
(j)
> d.new = data.frame(date = 70, t1 = 13, t2 = 50, cap = 800, pr = 1, ne = 0, ct = 0,
bw = 1, cum.n = 8, pt = 1)
20
> theta.hat.Wald = predict(object = best.model.Wald, newdata = d.new)
> theta.hat.AIC = predict(object = best.model.AIC, newdata = d.new)
> theta.hat.BIC = predict(object = best.model.BIC, newdata = d.new)
>
> sigma.hat.Wald = summary(best.model.Wald)$sigma
> sigma.hat.AIC = summary(best.model.AIC)$sigma
> sigma.hat.BIC = summary(best.model.BIC)$sigma
>
> se.theta.hat.Wald = predict(object = best.model.Wald, newdata = d.new, se.fit = TRUE
)$se.fit
> se.theta.hat.AIC = predict(object = best.model.AIC, newdata = d.new, se.fit = TRUE)
$se.fit
> se.theta.hat.BIC = predict(object = best.model.BIC, newdata = d.new, se.fit = TRUE)
$se.fit
>
> eta.hat.Wald = exp(theta.hat.Wald + 0.5*sigma.hat.Wald^2)
> eta.hat.AIC = exp(theta.hat.AIC + 0.5*sigma.hat.AIC^2)
> eta.hat.BIC = exp(theta.hat.BIC + 0.5*sigma.hat.BIC^2)
>
> theta.hat.Wald
1
5.876308
> theta.hat.AIC
1
5.968446
> theta.hat.BIC
1
5.888265
>
> se.theta.hat.Wald
[1] 0.1154351
> se.theta.hat.AIC
[1] 0.1500756
> se.theta.hat.BIC
[1] 0.1111445
>
> sigma.hat.Wald
[1] 0.1767232
> sigma.hat.AIC
[1] 0.1626095
> sigma.hat.BIC
[1] 0.1698493
>
> eta.hat.Wald
1
362.101
> eta.hat.AIC
1
396.1001
> eta.hat.BIC
1
366.0206
>
> #se.eta.hat.Wald
> #se.eta.hat.AIC
> #se.eta.hat.BIC
21
>
> #se.eta.hat.Wald
> #se.eta.hat.AIC
> #se.eta.hat.BIC
So,
θWald = 5.8763
θAIC = 5.9686
θBIC = 5.8883
and
ηWald = 362.1010
ηAIC = 396.1001
ηBIC = 366.0206.
NB.Y = XTβ + εθ = E[Y |X] = XTβ
θ = XTβy = XTβ + εVar(y) = Var(θ) + σ2.
So, we can easily obtain Var(θ).
However, Var(η) = Var
(θ +
1
2σ2
)= Var
(θ)
+1
4Var
(σ2)
+ Cov(θ, σ2
)is difficult to estimate without
bootstrapping since Var(σ2)
and Cov(θ, σ2
)are hard to obtain.
We also can’t use the delta method since we don’t know Var(σ2)
and Cov(θ, σ2
).
22
Oppgave 3
(a)
When one looks at the variance of the estimated parameters in exercise 2 (j), it ignores the uncertaintyof the model itself and the data.(Law of total variance: Var(Y ) = E [Var (Y |X)] + Var (E [Y |X]).)
(b)
> # We only concentrate on AIC
> theta.hat = theta.hat.AIC
> sigma.hat = sigma.hat.AIC
> psi.hat = theta.hat + 0.5*sigma.hat^2
> eta.hat = eta.hat.AIC
>
> # (b)
> # A function that draws a bootstrap sample.
> # Then, it fits the model with the bootstrap sample.
> # Then, it performs variable selection based on backward AIC.
> # Then, it predicts from the fitted model with given newdata.
> estimate.param.with.given.data.func = function(y.name, data, newdata, result.as.
vector = FALSE) {
+ # Full model that cotains all variables.
+ full.model.formula = as.formula(paste(y.name, "~.", sep=""))
+ full.model = lm(formula = full.model.formula, data = data)
+ # Perform variable selection based on AIC.
+ best.model.AIC = stepAIC(full.model, direction = "backward", trace = FALSE)
+ # Predict from the final model.
+ theta.hat.obj = predict(object = best.model.AIC, newdata = newdata, se.fit = TRUE)
+
+ y.hat = as.numeric(theta.hat.obj$fit)
+ theta.hat = y.hat
+ sigma.hat = summary(best.model.AIC)$sigma
+ psi.hat = theta.hat + 0.5*sigma.hat^2
+ eta.hat = exp(theta.hat + 0.5*sigma.hat^2)
+
+ if (result.as.vector == FALSE) {
+ # Report the result.
+ result = list(
+ theta.hat = theta.hat,
+ sigma.hat = sigma.hat,
+ psi.hat = psi.hat,
+ eta.hat = as.numeric(eta.hat)
+ )
+ } else if (result.as.vector == TRUE) {
+ result = c(theta.hat, sigma.hat, psi.hat, eta.hat)
+ }
+
+ return(result)
+ }
>
> # Bootstrap estimation
> B = 1000
23
> theta.star.vec = rep(NA,B)
> sigma.star.vec = rep(NA,B)
> psi.star.vec = rep(NA,B)
> eta.star.vec = rep(NA,B)
> for(b in 1:B) {
+ set.seed(b)
+ # Draw bootstrap sample index
+ boot.ind = sample(x = 1:n, size = n, replace = T)
+
+ # Fit lm with bootstrap sample and predict from the model.
+ lm.result = estimate.param.with.given.data.func(
+ y.name = "logcost",
+ data = Nuclear[boot.ind,],
+ newdata = d.new
+ )
+ theta.star.vec[b] = lm.result$theta.hat
+ sigma.star.vec[b] = lm.result$sigma.hat
+ psi.star.vec[b] = lm.result$psi.hat
+ eta.star.vec[b] = lm.result$eta.hat
+ }
>
(c)
> # (c)
> # Bias of theta.star.
> bias.theta.star = mean(theta.star.vec) - theta.hat
> # Standard error of theta.star.
> se.theta.star = sd(theta.star.vec)
>
> # Bias of sigma.star.
> bias.sigma.star = mean(sigma.star.vec) - sigma.hat
> # Standard error of theta.star.
> se.sigma.star = sd(sigma.star.vec)
>
> # Bias of sigma.star.
> bias.psi.star = mean(psi.star.vec) - psi.hat
> # Standard error of theta.star.
> se.psi.star = sd(psi.star.vec)
>
> # Bias of eta.star.
> bias.eta.star = mean(eta.star.vec) - eta.hat
> # Standard error of eta.star.
> se.eta.star = sd(eta.star.vec)
>
> bias.theta.star
1
-0.032707
> se.theta.star
[1] 0.3013266
> bias.eta.star
1
2.47609
> se.eta.star
[1] 119.4501
24
(d)
> alpha = 0.05
> crit.val = qnorm(1 - alpha/2)
i)
> # Method 1 (section 4.1 from the course note): CI based on normal approximation
> CI.theta.norm.boot = c(theta.hat - crit.val*se.theta.star, theta.hat + crit.val*se.
theta.star)
> CI.sigma.norm.boot = c(sigma.hat - crit.val*se.sigma.star, sigma.hat + crit.val*se.
sigma.star)
> CI.psi.norm.boot = c(psi.hat - crit.val*se.psi.star, psi.hat + crit.val*se.psi.star)
> CI.eta.norm.boot = c(eta.hat - crit.val*se.eta.star, eta.hat + crit.val*se.eta.star)
>
> # Confidence interval based on normal approximation (bias corrected)
> CI.theta.norm.boot.unbiased = CI.theta.norm.boot - bias.theta.star
> CI.sigma.norm.boot.unbiased = CI.sigma.norm.boot - bias.sigma.star
> CI.psi.norm.boot.unbiased = CI.psi.norm.boot - bias.psi.star
> CI.eta.norm.boot.unbiased = CI.eta.norm.boot - bias.eta.star
ii)
> # Method 2 (section 4.2 from the course note): Standard bootstrap CI
> CI.theta.std.boot = theta.hat - quantile(x = theta.star.vec - theta.hat, probs = c
(1-alpha/2, alpha/2))
> CI.sigma.std.boot = sigma.hat - quantile(x = sigma.star.vec - sigma.hat, probs = c
(1-alpha/2, alpha/2))
> CI.psi.std.boot = psi.hat - quantile(x = psi.star.vec - psi.hat, probs = c(1-alpha
/2, alpha/2))
> CI.eta.std.boot = eta.hat - quantile(x = eta.star.vec - eta.hat, probs = c(1-alpha
/2, alpha/2))
iii)
> # Method 3 (section 4.3 from the course note): Percentile bootstrap CI
> CI.theta.perc.boot = quantile(theta.star.vec, c(alpha/2, 1 - alpha/2))
> CI.sigma.perc.boot = quantile(sigma.star.vec, c(alpha/2, 1 - alpha/2))
> CI.psi.perc.boot = quantile(psi.star.vec, c(alpha/2, 1 - alpha/2))
> CI.eta.perc.boot = quantile(eta.star.vec, c(alpha/2, 1 - alpha/2))
iv)
> # Method 4 (section 4.4 from the course note): BC_a bootstrap CI
> # Compute z0
> z0.hat.theta = qnorm(mean(theta.star.vec < theta.hat))
> z0.hat.sigma = qnorm(mean(sigma.star.vec < sigma.hat))
> z0.hat.psi = qnorm(mean(psi.star.vec < psi.hat))
> z0.hat.eta = qnorm(mean(eta.star.vec < eta.hat))
>
> # Compute theta.hat by omitting one data point at a time.
> theta.hat.without.i = rep(NA, n)
> sigma.hat.without.i = rep(NA, n)
> psi.hat.without.i = rep(NA, n)
> eta.hat.without.i = rep(NA, n)
25
>
> for(i in 1:n) {
+ lm.result = estimate.param.with.given.data.func(
+ y.name = "logcost",
+ data = Nuclear[-i,],
+ newdata = d.new
+ )
+ theta.hat.without.i[i] = lm.result$theta.hat
+ sigma.hat.without.i[i] = lm.result$sigma.hat
+ psi.hat.without.i[i] = lm.result$psi.hat
+ eta.hat.without.i[i] = lm.result$eta.hat
+ }
>
> # Compute a.hat
> a.hat.theta = sum((mean(theta.hat.without.i) - theta.hat.without.i)^3)/
+ (6*(sum((mean(theta.hat.without.i) - theta.hat.without.i)^2)^1.5))
> a.hat.sigma = sum((mean(sigma.hat.without.i) - sigma.hat.without.i)^3)/
+ (6*(sum((mean(sigma.hat.without.i) - sigma.hat.without.i)^2)^1.5))
> a.hat.psi = sum((mean(psi.hat.without.i) - psi.hat.without.i)^3)/
+ (6*(sum((mean(psi.hat.without.i) - psi.hat.without.i)^2)^1.5))
> a.hat.eta = sum((mean(eta.hat.without.i) - eta.hat.without.i)^3)/
+ (6*(sum((mean(eta.hat.without.i) - eta.hat.without.i)^2)^1.5))
>
> # Compute alpha
> alpha.1.theta = pnorm(q = z0.hat.theta + (z0.hat.theta - crit.val)/(1 - a.hat.theta
*(z0.hat.theta - crit.val)))
> alpha.2.theta = pnorm(q = z0.hat.theta + (z0.hat.theta + crit.val)/(1 - a.hat.theta
*(z0.hat.theta + crit.val)))
>
> alpha.1.sigma = pnorm(q = z0.hat.sigma + (z0.hat.sigma - crit.val)/(1 - a.hat.sigma
*(z0.hat.sigma - crit.val)))
> alpha.2.sigma = pnorm(q = z0.hat.sigma + (z0.hat.sigma + crit.val)/(1 - a.hat.sigma
*(z0.hat.sigma + crit.val)))
>
> alpha.1.psi = pnorm(q = z0.hat.psi + (z0.hat.psi - crit.val)/(1 - a.hat.psi*(z0.hat.
psi - crit.val)))
> alpha.2.psi = pnorm(q = z0.hat.psi + (z0.hat.psi + crit.val)/(1 - a.hat.psi*(z0.hat.
psi + crit.val)))
>
> alpha.1.eta = pnorm(q = z0.hat.eta + (z0.hat.eta - crit.val)/(1 - a.hat.eta*(z0.hat.
eta - crit.val)))
> alpha.2.eta = pnorm(q = z0.hat.eta + (z0.hat.eta + crit.val)/(1 - a.hat.eta*(z0.hat.
eta + crit.val)))
>
> CI.theta.BCa.boot = quantile(x = theta.star.vec, probs = c(alpha.1.theta, alpha.2.
theta))
> CI.sigma.BCa.boot = quantile(x = sigma.star.vec, probs = c(alpha.1.sigma, alpha.2.
sigma))
> CI.psi.BCa.boot = quantile(x = psi.star.vec, probs = c(alpha.1.psi, alpha.2.psi))
> CI.eta.BCa.boot = quantile(x = eta.star.vec, probs = c(alpha.1.eta, alpha.2.eta))
>
>
> # Report the result
> CI.result.mat = rbind(c(CI.theta.norm.boot, CI.sigma.norm.boot, CI.eta.norm.boot),
+ c(CI.theta.norm.boot.unbiased, CI.sigma.norm.boot.unbiased, CI.eta.
norm.boot.unbiased),
26
+ c(CI.theta.std.boot, CI.sigma.std.boot, CI.eta.std.boot),
+ c(CI.theta.perc.boot, CI.sigma.perc.boot, CI.eta.perc.boot),
+ c(CI.theta.BCa.boot, CI.sigma.BCa.boot, CI.eta.BCa.boot))
> rownames(CI.result.mat) = c("Normal","Normal.with.bias.correc","Std","Perce","BCa")
> colnames(CI.result.mat) = c("theta.low","theta.upp","sigma.low","sigma.upp","eta.low
","eta.upp")
> # Test for invariance for monotone transformation function.> CI.result.mat = cbind(+ CI.result.mat,+ exp(rbind(CI.psi.norm.boot, CI.psi.norm.boot.unbiased, CI.psi.std.boot, CI.psi.perc.boot, CI.psi.BCa.boot)
)+ )> colnames(CI.result.mat)[(length(colnames(CI.result.mat))−1):length(colnames(CI.result.mat))] = c(”eta.
trans.low”,”eta.trans.upp”)> round(CI.result.mat, 2)
theta.low theta.upp sigma.low sigma.upp eta.low eta.upp eta.trans.low eta.trans.uppNormal 5.38 6.56 0.12 0.21 161.98 630.22 219.46 714.93Normal.with.bias.correc 5.41 6.59 0.15 0.25 159.51 627.74 227.93 742.52Std 5.46 6.64 0.16 0.25 134.30 590.70 238.48 778.64Perce 5.30 6.48 0.07 0.17 201.50 657.90 201.50 657.90BCa 5.34 6.52 0.16 0.18 202.37 658.28 209.73 684.04>
> # By using boot package.
> library(boot)
> boot.func = function(data, ind) {
+ estimate.param.with.given.data.func(y.name = "logcost", data = data[ind,], newdata
= d.new, result.as.vector = TRUE)
+ }
>
> boot.obj = boot(data = Nuclear, statistic = boot.func, R = 1000)
> show(boot.obj)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Nuclear, statistic = boot.func, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 5.9684461 -0.03006729 0.30151863
t2* 0.1626095 -0.03768358 0.02519219
t3* 5.9816670 -0.03516797 0.30158719
t4* 396.1001200 3.72001956 122.06135759
>
> boot.ci.theta = boot.ci(boot.obj, index = 1, type = c("norm","basic","perc", "bca")
)
> boot.ci.sigma = boot.ci(boot.obj, index = 2, type = c("norm","basic","perc", "bca")
)
Warning message:
In norm.inter(t, adj.alpha) : extreme order statistics used as endpoints
> boot.ci.psi = boot.ci(boot.obj, index = 3, type = c("norm","basic","perc", "bca"))
> boot.ci.eta = boot.ci(boot.obj, index = 4, type = c("norm","basic","perc", "bca"))
>
27
> boot.ci.theta
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = boot.obj, type = c("norm", "basic", "perc",
"bca"), index = 1)
Intervals :
Level Normal Basic
95% ( 5.408, 6.589 ) ( 5.458, 6.655 )
Level Percentile BCa
95% ( 5.282, 6.479 ) ( 5.290, 6.510 )
Calculations and Intervals on Original Scale
> boot.ci.sigma
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = boot.obj, type = c("norm", "basic", "perc",
"bca"), index = 2)
Intervals :
Level Normal Basic
95% ( 0.1509, 0.2497 ) ( 0.1570, 0.2527 )
Level Percentile BCa
95% ( 0.0726, 0.1682 ) ( 0.1542, 0.2000 )
Calculations and Intervals on Original Scale
Warning : BCa Intervals used Extreme Quantiles
Some BCa intervals may be unstable
> boot.ci.psi
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = boot.obj, type = c("norm", "basic", "perc",
"bca"), index = 3)
Intervals :
Level Normal Basic
95% ( 5.426, 6.608 ) ( 5.478, 6.678 )
Level Percentile BCa
95% ( 5.285, 6.485 ) ( 5.318, 6.525 )
Calculations and Intervals on Original Scale
> boot.ci.eta
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = boot.obj, type = c("norm", "basic", "perc",
"bca"), index = 4)
Intervals :
28
Level Normal Basic
95% (153.1, 631.6 ) (136.7, 594.8 )
Level Percentile BCa
95% (197.4, 655.5 ) (204.9, 683.3 )
Calculations and Intervals on Original Scale
>
> # Test invariance
> boot.ci.eta.trans = rbind(
+ boot.ci.psi$normal[2:3],
+ boot.ci.psi$basic[4:5],
+ boot.ci.psi$percent[4:5],
+ boot.ci.psi$bca[4:5])
> boot.ci.eta.trans = exp(boot.ci.eta.trans)
> rownames(boot.ci.eta.trans) = c("Normal.with.bias.correc","Std","Perce","BCa")
> colnames(boot.ci.eta.trans) = c("eta.trans.low","eta.trans.upp")
> boot.ci.eta.trans
eta.trans.low eta.trans.upp
Normal.with.bias.correc 227.1782 740.9514
Std 239.3514 794.9614
Perce 197.3622 655.5019
BCa 203.9552 681.6720
(e)
From the result of (d), we can see that percentile and BCa bootstrap confidence interval are invariant oftransformation.
29