Download pdf - Solutions to obligatorisk oppgave 1, STK2100 · Solutions to obligatorisk oppgave 1, STK2100 Vinnie Ko May 14, 2018 Disclaimer: This document is made solely for my own personal use

Solutions to obligatorisk oppgave 1, STK2100

Vinnie Ko

May 14, 2018

Disclaimer:This document is made solely for my own personal use and can contain many errors.

Oppgave 1

(a)

ci can be seen as a categorical variable that indicates the j-th category where 1 ≤ j ≤ K.xi,j can then be seen as a dummy variable which defined as:

xi,j = 1j(ci) :=

{1 if ci = j

0 if ci 6= j.

This way of “dummy coding” results in the following mapping:

xi,1 xi,2 · · · xi,K−1 xi,KWhen ci = 1 1 0 · · · 0 0When ci = 2 0 1 · · · 0 0

......

.... . .

......

When ci = K − 1 0 0 · · · 1 0When ci = K 0 0 · · · 0 1

The given models

Yi = β0 + β2xi,2 + · · ·+ βKxi,K + εi (1)

and

Yi = α1xi,1 + · · ·+ αKxi,K + εi (2)

correspond to 2 different ways of using this “dummy coding”.

For each model, we can see that:

Under equation (1): Under equation (2):Yi Yi

When ci = 1 β0 + εi α1 + εiWhen ci = 2 β0 + β2 + εi α2 + εi

......

...When ci = K − 1 β0 + βK−1 + εi αK−1 + εiWhen ci = K β0 + βK + εi αK + εi

1

So, if αj = β0 + βj , β1 = 0 for j ∈ {1, · · ·K}, then model (1) and model (2) are the same.

Interpretation:αj = average value of {Yi|ci = j} in the data.βj = difference between average of {Yi|ci = 1} (reference group) and average of {Yi|ci = j}

(b)

The design matrix for model (2): X =

x1,1 · · · x1,j · · · x1,K...

. . ....

. . ....

xi,1 · · · xi,j · · · xi,K...

. . ....

. . ....

xn,1 · · · xn,j · · · xn,K

XTX =

x1,1 · · · xl,1 · · · xn,1...

. . ....

. . ....

x1,j · · · xl,j · · · xn,j...

. . ....

. . ....

x1,K · · · xl,K · · · xn,K

·

x1,1 · · · x1,j · · · x1,K...

. . ....

. . ....

xl,1 · · · xl,j · · · xl,K...

. . ....

. . ....

xn,1 · · · xn,j · · · xn,K

=

n∑l=1

xl,1xl,1 · · ·n∑l=1

xl,1xl,j · · ·n∑l=1

xl,1xl,K

.... . .

.... . .

...n∑l=1

xl,jxl,1 · · ·n∑l=1

xl,jxl,j · · ·n∑l=1

xl,jxl,K

.... . .

.... . .

...n∑l=1

xl,Kxl,1 · · ·n∑l=1

xl,Kxl,j · · ·n∑l=1

xl,Kxl,K

We can see that

(XTX)i,j =

n∑l=1

xl,ixl,j =

n∑l=1

1j(cl) if i = j

0 if i 6= j

.

So, XTX is a diagonal matrix with diagonal elements (XTX)j,j =

n∑l=1

1j(cl).

2

XTy =

x1,1 · · · xl,1 · · · xn,1...

. . ....

. . ....

x1,j · · · xl,j · · · xn,j...

. . ....

. . ....

x1,K · · · xl,K · · · xn,K

·

y1...yl...yn

=

n∑l=1

xl,1yl

...n∑l=1

xl,jyl

...n∑l=1

xl,Kyl

=

n∑l=1

11(cl)yl

...n∑l=1

1j(cl)yl

...n∑l=1

1K(cl)yl

=

∑l:cl=1

yl

...∑l:cl=j

yl

...∑l:cl=K

yl

Now, we derive the least squares estimator of α:

RSS =

n∑i=1

(yi −K∑j=1

xijαj)2

=∥∥y −Xα∥∥2

= (y −Xα)T(y −Xα).

This leads us to:

α = arg minα

(y −Xα)T(y −Xα).

Differentiate:

∂RSS

∂α=∂(yTy − yTXα−αTXTy +αTXTXα)

∂α

= 0−XTy −XTy + (XTX + (XXT)T)α

= −2XTy + 2XTXα

This first derivative should equal to 0. So,

−2XTy + 2XTXα = 0

XTXα = XTy

α = (XTX)−1XTy.

Therefore, the least squares estimate for α is:

α = (XTX)−1XTy =

∑l:cl=1 yl∑nl=1 11(cl)

...∑l:cl=j

yl∑nl=1 1j(cl)

...∑l:cl=K

yl∑nl=1 1K(cl)

.

We can easily see that αj is {y|ci = j}.

XTX is a diagonal matrix. So, XTX is invertible if and only if all diagonal entries are non-zero. This

means that we can obtain α only if ∀j ∈ {1, · · · ,K} :

n∑l=1

1j(cl) 6= 0.

3

In other words, to compute α, we need at least 1 unique observation for every possible categorical valueof ci.

(c)

βj =

{α1 if j = 0

αj − α1 if j ∈ {1, · · · ,K}(3)

α is the least false estimate of α, obtained via the least squares principle. And we know that there isa one-to-one mapping between α and β (3). Therefore, β obtained through this one-to-one mapping isalso a least squares estimate of β.

(d)

The given alternative model:

Yi = γ0 + γ1xi,1 + · · ·+ γKxi,K + εi (4)

This way of “dummy coding” results in the following mapping:

Under equation (1): Under equation (2): Under equation (4):Yi Yi Yi

When ci = 1 β0 + εi α1 + εi γ0 + γ1 + εiWhen ci = 2 β0 + β2 + εi α2 + εi γ0 + γ2 + εi

......

......

When ci = K − 1 β0 + βK−1 + εi αK−1 + εi γ0 + γK−1 + εiWhen ci = K β0 + βK + εi αK + εi γ0 + γK + εi

We examine how γ is related to α and β.

From Kγ0 +

K∑j=1

γj =

K∑j=1

αj and

K∑j=1

γj = 0, we obtain

γj =

1

K

K∑j=1

αj if j = 0

αj −1

K

K∑j=1

αj if j ∈ {1, · · · ,K}.

From Kβ0 +

K∑j=2

βj = Kγ0 +

K∑j=1

γj and

K∑j=1

γj = 0, we obtain

γj =

β0 +

1

K

K∑j=2

βj if j = 0

βk −1

K

K∑j=1

βj if j ∈ {1, · · · ,K}.

4

Interpretation of γ:γj = difference between {Y i|ci = j} and Y for j ∈ {1, · · · ,K}.γ0 = Y .

(e)

> # Read data.

> Fe = read.table("http://www.uio.no/studier/emner/matnat/math/STK2100/data/fe.txt",

header=T, sep=",")

> Fe[,"form"] = as.factor(Fe[,"form"])

>

> # Model 2 (alpha’s)

> fit1 = lm(Fe~form+0,data=Fe)

> summary(fit1)

Call:

lm(formula = Fe ~ form + 0, data = Fe)

Residuals:

Min 1Q Median 3Q Max

-8.340 -1.255 -0.250 1.770 10.360

Coefficients:

Estimate Std. Error t value Pr(>|t|)

form1 26.080 1.251 20.85 <2e-16 ***

form2 24.690 1.251 19.74 <2e-16 ***

form3 29.950 1.251 23.95 <2e-16 ***

form4 33.840 1.251 27.06 <2e-16 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 3.955 on 36 degrees of freedom

Multiple R-squared: 0.9834, Adjusted R-squared: 0.9815

F-statistic: 532.5 on 4 and 36 DF, p-value: < 2.2e-16

> logLik(fit1)

’log Lik.’ -109.6503 (df=5)

This model corresponds to model (2) where α0 = 0.

(f)

> # Model 1 (beta’s)

> fit2 = lm(Fe~form,data=Fe)

> summary(fit2)

Call:

lm(formula = Fe ~ form, data = Fe)

Residuals:


-8.340 -1.255 -0.250 1.770 10.360

5

Coefficients:


(Intercept) 26.080 1.251 20.852 < 2e-16 ***

form2 -1.390 1.769 -0.786 0.4371

form3 3.870 1.769 2.188 0.0352 *

form4 7.760 1.769 4.387 9.6e-05 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1



F-statistic: 10.85 on 3 and 36 DF, p-value: 3.199e-05

> logLik(fit2)

’log Lik.’ -109.6503 (df=5)

This model corresponds to model (1) where β1 = 0.

> # Model 3 (gamma’s)

> options(contrasts=c("contr.sum","contr.sum"))

> options()$contrasts

[1] "contr.sum" "contr.sum"

> fit3 = lm(Fe~form,data=Fe)

> summary(fit3)

Call:

lm(formula = Fe ~ form, data = Fe)

Residuals:


-8.340 -1.255 -0.250 1.770 10.360

Coefficients:


(Intercept) 28.6400 0.6254 45.798 < 2e-16 ***

form1 -2.5600 1.0831 -2.363 0.023622 *

form2 -3.9500 1.0831 -3.647 0.000833 ***

form3 1.3100 1.0831 1.209 0.234375

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1




> logLik(fit3)

’log Lik.’ -109.6503 (df=5)

>

> # The ’hidden’ coefficient of model 3

> coefficients(fit3)

(Intercept) form1 form2 form3

28.64 -2.56 -3.95 1.31

> new.data = data.frame(form = 4)

> new.data[,"form"] = as.factor(new.data[,"form"])

6

> new.data

form

1 4

> predict(fit3,new.data) - coefficients(fit3)[1]

1

5.2

This model corresponds to model (4) where

K∑j=1

γj = 0.

Estimates of 3 models we obtained:

Model 1:

Yi = β0 + β1xi,1 + β2xi,2 + β3xi,3 + β4xi,4 + εi

Yi = 26.1 + 0 · xi,1 − 1.39 · xi,2 + 3.87 · xi,3 + 7.76 · xi,4 + εi

Model 2:

Yi = α0 + α1xi,1 + α2xi,2 + α3xi,3 + α4xi,4 + εi

Yi = 0 + 26.08 · xi,124.69 · xi,2 + 29.95 · xi,3 + 33.84 · xi,4 + εi

Model 3:

Yi = γ0 + γ1xi,1 + γ2xi,2 + γ3xi,3 + γ4xi,4 + εi

Yi = 28.64− 2.56 · xi,1 − 3.95 · xi,2 + 1.31 · xi,3 + 5.20 · xi,4 + εi

(g)

This is a test called ‘model utility test’ and the belonging hypotheses are:

- Model (1): H0 : β2 = β3 = β4 = 0 vs. H1 : at least one βj 6= 0.

- Model (2): H0 : α1 = α2 = α3 = α4 = 0 vs. H1 : at least one αj 6= 0.

- Model (4): H0 : γ1 = γ2 = γ3 = 0 vs. H1 : at least one γj 6= 0.

The F-test of all 3 models tells us to reject the null hypothesis. So, there is certain form of differencebetween 4 iron content categories in terms of their effect on the response variable.If we want to check whether there is a difference between 2 specific iron content categories, we can usemodel (1) for hypothesis testing. This is because βj can be interpreted as a mean difference between thereference category and another comparing category.Note that the coefficients of model (1) only shows the mean difference between type 1 and other types.This is because type 1 is the reference category in the model. If we for example wish to compare type 2and type 3 iron contents, we can recreate model (1) with type 2 or type 3 as a reference category.

(h)

From the t-test of coefficients model (1), we can see that there is no significant difference between type1 and type 2 iron contents. So, we can consider to merge them as one type. This will result in a simplermodel with only 3 categories.

7

Oppgave 2

(a)

> # Load nuclear data

> Nuclear = read.table("http://www.uio.no/studier/emner/matnat/math/STK2100/data/

nuclear.dat", header=T)

> n = nrow(Nuclear)

> Nuclear[,"logcost"] = log(Nuclear[,"cost"])

> Nuclear = Nuclear[,-which(names(Nuclear) %in% c("cost"))]

> head(Nuclear)

date t1 t2 cap pr ne ct bw cum.n pt logcost

1 68.58 14 46 687 0 1 0 0 14 0 6.131335

2 67.33 10 73 1065 0 0 1 0 1 0 6.115870

3 67.33 10 85 1065 1 0 1 0 1 0 6.094066

4 68.00 11 67 1065 0 1 1 0 12 0 6.480535

5 68.00 11 78 1065 1 1 1 0 12 0 6.464946

6 67.92 13 51 514 0 1 1 0 3 0 5.844674

>

> # Plot the scatterplot matrix to get a overview of the data.

> pairs(Nuclear[,-which(names(Nuclear) %in% c("pr","ne","ct","bw","pt"))])

>

> # Plot boxplots for categorical variables

> boxplot(logcost ~ pr, data = Nuclear, xlab = "pr", ylab = "logcost")

> boxplot(logcost ~ ne, data = Nuclear, xlab = "ne", ylab = "logcost")

> boxplot(logcost ~ ct, data = Nuclear, xlab = "ct", ylab = "logcost")

> boxplot(logcost ~ bw, data = Nuclear, xlab = "bw", ylab = "logcost")

> boxplot(logcost ~ pt, data = Nuclear, xlab = "pt", ylab = "logcost")

>

> cor(Nuclear)

date t1 t2 cap pr ne

date 1.00000000 0.85785460 -0.40398529 0.019629178 -0.05481837 0.097498341

t1 0.85785460 1.00000000 -0.47429212 -0.093526193 0.05081973 0.087038828

t2 -0.40398529 -0.47429212 1.00000000 0.313031186 0.44316550 -0.155187617

cap 0.01962918 -0.09352619 0.31303119 1.000000000 0.16070299 -0.006582732

pr -0.05481837 0.05081973 0.44316550 0.160702986 1.00000000 -0.077849894

ne 0.09749834 0.08703883 -0.15518762 -0.006582732 -0.07784989 1.000000000

ct -0.04560687 -0.12949678 0.18735426 0.028720001 -0.14585425 0.110207754

bw -0.16004369 -0.37417353 0.35802537 0.112013371 0.02159168 -0.092450033

cum.n 0.54940746 0.39965783 -0.22767928 0.193393590 -0.04666996 0.205620967

pt -0.50697195 -0.39831375 0.17803447 0.007195490 0.19432508 -0.277350098

logcost 0.62910712 0.45430974 -0.03622449 0.443081827 -0.10717916 0.385821416

ct bw cum.n pt logcost

date -0.04560687 -0.16004369 0.54940746 -0.50697195 0.62910712

t1 -0.12949678 -0.37417353 0.39965783 -0.39831375 0.45430974

t2 0.18735426 0.35802537 -0.22767928 0.17803447 -0.03622449

cap 0.02872000 0.11201337 0.19339359 0.00719549 0.44308183

pr -0.14585425 0.02159168 -0.04666996 0.19432508 -0.10717916

ne 0.11020775 -0.09245003 0.20562097 -0.27735010 0.38582142

ct 1.00000000 -0.07132097 0.04181119 -0.23434034 0.25653912

bw -0.07132097 1.00000000 0.19036439 0.38461538 -0.14159736

cum.n 0.04181119 0.19036439 1.00000000 0.06184835 0.27147634

pt -0.23434034 0.38461538 0.06184835 1.00000000 -0.67419415

logcost 0.25653912 -0.14159736 0.27147634 -0.67419415 1.00000000

8

cost

68 69 70 71

●●●

●●

●

●●

●

●

●●●

●

●●

●

●

●

●

●

●● ●

●

●

●

● ●●

●

●

●●●

●●

●

●●

●

●

●● ●

●

●●

●

●

●

●

●

●● ●

●

●

●

● ●●

●

●

50 60 70 80

● ● ●

● ●

●

●●

●

●

●● ●

●

●●

●

●

●

●

●

●●●

●

●

●

● ● ●

●

●

● ●●

●●

●

●●

●

●

●●●

●

●●

●

●

●

●

●

●●●

●

●

●

● ●●

●

●

5 10 15 20

●●●

●●

●

●●

●

●

●●●

●

●●

●

●

●

●

●

●●●

●

●

●

● ●●

●

●

200

400

600

800

●●●

●●

●

●●

●

●

●●●

●

●●

●

●

●

●

●

●●●

●

●

●

●●●

●

●

6869

7071

●

●●

●●●●

● ● ●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

● ●

●●

●

●

date●

●●

●● ●●

● ●●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●

● ●

● ●●●

●● ●●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●●

● ●

●

●

●

●●

●●●●

● ●●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

● ●

●●

●

●

●

●●

●●●●

● ●●●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●●●●

● ● ●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

● ●

●●

●

●

●

●●●●

●●

●●

●●●

●

●

●

●

●

● ●●

●

●

●

●●●

●

●

●●●

●

●

●●●●

●●

●●

● ●●

●

●

●

●

●

● ●●

●

●

●

● ●●

●

●

●●●

●

t1 ●

● ●● ●

●●

●●

●●●

●

●

●

●

●

●●●

●

●

●

●●●

●

●

● ●●

●

●

●●●●

●●

●●

●●●

●

●

●

●

●

● ●●

●

●

●

●●●

●

●

●●●

●

●

●●●●

●●

●●

●●●

●

●

●

●

●

●●●

●

●

●

● ●●

●

●

●●●

●

1015

20

●

●●●●

●●

●●

●●●

●

●

●

●

●

● ●●

●

●

●

●●●

●

●

●●●

●

5060

7080

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●● ●

●

●

●

●

●

● ●

●

t2

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●● ●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●● ●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●● ●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

●

● ●● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

● ●

●

●

cap●

●● ●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●

500

700

900

1100

●

●● ●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●

●●

●

●

510

1520

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

● ●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

● cum.n●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

200 400 600 800

●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●●

●

●

●

●●●

●

●

●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●● ●

●

●

●

● ●●

●

●

10 15 20

●●●

●●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●● ●

●

●

●

● ●●

●

●

● ● ●

● ●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●●●

●

●

●

● ● ●

●

●

500 700 900 1100

● ●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●●

●

●

●

● ●●

●

●

●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●●

●

●

●

● ●●

●

●

5.5 6.0 6.55.

56.

06.

5

logcost

Figure 1: Scatterplot matrix of Nuclear dataset.

9

0 1

5.5

6.0

6.5

pr

logc

ost

0 1

5.5

6.0

6.5

ne

logc

ost

0 1

5.5

6.0

6.5

ct

logc

ost

0 1

5.5

6.0

6.5

bw

logc

ost

0 1

5.5

6.0

6.5

pt

logc

ost

Figure 2: Boxplots of Nuclear dataset.

10

(b)

In linear regression, we assume that:

ε1, · · · , εni.i.d.∼ N(0, σ2).

The assumptions we make, in order of importance:- εi ∼ Normal distribution- Independence between ε1, · · · , εn- Var[εi] = σ2

- E[εi] = 0

> # Model with all predictors

> full.model = lm(logcost~.-cost, data = Nuclear)

> summary(full.model)

Call:

lm(formula = logcost ~ . - cost, data = Nuclear)

Residuals:


-0.284032 -0.081677 0.009502 0.090890 0.266548

Coefficients:


(Intercept) -1.063e+01 5.710e+00 -1.862 0.07662 .

date 2.276e-01 8.656e-02 2.629 0.01567 *

t1 5.252e-03 2.230e-02 0.236 0.81610

t2 5.606e-03 4.595e-03 1.220 0.23599

cap 8.837e-04 1.811e-04 4.878 7.99e-05 ***

pr -1.081e-01 8.351e-02 -1.295 0.20943

ne 2.595e-01 7.925e-02 3.274 0.00362 **

ct 1.155e-01 7.027e-02 1.644 0.11503

bw 3.680e-02 1.063e-01 0.346 0.73261

cum.n -1.203e-02 7.828e-03 -1.536 0.13944

pt -2.220e-01 1.304e-01 -1.702 0.10352

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1




(c)

> # Remove the variable with the highest p-value.

> model.without.t1 = lm(logcost~.-cost-t1, data = Nuclear)

> summary(model.without.t1)

Call:

lm(formula = logcost ~ . - cost - t1, data = Nuclear)

11

Residuals:


-0.28898 -0.07856 0.01272 0.08983 0.26537

Coefficients:


(Intercept) -1.161e+01 3.835e+00 -3.027 0.006187 **

date 2.431e-01 5.482e-02 4.435 0.000208 ***

t2 5.451e-03 4.449e-03 1.225 0.233451

cap 8.778e-04 1.755e-04 5.002 5.25e-05 ***

pr -1.035e-01 7.944e-02 -1.303 0.205922

ne 2.607e-01 7.738e-02 3.368 0.002772 **

ct 1.142e-01 6.853e-02 1.667 0.109715

bw 2.622e-02 9.423e-02 0.278 0.783401

cum.n -1.220e-02 7.626e-03 -1.599 0.124034

pt -2.157e-01 1.249e-01 -1.727 0.098181 .

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1




It’s sensible to remove the variable with the highest P-value since, according to the Wald test, thisvariable has the lowest chance that it has a significant (linear) relationship with logcost.

Compare to the model with all explanatory variables, the model without t1 has somewhat changed p-values. It’s logical because the same variables are in 2 different context in 2 different models. In general,one would expect that the explanatory variables that had higher correlation with t1 will have biggerchanges in their p-values (usually decreased p-values). The reason is that these variables now take overthe role of t1.

(d)

# This function fits lm() with the given data.

# It uses y.name as reponse varialbe and all other variables as explanatory variables.

# It removes the variable with the highest p-value (according to the Wald test) from

the model.

# It repeats this process untill all variables in the model have significant p-values.

p.value.backward.selection.func = function(y.name, data) {

# Full model that cotains all variables.

full.model.formula = as.formula(paste(y.name, "~.", sep=""))

full.model = lm(formula = full.model.formula, data = data)

# Starting values

itr = 0

n.not.signif = sum(summary(full.model)$coefficients[,"Pr(>|t|)"] >= 0.05)

trimmed.model = full.model

# Make a frame

trimmed.model.list = list()

trimmed.model.list[[1]] = full.model

# A loop to remove all variables with a non-significant p-value.

12

while(n.not.signif > 0) {

# Iteration

itr = itr + 1

# cat("Itr = ", itr, "\n", sep = "")

# Extract coefficients from the model.

coeff.result.mat = summary(trimmed.model)$coefficients

# Remove intercept.

coeff.result.mat = coeff.result.mat[-1, ]

# Order such that the highest p-value comes on the top.

coeff.result.mat = coeff.result.mat[order(-coeff.result.mat[, "Pr(>|t|)"]), ]

# Write down which variable should be removed.

if (itr == 1) {

variables.to.remove = rownames(coeff.result.mat)[1]

} else {

variables.to.remove = c(variables.to.remove, rownames(coeff.result.mat)[1])

}

# Fit the model without the variable with the highest p-value.

trimmed.model.formula = as.formula(paste(y.name, "~.", paste("-",variables.to.

remove, sep="",collapse=""), sep=""))

trimmed.model = lm(trimmed.model.formula, data = data)

# Save the model

trimmed.model.list[[itr]] = trimmed.model

# Check whether the trimmed.model cotains a non-significant p-value.

coeff.result.mat = summary(trimmed.model)$coefficients

# Remove intercept.

coeff.result.mat = coeff.result.mat[-1, ]

# Order such that the highest p-value comes on the top.

coeff.result.mat = coeff.result.mat[order(-coeff.result.mat[, "Pr(>|t|)"]), ]

# Update n.not.signif

n.not.signif = sum(coeff.result.mat[,"Pr(>|t|)"]>=0.05)

# When all non-significant variables are removed.

if (n.not.signif == 0) {

final.model = trimmed.model

cat("Process done: all variables with a non-significant p-value are removed (itr

= ", itr, ").", sep="", "\n")

cat("The following variables are removed during the process:", "\n", paste(

variables.to.remove, collapse=", "),"\n")

cat("\n")

cat("The summary of the resulting model:","\n")

# Print summary of the final model.

print(summary(trimmed.model))

}

}

result = list(final.model = final.model, full.model = full.model, variables.removed

= variables.to.remove, all.models = trimmed.model.list, itr = itr)

return(result)

}

13

>

>

> source("./Oblig 1/func p value backward selection.R")

> # Keep removing non-significant variables untill only significant variables are left

.

> backward.selection.result = p.value.backward.selection.func(y.name = "logcost", data

= Nuclear)

Process done: all variables with a non-significant p-value are removed (itr = 6).

The following variables are removed during the process:

t1, bw, pr, t2, cum.n, ct

The summary of the resulting model:

Call:

lm(formula = trimmed.model.formula, data = data)

Residuals:


-0.42160 -0.10554 -0.00070 0.07247 0.37328

Coefficients:


(Intercept) -4.5035539 2.5022087 -1.800 0.083072 .

date 0.1439104 0.0363320 3.961 0.000491 ***

cap 0.0008783 0.0001677 5.238 1.61e-05 ***

ne 0.2024364 0.0751953 2.692 0.012042 *

pt -0.3964878 0.0963356 -4.116 0.000326 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1




>

> # The model that contains only variables with significant p-values.

> final.model = backward.selection.result$final.model

>

> # Diagnostic plot of the model

> plot(final.model)

14

5.4 5.6 5.8 6.0 6.2 6.4 6.6

−0.

4−

0.2

0.0

0.2

0.4

Fitted values

Res

idua

ls

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

Residuals vs Fitted

7

19

10

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

−2 −1 0 1 2

−2

−1

01

2

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

7

19

10

5.4 5.6 5.8 6.0 6.2 6.4 6.6

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●● ●

●

●

●

●

●

●

●

●

●

●

Scale−Location7

19

10

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

−3

−2

−1

01

2

Leverage

Sta

ndar

dize

d re

sidu

als

●●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

Cook's distance1

0.5

0.5

Residuals vs Leverage

26

19

10

Figure 3: Diagnostic plot of the final model that contains only significant variables.

According to the F -test, at least one of βj 6= 0. The diagnostic plots of the model suggest that there isa slight non-linearity in the upper quantile of y. However, this non linearity is caused by 2 data points.We can say more about the non-linearity when we have more data.

(e)

> # (e)

> # y.hat from the test set.

> y.hat = predict(final.model, newdata = Nuclear)

> # mean squared error

> MSE = 1/nrow(Nuclear)*sum((Nuclear[,"logcost"] - y.hat)^2)

> MSE

[1] 0.02635124

We have MSE = 0.0264, but this doesn’t tell much about how good the model is. Instead, we compute

R2 =Var(y)

Var(y)= 1−

∑ni=1(yi − yi)2∑ni=1(yi − y)2

= 1− n ·MSE

(n− 1) ·Var(y).

15

> # R-squared

> # Method 1

> 1 - n*MSE/((n-1)*var(Nuclear[,"logcost"]))

[1] 0.8095693

> # Method 2

> var(y.hat)/var(Nuclear[,"logcost"])

[1] 0.8095693

> # Method 3

> summary(best.model.Wald)$r.squared

[1] 0.8095693

So, the model explains about 80, 96% of the variance in y, which is not bad. However, we used thesame training data to measure the prediction performance of the model. This often gives too optimisticresult (i.e. underestimated MSE) since we are using the data that we have seen before. To prevent thisproblem, one can use test set, that is not used during the model fitting process, to measure the predictionperformance.

(f)

> best.model.AIC = stepAIC(full.model, direction = "backward")

Start: AIC=-105.01

logcost ~ date + t1 + t2 + cap + pr + ne + ct + bw + cum.n +

pt

Df Sum of Sq RSS AIC

- t1 1 0.00160 0.60603 -106.930

- bw 1 0.00345 0.60788 -106.832

<none> 0.60443 -105.014

- t2 1 0.04284 0.64727 -104.823

- pr 1 0.04826 0.65269 -104.556

- cum.n 1 0.06792 0.67235 -103.607

- ct 1 0.07781 0.68224 -103.140

- pt 1 0.08337 0.68781 -102.879

- date 1 0.19899 0.80343 -97.907

- ne 1 0.30859 0.91302 -93.815

- cap 1 0.68497 1.28940 -82.770

Step: AIC=-106.93

logcost ~ date + t2 + cap + pr + ne + ct + bw + cum.n + pt


- bw 1 0.00213 0.60816 -108.818

<none> 0.60603 -106.930

- t2 1 0.04135 0.64738 -106.818

- pr 1 0.04680 0.65283 -106.550

- cum.n 1 0.07045 0.67648 -105.411

- ct 1 0.07654 0.68257 -105.124

- pt 1 0.08216 0.68819 -104.862

- ne 1 0.31255 0.91858 -95.621

- date 1 0.54190 1.14793 -88.489

- cap 1 0.68916 1.29518 -84.627

Step: AIC=-108.82

logcost ~ date + t2 + cap + pr + ne + ct + cum.n + pt

16


<none> 0.60816 -108.818

- pr 1 0.05738 0.66554 -107.932

- t2 1 0.06379 0.67195 -107.626

- cum.n 1 0.06839 0.67656 -107.407

- ct 1 0.07440 0.68257 -107.124

- pt 1 0.08066 0.68882 -106.832

- ne 1 0.31375 0.92192 -97.505

- date 1 0.54592 1.15408 -90.318

- cap 1 0.68739 1.29556 -86.617

> best.model.BIC = stepAIC(full.model, direction = "backward", k = log(n))

Start: AIC=-88.89

logcost ~ date + t1 + t2 + cap + pr + ne + ct + bw + cum.n +

pt


- t1 1 0.00160 0.60603 -92.273

- bw 1 0.00345 0.60788 -92.175

- t2 1 0.04284 0.64727 -90.166

- pr 1 0.04826 0.65269 -89.899

- cum.n 1 0.06792 0.67235 -88.949

<none> 0.60443 -88.891

- ct 1 0.07781 0.68224 -88.482

- pt 1 0.08337 0.68781 -88.222

- date 1 0.19899 0.80343 -83.250

- ne 1 0.30859 0.91302 -79.158

- cap 1 0.68497 1.28940 -68.113

Step: AIC=-92.27

logcost ~ date + t2 + cap + pr + ne + ct + bw + cum.n + pt


- bw 1 0.00213 0.60816 -95.626

- t2 1 0.04135 0.64738 -93.626

- pr 1 0.04680 0.65283 -93.358

<none> 0.60603 -92.273

- cum.n 1 0.07045 0.67648 -92.219

- ct 1 0.07654 0.68257 -91.933

- pt 1 0.08216 0.68819 -91.670

- ne 1 0.31255 0.91858 -82.430

- date 1 0.54190 1.14793 -75.297

- cap 1 0.68916 1.29518 -71.435

Step: AIC=-95.63

logcost ~ date + t2 + cap + pr + ne + ct + cum.n + pt


- pr 1 0.05738 0.66554 -96.207

- t2 1 0.06379 0.67195 -95.900

- cum.n 1 0.06839 0.67656 -95.681

<none> 0.60816 -95.626

- ct 1 0.07440 0.68257 -95.398

- pt 1 0.08066 0.68882 -95.106

- ne 1 0.31375 0.92192 -85.779

- date 1 0.54592 1.15408 -78.592

- cap 1 0.68739 1.29556 -74.892

17

Step: AIC=-96.21

logcost ~ date + t2 + cap + ne + ct + cum.n + pt


- t2 1 0.02447 0.69001 -98.517

- cum.n 1 0.05351 0.71905 -97.198

<none> 0.66554 -96.207

- ct 1 0.10237 0.76791 -95.094

- pt 1 0.12015 0.78570 -94.361

- ne 1 0.28784 0.95339 -88.171

- date 1 0.49109 1.15664 -81.987

- cap 1 0.68019 1.34573 -77.141

Step: AIC=-98.52

logcost ~ date + cap + ne + ct + cum.n + pt


- cum.n 1 0.06006 0.75007 -99.312

<none> 0.69001 -98.517

- pt 1 0.11719 0.80720 -96.963

- ct 1 0.12931 0.81932 -96.486

- ne 1 0.27215 0.96216 -91.343

- date 1 0.46672 1.15673 -85.450

- cap 1 0.89456 1.58457 -75.379

Step: AIC=-99.31

logcost ~ date + cap + ne + ct + pt


<none> 0.75007 -99.312

- ct 1 0.09317 0.84324 -99.031

- ne 1 0.21478 0.96485 -94.720

- pt 1 0.37487 1.12494 -89.807

- date 1 0.55668 1.30675 -85.013

- cap 1 0.83451 1.58458 -78.845

>

> summary(best.model.AIC)

Call:

lm(formula = logcost ~ date + t2 + cap + pr + ne + ct + cum.n +

pt, data = Nuclear)

Residuals:


-0.290513 -0.082692 0.008663 0.098259 0.260204

Coefficients:


(Intercept) -1.169e+01 3.748e+00 -3.118 0.004839 **

date 2.438e-01 5.366e-02 4.544 0.000145 ***

t2 6.018e-03 3.874e-03 1.553 0.134024

cap 8.739e-04 1.714e-04 5.099 3.65e-05 ***

pr -1.099e-01 7.458e-02 -1.473 0.154275

ne 2.611e-01 7.580e-02 3.445 0.002206 **

ct 1.111e-01 6.622e-02 1.677 0.106991

18

cum.n -1.176e-02 7.311e-03 -1.608 0.121414

pt -2.071e-01 1.186e-01 -1.747 0.094059 .

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1




> summary(best.model.BIC)

Call:

lm(formula = logcost ~ date + cap + ne + ct + pt, data = Nuclear)

Residuals:


-0.36175 -0.08063 0.00584 0.08594 0.42355

Coefficients:


(Intercept) -5.4058393 2.4567323 -2.200 0.036861 *

date 0.1563992 0.0356036 4.393 0.000167 ***

cap 0.0008674 0.0001613 5.378 1.24e-05 ***

ne 0.1973468 0.0723259 2.729 0.011252 *

ct 0.1154229 0.0642266 1.797 0.083943 .

pt -0.3477717 0.0964752 -3.605 0.001299 **

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1




When we use AIC, we remove t1, bw (in the order of removal) from the model that contains all explana-tory variables.When we use BIC, we remove t1, bw, pr, t2, cum.n (in the order of removal) from the model thatcontains all explanatory variables.

With Nuclear data, the BIC’s penalty term is bigger than that of AIC since ln(32) = 3.4657 > 2. Thisresults in that BIC penalizes bigger model more severely.

(g)

When stepAIC tests all the models that drop one variable, we can see that the rank between thosemodels are identical between AIC and BIC. This is because the penalty term is defined as the numberof parameters times a constant across all models. So, the penalty term is same for all the models withthe same dimension (within AIC or BIC). Thus, the rank is decided purely by the log-likelihood value,which is the same for AIC and BIC.

19

(h)

Table 1 shows which variables are (not) in the best model from each model selection framework. We seethat the best model from the Wald test frame work has the least number of variables. This is becauseAIC and BIC evaluates the model as a whole (in terms of KL divergence), while Wald-test frameworkevaluates components of the model (i.e. βj) separately. So, even though a variable has non-significantp-value, it can happen that this non-significant variable improves the model according to AIC or BIC.

In terms of the order that variables are removed, there is no difference between the frameworks (for thecommon variables that are removed). This can however differ from application to application.

Table 1: An overview of variables in the best models from 3 different model selection frameworks. Thenumbers in red cells indicate in which order the variables were removed from the model.

Model selection criterion date t1 t2 cap pr ne ct bw cum.n ptWald test 1 4 3 6 2 5AIC 1 2BIC 1 4 3 2 5

(i)

We know that the moment generating function is defined as MY (t) = E[eY t]. For normal distribution,

the moment generating function isMY (t) = exp[µt+ σ2t2/2

]. Thus, E

[eZ]

= MZ(1) = exp[µ+ 0.5σ2

].

We are given that θ = E [Z] and η = E[eZ].

The natural estimator of η = E[eZ]

can be obtained by replacing the expectation with sample equivalent:

η3 =1

n

n∑i=1

ezi .

However, since we know E[eZ]

= exp[θ + 0.5σ2

], we can estimate η more directly with a plug-in

estimator:η2 = exp

[θ + 0.5σ2

].

exp[·] is not a linear operator, so it can’t get out of expectation. But when an immature and naivestatistician does this, he/she gets η = E

[eZ]

= eE[Z] = eθ. And by using the plug-in estimator, he/shegets

η1 = exp[θ]

which is wrong.

We choose for η2 (or η3) since η1 is incorrect.

(j)

> d.new = data.frame(date = 70, t1 = 13, t2 = 50, cap = 800, pr = 1, ne = 0, ct = 0,

bw = 1, cum.n = 8, pt = 1)

20

> theta.hat.Wald = predict(object = best.model.Wald, newdata = d.new)

> theta.hat.AIC = predict(object = best.model.AIC, newdata = d.new)

> theta.hat.BIC = predict(object = best.model.BIC, newdata = d.new)

>

> sigma.hat.Wald = summary(best.model.Wald)$sigma

> sigma.hat.AIC = summary(best.model.AIC)$sigma

> sigma.hat.BIC = summary(best.model.BIC)$sigma

>

> se.theta.hat.Wald = predict(object = best.model.Wald, newdata = d.new, se.fit = TRUE

)$se.fit

> se.theta.hat.AIC = predict(object = best.model.AIC, newdata = d.new, se.fit = TRUE)

$se.fit

> se.theta.hat.BIC = predict(object = best.model.BIC, newdata = d.new, se.fit = TRUE)

$se.fit

>

> eta.hat.Wald = exp(theta.hat.Wald + 0.5*sigma.hat.Wald^2)

> eta.hat.AIC = exp(theta.hat.AIC + 0.5*sigma.hat.AIC^2)

> eta.hat.BIC = exp(theta.hat.BIC + 0.5*sigma.hat.BIC^2)

>

> theta.hat.Wald

1

5.876308

> theta.hat.AIC

1

5.968446

> theta.hat.BIC

1

5.888265

>

> se.theta.hat.Wald

[1] 0.1154351

> se.theta.hat.AIC

[1] 0.1500756

> se.theta.hat.BIC

[1] 0.1111445

>

> sigma.hat.Wald

[1] 0.1767232

> sigma.hat.AIC

[1] 0.1626095

> sigma.hat.BIC

[1] 0.1698493

>

> eta.hat.Wald

1

362.101

> eta.hat.AIC

1

396.1001

> eta.hat.BIC

1

366.0206

>

> #se.eta.hat.Wald

> #se.eta.hat.AIC

> #se.eta.hat.BIC

21

>

> #se.eta.hat.Wald

> #se.eta.hat.AIC

> #se.eta.hat.BIC

So,

θWald = 5.8763

θAIC = 5.9686

θBIC = 5.8883

and

ηWald = 362.1010

ηAIC = 396.1001

ηBIC = 366.0206.

NB.Y = XTβ + εθ = E[Y |X] = XTβ

θ = XTβy = XTβ + εVar(y) = Var(θ) + σ2.

So, we can easily obtain Var(θ).

However, Var(η) = Var

(θ +

1

2σ2

)= Var

(θ)

+1

4Var

(σ2)

+ Cov(θ, σ2

)is difficult to estimate without

bootstrapping since Var(σ2)

and Cov(θ, σ2

)are hard to obtain.

We also can’t use the delta method since we don’t know Var(σ2)

and Cov(θ, σ2

).

22

Oppgave 3

(a)

When one looks at the variance of the estimated parameters in exercise 2 (j), it ignores the uncertaintyof the model itself and the data.(Law of total variance: Var(Y ) = E [Var (Y |X)] + Var (E [Y |X]).)

(b)

> # We only concentrate on AIC

> theta.hat = theta.hat.AIC

> sigma.hat = sigma.hat.AIC

> psi.hat = theta.hat + 0.5*sigma.hat^2

> eta.hat = eta.hat.AIC

>

> # (b)

> # A function that draws a bootstrap sample.

> # Then, it fits the model with the bootstrap sample.

> # Then, it performs variable selection based on backward AIC.

> # Then, it predicts from the fitted model with given newdata.

> estimate.param.with.given.data.func = function(y.name, data, newdata, result.as.

vector = FALSE) {

+ # Full model that cotains all variables.

+ full.model.formula = as.formula(paste(y.name, "~.", sep=""))

+ full.model = lm(formula = full.model.formula, data = data)

+ # Perform variable selection based on AIC.

+ best.model.AIC = stepAIC(full.model, direction = "backward", trace = FALSE)

+ # Predict from the final model.

+ theta.hat.obj = predict(object = best.model.AIC, newdata = newdata, se.fit = TRUE)

+

+ y.hat = as.numeric(theta.hat.obj$fit)

+ theta.hat = y.hat

+ sigma.hat = summary(best.model.AIC)$sigma

+ psi.hat = theta.hat + 0.5*sigma.hat^2

+ eta.hat = exp(theta.hat + 0.5*sigma.hat^2)

+

+ if (result.as.vector == FALSE) {

+ # Report the result.

+ result = list(

+ theta.hat = theta.hat,

+ sigma.hat = sigma.hat,

+ psi.hat = psi.hat,

+ eta.hat = as.numeric(eta.hat)

+ )

+ } else if (result.as.vector == TRUE) {

+ result = c(theta.hat, sigma.hat, psi.hat, eta.hat)

+ }

+

+ return(result)

+ }

>

> # Bootstrap estimation

> B = 1000

23

> theta.star.vec = rep(NA,B)

> sigma.star.vec = rep(NA,B)

> psi.star.vec = rep(NA,B)

> eta.star.vec = rep(NA,B)

> for(b in 1:B) {

+ set.seed(b)

+ # Draw bootstrap sample index

+ boot.ind = sample(x = 1:n, size = n, replace = T)

+

+ # Fit lm with bootstrap sample and predict from the model.

+ lm.result = estimate.param.with.given.data.func(

+ y.name = "logcost",

+ data = Nuclear[boot.ind,],

+ newdata = d.new

+ )

+ theta.star.vec[b] = lm.result$theta.hat

+ sigma.star.vec[b] = lm.result$sigma.hat

+ psi.star.vec[b] = lm.result$psi.hat

+ eta.star.vec[b] = lm.result$eta.hat

+ }

>

(c)

> # (c)

> # Bias of theta.star.

> bias.theta.star = mean(theta.star.vec) - theta.hat

> # Standard error of theta.star.

> se.theta.star = sd(theta.star.vec)

>

> # Bias of sigma.star.

> bias.sigma.star = mean(sigma.star.vec) - sigma.hat


> se.sigma.star = sd(sigma.star.vec)

>

> # Bias of sigma.star.

> bias.psi.star = mean(psi.star.vec) - psi.hat


> se.psi.star = sd(psi.star.vec)

>

> # Bias of eta.star.

> bias.eta.star = mean(eta.star.vec) - eta.hat

> # Standard error of eta.star.

> se.eta.star = sd(eta.star.vec)

>

> bias.theta.star

1

-0.032707

> se.theta.star

[1] 0.3013266

> bias.eta.star

1

2.47609

> se.eta.star

[1] 119.4501

24

(d)

> alpha = 0.05

> crit.val = qnorm(1 - alpha/2)

i)

> # Method 1 (section 4.1 from the course note): CI based on normal approximation

> CI.theta.norm.boot = c(theta.hat - crit.val*se.theta.star, theta.hat + crit.val*se.

theta.star)

> CI.sigma.norm.boot = c(sigma.hat - crit.val*se.sigma.star, sigma.hat + crit.val*se.

sigma.star)

> CI.psi.norm.boot = c(psi.hat - crit.val*se.psi.star, psi.hat + crit.val*se.psi.star)

> CI.eta.norm.boot = c(eta.hat - crit.val*se.eta.star, eta.hat + crit.val*se.eta.star)

>

> # Confidence interval based on normal approximation (bias corrected)

> CI.theta.norm.boot.unbiased = CI.theta.norm.boot - bias.theta.star

> CI.sigma.norm.boot.unbiased = CI.sigma.norm.boot - bias.sigma.star

> CI.psi.norm.boot.unbiased = CI.psi.norm.boot - bias.psi.star

> CI.eta.norm.boot.unbiased = CI.eta.norm.boot - bias.eta.star

ii)

> # Method 2 (section 4.2 from the course note): Standard bootstrap CI

> CI.theta.std.boot = theta.hat - quantile(x = theta.star.vec - theta.hat, probs = c

(1-alpha/2, alpha/2))

> CI.sigma.std.boot = sigma.hat - quantile(x = sigma.star.vec - sigma.hat, probs = c

(1-alpha/2, alpha/2))

> CI.psi.std.boot = psi.hat - quantile(x = psi.star.vec - psi.hat, probs = c(1-alpha

/2, alpha/2))

> CI.eta.std.boot = eta.hat - quantile(x = eta.star.vec - eta.hat, probs = c(1-alpha

/2, alpha/2))

iii)

> # Method 3 (section 4.3 from the course note): Percentile bootstrap CI

> CI.theta.perc.boot = quantile(theta.star.vec, c(alpha/2, 1 - alpha/2))

> CI.sigma.perc.boot = quantile(sigma.star.vec, c(alpha/2, 1 - alpha/2))

> CI.psi.perc.boot = quantile(psi.star.vec, c(alpha/2, 1 - alpha/2))

> CI.eta.perc.boot = quantile(eta.star.vec, c(alpha/2, 1 - alpha/2))

iv)

> # Method 4 (section 4.4 from the course note): BC_a bootstrap CI

> # Compute z0

> z0.hat.theta = qnorm(mean(theta.star.vec < theta.hat))

> z0.hat.sigma = qnorm(mean(sigma.star.vec < sigma.hat))

> z0.hat.psi = qnorm(mean(psi.star.vec < psi.hat))

> z0.hat.eta = qnorm(mean(eta.star.vec < eta.hat))

>

> # Compute theta.hat by omitting one data point at a time.

> theta.hat.without.i = rep(NA, n)

> sigma.hat.without.i = rep(NA, n)

> psi.hat.without.i = rep(NA, n)

> eta.hat.without.i = rep(NA, n)

25

>

> for(i in 1:n) {

+ lm.result = estimate.param.with.given.data.func(

+ y.name = "logcost",

+ data = Nuclear[-i,],

+ newdata = d.new

+ )

+ theta.hat.without.i[i] = lm.result$theta.hat

+ sigma.hat.without.i[i] = lm.result$sigma.hat

+ psi.hat.without.i[i] = lm.result$psi.hat

+ eta.hat.without.i[i] = lm.result$eta.hat

+ }

>

> # Compute a.hat

> a.hat.theta = sum((mean(theta.hat.without.i) - theta.hat.without.i)^3)/

+ (6*(sum((mean(theta.hat.without.i) - theta.hat.without.i)^2)^1.5))

> a.hat.sigma = sum((mean(sigma.hat.without.i) - sigma.hat.without.i)^3)/

+ (6*(sum((mean(sigma.hat.without.i) - sigma.hat.without.i)^2)^1.5))

> a.hat.psi = sum((mean(psi.hat.without.i) - psi.hat.without.i)^3)/

+ (6*(sum((mean(psi.hat.without.i) - psi.hat.without.i)^2)^1.5))

> a.hat.eta = sum((mean(eta.hat.without.i) - eta.hat.without.i)^3)/

+ (6*(sum((mean(eta.hat.without.i) - eta.hat.without.i)^2)^1.5))

>

> # Compute alpha

> alpha.1.theta = pnorm(q = z0.hat.theta + (z0.hat.theta - crit.val)/(1 - a.hat.theta

*(z0.hat.theta - crit.val)))

> alpha.2.theta = pnorm(q = z0.hat.theta + (z0.hat.theta + crit.val)/(1 - a.hat.theta

*(z0.hat.theta + crit.val)))

>

> alpha.1.sigma = pnorm(q = z0.hat.sigma + (z0.hat.sigma - crit.val)/(1 - a.hat.sigma

*(z0.hat.sigma - crit.val)))

> alpha.2.sigma = pnorm(q = z0.hat.sigma + (z0.hat.sigma + crit.val)/(1 - a.hat.sigma

*(z0.hat.sigma + crit.val)))

>

> alpha.1.psi = pnorm(q = z0.hat.psi + (z0.hat.psi - crit.val)/(1 - a.hat.psi*(z0.hat.

psi - crit.val)))

> alpha.2.psi = pnorm(q = z0.hat.psi + (z0.hat.psi + crit.val)/(1 - a.hat.psi*(z0.hat.

psi + crit.val)))

>

> alpha.1.eta = pnorm(q = z0.hat.eta + (z0.hat.eta - crit.val)/(1 - a.hat.eta*(z0.hat.

eta - crit.val)))

> alpha.2.eta = pnorm(q = z0.hat.eta + (z0.hat.eta + crit.val)/(1 - a.hat.eta*(z0.hat.

eta + crit.val)))

>

> CI.theta.BCa.boot = quantile(x = theta.star.vec, probs = c(alpha.1.theta, alpha.2.

theta))

> CI.sigma.BCa.boot = quantile(x = sigma.star.vec, probs = c(alpha.1.sigma, alpha.2.

sigma))

> CI.psi.BCa.boot = quantile(x = psi.star.vec, probs = c(alpha.1.psi, alpha.2.psi))

> CI.eta.BCa.boot = quantile(x = eta.star.vec, probs = c(alpha.1.eta, alpha.2.eta))

>

>

> # Report the result

> CI.result.mat = rbind(c(CI.theta.norm.boot, CI.sigma.norm.boot, CI.eta.norm.boot),

+ c(CI.theta.norm.boot.unbiased, CI.sigma.norm.boot.unbiased, CI.eta.

norm.boot.unbiased),

26

+ c(CI.theta.std.boot, CI.sigma.std.boot, CI.eta.std.boot),

+ c(CI.theta.perc.boot, CI.sigma.perc.boot, CI.eta.perc.boot),

+ c(CI.theta.BCa.boot, CI.sigma.BCa.boot, CI.eta.BCa.boot))

> rownames(CI.result.mat) = c("Normal","Normal.with.bias.correc","Std","Perce","BCa")

> colnames(CI.result.mat) = c("theta.low","theta.upp","sigma.low","sigma.upp","eta.low

","eta.upp")

> # Test for invariance for monotone transformation function.> CI.result.mat = cbind(+ CI.result.mat,+ exp(rbind(CI.psi.norm.boot, CI.psi.norm.boot.unbiased, CI.psi.std.boot, CI.psi.perc.boot, CI.psi.BCa.boot)

)+ )> colnames(CI.result.mat)[(length(colnames(CI.result.mat))−1):length(colnames(CI.result.mat))] = c(”eta.

trans.low”,”eta.trans.upp”)> round(CI.result.mat, 2)

theta.low theta.upp sigma.low sigma.upp eta.low eta.upp eta.trans.low eta.trans.uppNormal 5.38 6.56 0.12 0.21 161.98 630.22 219.46 714.93Normal.with.bias.correc 5.41 6.59 0.15 0.25 159.51 627.74 227.93 742.52Std 5.46 6.64 0.16 0.25 134.30 590.70 238.48 778.64Perce 5.30 6.48 0.07 0.17 201.50 657.90 201.50 657.90BCa 5.34 6.52 0.16 0.18 202.37 658.28 209.73 684.04>

> # By using boot package.

> library(boot)

> boot.func = function(data, ind) {

+ estimate.param.with.given.data.func(y.name = "logcost", data = data[ind,], newdata

= d.new, result.as.vector = TRUE)

+ }

>

> boot.obj = boot(data = Nuclear, statistic = boot.func, R = 1000)

> show(boot.obj)

ORDINARY NONPARAMETRIC BOOTSTRAP

Call:

boot(data = Nuclear, statistic = boot.func, R = 1000)

Bootstrap Statistics :

original bias std. error

t1* 5.9684461 -0.03006729 0.30151863

t2* 0.1626095 -0.03768358 0.02519219

t3* 5.9816670 -0.03516797 0.30158719

t4* 396.1001200 3.72001956 122.06135759

>

> boot.ci.theta = boot.ci(boot.obj, index = 1, type = c("norm","basic","perc", "bca")

)

> boot.ci.sigma = boot.ci(boot.obj, index = 2, type = c("norm","basic","perc", "bca")

)

Warning message:

In norm.inter(t, adj.alpha) : extreme order statistics used as endpoints

> boot.ci.psi = boot.ci(boot.obj, index = 3, type = c("norm","basic","perc", "bca"))

> boot.ci.eta = boot.ci(boot.obj, index = 4, type = c("norm","basic","perc", "bca"))

>

27

> boot.ci.theta

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS

Based on 1000 bootstrap replicates

CALL :

boot.ci(boot.out = boot.obj, type = c("norm", "basic", "perc",

"bca"), index = 1)

Intervals :

Level Normal Basic

95% ( 5.408, 6.589 ) ( 5.458, 6.655 )

Level Percentile BCa

95% ( 5.282, 6.479 ) ( 5.290, 6.510 )

Calculations and Intervals on Original Scale

> boot.ci.sigma



CALL :


"bca"), index = 2)

Intervals :

Level Normal Basic

95% ( 0.1509, 0.2497 ) ( 0.1570, 0.2527 )


95% ( 0.0726, 0.1682 ) ( 0.1542, 0.2000 )


Warning : BCa Intervals used Extreme Quantiles

Some BCa intervals may be unstable

> boot.ci.psi



CALL :


"bca"), index = 3)

Intervals :

Level Normal Basic

95% ( 5.426, 6.608 ) ( 5.478, 6.678 )


95% ( 5.285, 6.485 ) ( 5.318, 6.525 )


> boot.ci.eta



CALL :


"bca"), index = 4)

Intervals :

28

Level Normal Basic

95% (153.1, 631.6 ) (136.7, 594.8 )


95% (197.4, 655.5 ) (204.9, 683.3 )


>

> # Test invariance

> boot.ci.eta.trans = rbind(

+ boot.ci.psi$normal[2:3],

+ boot.ci.psi$basic[4:5],

+ boot.ci.psi$percent[4:5],

+ boot.ci.psi$bca[4:5])

> boot.ci.eta.trans = exp(boot.ci.eta.trans)

> rownames(boot.ci.eta.trans) = c("Normal.with.bias.correc","Std","Perce","BCa")

> colnames(boot.ci.eta.trans) = c("eta.trans.low","eta.trans.upp")

> boot.ci.eta.trans

eta.trans.low eta.trans.upp

Normal.with.bias.correc 227.1782 740.9514

Std 239.3514 794.9614

Perce 197.3622 655.5019

BCa 203.9552 681.6720

(e)

From the result of (d), we can see that percentile and BCa bootstrap confidence interval are invariant oftransformation.

29