logga
Modern methods of statistical learningSF2935
Johan Westerborn
Lecture 6: BootstrapModern methods of statistical learning
16 November 2015
Johan Westerborn Statistical learning (1) Bootstrap
logga
Outline
1 Introduction to bootsrap
2 Non-parametric bootstrap
3 Parameteric bootstrap
Johan Westerborn Statistical learning (2) Bootstrap
logga
Introduction to bootsrap
What is the bootstrap
We are given some data z = (z0, z1, . . . , zn) and wish to calculatethe value τ that depends on the distribution of the data.Using this data we calculate some estimator τ̂ = t(z)
I As an example we can estimate the mean using
t(z) =1n
n∑i=1
zi
How certain can we be on the value τ̂ that we get?The bootstrap method tries to answer this question.
Johan Westerborn Statistical learning (4) Bootstrap
logga
Introduction to bootsrap
If we knew the distribution of Z
t(z) is just an observation of the random variable t(Z )
the error in our estimator is ∆(z) = t(z)− τwhich is an observation of the random variable ∆(Z ) = t(Z )− τUncertainity of the estimator would require us to study thedistribution of ∆(Z )
If we would like to calculate a confidence interval for the estimatort(Y ) we would have to invert the distribution function of ∆(Y )
Iα =(
t(z)− F−1∆(Z )(1− α/2), t(z)− F−1
∆(Z )(α/2))
The bias of the estimator is E[∆(Z )].
Johan Westerborn Statistical learning (5) Bootstrap
logga
Introduction to bootsrap
Normal distribution example
Normal distributionAssume that zi , i = 1, . . . ,n are i.i.d. normal distributed randomvariable with mean µ and variance 1.
Our estimator of µ is µ̂ = t(z) = 1n∑n
i=1 zi .
With this estimator we have that ∆(Z ) = t(Z )− µ ∼ N (0,1/n)
In this case we can calculate the error distribution exactly.
Johan Westerborn Statistical learning (6) Bootstrap
logga
Introduction to bootsrap
What if we don’t know the distribution
If we don’t know the distribution we can use the bootstrap!The main idea is to substitute the distribution of Z with theempirical distribution based on the sample z.
F̂n(x) =1n
n∑i=1
1(zi ≤ x) = fraction of the zi ’s less than x
Easy to show that
nF̂n(x) ∼ Bin(n,F (x))
limn→∞
F̂n(x) = F (x) a.s.
Johan Westerborn Statistical learning (7) Bootstrap
logga
Introduction to bootsrap
Empirical versus true distribution
−3 −2 −1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
ecdf(ran.vec)
x
Fn(x
)
Johan Westerborn Statistical learning (8) Bootstrap
logga
Introduction to bootsrap
The bootstrap algorithm
Given data z from the distribution Z we replace the distributionfunction with the empirical distribution function. The algorithmgoes as follows:
I Calculate τ̂ = t(z).I Simulate B new datasets zb,b = 1, . . . ,B where each zb has the
same size as z and each zb is obtained by drawing from theempirical distribution (that is resample with replacement from thevector z).
I Compute τ̂b = t(zb),b = 1, . . . ,B.I Calculate ∆b = τ̂b − τ̂ . This can be used for uncertainty analysis.
Johan Westerborn Statistical learning (9) Bootstrap
logga
Introduction to bootsrap
Runing example
Yearly maximum water height in Port Pirie
We have a dataset of 65yearly measurements of thehighest sea level recorded inthe city of Port Pirie in thesouthern Australia.Can we say anything aboutthe 10 year sea level? or the100 year sea level?
Johan Westerborn Statistical learning (10) Bootstrap
logga
Introduction to bootsrap
Runing example cont.
1930 1940 1950 1960 1970 1980
3.6
3.8
4.0
4.2
4.4
4.6
Year
SeaLevel
Johan Westerborn Statistical learning (11) Bootstrap
logga
Non-parametric bootstrap
The 10-year sea level
The 10-year sea level is defined as F−1(1− 1/10), where thedistribution is the distribution of the yearly maximum.Since we don’t make any assumptions on the data, given a vectorof 65 values we choose the 65 ∗ (1− 1/10) = 58.5th value as the10-year return.Since we can’t choose the 58.5 th value we take the mean of the58th and 59th largest value.We let the function t(z) be taking the “58.5” value in the vector z.
Johan Westerborn Statistical learning (13) Bootstrap
logga
Non-parametric bootstrap
The 10-year sea level cont.
We perform the bootstrap by following the algorithm:I Calculate τ̂ = t(z)I For every b ∈ {1, . . . ,1000} draw a vector zb by resampling from
the data z and set τ̂b = t(zb).I We set ∆b = τ̂b − τ̂I We can now estimate the bias as
Estimated bias =1B
B∑b=1
∆b
I We can estimate the standard deviation of the error in the usualway.
I We can estimate a confidence interval by taking the appropriatequantiles of the ∆b vector and using that together with τ̂ .
Johan Westerborn Statistical learning (14) Bootstrap
logga
Non-parametric bootstrap
The 10-year sea level cont.We get the estimated 10-year sea level to be τ̂ = 4.298Estimated bias = −0.006995% confidence interval to be (4.198,4.370)
Histogram of non_par.boot$t
non_par.boot$t
Fre
quency
4.1 4.2 4.3 4.4 4.5
0100
200
300
400
Johan Westerborn Statistical learning (15) Bootstrap
logga
Parameteric bootstrap
Another wayWhat if we want to estimate the 100-year sea level?
I Notice how bad the previous estimator would be.I We need to estimate something that is outside of our data range.
To solve this we set a distribution governed by some parameters θon our data.In our case we set the Gumbel distribution with distributionfunction
F (x) = exp(−exp
(−x − µ
β
)), x ∈ R
which has inverse
F−1(y) = µ− β log(− log(y)), y ∈ (0,1)
Johan Westerborn Statistical learning (17) Bootstrap
logga
Parameteric bootstrap
The parametric bootstrap
In the parametric bootstrap instead of using the empiricaldistribution we calculate θ̂ = θ̂(y) as an estimate of θThe new samples are then generated from the distributiondecided by θ̂ and calculate θ̂b = θ̂(yb)
We let the function t(y) depend on the estimated parameters θ̂binstead of the sample.
Johan Westerborn Statistical learning (18) Bootstrap
logga
Parameteric bootstrap
The 100-year sea level
We perform the parametric bootstrap to get the 100-year sea levelin the following way:
I Estimate the parameters θ̂ using the MLE method and we letτ̂ = t(θ̂) = F−1(1− 1/100; θ̂)
I For each b ∈ {1, . . . ,1000} we draw a new sample of size 65 fromthe Gumbel distribution with parameters θ̂ and calculate θ̂b
I We let τ̂b = t(θ̂b) and ∆b = τ̂b − τ̂
Johan Westerborn Statistical learning (19) Bootstrap
logga
Parameteric bootstrap
The 100-year sea level cont.We get τ̂ = 4.77We get an estimate bias to −0.0059We get an estimated lower 95% confidence interval to (4.60,∞)
Histogram of b1$t
b1$t
Fre
quency
4.4 4.5 4.6 4.7 4.8 4.9 5.0
050
100
150
200
Johan Westerborn Statistical learning (20) Bootstrap