Bootstrap Resampling
Nathaniel E. Helwig
Assistant Professor of Psychology and StatisticsUniversity of Minnesota (Twin Cities)
Updated 04-Jan-2017
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 1
Copyright
Copyright c© 2017 by Nathaniel E. Helwig
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 2
Outline of Notes
1) Background InformationStatistical inferenceSampling distributionsNeed for resampling
2) Bootstrap BasicsOverviewEmpirical distributionPlug-in principle
3) Bootstrap in PracticeBootstrap in RBias and mean-squared errorThe Jackknife
4) Bootstrapping RegressionRegression reviewBootstrapping residualsBootstrapping pairs
For a thorough treatment see:Efron, B. & Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC.
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 3
Background Information
Background Information
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 4
Background Information Statistical Inference
The Classic Statistical Paradigm
X is some random variable, e.g., age in years
X = {x1, x2, x3, . . .} is some population of interest, e.g.,Ages of all students at the University of MinnesotaAges of all people in the state of Minnesota
At the population level. . .F (x) = P(X ≤ x) for all x ∈ X is the population CDFθ = t(F ) is population parameter, where t is some function of F
At the sample level. . .
x = (x1, . . . , xn)′ is sample of data with xiiid∼ F for i ∈ {1, . . . ,n}
θ = s(x) is sample statistic, where s is some function of x
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 5
Background Information Statistical Inference
The Classic Statistical Paradigm (continued)
θ is a random variable that depends on x (and thus F )
The sampling distribution of θ refers to the CDF (or PDF) of θ.
If F is known (or assumed to be known), then the sampling distributionof θ may have some known distribution.
If xiiid∼ N(µ, σ2), then x ∼ N(µ, σ2/n) where x = 1
n∑n
i=1 xi
Note in the above example, θ ≡ µ and θ ≡ x
How can we make inferences about θ using θ when F is unknown?
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 6
Background Information Sampling Distributions
The Hypothetical Ideal
Assume that X is too large to measure all members of population.
If we had a really LARGE research budget, we could collect Bindependent samples from the population X
xj = (x1j , . . . , xnj)′ is j-th sample with xij
iid∼ F
θj = s(xj) is statistic (parameter estimate) for j-th sample
Sampling distribution of θ can be estimated via distribution of {θj}Bj=1.
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 7
Background Information Sampling Distributions
The Hypothetical Ideal: Example 1 (Normal Mean)
Sampling Distribution of x with xiiid∼ N(0,1) for n = 100:
Sampling Distribution: B = 200
xbar
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Sampling Distribution: B = 500
xbar
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Sampling Distribution: B = 1000
xbar
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Sampling Distribution: B = 2000
xbar
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Sampling Distribution: B = 5000
xbar
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Sampling Distribution: B = 10000
xbar
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 8
Background Information Sampling Distributions
The Hypothetical Ideal: Example 1 R Code
# hypothetical ideal: example 1 (normal mean)set.seed(1)n = 100B = c(200,500,1000,2000,5000,10000)xseq = seq(-0.4,0.4,length=200)quartz(width=12,height=8)par(mfrow=c(2,3))for(k in 1:6){X = replicate(B[k], rnorm(n))xbar = apply(X, 2, mean)hist(xbar, freq=F, xlim=c(-0.4,0.4), ylim=c(0,5),
main=paste("Sampling Distribution: B =",B[k]))lines(xseq, dnorm(xseq, sd=1/sqrt(n)))legend("topright",expression(bar(x)*" pdf"),lty=1,bty="n")
}
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 9
Background Information Sampling Distributions
The Hypothetical Ideal: Example 2 (Normal Median)
Sampling Distribution of median(x) with xiiid∼ N(0,1) for n = 100:
Sampling Distribution: B = 200
xmed
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Sampling Distribution: B = 500
xmed
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Sampling Distribution: B = 1000
xmed
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Sampling Distribution: B = 2000
xmed
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Sampling Distribution: B = 5000
xmed
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Sampling Distribution: B = 10000
xmed
Den
sity
−0.4 −0.2 0.0 0.2 0.4
01
23
45 x pdf
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 10
Background Information Sampling Distributions
The Hypothetical Ideal: Example 2 R Code
# hypothetical ideal: example 2 (normal median)set.seed(1)n = 100B = c(200,500,1000,2000,5000,10000)xseq = seq(-0.4,0.4,length=200)quartz(width=12,height=8)par(mfrow=c(2,3))for(k in 1:6){X = replicate(B[k], rnorm(n))xmed = apply(X, 2, median)hist(xmed, freq=F, xlim=c(-0.4,0.4), ylim=c(0,5),
main=paste("Sampling Distribution: B =",B[k]))lines(xseq, dnorm(xseq, sd=1/sqrt(n)))legend("topright",expression(bar(x)*" pdf"),lty=1,bty="n")
}
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 11
Background Information Need for Resampling
Back to the Real World
In most cases, we only have one sample of data. What do we do?
If n is large and we only care about x , we can use the CLT.
Sampling Distribution of x with xiiid∼ U[0,1] for B = 10000:
Sampling Distribution: n = 3
xbar
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
asymp pdf
Sampling Distribution: n = 5
xbar
Den
sity
0.2 0.4 0.6 0.8
0.0
0.5
1.0
1.5
2.0
2.5
3.0 asymp pdf
Sampling Distribution: n = 10
xbar
Den
sity
0.2 0.4 0.6 0.8
01
23
4
asymp pdf
Sampling Distribution: n = 20
xbar
Den
sity
0.3 0.4 0.5 0.6 0.7 0.8
01
23
45
6 asymp pdf
Sampling Distribution: n = 50
xbar
Den
sity
0.35 0.40 0.45 0.50 0.55 0.60 0.65
02
46
810 asymp pdf
Sampling Distribution: n = 100
xbar
Den
sity
0.40 0.45 0.50 0.55 0.60
02
46
810
1214 asymp pdf
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 12
Background Information Need for Resampling
The Need for a Nonparametric Resampling Method
For most statistics other than the sample mean, there is no theoreticalargument to derive the sampling distribution.
To make inferences, we need to somehow obtain (or approximate) thesampling distribution of any generic statistic θ.
Note that parametric statistics overcome this issue by assumingsome particular distribution for the dataNonparametric bootstrap overcomes this problem by resamplingobserved data to approximate the sampling distribution of θ.
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 13
Bootstrap Basics
Bootstrap Basics
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 14
Bootstrap Basics Overview
Problem of Interest
In statistics, we typically want to know the properties of our estimates,e.g., precision, accuracy, etc.
In parametric situation, we can often derive the distribution of ourestimate given our assumptions about the data (or MLE principles).
In nonparametric situation, we can use the bootstrap to examineproperties of our estimates in a variety of different situations.
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 15
Bootstrap Basics Overview
Bootstrap Procedure
Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we
want to make inferences about some statistic θ = s(x).
Can use Monte Carlo Bootstrap:1 Sample x∗i with replacement from {x1, . . . , xn} for i ∈ {1, . . . ,n}2 Calculate θ∗ = s(x∗) for b-th sample where x∗ = (x∗1 , . . . , x
∗n )′
3 Repeat 1–2 a total of B times to get bootstrap distribution of θ4 Compare θ = s(x) to bootstrap distribution
Estimated standard error of θ is standard deviation of {θ∗b}Bb=1:
σB =
√1
B − 1∑B
b=1(θ∗b − θ∗)2
where θ∗ = 1B∑B
b=1 θ∗b is the mean of the bootstrap distribution of θ.
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 16
Bootstrap Basics Empirical Distribution
Empirical Cumulative Distribution Functions
Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we
want to estimate the cdf F .
The empirical cumulative distribution function (ecdf) Fn is defined as
Fn(x) = P(X ≤ x) =1n∑n
i=1 I{xi≤x}
where I{·} denotes an indicator function.
The ecdf assigns probability 1/n to each value xi , which implies that
Pn(A) =1n∑n
i=1 I{xi∈A}
for any set A in the sample space of X .Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 17
Bootstrap Basics Empirical Distribution
Some Properties of ECDFs
For any fixed value x , we have thatE [Fn(x)] = F (x)
V [Fn(x)] = 1n F (x)[1− F (x)]
As n→∞ we have that
supx∈R|Fn(x)− F (x)| as→ 0
which is the Glivenko-Cantelli theorem.
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 18
Bootstrap Basics Empirical Distribution
ECDF Visualization for Normal Distribution
−2 −1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
n = 100
x
Fn(
x)
●●
●●
●●
●●●
●●
●●●
●●●●
●●●●●●●
●●●●●●
●●
●●●●●●●●●●●●●●●
●●
●●●
●●●●●●●●●●●
●●
●●●●●●●●
●●●●●●●●
●●●
●●●●●●
●●●
●●●
●●
●
−2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
n = 500
x
Fn(
x)
● ● ●● ●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●● ● ●
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
n = 1000
x
Fn(
x)
set.seed(1)par(mfrow=c(1,3))n = c(100,500,1000)xseq = seq(-4,4,length=100)for(j in 1:3){x = rnorm(n[j])plot(ecdf(x),main=paste("n = ",n[j]))lines(xseq,pnorm(xseq),col="blue")
}
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 19
Bootstrap Basics Empirical Distribution
ECDF Example
Table 3.1 An Introduction to the Bootstrap (Efron & Tibshirani, 1993).
School LSAT (y ) GPA (z) School LSAT (y ) GPA (z)1 576 3.39 9 651 3.362 635 3.30 10 605 3.133 558 2.81 11 653 3.124 578 3.03 12 575 2.745 666 3.44 13 545 2.766 580 3.07 14 572 2.887 555 3.00 15 594 2.968 661 3.43
Defining A = {(y , z) : 0 < y < 600, 0 < z < 3.00}, we have
P15(A) = (1/15)∑15
i=1 I{(yi ,zi )∈A} = 5/15
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 20
Bootstrap Basics Plug-In Principle
Plug-In Parameter Estimates
Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we
want to estimate some parameter θ = t(F ) that depends on the cdf F .Example: want to estimate expected value E(X ) =
∫xf (x)dx
The plug-in estimate of θ = t(F ) is given by
θ = t(F )
which is the statistic calculated using the ecdf in place of the cdf.
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 21
Bootstrap Basics Plug-In Principle
Plug-In Estimate of Mean
Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we
want to estimate the expected value θ = E(X ) =∫
xf (x)dx .
The plug-in estimate of the expected value is the sample mean
θ = EF (x) =n∑
i=1
xi fi =1n
n∑i=1
xi = x
where fi = 1n is sample probability from the ecdf.
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 22
Bootstrap Basics Plug-In Principle
Standard Error of Mean
Let µF = EF (x) and σ2F = VF (x) = EF [(x − µF )2] denote the mean and
variance of X , and denote this using the notation X ∼ (µF , σ2F ).
If x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, then the sample
mean x = 1n∑n
i=1 xi has mean and variance x ∼ (µF , σ2F/n).
The standard error of the mean x is the square root of the variance of x
SEF (x) = σF/√
n
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 23
Bootstrap Basics Plug-In Principle
Plug-In Estimate of Standard Error of Mean
Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we
want to estimate the standard error of the meanSEF (x) = σF/
√n =
√EF [(x − µF )2]/n.
The plug-in estimate of the standard deviation is given by
σ = σF =
{1n∑n
i=1(xi − x)2}1/2
so the plug-in estimate of the standard error of the mean is
σ/√
n = σF/√
n =
{1n2
∑ni=1(xi − x)2
}1/2
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 24
Bootstrap in Practice
Bootstrap in Practice
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 25
Bootstrap in Practice Bootstrap in R
Bootstrap Standard Error (revisited)
Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we
want to make inferences about some statistic θ = s(x).
Estimated standard error of θ is standard deviation of {θ∗b}Bb=1:
σB =
√1
B − 1∑B
b=1(θ∗b − θ∗)2
where θ∗ = 1B∑B
b=1 θ∗b is the mean of the bootstrap distribution of θ.
As the number of bootstrap samples goes to infinity, we have that
limB→∞
σB = SEF (θ)
where SEF (θ) is the plug-in estimate of SEF (θ).Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 26
Bootstrap in Practice Bootstrap in R
Illustration of Bootstrap Standard Error
Figure 6.1: An Introduction to the Bootstrap (Efron & Tibshirani, 1993).
48 THE BOOTSTRAP ESTIMATE OF STANDARD ERROR
Figure 6.1. The bootstrap algorithm for estimating the standard error of a statistzc iJ = s(x); each bootstrap sample zs an mdependent random sample of szze n from F The number of bootstrap replications B for estimating a standard error zs usually between 25 and 200. As B ---> oo, ies approaches the plug-m estimate of sep ( iJ).
Empirical Distribution
Bootstrap Bootstrap Samples of Replications
Size n of 8 t t
I ::: = x*
3 -----+ S*(3) = s(x*3)
F / *b A x -----+ S*(b) = s(x*b)
-
Bootstrap Estimate of Standard Error
A word about notation: m (G.7) we write sep(iJ*) rather than sep(O) to avoid confuswn between o, the value of s(x) based on the observed data, and 0* = s(x*) thought of as a random variable based on the bootstrap sample. The fuller notation se p ( 0( x*)) em-phasizes that sep is a bootstrap standard error: the actual data x is held fixed in (6.7); the randomness in the calculation comes from the variability of the bootstrap samples x*, gwen x. Similarly we will write E pg(x*) to indicate the bootstrap expectation of a func-
EXAMPLE: THE CORRELATION COEFFICIENT 49
tion g( x*), the expectation with x (and P) fixed and x* varymg according to (6.1).
The reader is asked in Problem 6.5 to show that there is a total of enn- 1
) distinct bootstrap samples. Denote these by z1, z 2 ,. . zm where m = enn- 1
). For example, if n = 2, the distinct sam-ples are ( x1, xt), ( x2, Xz) and ( x1, x 2); since the order doesn't mat-ter, ( x2, xt) 1s the same as ( x1, x2). The probability of obtaining one of these samples under sampling with replacement can be ob-tained frotu Llw multinomial distribution: details are in Problem 6.7 Denote the probability of the jth distinct sample by Wj,J = 1, 2,.-- enn- 1
). Then a direct way to calculate the ideal bootstrap estimate of standard error would be to use the population standard deviation of them bootstrap values s(zl):
m
sep(O*) = (L w.i{s(zl) _ s(-)}2]1/2 (6.8) J=1
where s(·) = I:;"=1 Wjs(zJ). The difficulty with this approach is that unless n is quite small(::::; 5), the number en,;1) is very large, making computation of (6.8) impractical. Hence the need for boot-strap sampling as described above.
6.3 Example: the correlation coefficient
We have already seen two examples of the bootstrap standard error estimate, for the mean and the median of the Treatment group of the mouse data, Table 2.1. As a second example consider the sample correlation coefficient between y = LSAT and z = GPA for then= 15 law school data points, Table 3.1, Con.·(y, z) = .776. How accurate is the estimate . 776? Table 6.1 shows the bootstrap estimate of standard error BeiJ for B ranging from 25 to 3200. The last value, sca2ou = .132, ts our estimate for se1.·(cilli). Later we Will see that SC200 is nearly as good an estimate of Sep as is Be3200·
Looking at the nght stde of Figure 3.1, the reader can i;nagine the bootstrap sampling process at work. The sample correlation of the n = 15 actual data points is CciiT = . 776. A bootstrap sample consists of 15 points selected at random and with replacement from the actual 15. The sample correlation of the bootstrap sample is a bootstrap replication CciiT*, which may be either bigger or smaller than CciiT. Independent repetitions of the bootstrap sampling pro-cess give bootstrap replications COiT* ( 1), COiT* (2), ; COiT* (B). Fi-
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 27
Bootstrap in Practice Bootstrap in R
Illustration of Bootstrap Procedure
Figure 8.1: An Introduction to the Bootstrap (Efron & Tibshirani, 1993).
CHAPTER 8
More complicated data structures
8.1 Introduction
The bootstrap algorithm of Figure 6.1 is based on the simplest possible probability model for random data: the one-sample model, where a single unknown probability distribution F produces the data x by random sampling
(8.1)
The individual data points xt in (8.1) can themselves be quite complex, perhaps being numbers or vectors or maps or images or anything at all, but the probability mechanism is simple. Many data analysis problems involve more complicated data structures. These structures have names like time series, analysis of variance, regression models, multi-sample problems, censored data, stratified sampling, and so on. The bootstrap algorithm can be adapted to general data structures, as is dificussed here and m Chapter 9.
8.2 One-sample problems
Figure 8.1 is a schematic diagram of the bootstrap method as it applies to one-sample problems. On the left IS the real world, where an unknown distribution F has given the observed data x = ( x 1 , x 2 , - - - , Xn) by random sampling_ We have calculated a statistic of interest from x, iJ =: s(x), and wish to know something about iJ's statistical behavior, perhaps its standard error se F ( iJ).
On the right side of the diagram is the bootstrap world, to use David Freedman's evocative te,:minology. In the bootstrap world, the empirical distribution F gives bootstrap samples x* = (xi, - · - , x;,) by random sampling, from which we calcu-
ONE-SAMPLE PROBLEMS
REAL WORLD
Unknown Probability Distribution
Observed Random Sample
Statistic of interest
BOOTSTRAP WORLD
D1stnbution Bootstrap Sample
Bootstrap Replication
87
Figure 8.1. A schematic diagram of the bootstrap as it applies to one-sample problems. In the r·eal world, the unknown probability distribution F gtves the data x = (x 1 , x2 ,- · , xn) by random sampling; from x we calculate the stattstic of mterest {j = s(x). In the bootstrap world, F generates x* by random sampling, gwmg {j• = s(x*). There ts only one observed value of e, but we can generate as many bootstrap replications {j• as affordable. The crucwl step m the bootstrap pro,cess ts "===* ", the process by whtch we construct from x an estimate F of the unknown population F
late bootstrap replications of the statistic of interest, {j• = s(x*). The big advantage of the, bootstrap world is that we can calculate as many replications of B* as we want, or at least as many as we can afford. This allows us to do probabilistic calcu!ations directly, for example using the observed variability of the B*'s to estimate the unobservable quantity sep(iJ). ,
The double arrow in Figure 8.1 indicates the calculation of F from F _ Conceptually, this is the crucial step in the bootstrap pro-cess, even though it is computationally simple. Every other part of the bootstrap picture is defined by analogy: F, gives x by random sampling, so F gives x* random sampling; e is obtained from X
via the function s(x), so e· is obtained from x* in the same way. Bootstrap calculations for more complex probability mechanisms turn out to be straightforward, once we know how to carry out the double arrow process - estimating the entire probability mech-anism from the data. Fortunately this is easy to do for all of the
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 28
Bootstrap in Practice Bootstrap in R
An R Function for Bootstrap Resampling
We can design our own bootstrap sampling function:
bootsamp <- function(x,nsamp=10000){x = as.matrix(x)nx = nrow(x)bsamp = replicate(nsamp,x[sample.int(nx,replace=TRUE),])
}
If x is a vector of length n, then bootsamp returns an n × B matrix,where B is the number of bootstrap samples (controlled via nsamp).
If x is a matrix of order n × p, then bootsamp returns an n × p × Barray, where B is the number of bootstrap samples.
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 29
Bootstrap in Practice Bootstrap in R
An R Function for Bootstrap Standard Error
We can design our own bootstrap standard error function:
bootse <- function(bsamp,myfun,...){if(is.matrix(bsamp)){
theta = apply(bsamp,2,myfun,...)} else {theta = apply(bsamp,3,myfun,...)
}if(is.matrix(theta)){return(list(theta=theta,cov=cov(t(theta))))
} else{return(list(theta=theta,se=sd(theta)))
}}
Returns a list where theta contains bootstrap statistic {θ∗b}Bb=1, andse contains the bootstrap standard error estimate(or cov contains bootstrap covariance matrix).Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 30
Bootstrap in Practice Bootstrap in R
Example 1: Sample Mean
> set.seed(1)> x = rnorm(500,mean=1)> bsamp = bootsamp(x)> bse = bootse(bsamp,mean)> mean(x)[1] 1.022644> sd(x)/sqrt(500)[1] 0.04525481> bse$se[1] 0.04530694> hist(bse$theta)
Histogram of bse$theta
bse$theta
Fre
quen
cy
0.9 1.0 1.1 1.20
500
1000
1500
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 31
Bootstrap in Practice Bootstrap in R
Example 2: Sample Median
> set.seed(1)> x = rnorm(500,mean=1)> bsamp = bootsamp(x)> bse = bootse(bsamp,median)> median(x)[1] 0.9632217> bse$se[1] 0.04299574> hist(bse$theta)
Histogram of bse$theta
bse$theta
Fre
quen
cy
0.8 0.9 1.0 1.1 1.20
1000
2000
3000
4000
5000
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 32
Bootstrap in Practice Bootstrap in R
Example 3: Sample Variance
> set.seed(1)> x = rnorm(500,sd=2)> bsamp = bootsamp(x)> bse = bootse(bsamp,var)> var(x)[1] 4.095996> bse$se[1] 0.2690615> hist(bse$theta)
Histogram of bse$theta
bse$theta
Fre
quen
cy
3.5 4.0 4.5 5.00
500
1000
1500
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 33
Bootstrap in Practice Bootstrap in R
Example 4: Mean Difference
> set.seed(1)> x = rnorm(500,mean=3)> y = rnorm(500)> z = cbind(x,y)> bsamp = bootsamp(z)> myfun = function(z) mean(z[,1]) - mean(z[,2])> bse = bootse(bsamp,myfun)> myfun(z)[1] 3.068584> sqrt( (var(z[,1])+var(z[,2]))/nrow(z) )[1] 0.06545061> bse$se[1] 0.06765369> hist(bse$theta)
Histogram of bse$theta
bse$theta
Fre
quen
cy
2.8 2.9 3.0 3.1 3.2 3.3
050
010
0015
0020
0025
00
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 34
Bootstrap in Practice Bootstrap in R
Example 5: Median Difference
> set.seed(1)> x = rnorm(500,mean=3)> y = rnorm(500)> z = cbind(x,y)> bsamp = bootsamp(z)> myfun = function(z) median(z[,1]) - median(z[,2])> bse = bootse(bsamp,myfun)> myfun(z)[1] 2.984479> bse$se[1] 0.07699423> hist(bse$theta)
Histogram of bse$theta
bse$theta
Fre
quen
cy
2.8 3.0 3.2 3.4
050
010
0020
0030
00
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 35
Bootstrap in Practice Bootstrap in R
Example 6: Correlation Coefficient
> set.seed(1)> x=rnorm(500)> y=rnorm(500)> Amat=matrix(c(1,-0.25,-0.25,1),2,2)> Aeig=eigen(Amat,symmetric=TRUE)> evec=Aeig$vec> evalsqrt=diag(Aeig$val^0.5)> Asqrt=evec%*%evalsqrt%*%t(evec)> z=cbind(x,y)%*%Asqrt> bsamp=bootsamp(z)> myfun=function(z) cor(z[,1],z[,2])> bse=bootse(bsamp,myfun)> myfun(z)[1] -0.2884766> (1-myfun(z)^2)/sqrt(nrow(z)-3)[1] 0.04112326> bse$se[1] 0.03959024> hist(bse$theta)
Histogram of bse$theta
bse$theta
Fre
quen
cy
−0.45 −0.35 −0.25 −0.150
500
1000
1500
2000
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 36
Bootstrap in Practice Bootstrap in R
Example 7: Uniform[0, θ] = bootstrap failure
> set.seed(1)> x = runif(500)> bsamp = bootsamp(x)> myfun = function(x) max(x)> bse = bootse(bsamp,myfun)> myfun(x)[1] 0.9960774> bse$se[1] 0.001472801> hist(bse$theta)
F is not a good estimate of F inextreme tails.
Histogram of bse$theta
bse$theta
Fre
quen
cy
0.986 0.988 0.990 0.992 0.994 0.9960
1000
3000
5000
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 37
Bootstrap in Practice Bias and Mean-Squared Error
Measuring the Quality of an Estimator
We have focused on standard error to measure the precision of θ.Small standard error is good, but other qualities are important too!
Consider the toy example:
Suppose {xi}ni=1iid∼ (µ, σ2) and want to estimate mean of X
Define µ = 10 + x to be our estimate of µStandard error of µ is σ/
√n and limn→∞ σ/
√n = 0
But µ = 10 + x is clearly not an ideal estimate of µ
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 38
Bootstrap in Practice Bias and Mean-Squared Error
Bias of an Estimator
Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we
want to make inferences about some statistic θ = s(x).
The bias of an estimate θ = s(x) of θ = t(F ) is defined as
BiasF = EF [s(x)]− t(F )
where the expectation is taken with respect to F .Bias is difference between expectation of estimate and parameterIn the example on the previous slide, we have thatBiasF = EF (µ)− µ = EF (10 + x)− µ = 10 + EF (x)− µ = 10
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 39
Bootstrap in Practice Bias and Mean-Squared Error
Bootstrap Estimate of Bias
The bootstrap estimate of bias substitutes F for F in bias definition
BiasF = EF [s(x)]− t(F )
where the expectation is taken with respect to the ecdf F .Note that t(F ) is the plug-in estimate of θt(F ) is not necessarily equal to θ = s(x)
Given B bootstrap samples, we can estimate bias using
BiasB = θ∗ − t(F )
where θ∗ = 1B∑B
b=1 θ∗b is the mean of the bootstrap distribution of θ
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 40
Bootstrap in Practice Bias and Mean-Squared Error
An R Function for Bootstrap Bias
We can design our own bootstrap bias estimation function:
bootbias <- function(bse,theta,...){if(is.matrix(bse$theta)){return(apply(bse$theta,1,mean) - theta)
} else{return(mean(bse$theta) - theta)
}}
The first input bse is the object output from bootse, and the secondinput theta is the plug-in estimate of theta used for bias calculation.
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 41
Bootstrap in Practice Bias and Mean-Squared Error
Sample Mean is Unbiased
> set.seed(1)> x = rnorm(500,mean=1)> bsamp = bootsamp(x)> bse = bootse(bsamp,mean)> mybias = bootbias(bse,mean(x))> mybias[1] 0.0003689287> mean(x)[1] 1.022644> sd(x)/sqrt(500)[1] 0.04525481> bse$se[1] 0.04530694
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 42
Bootstrap in Practice Bias and Mean-Squared Error
Toy Example
> set.seed(1)> x = rnorm(500,mean=1)> bsamp = bootsamp(x)> bse = bootse(bsamp,function(x) mean(x)+10)> mybias = bootbias(bse,mean(x))> mybias[1] 10.00037> mean(x)[1] 1.022644> sd(x)/sqrt(500)[1] 0.04525481> bse$se[1] 0.04530694
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 43
Bootstrap in Practice Bias and Mean-Squared Error
Mean Squared Error (MSE)
The mean-squared error (MSE) of an estimate θ = s(x) of θ = t(F ) is
MSEF = EF{[s(x)− t(F )]2}= VF (θ) + [BiasF (θ)]2
where the expectation is taken with respect to F .MSE is expected squared difference between θ and θIn the toy example on the previous slide, we have that
MSEF = EF{(µ− µ)2}= EF{(10 + x − µ)2}= EF (102) + 2EF [10(x − µ)] + EF [(x − µ)2]
= 100 + σ2/n
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 44
Bootstrap in Practice Bias and Mean-Squared Error
Toy Example (revisited)
> set.seed(1)> x = rnorm(500,mean=1)> bsamp = bootsamp(x)> bse = bootse(bsamp,function(x) mean(x)+10)> mybias = bootbias(bse,mean(x))> c(bse$se,mybias)[1] 0.04530694 10.00036893> c(bse$se,mybias)^2[1] 2.052718e-03 1.000074e+02> mse = (bse$se^2) + (mybias^2)> mse[1] 100.0094> 100 + 1/length(x)[1] 100.002
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 45
Bootstrap in Practice Bias and Mean-Squared Error
Balance between Accuracy and Precision
MSE quantifies both the accuracy (bias) and precision (variance).
Ideal estimators are accurate (small bias) and precise (small variance).
Having some bias can be an ok (or even good) thing, despite thenegative connotations of the word “biased”.
For example:Q: Would you rather have an estimator that is biased by 1 unit witha standard error of 1 unit? Or one that is unbiased but has astandard error of 1.5 units?A: The first estimator is better with respect to MSE
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 46
Bootstrap in Practice Bias and Mean-Squared Error
Accuracy and Precision Visualization
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
low accuracy and low precision
x
y
●
TruthEstimates
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
● ● ●
●
● ●
●
●
●
●●
●
●
●●●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●● ●●
●
●
●
● ●●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●● ●●
● ●
●
●
●
●
● ●
●
●●
● ●
●
●●
●
●
●
●●
●●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
low accuracy and high precision
x
y
●
TruthEstimates
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
high accuracy and low precision
x
y
●
TruthEstimates
●●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●●● ●
●
●
● ●
●
●
●
●● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●● ●
●
●●
● ●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●●
●●
●
●●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
● ●
●
●
●●
● ●
●●●
●
●
●● ●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
● ●● ●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
● ●●
●
●
●
●●
●
●
●●
●
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
high accuracy and high precision
x
y
●
TruthEstimates
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 47
Bootstrap in Practice The Jackknife
Jackknife Sample
Before the bootstrap, the jackknife was used to estimate bias and SE.
The i-th jackknife sample of x = (x1, . . . , xn) is defined as
x(i) = (x1, . . . , xi−1, xi+1, . . . , xn)
for i ∈ {1, . . . ,n}. Note that. . .x(i) is original data vector without the i-th observationx(i) is a vector of length (n − 1)
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 48
Bootstrap in Practice The Jackknife
Jackknife Replication
The i-th jackknife replication θ(i) of the statistic θ = s(x) is
θ(i) = s(x(i))
which is the statistic calculated using the i-th jackknife sample.
For plug-in statistics θ = t(F ), we have that
θ(i) = t(F(i))
where F(i) is the empirical distribution of x(i).
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 49
Bootstrap in Practice The Jackknife
Jackknife Estimate of Standard Error
The jackknife estimate of standard error is defined as
σjack =
√√√√n − 1n
n∑i=1
(θ(i) − θ(·))2
where θ(·) = 1n∑n
i=1 θ(i) is the mean of the jackknife estimates of θ.
Note that the n−1n factor is derived considering the special case θ = x
σjack =
√√√√ 1(n − 1)n
n∑i=1
(xi − x)2
which is an unbiased estimator of the standard error of x .
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 50
Bootstrap in Practice The Jackknife
Jackknife Estimate of Bias
The jackknife estimate of bias is defined as
Biasjack = (n − 1)(θ(·) − θ)
where θ(·) = 1n∑n
i=1 θ(i) is the mean of the jackknife estimates of θ.
This approach only works for plug-in statistics θ = t(F ).Only works if t(F ) is smooth (e.g., mean or ratio)Doesn’t work if t(F ) is unsmooth (e.g., median)Gives bias estimate using only n recomputations (typically n� B)
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 51
Bootstrap in Practice The Jackknife
Smooth versus Unsmooth t(F )
Suppose we have a sample of data (x1, . . . , xn), and consider themean and median as a function of x1:
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●
−4 −2 0 2 4
0.08
0.10
0.12
0.14
mean
x1
mea
nval
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−4 −2 0 2 4
0.08
0.10
0.12
0.14
median
x1
med
val
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 52
Bootstrap in Practice The Jackknife
Smooth versus Unsmooth t(F ) R Code
# mean is smoothmeanfun <- function(x,z) mean(c(x,z))set.seed(1)z = rnorm(100)x = seq(-4,4,length=200)meanval = rep(0,200)for(j in 1:200) meanval[j] = meanfun(x[j],z)quartz(width=6,height=6)plot(x,meanval,xlab=expression(x[1]),main="mean")
# median is unsmoothmedfun <- function(x,z) median(c(x,z))set.seed(1)z = rnorm(100)x = seq(-4,4,length=200)medval = rep(0,200)for(j in 1:200) medval[j] = medfun(x[j],z)quartz(width=6,height=6)plot(x,medval,xlab=expression(x[1]),main="median")
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 53
Bootstrap in Practice The Jackknife
Some R Functions for Jackknife Resampling
We can design our own jackknife functions:
jacksamp <- function(x){nx = length(x)jsamp = matrix(0,nx-1,nx)for(j in 1:nx) jsamp[,j] = x[-j]jsamp
}
jackse <- function(jsamp,myfun,...){nx = ncol(jsamp)theta = apply(jsamp,2,myfun,...)se = sqrt( ((nx-1)/nx) * sum( (theta-mean(theta))^2 ) )list(theta=theta,se=se)
}
These functions work similar to the bootsamp and bootse functions ifx is a vector and the statistic produced by myfun is unidimensional.Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 54
Bootstrap in Practice The Jackknife
Example 1: Sample Mean (revisited)
> set.seed(1)> x = rnorm(500,mean=1)> jsamp = jacksamp(x)> jse = jackse(jsamp,mean)> mean(x)[1] 1.022644> sd(x)/sqrt(500)[1] 0.04525481> jse$se[1] 0.04525481> hist(jse$theta)
Histogram of jse$theta
jse$theta
Fre
quen
cy
1.016 1.020 1.024 1.0280
2040
6080
100
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 55
Bootstrap in Practice The Jackknife
Example 2: Sample Median (revisited)
> set.seed(1)> x = rnorm(500,mean=1)> jsamp = jacksamp(x)> jse = jackse(jsamp,median)> median(x)[1] 0.9632217> jse$se[1] 0.01911879> hist(jse$theta)
Histogram of jse$theta
jse$theta
Fre
quen
cy
0.9625 0.9630 0.9635 0.96400
5010
015
020
025
0
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 56
Bootstrapping Regression
Bootstrapping Regression
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 57
Bootstrapping Regression Regression Review
Simple Linear Regression Model: Scalar Form
The simple linear regression model has the form
yi = b0 + b1xi + ei
for i ∈ {1, . . . ,n} whereyi ∈ R is the real-valued response for the i-th observationb0 ∈ R is the regression interceptb1 ∈ R is the regression slopexi ∈ R is the predictor for the i-th observation
eiiid∼ (0, σ2) is zero-mean measurement error
Implies that (yi |xi)ind∼ (b0 + b1xi , σ
2)
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 58
Bootstrapping Regression Regression Review
Simple Linear Regression Model: Matrix Form
The simple linear regression model has the form
y = Xb + e
wherey = (y1, . . . , yn)′ ∈ Rn is the n × 1 response vectorX = [1n,x] ∈ Rn×2 is the n × 2 design matrix• 1n is an n × 1 vector of ones• x = (x1, . . . , xn)′ ∈ Rn is the n × 1 predictor vector
b = (b0,b1)′ ∈ R2 is the 2× 1 vector of regression coefficientse = (e1, . . . ,en)′ ∼ (0n, σ
2In) is the n × 1 error vector
Implies that (y|x) ∼ (Xb, σ2In)
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 59
Bootstrapping Regression Regression Review
Ordinary Least Squares: Scalar Form
The ordinary least squares (OLS) problem is
minb0,b1∈R
n∑i=1
(yi − b0 − b1xi)2
and the OLS solution has the form
b0 = y − b1x
b1 =
∑ni=1(xi − x)(yi − y)∑n
i=1(xi − x)2
where x = (1/n)∑n
i=1 xi and y = (1/n)∑n
i=1 yi
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 60
Bootstrapping Regression Regression Review
Ordinary Least Squares: Matrix Form
The ordinary least squares (OLS) problem is
minb∈R2
‖y− Xb‖2
where ‖ · ‖ denotes the Frobenius norm; the OLS solution has the form
b = (X′X)−1X′y
where (X′X
)−1=
1n∑n
i=1(xi − x)2
( ∑ni=1 x2
i −∑n
i=1 xi−∑n
i=1 xi n
)X′y =
( ∑ni=1 yi∑n
i=1 xiyi
)
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 61
Bootstrapping Regression Regression Review
OLS Coefficients are Random Variables
Note that b is a linear function of y, so we can derive the following.
The expectation of b is given by
E(b) = E[(X′X)−1X′y]
= E[(X′X)−1X′(Xb + e)]
= E[b] + (X′X)−1X′E[e]
= b
and the covariance matrix is given by
V(b) = V[(X′X)−1X′y]
= (X′X)−1X′V[y]X(X′X)−1
= (X′X)−1X′(σ2In)X(X′X)−1
= σ2(X′X)−1
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 62
Bootstrapping Regression Regression Review
Fitted Values are Random Variables
Similarly y = Xb is a linear function of y, so we can derive. . .
The expectation of y is given by
E(y) = E[X(X′X)−1X′y]
= E[X(X′X)−1X′(Xb + e)]
= E[Xb] + X(X′X)−1X′E[e]
= Xb
and the covariance matrix is given by
V(y) = V[X(X′X)−1X′y]
= X(X′X)−1X′V[y]X(X′X)−1X′
= X(X′X)−1X′(σ2In)X(X′X)−1X′
= σ2X(X′X)−1X′
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 63
Bootstrapping Regression Regression Review
Need for the Bootstrap
If the residuals are Gaussian eiiid∼ N(0, σ2), then we have that
b ∼ N(b, σ2(X′X)−1).y ∼ N(Xb, σ2X(X′X)−1X′)
so it is possible to make probabilistic statements about b and y.
If eiiid∼ F for some arbitrary distribution F with EF (ei) = 0, we can use
the bootstrap to make inferences about b and y.Use bootstrap with ecdf F as distribution of ei
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 64
Bootstrapping Regression Bootstrapping Residuals
Bootstrapping Regression Residuals
Can use following bootstrap procedure:1 Fit regression model to obtain y and e = y− y2 Sample e∗i with replacement from {e1, . . . , en} for i ∈ {1, . . . ,n}3 Define y∗i = yi + e∗i and b∗ = (X′X)−1X′y∗
4 Repeat 2–3 a total of B times to get bootstrap distribution of b
Don’t need Monte Carlo simulation to get bootstrap standard errors:
V(b∗) = (X′X)−1X′V(y∗)X(X′X)−1
= σ2F (X′X)−1
given that V(y∗) = σ2F In where σ2
F =∑n
i=1 e2i /n
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 65
Bootstrapping Regression Bootstrapping Residuals
Bootstrapping Regression Residuals in R
> set.seed(1)> n = 500> x = rexp(n)> e = runif(n,min=-2,max=2)> y = 3 + 2*x + e> linmod = lm(y~x)> linmod$coef(Intercept) x
2.898325 2.004220> yhat = linmod$fitted.values> bsamp = bootsamp(linmod$residuals)> bsamp = matrix(yhat,n,ncol(bsamp)) + bsamp> myfun = function(y,x) lm(y~x)$coef> bse = bootse(bsamp,myfun,x=x)> bse$cov
(Intercept) x(Intercept) 0.006105788 -0.003438571x -0.003438571 0.003605852> sigsq = mean(linmod$residuals^2)> solve(crossprod(cbind(1,x))) * sigsq
x0.006112136 -0.003412774
x -0.003412774 0.003573180> par(mfcol=c(2,1))> hist(bse$theta[1,],main=expression(hat(b)[0]))> hist(bse$theta[2,],main=expression(hat(b)[1]))
b0
bse$theta[1, ]
Fre
quen
cy
2.6 2.7 2.8 2.9 3.0 3.1 3.2
010
00
b1
bse$theta[2, ]
Fre
quen
cy
1.8 1.9 2.0 2.1 2.2 2.3
015
0030
00
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 66
Bootstrapping Regression Bootstrapping Pairs
Bootstrapping Pairs Instead of Residuals
We could also use the following bootstrap procedure:1 Fit regression model to obtain y and e = y− y2 Sample z∗i = (x∗i , y
∗i ) with replacement from {(x1, y1), . . . , (xn, yn)}
for i ∈ {1, . . . ,n}3 Define x∗ = (x∗1 , . . . , x
∗n ), X∗ = [1n,x∗], y∗ = (y∗1 , . . . , y
∗n ), , and
b∗ = (X′∗X∗)−1X′∗y∗
4 Repeat 2–3 a total of B times to get bootstrap distribution of b
Bootstrapping pairs only assumes (xi , yi) are iid from some F .
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 67
Bootstrapping Regression Bootstrapping Pairs
Bootstrapping Regression Pairs in R
> set.seed(1)> n = 500> x = rexp(n)> e = runif(n,min=-2,max=2)> y = 3 + 2*x + e> linmod = lm(y~x)> linmod$coef(Intercept) x
2.898325 2.004220> z = cbind(y,x)> bsamp = bootsamp(z)> myfun = function(z) lm(z[,1]~z[,2])$coef> bse = bootse(bsamp,myfun)> bse$cov
(Intercept) z[, 2](Intercept) 0.006376993 -0.003913989z[, 2] -0.003913989 0.004308720> sigsq = mean(linmod$residuals^2)> solve(crossprod(cbind(1,x))) * sigsq
x0.006112136 -0.003412774
x -0.003412774 0.003573180> par(mfcol=c(2,1))> hist(bse$theta[1,],main=expression(hat(b)[0]))> hist(bse$theta[2,],main=expression(hat(b)[1]))
b0
bse$theta[1, ]
Fre
quen
cy
2.6 2.7 2.8 2.9 3.0 3.1 3.2
010
0025
00
b1
bse$theta[2, ]
Fre
quen
cy
1.8 1.9 2.0 2.1 2.2
010
0025
00
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 68
Bootstrapping Regression
Bootstrapping Regression: Pairs or Residuals?
Bootstrapping pairs requires fewer assumptions about the dataBootstrapping pairs only assumes (xi , yi) are iid from some F
Bootstrapping residuals assumes (yi |xi)iid∼ (b0 + b1xi , σ
2)
Bootstrapping pairs can be dangerous when working with categoricalpredictors and/or continuous predictors with skewed distributions
Bootstrapping residuals is preferable when regression model isreasonably specified (because X remains unchanged).
Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 69