28
Bootstrap Methods: Recent Bootstrap Methods: Recent Advances and New Applications Advances and New Applications 2007 ICTP Lectures 2007 ICTP Lectures Anestis Anestis Antoniadis Antoniadis UJF, LJK Lab UJF, LJK Lab Trieste October, 2007 Trieste October, 2007 1

Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

  • Upload
    lydiep

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Bootstrap Methods: RecentBootstrap Methods: RecentAdvances and New ApplicationsAdvances and New Applications

2007 ICTP Lectures2007 ICTP LecturesAnestis Anestis AntoniadisAntoniadis

UJF, LJK LabUJF, LJK LabTrieste October, 2007Trieste October, 2007

11

Page 2: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Bootstrap TopicsBootstrap Topics• Introduction to Bootstrap

• Wide Variety of Applications

• Confidence regions and hypothesis tests

• Examples where bootstrap is not consistent and remedies:(1) infinite variance case for a population mean and(2) extreme values

• Available Software

22

Page 3: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

IntroductionIntroduction

• The bootstrap is a general method for doingstatistical analysis without making strongparametric assumptions.

• Efron’s nonparametric bootstrap, resamplesthe original data.

• It was originally designed to estimate bias andstandard errors for statistical estimates muchlike the jackknife.

33

Page 4: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Introduction (continued)Introduction (continued)

• The bootstrap is similar to earliertechniques which are also calledresampling methods:– (1) jackknife,– (2) cross-validation,– (3) delta method,– (4) permutation methods, and– (5) subsampling..

44

Page 5: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Introduction (continued)Introduction (continued)

The technique was extended, modified and

refined to handle a wide variety of problems

including:– (1) confidence intervals and hypothesis tests,

– (2) linear and nonlinear regression,

– (3) time series analysis and other problems

55

Page 6: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Introduction (continued)Introduction (continued)

Definition of Efron’s nonparametric bootstrap.Given a sample of n independent identicallydistributed (i.i.d.) observations X1, X2, …, Xn froma distribution F and a parameter θ of thedistribution F with a real valued estimatorθ(X1, X2, …, Xn ), the bootstrap estimates the

accuracy of the estimator by replacing F with Fn,the empirical distribution, where Fn placesprobability mass 1/n at each observation Xi.

66

Page 7: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Introduction (continued)Introduction (continued)

• Let X1*, X2

*, …, Xn* be a bootstrap sample, that

is a sample of size n taken with replacementfrom Fn .

• The bootstrap, estimates the variance of

θ(X1, X2, …, Xn ) by computing orapproximating the variance of

θ* = θ(X1*, X2

*, …, Xn* ).

77

Page 8: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Schematic of Bootstrap ProcessSchematic of Bootstrap Process

88

Page 9: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Introduction (continued)Introduction (continued)

• Statistical Functionals - A functional is amapping that takes functions into realnumbers.

• Parameters of a distribution can usually beexpressed as functionals of the populationdistribution.

• Often the standard estimate of aparameter is the same functional appliedto the empirical distribution.

99

Page 10: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Introduction (continued)Introduction (continued)

• Statistical Functionals and the bootstrap.• A parameter θ is a functional T(F) where T

denotes the functional and F is apopulation distribution.

• An estimator of θ is θh = T(Fn) where Fn isthe empirical distribution function.

• Many statistical problems involveproperties of the distribution of θ - θh , itsmean (bias of θh ), variance, median etc.

1010

Page 11: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Introduction (continued)Introduction (continued)• Bootstrap idea: Cannot determine the distribution of

θ - θh but through the bootstrap we can determine, orapproximate through Monte Carlo, the distribution ofθh - θ*, where θ* = T(Fn

*) and Fn* is the empirical

distribution for a bootstrap sample X1*, X2

*,…,Xn*

(θ* is a bootstrap estimate of θ).• Based on k bootstrap samples the Monte Carlo

approximation to the distribution of θh - θ* is used toestimate bias, variance etc. for θh .

• In bootstrapping θh substitutes for θ and θ* substitutes forθh . Called the bootstrap principle.

1111

Page 12: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Introduction (continued)Introduction (continued)

1212

Page 13: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Introduction (continued)Introduction (continued)

• Basic Theory: Mathematical results show thatbootstrap estimates are consistent in particularcases.

• Basic Idea: Empirical distributions behave inlarge samples like population distributions.Glivenko-Cantelli Theorem tells us this.

• The smoothness condition is needed to transferconsistency to functionals of Fn, such as theestimate of the parameter θ.

1313

Page 14: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Wide Variety of ApplicationsWide Variety of Applications

• Efron and others recognized that throughthe power of fast computing the MonteCarlo approximation could be used toextend the bootstrap to many differentstatistical problems .

1414

Page 15: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Wide Variety of ApplicationsWide Variety of Applications(continued)(continued)

• It can estimate process capability indices fornon-Gaussian data.

• It is used to adjust p-values in a variety ofmultiple comparison situations.

• It can be extended to problems involvingdependent data including multivariate, spatialand time series data and in sampling fromfinite populations.

1515

Page 16: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Wide Variety of ApplicationsWide Variety of Applications(continued)(continued)

• It also has been applied to problems involvingmissing data.

• In many cases, the theory justifying the use ofbootstrap (e.g. consistency theorems) has beenextended to these non i.i.d. settings.

• In other cases, the bootstrap has been modifiedto “make it work.” The general case ofconfidence interval estimation is a notableexample.

1616

Page 17: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Confidence regions and hypothesisConfidence regions and hypothesisteststests

• The percentile method and other bootstrapvariations may require 1000 or morebootstrap replications to be very useful.

• The percentile method only works underspecial conditions.

• Bias correction and other adjustments aresometimes needed to make the bootstrap“accurate” and “correct” when the samplesize n is small or moderate.

1717

Page 18: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Confidence regions and hypothesisConfidence regions and hypothesistests (continued)tests (continued)

• Confidence intervals are accurate or nearlyexact when the stated confidence level for theintervals is approximately the long run probabilitythat the random interval contains the “true” valueof the parameter.

• Accurate confidence intervals are said to becorrect if they are approximately the shortestlength confidence intervals possible for the givenconfidence level.

1818

Page 19: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Examples where the bootstrap failsExamples where the bootstrap fails

• Athreya (1987) shows that the bootstrapestimate of the sample mean isinconsistent when the populationdistribution has an infinite variance.

• Angus (1993) provides similarinconsistency results for the maximumand minimum of a sequence ofindependent identically distributedobservations.

1919

Page 20: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Bootstrap RemediesBootstrap Remedies

• In the past decade many of the problemswhere the bootstrap is inconsistentremedies have been found by researchersto give good modified bootstrap solutionsthat are consistent.

• For both problems describe thus far asimple procedure called the m-out-nbootstrap has been shown to lead toconsistent estimates .

2020

Page 21: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

The The m-out-of-n m-out-of-n BootstrapBootstrap

• This idea was proposed by Bickel and Ren (1996) forhandling doubly censored data.

• Instead of sampling n times with replacement from asample of size n they suggest to do it only m timeswhere m is much less than n.

• To get the consistency results both m and n need to getlarge but at different rates. We need m=o(n). That ism/n→0 as m and n both → ∞.

• This method leads to consistent bootstrap estimates inmany cases where the ordinary bootstrap has problems,particularly (1) mean with infinite variance and (2)extreme value distributions.

2121

Page 22: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Available SoftwareAvailable Software• Resampling Stats from Resampling Stats Inc.

(provides basic bootstrap tools in easy to usesoftware and is good as an elementary teaching tool).

• SPlus from Insightful Corporation ( good foradvanced bootstrap techniques such as BCa, easy touse in new Windows based version). The currentmodule Resample is what I use in my bootstrap classat statistics.com.

• S functions provided by Tibshirani (see Appendix inEfron and Tibshirani text or visit Rob Tibshirani’s website http:/www.stat-stanford.edu/~tibs)

2222

Page 23: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

Available Software (continued)Available Software (continued)

• Stata has a bootstrap algorithm available thatsome users rave about.

• Mathworks and other examples (see SusanHolmes web page):

http:/www-stat.stanford.edu/~susan) or contact herby email

• SAS macros are available and Proc MULTTESTdoes bootstrap sampling.

2323

Page 24: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

References on confidence intervalsReferences on confidence intervalsand hypothesis testsand hypothesis tests

(1) Chernick, M.R. (1999). Bootstrap Methods: APractitioner’s Guide. Wiley, New York.(2) Chernick, M.R. (2007). Bootstrap Methods: AGuide for Practitioners and Researchers, 2nd

Edition. Wiley, New York.(3) Hall, P. (1992). The Bootstrap andEdgeworth Expansion. Springer-Verlag, NewYork.(4) Efron, B. (1982) The Jackknife, the Bootstrapand Other Resampling Plans. Society forIndustrial and Applied Mathematics CBMS-NSFRegional Conference Series 38, Philadelphia.

24

Page 25: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

(5) Carpenter, J. and Bithell, J. (2000).Bootstrap confidence intervals: when, which,what? A practical guide for medical statisticians.Statistics in Medicine 19, 1141-1164.(6) Bahadur, R.R. and Savage, L.J. (1956). Thenonexistence of certain statistical procedures innonparametric problems. Annals ofMathematical Statistics 27, 1115-1122.(7) Ewens, W.J. and Grant, G.R. (2001).Statistical Methods in Bioinformatics AnIntroduction.

References on confidence intervals andReferences on confidence intervals andhypothesis tests (continued)hypothesis tests (continued)

25

Page 26: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

References on when bootstrap failsReferences on when bootstrap fails(1) Angus, J. E. (1993). Asymptotic theory forbootstrapping the extremes. Communs. Statist.Theory and Methods 22, 15-30.(2) Athreya, K. B. (1987). Bootstrap estimationof the mean in the infinite variance case. Ann.Statist. 15, 724-731.(3) Bickel, P. J. and Freedman, D. A. (1981).Some asymptotic theory for the bootstrap. Ann.Statist. 9, 1196-1217.

26

Page 27: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

References on when bootstrap failsReferences on when bootstrap fails(continued)(continued)

(4) Chernick, M.R. (1999). Bootstrap Methods: APractitioner’s Guide. Wiley, New York.

(5) Chernick, M.R. (2007). Bootstrap Methods: AGuide for Practitioners and Researchers, 2nd

Edition. Wiley, New York.

(6) Cochran, W. (1977). Sampling Techniques.3rd ed., Wiley, New York

27

Page 28: Bootstrap Methods: Recent Advances and New …mescal.imag.fr/membres/yves.denneulin/LASCAR/page2/files/Bootstrap...Bootstrap Methods: Recent Advances and New Applications 2007 ICTP

References on when bootstrap failsReferences on when bootstrap fails(continued)(continued)

(7) Knight, K. (1989). On the bootstrap of thesample mean in the infinite variance case. Ann.Statist. 17, 1168-1175.

(8) LePage, R., and Billard, L. (editors). (1992).Exploring the Limit of Bootstrap. Wiley, NewYork.(9) Mammen, E. (1992). When Does theBootstrap Work? Asymptotic Results andSimulations Springer-Verlag, Heidelberg.

(10) Singh, K. (1981). On the asymptoticaccuracy of Efron’s bootstrap. Ann. Statist. 9,1187-1195.

28