25
-4 -2 0 2 4 0.0 0.1 0.2 0.3 0.4 x Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

Embed Size (px)

DESCRIPTION

Statistical Data Analysis 3 Today’s topics: Bootstrap (Chapter 4: 4.3, 4.4) 4. Bootstrap 4.1. Simulation (read yourself) (last week) 4.2. Bootstrap estimators for distribution (last week) 4.3. Bootstrap confidence intervals 4.4. Bootstrap tests

Citation preview

Page 1: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis

2011/2012

M. de Gunst

Lecture 4

Page 2: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 2

Statistical Data Analysis: Introduction

TopicsSummarizing dataExploring distributions Bootstrap (continued)Robust methodsNonparametric testsAnalysis of categorical dataMultiple linear regression

Page 3: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 3

Today’s topics: Bootstrap (Chapter 4: 4.3, 4.4)

4. Bootstrap4.1. Simulation (read yourself) (last week)4.2. Bootstrap estimators for distribution (last week)4.3. Bootstrap confidence intervals4.4. Bootstrap tests

Page 4: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 4

Bootstrap: recap (1)

Situation realizations of , independent, unknown distr.

P

Bootstrap to estimate distribution of

estimator or test statistic

Which steps? First errorSecond errorStep 1. Estimate by

Step 2. Estimate by i.e. by empirical distribution of

Page 5: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 5

Bootstrap: recap (2)

Step 1: Determine theoretical bootstrap estimator

empirical distributioni) Estimate P by parametric distribution, parameter estimated stochastic: estimator

ii) Estimate by

stochastic: bootstrap estimator

First error

Page 6: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 6

Bootstrap: recap (3)

Step 2: From estimator to estimate: fixed

i) If has explicit expression, then done ii) If not, then estimate the estimate: use bootstrap (sampling) scheme to estimate

where and from by empirical distribution of , is stochastic: estimator empirical distr. of simulated realizations of is estimate

Second error

Page 7: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 7

Bootstrap: recap (4)

Obtain empirical distr. of simulated realizations of

with bootstrap (sampling) scheme:

With the B bootstrap values get impression of (characteristics of) unknown distribution of Tn:

draw histogram compute sample variance compute sample sd

Page 8: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 8

4.3. Bootstrap confidence intervals (1)

Tn : estimator of unknown parameter θ

Seen: accuracy of estimator Tn : variance of estimator’s distribution

Now: accuracy of estimator Tn : confidence interval

(1 - 2α)x100% confidence interval for θ is interval around Tn such that it contains `true’ θ with probability > 1 - 2α

If interval is [Tn - b1, Tn + b2], how to determine b1 and b2?

(blackboard)

Page 9: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 9

Bootstrap confidence intervals (2)

(1 - 2α)x100% confidence interval for θ is interval around Tn such that it contains `true’ θ with probability > 1 - 2α

If interval is [Tn - b1, Tn + b2], then b1 and b2 determined by

[Tn - b1, Tn + b2] =

with , the distribution of Tn – θ,

So b1 and –b2 are quantiles of unknown distribution

How to estimate the quantiles b1 and –b2?

Page 10: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 10

Bootstrap confidence intervals (3)

Interval is [Tn - b1, Tn + b2] =

How to estimate quantiles b1 and –b2 of unknown distribution of Tn – θ?

Estimate with , use bootstrap

Givesestimate of conf interval: (4.1)

Page 11: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 11

Estimate of conf interval: (4.1)

In practice, determine in steps:

1. Estimate unknown distribution of Tn – θ with ,: use bootstrap

Same as before? No: Tn – θ , need bootstrap values

2. Estimate quantiles by empirical quantiles of bootstrap values

3. Bootstrap confidence interval:

Bootstrap confidence intervals (4)

(4.2)(You have to know this formula!!)

Page 12: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 12

Estimate of confidence interval:Correspondingbootstrap confidence interval:

This is original bootstrap confidence interval, also called reflection method

Other method: percentile methodEstimate of confidence interval:Correspondingbootstrap confidence interval:

Only suitable if symmetric around 0. (Asymptotically two methods give same result)

Bootstrap confidence intervals (5)

(4.2)

(4.1)

We will use!!

We just discussed:

Page 13: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 13

Bootstrap confidence intervals (5)

How to obtain the (sample) α-quantile ?

R: if zstar contains the bootstrap values > quantile(zstar, α)

Note: always same function of as of

For two samples and Y1 , . . . , Ym method is same

Example: if Tn,m = Xn-Ym, then Tn,m* = Xn * - Ym *

and Zn* = Xn * - Ym * - (Xn-Ym ) (cf. Example 4.4. in Reader)

Page 14: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 14

4.4. Bootstrap Tests (1)

Remember last week’s slide:

Page 15: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 15

From lecture 3: Kolmogorov-Smirnov test (5)

Data: yH0: F is normal ← composite null hypothesisH1 : F is not normal Test statistic:

R:> ks.test(y,pnorm)D = 0.6922, p-value = 6.661e-16

> ks.test(y,pnorm,mean=mean(y),sd=sd(y))D = 0.1081, p-value = 0.5655> mean(y)[1] 3.62158> sd(y)[1] 3.043356

adj

Incorrect: this is test for H0: F = N(0,1) H1: F ≠ N(0,1)

Incorrect : this is test for

H0: F = N(3.62158,(3.04335)2)

H1: F ≠ N(3.62158,(3.04335)2)

of y

Example

We have not used Dadj ! ! p-value should be

0.126 (next week)

Correct?

Correct?

Page 16: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 16

Bootstrap Tests (2)

Solve this with bootstrap test!

General idea on blackboard

Page 17: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 17

Bootstrap Tests (3) Example

Page 18: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 18

Bootstrap Tests (4)

> hist(dprec, prob=T)> qqnorm(dprec)

Example

dprec

dprec

Page 19: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 19

Bootstrap Tests (5) Example

Page 20: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 20

Bootstrap Tests (6) Example

Page 21: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 21

Bootstrap Tests (7) Example

Page 22: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 22

Bootstrap Tests (8) Example

Page 23: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 23

Bootstrap Tests (9) Example

Page 24: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 24

Recap

Bootstrap4.3. Bootstrap confidence intervals4.4. Bootstrap tests

Page 25: Statistical Data Analysis 2011/2012 M. de Gunst Lecture 4

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

x

Statistical Data Analysis 25

Bootstrap

The end