100
7/26/2004 Unit 14 - Stat 571 - Ramón V. León 1 Statistics 571: Statistical Methods Ramón V. León Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

  • Upload
    donga

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 1

Statistics 571: Statistical MethodsRamón V. León

Unit 14: NonparametricStatistical Methods

Page 2: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 2

Introductory Remarks• Most methods studied so far have been based on

the assumption of normally distributed data– Frequently this assumption is not valid– Sample size may be too small to verify it

• Sometimes the data is measured in an ordinal scale• Nonparametric or distribution-free statistical

methods– Make very few assumptions about the form of the

population distribution from which the data are sampled– Based on ranks so they can be used on ordinal data

• Will concentrate on hypothesis tests but will also mention confidence interval procedures.

Page 3: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 3

Inference for a Single Sample

1 2Consider a random sample , ,..., from a population with unknown median .

nx x xµ

(Recall that for nonnormal (especially skewed) distributions the median is a better measure of the center than the mean.)

0 0 1 0: vs. : H Hµ µ µ µ= >Example: Test whether the median household income of a population exceeds $50,000 based on a random sample of household incomes from that population

For simplicity we sometimes present methods for one-sided tests. Modifications for two-sided tests are straightforward and are given in the textbook Some examples in these notes are two-sided tests.

Page 4: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4

Sign Test for a Single Sample

0

0

1. Count the number of 's that exceed . Denote thisnumber by , called the number of plus signs. Let

, which is the number of minus signs.2. Reject if is large or equivalently if

ixs

s n sH s s

µ

+

− +

+ −

= −is small.

0 0 1 0: vs. : H Hµ µ µ µ= >

Sign test:

Test idea:Under the null hypothesis s+ has a binomial distribution, Bin (n, ½). So this test is simply the test for binomial proportions

Page 5: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 5

Sign Test ExampleA thermostat used in an electric device is to be checked for theaccuracy of its design setting of 200ºF. Ten thermostats were tested to determine their actual settings, resulting in the following data:

202.2, 203.4, 200.5, 202.5, 206.3, 198.0, 203.7, 200.8, 201.3, 199.0

0 1: 200 vs : 200H Hµ µ= ≠

(The t test based on the mean has P-value = 0.0453. However recall that the t test assumes a normal population)

10 1010 2

8 0

8 number of data values > 200, so

10 101 1P-value 2 2 0.1102 2i i

s

i i

+

= =

= =

= = =

∑ ∑

Page 6: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 6

Normal Approximation to Test StatisticIf the sample size is large ( 20) the common of and is approximated by a normal distribution with

1( ) ( ) ,2 2

1 1 ( ) ( ) (1 )2 2 4

Therefore can perform a o

S S

nE S E S np n

nVar S Var S np p n

+ −

+ −

+ −

= = = =

= = − = =

ne-sided - with 2 1 24

z tests nz

n+ − −

=

Page 7: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 7

P-values for SignTest Using JMP

Based on normal approximation to the binomial ( = z2 )

Page 8: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 8

Treatment of Ties• Theory of the test assumes that the distribution of

the data is continuous so in theory ties are impossible

• In practice they do occur because of rounding • A simple solution is to ignore the ties and work

only with the untied observation. This does reduce the effective sample size of the test and hence its power, but the loss is not significant if there are only a few ties

Page 9: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 9

( )

(1) (2) ( )

( 1) ( )

,1 2

Let be the ordered data values.

Then a (1- )-level CI for is given by

where is the lower 2 critical point

of the Bin n,1 2 distribution.

n

b n b

n

x x x

x x

b b α

α µµ

α+ −

≤ ≤ ⋅⋅⋅ ≤

≤ ≤

=

Note: Not all confidence levels are possible because of the discreteness of the Binomial distribution

Comfidence Interval for µ

Page 10: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 10

Thermostat Setting: Sign Confidence Interval for the Median

From Table A.1 we see that for 10 and p=0.5,the lower 0.011 critical point of the binomialdistribution is 1 and by symmetry the upper 0.011critical point is 9. Setting 2 0.011 which gives 1- 1 0.0

n

α α

=

= = −

(2) (9)

22 0.978,we find that 199.0 203.7

is a 97.8% CI for .

x xµ

µ

=

= ≤ ≤ =

Page 11: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 11

Sign Test for Matched Pairs

Drop 3 tied pairs. Then s+ = 20; s- = 3

Page 12: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 12

Sign Test for Matched Pairs

Page 13: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 13

Sign Test for Matched Pairs in JMPPearson’s p-value is not the same as the book’s two-sided P-value because the book uses the continuity correction in the normal approximation to the binomial distribution, i.e, book uses z = 3.336 (Page 567) rather than z = 3.544745 used by JMP. Note that (3.544745)2 = 12.5652

book

Page 14: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 14

Wilcoxon Signed Rank Test

0 0 1 0: vs. :H Hµ µ µ µ= ≠

More powerful than the sign test, however, it requires the assumption that the population distribution is symmetric

1. Rank and order the differences in terms of their absolute value

2. Calculate w+ = sum of the ranks of the positive differences w+ = 6 + 8 + 1 + 7 + 10 + 9 + 2 + 4

3. Reject H0 if w+ is large or small

Example 14.1 and 14.4: Thermostat Setting is 200° F

Page 15: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 15

Wilcoxon Signed Rank Test in JMP

This test finds a significant difference at α=0.05 while the sign test did not at even α=0.1

Page 16: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 16

Normal Approximation in the Wilcoxon Signed Rank Test

+ -For large , the null distribution of W W W can be well-approximated by a normal distributionwith mean and variance given by

( 1) ( 1)(2 1)( ) ( ) .4 24

For large samples a one-sided (

n

n n n n nE W and Var W+ + += =

∼ ∼

greater than median) z-test uses the statistic

( 1) / 4 1/ 2( 1)(2 1) 24

w n nzn n n+ − + −

=+ +

Page 17: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 17

Importance of Symmetric Population Assumption

Here even though H0 is true the long right hand tail makes the positive differences tend to be larger in magnitude than the negative differences, resulting in higher ranks. This inflates w+ and hence the test’s type I error probability.

Page 18: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 18

Null Distribution of the Wilcoxon Signed Rank

Statistics

Page 19: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 19

Null Distribution of the Wilcoxon Signed Rank Statistics

Page 20: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 20

Wilcoxon Signed Rank Statistic:Treatment of Ties

• There are two types of ties – Some of the data is equal to the median

• Drop these observations– Some of the differences from the median

may be tied• Use midrank, that is, the average rank

1 2 3 4

1 2 3 4

For example, suppose 1, 3, 3, 5

Then(2 3)1, 2.5, 4

2

d d d d

r r r r

= − = + = − = +

+= = = = =

With ties Table A.10is only approximate

Page 21: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 21

Wilcoxon Sign Rank Test: Matched Pair Design

Notice that we average the tied ranks

Two-Side P-valuesSigned test: 0.0008Signed Rank test: 0.0002t-test: 0.0000671 (Page 284)

(Notice that these tests require progressively more stringent assumptions about the population of differences)

Notice that we drop the three zero differences

Example 14.5: Comparing Two Methods of Cardiac Output

Page 22: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 22

JMP Calculation

Page 23: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 23

Signed Rank Confidence Interval for the Median

Page 24: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 24

Thermostat Setting: Wilcoxon Signed Rank Confidence Interval for MedianFrom Table A.10 we see that for 10, the upper 2.4%critical point is 47 and by symmetry the lower 2.4%

10(10 1)critical point is - 47 55 - 47 8. 2

Setting 2 0.024 and hence 1-α=1-0.048=0.952we find

n

α

=

+= =

=

9 8 1 47

that 200.10 203.55

is a 95.2% CI for x x xµ

µ++ = = ≤ ≤ =

Page 25: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 25

Inferences for Two Independent SamplesOne wants to show that the observations from one population tend to be larger than those from another population based on independent random samples

1 21 2 1 2, ,..., and , ,...,n nx x x y y y

Examples:•Treated patients tend to live longer than untreated patients•An equity fund tends to have a higher yield than a bond fund

Page 26: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 26

Wilcoxon-Mann-Whitney Test Example: Time to Failure of Two Capacitor Groups

Reject for extreme values of w1.

Page 27: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 27

Stochastic Ordering of Populations

1 2

is stochastically larger than ( )if for all real numbers , ( ) ( )equivalently, ( ) ( ) ( ) ( )with strict inequality for at least some .Denoted by

X Y X Yu

P X u P Y u

P X u F u F u P Y uu

X

> ≥ >

≤ = ≤ = ≤

1 2or equivalently by )Y F F<

Page 28: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 28

Stochastic Ordering Especial Case: Location Difference

2 1 2 1

is called a location parameterNotice that iff X Xθ

θ θ<≺

Page 29: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 29

Wilcoxon-Mann-Whitney Test

0 1 2

1 1 2

1 1 2 2 1

: ( )

One sided: : ( )Two sided: : or ( or )

H F F X Y

H F F X YH F F F F X Y Y X

=

<< <

Alternatives :∼

1 1 2Notice that the alternative is not :H F F≠

(Kolmogorov-Smirnov Test can handle this alternative)

Page 30: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 30

Wilcoxon Version of the Test

1 2

1 2

1 2 1 2

1 2

0 1

1. Rank all observations, , ,..., and , ,...,

in ascending order2. Sum the ranks of the 's and 's separately. Denote these sums by and 3. Reject H if is large or equival

n n

N n nx x x y y y

x yw w

w

= +

2ently is smallw

0 1 2 1 1 2: ( ) vs. : ( )H F F X Y H F F X Y= < ∼

Page 31: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 31

Mann-Whitney Test Version

The advantage of using the Mann-Whitney form of the test is thatthe same distribution applies whether we use u1 or u2

1 2( ) ( )P value P U u P U u− = ≥ = ≤

Page 32: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 32

Null Distribution of the Wilcoxon-Mann-Whitney Test Statistic

Under the null hypothesis each of these 10 ordering has an equal chance of occurring, namely, 1/10

510

2

=

Page 33: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 33

Null Distribution of the Wilcoxon-Mann-Whitney Test Statistic

1 1( 8) 0.1 0.1 0.2 (one-sided -value for 8)P w p w≥ = + = =1( : )H X Y

Page 34: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 34

Normal Approximation of Mann-Whitney Statistic

1 2

1 2 1 2

For large and , the null distribution of U can bewell approximated by a normal distribution with mean and variance given by

( 1)( ) and ( )2 12

A large sample one-sided - can be

n n

n n n n NE U Var U

z test

+= =

1 1 2

1 2

based on the statistic2 1 2

( 1)12

u n nzn n N− −

=+ 1( : )H X Y

Page 35: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 35

Treatment of Ties

A tie occurs when some x equal a y.– A contribution of ½ is counted towards both u1

and u2 for each tied pair– Equivalent to using the midrank method in

computing the Wilcoxon rank sum statistic

Page 36: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 36

Wilcoxon-Mann-Whitney Confidence Interval

Example14.8 shows that [d(18) , d(63) ] = [-1.1, 14.7] is a 95.6% CI for the difference of the two medians of the failure times of capacitors.This example is in the book errata since Table A.11 is not detailed enough.

Page 37: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 37

Wilcoxon-Mann-Whitney Test in JMP

With continuity correction. Used in the book which gets a one-sided p-value of0.0502

Without continuity correction

z2=1.6882

Page 38: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 38

Inference for Several Independent Samples: Kruskal-Wallis Test

Note that this is a completely randomized design

Page 39: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 39

Kruskal-Wallis Test

0 1 2 1: vs. : for some a i jH F F F H F F i j= = ⋅⋅⋅ = < ≠

Distance from the average rank

20 1,Reject if aH kw αχ −>

Page 40: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 40

Chi-Square Approximation

• For large samples the distribution of KW under the null hypothesis can be approximated by the chi-square distribution with a-1 degrees of freedom

• So reject H0 if

1,akw αχ −>

Page 41: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 41

Kruskal-Wallis Test Example

Reject if kw is large.

23,.005 12.837χ =

Page 42: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 42

Kruskal-Wallis Test in JMP

Page 43: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 43

Page 44: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 44

•Case method is different from Unitary method•Formula method is different from Unitary method

Page 45: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 45

Pairwise Comparisons: Is Any Pair of Treatments Different?

• One can use the Tukey Method on the average ranks to make approximate pairwise comparisons.

• This is one of many approximate techniques where ranks are substituted for the observations in the normal theory methods.

Page 46: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 46

Page 47: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 47

Page 48: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 48

Tukey’s Test Applied to the Ranks Averaged

Lack of agreement with the more precise method of Example 14.10. Here Equation method also seems to be different from Formula and Case method

Page 49: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 49

Example of Friedman’s Test

27,.025 16.012 - P-value =.0040 vs. .0003 for ANOVA tableχ =

Ranking is done within blocks

Page 50: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 50

Inference for Several Matched SamplesRandomized Block Design: 2 treatment groups 2 blocks observation on the i-th treatment in the j-th block

c.d.f of r.v. corresponding to the observed value

For simplicity assume ( ) ( )

i

ij

ij ij ij

ij i j

i

aby

F Y y

F y F y θ β

θ

≥≥=

=

= − −

iiii

i s the "treatment effect" is the "block effect"

i.e., we assume that there is no treatment by block interaction

jβi

Page 51: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 51

Friedman Test0 1 2 1: vs. : for some a i jH H i jθ θ θ θ θ= = ⋅⋅⋅ = > ≠

21,Reject if afr αχ −>

Distance from the total of the ranks from their expected value when there is no agreement between the blocks

Page 52: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 52

Pairwise Comparisons

Page 53: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 53

Rank Correlation Methods

• The Pearson correlation coefficient – measures only the degree of linear association between

two variables– Inferences use the assumption of bivariate normality of

the two variables

• We present two correlation coefficients that– Take into account only the ranks of the observations– Measure the degree of monotonic (increasing or

decreasing) association between two variables

Page 54: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 54

Motivating Example1 2 3 4 5( , ) (1, ), (2, ), (3, ), (4, ), (5, )x y e e e e e=

Note that there is a perfect positive association between between x and y with y = ex.

•The Pearson correlation correlation coefficient is only 0.886 because the relationship is not linear

•The rank correlation coefficients we present yield a value of 1for these data

Page 55: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 55

Spearman’s Rank Correlation Coefficient

Ranges between –1 and +1 with rs = -1 when there is a perfect negative association and rs = +1 when there is a perfect positive association

Page 56: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 56

Example 14.12 (Wine Consumption and Heart Disease Deaths per 100,000

Page 57: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 57

Page 58: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 58

Calculation of Spearman’s Rho

Page 59: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 59

Test for Association Based on Spearman’s Rank Correlation Coefficient

Page 60: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 60

Hypothesis Testing Example

0

1

: = Wine Consumption and Heart Disease Deaths are independent.

vs.: and are (negatively or positively) associated

H XY

H X Y

=

1 0.826 19 1 3.504Two-Sided 0.0004

Sz r nP value

= − = − − = −

− =

Evidence of negative association

Page 61: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 61

JMP Calculations: Pearson Correlation

50

100

150

200

250

300

350

Hea

rt D

isea

se D

eath

s

0 2 4 6 8 10Alcohol from Wine

Pearson correlation

Plot is fairly linear

Page 62: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 62

JMP Calculations: Spearman Rank Correlation

Page 63: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 63

Kendall’s Rank Correlation Coefficient: Key Concept Examples

Concordant pairs:(1,2), (4,9) (1 - 4)(2 - 9)>0(4,2), (3,1) (4 - 3)(2 - 1)>0

Discordant pairs:(1,2), (9,1) (1 - 9)(2 - 1)<0(2,4), (3,1) (2 - 3)(4 - 1)<0

Tied pairs:(1,3), (1,5) (1 – 1)(3 – 5)=0(1,4), (2,4) (1 – 2)(4 – 4)=0(1,2), (1,2) (1 – 1)(2 – 2)=0

Kendall’s idea is to compare the number of concordant pairs to the number of discordant pairs in bivariate data

Page 64: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 64

Kendall’s TauExample

(X, Y)(1, 2)(3, 4)(2, 1)

Concordant pairs:(1,2) (3,4)(3,4) (2,1)

Nc = 2

Discordant pairs:(1,2) (2,1)

N d = 1

n 3Number of pairwise comparisons = 3

2 2N

= = =

ˆ

2 13

13

c dN NN

τ −=

−=

=

Page 65: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 65

Kendall’s Rank Correlation Coefficient: Population Version

Page 66: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 66

Kendall’s Rank Correlation Coefficient: Sample Estimate

Let Number of concordant pairs in the dataLet Number of disconcordant pairs in the data

Let be the number of pairwise comparisons among2

the observations ( , ), 1, 2,..., . Then

ˆ

c

d

i i

NN

nN

x y i n

τ

=

=

=

=

= and if no ties

ˆ if ties ( )( )

where and are corrections for the number of tied pairs.

c dc d

c d

x y

x y

N N N N NN

N NN T N T

T T

τ

−+ =

−=

− −

Page 67: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 67

Hypothesis of Independence Versus Positive Association

Wine data:-4.164-.696

Page 68: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 68

JMP Calculations:

Kendall’s Rank

Correlation Coefficient

Page 69: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 69

Kendall’s Coefficient of Concordance• Measure of association between several matched samples• Closely related to Friedman’s test statistic

– Consider a candidates (treatments) and b judges (blocks) with each judge ranking the a candidates

• If there is perfect agreement between the judges, then each candidate gets the same rank. Assuming the candidates are labeled in the order of their ranking, the rank sum for the ithcandidate would be ri= ib

• If the judges rank the candidates completely at random (“perfect disagreement”) then the expected rank of each candidate would be [1+2+…+a]/a =[a(a+1)/2]/a=(a+1)/2, and the expected value of all the rank sums would equal to b(a+1)/2

Page 70: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 70

Kendall’s Coefficient of Concordance

Page 71: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 71

Kendall’s Coefficient of Concordance and Friedman’s Test

Page 72: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 72

24.667 0.8814(8 1)

w = =−

Page 73: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 73

Do You Need to Know More

“Nonparametric Statistical Methods, Second Edition” by Myles Hollander and Douglas A. Wolfe. (1999) Wiley-Interscience

Page 74: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 74

Resampling Methods

• Conventional methods are based on the sampling distribution of a statistic computed for the observed sample. The sampling distribution is derived by considering all possible samples of size n from the underlying population.

• Resampling methods generate the sampling distribution of the statistic by drawing repeated samples from the observed sample itself. This eliminates the need to assume a specific functional form for the population distribution (e.g. normal).

Page 75: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 75

Challenger Shuttle O-Ring Data

Do we have statistical evidence that cold temperature leads to more O-ring incidents?

•Notice that assumptions of two sample t test do not hold.

•Original analysis omitted the zeros? Was this justified?

•What do we do?

Page 76: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 76

Wrong t-test Analysis

Notice that the assumptions of the independent sample t-test do not hold, i.e., data is not normal for each group.

Difference of Low mean to High mean

Page 77: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 77

Permutation Distribution of t Statistic

Also equal to the two-sided p-value

Equivalent to selecting all simple random samples without replacement of size 20 from the 24 data points, labeling these High and the rest Low

Page 78: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 78

Comments

• A randomization test is a permutation test applied to data from a randomized experiment. Randomization tests are the gold standard for establishing causality.

• A permutation test considers all possible simple random samples without replacement from the set of observed data values

• The bootstrap method considers a large number of simple random samples with replacement from the set of observed data values.

Page 79: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 79

Calculation of t Statistics from 10,000 Bootstrap Samples

-2 -1 0 1 2 3 4 5 6

Think that we are placing the 24 Challenger data values in a hat. And that we are randomly selecting 24 values with replacement from the hat, labeling the first 20 values High and the remaining 4 values Low. We repeat these process 10,000 times. For each of these 10,000 bootstrap samples we calculate the t-statistic. 35 t-statistics values were greater than or equal to 3.888 out of 10000 (if sp= 0, t is defined to be 0). This gives a bootstrap P-value of 35/10000 = 0.0035

Page 80: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 80

Bootstrap Distribution of Difference Between the Means

-1 0 1 2

67 of the 10,000 differences of the Low mean and the High mean were greater than or equal to 1.3. This gives a bootstrap P-value of 67/10000 = .0067

Conclusion: Cold weather increases the chance of O-ring problems

Page 81: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 81

Bootstrap Final Remarks• The JMP files - that we used to generate the

bootstrap samples and to calculate the statistics -are available at the course web site.

• There are bootstrap procedures for most types of statistical problems. All are based on resampling from the data.

• These methods do not assume specific functional forms for the distribution of the data, e.g. normal

• The accuracy of bootstrap procedures depend on the sample size and the number of bootstrap samples generated

Page 82: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 82

How Were the

Bootstrap Samples

Generated?

(see next page)

Page 83: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 83

Page 84: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 84

Page 85: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 85

Page 86: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 86

Page 87: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 87

Calculated Columns in JMP Samples File

Page 88: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 88

Page 89: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 89

Page 90: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 90

Page 91: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 91

Page 92: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 92

Page 93: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 93

Page 94: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 94

Page 95: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 95

Bootstrap Estimate of the Standard Error of the Mean

Summary: We calculate the standard deviation of the N bootstrap estimates of the mean

Page 96: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 96

BSE for Arbitrary Statistic

Example: The bootstrap standard error of the median is calculated by drawing a large number N, e.g. 10000, of bootstrap samples from the data. For each bootstrap sample we calculated the sample median.Then we calculate the standard deviation of the N bootstrap medians.

Page 97: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 97

Estimated Bootstrap Standard Error for t-statistics Using JMP

Note N =10,000

Page 98: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 98

Bootstrap Standard Error Interpretation

• Many bootstrap statistics have an approximate normal distribution

• Confidence interval interpretation– 68% of the time the bootstrap estimate (the

average of the bootstrap estimates) will be within one standard error of true parameter value

– 95% of the time the bootstrap estimate (the average of the bootstrap estimates) will be within two standard error of true parameter value

Page 99: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 99

Bootstrap Confidence Intervals –Percentile Method: Median Example1. Draw N (= 10000) bootstrap samples from the data and for

each calculate the (bootstrap) sample median.2. The 2.5 percentile of the N bootstrap sample medians will

be the LCL for a 95% confidence interval3. The 97.5 percentile of the N bootstrap sample medians will

be the UCL for a 95% confidence interval

LCL UCL

0.0250.025

Page 100: Unit 14: Nonparametric Statistical Methodsweb.utk.edu/~leon/stat571/2004SummerPDFs/571Unit14.pdf7/26/2004 Unit 14 - Stat 571 - Ramón V. León 4 Sign Test for a Single Sample 0 0 1

7/26/2004 Unit 14 - Stat 571 - Ramón V. León 100

Do You Need to Know More?

“A Introduction to the Bootstrap” by Bradley Efrom and Robert J. Tibshirani. (1993) Chapman & Hall/CRC