73
Learning Objectives Understand the 2 goodness-of-fit test and how to use it. Analyze data using the 2 test of independence. Recognize the advantages and disadvantages of nonparametric statistics. Understand how to use the runs test to test for randomness. Know when and how to use the Mann-Whitney U test, the Wilcoxon matched-pairs signed rank test, the Kruskal-Wallis test, and the Friedman test. Learn when and how to measure correlation using Spearman’s rank correlation measurement.

Statr session 21 and 22

Embed Size (px)

Citation preview

Page 1: Statr session 21 and 22

Learning Objectives• Understand the 2 goodness-of-fit test and how

to use it.• Analyze data using the 2 test of independence.• Recognize the advantages and disadvantages of

nonparametric statistics.• Understand how to use the runs test to test for

randomness.• Know when and how to use the Mann-Whitney U

test, the Wilcoxon matched-pairs signed rank test, the Kruskal-Wallis test, and the Friedman test.

• Learn when and how to measure correlation using Spearman’s rank correlation measurement.

Page 2: Statr session 21 and 22

Goodness-of-Fit Test

• The Chi-square goodness-of-fit test compares expected (theoretical) frequencies of categories from a population distribution to the observed (actual) frequencies from a distribution to determine whether there is a difference between what was expected and what was observed.

• Chi-square goodness-of-fit test is used to analyze probabilities of multinomial distribution trials along a single dimension.

Page 3: Statr session 21 and 22

Goodness-of-Fit Test

The formula which is used to compute the test statistic for a chi-square goodness-of-fit test is given below.

Page 4: Statr session 21 and 22

Goodness-of-Fit Test

• The formula compares the frequency of observed values to the frequency of the expected values across the distribution.– Test loses one degree of freedom because the total

number of expected frequencies must equal the number of observed frequencies

• The chi-square distribution is the sum of thesquares of k independent random variables. – Can never be less than zero; it extends indefinitely in the

positive direction

Page 5: Statr session 21 and 22

Milk Sales Data forDemonstration Problem 16.1

Dairies would like to know whether the sales of milk are distributed uniformly over a year so they can plan for milk production and storage. A uniform distribution means that the frequencies are the same in all categories. In this situation, the producers are attempting to determine whether the amounts of milk sold are the same for each month of the year. They ascertain the number of gallons of milk sold by sampling one large supermarket each month during a year, obtaining the following data. Use .01 to test whether the data fit a uniform distribution.

Page 6: Statr session 21 and 22

Milk Sales Data forDemonstration Problem 16.1

January 1,610February 1,585

March 1,649April 1,590May 1,540June 1,397July 1,410

August 1,350September 1,495

October 1,564November 1,602December 1,655

18,447

Month Gallons

Page 7: Statr session 21 and 22

Hypotheses and DecisionRules for Demonstration Problem 16.1

ddistributeuniformly not are salesmilk for figuresmonthly The :H

ddistributeuniformly are salesmilk for figuresmonthly The :H

a

o

.

.. ,

011

12 1 011

24 72501 11

2

df k cIf reject H .

If do not reject H .

Cal

2o

Cal

2o

24 725

24 725

. ,

. ,

Page 8: Statr session 21 and 22

Calculations forDemonstration Problem 16.1

Month fo fe (fo - fe)2/fe

January 1,610 1,537.25 3.44February 1,585 1,537.25 1.48March 1,649 1,537.25 8.12April 1,590 1,537.25 1.81May 1,540 1,537.25 0.00June 1,397 1,537.25 12.80July 1,410 1,537.25 10.53August 1,350 1,537.25 22.81September 1,495 1,537.25 1.16October 1,564 1,537.25 0.47November 1,602 1,537.25 2.73December 1,655 1,537.25 9.02

18,447 18,447.00 74.38

Page 9: Statr session 21 and 22

Calculations forDemonstration Problem 16.1

• The observed chi-square value of 74.37 is greaterthan the critical value of 24.725.

• The decision is to reject the null hypothesis.The data provides enough evidence to indicatethat the distribution of milk sales is not uniform.

Page 10: Statr session 21 and 22

Calculations forDemonstration Problem 16.1

Page 11: Statr session 21 and 22

Test of Independence

• Chi-square goodness-of-fit test – is used to analyze the distribution of frequencies for categories of one variable to determine whether the distribution of these frequencies is the same as some hypothesized or expected distribution.

• The goodness-of-fit test cannot be used to analyzetwo variables simultaneously.

• Chi-square test of independence – is used to analyze the frequencies of two variables with multiple categories to determine whether the two variables are independent.

Page 12: Statr session 21 and 22

Test of Independence

• Different chi-square test, the chi-square test of independence, can be used to analyze the frequencies of two variables with multiple categories to determine whether the two variables are independent.

• Used to analyze the frequencies of two variables with multiple categories to determine whether the two variables are independent

• Two random variables x and y are called independent if the probability distribution of one variable is not affected by the presence of another.

Page 13: Statr session 21 and 22

Test of Independence

Assume fij is the observed frequency count of events belonging to both i-th category of x and j-th category of y. Also assume eij to be the corresponding expected count if x and y are independent. The null hypothesis of the independence assumption is to be rejected if the p-value of the following Chi-square test statistics is less than a given significance level α.

Page 14: Statr session 21 and 22

Test of Independence: GasolinePreference Versus Income Category

Suppose a business researcher wants to determine whether type of gasoline preferred is independent of a person’s income. She takes a random survey of gasoline purchasers, asking them one question about gasoline preference and a second question about income. The respondent checks which gasoline he or she prefers: (1) regular, (2) premium, or (3) extra premium. The respondent also is to check his or her income brackets as being (1) < $30,000, (2) $30,000 to $49,999, (3) $50,000 to $99,999, or (4) > $100,000.

Page 15: Statr session 21 and 22

Test of Independence: Type of Gasoline Versus Income Category

Hypotheses:

Using α = .01, she uses the chi-square test of independence to determine whether type of gasoline preferred is independent of income level.

Page 16: Statr session 21 and 22

Test of Independence: Type of Gasoline Versus Income Category

Type of Gasoline

Income Regular PremiumExtra

PremiumLess than $30,000

$30,000 to $49,999$50,000 to $99,000At least $100,000

r = 4 c = 3

Page 17: Statr session 21 and 22

Gasoline preference Versus Income Category: Observed Frequencies

Type of Gasoline

Income Regular PremiumExtra

PremiumLess than $30,000 85 16 6 107

$30,000 to $49,999 102 27 13 142$50,000 to $99,000 36 22 15 73At least $100,000 15 23 25 63

238 88 59 385

Page 18: Statr session 21 and 22

Gasoline preference Versus Income Category: Observed Frequencies

Type of Gasoline

Income Regular PremiumExtra

PremiumLess than $30,000 (66.15) (24.46) (16.40)

85 16 6 107$30,000 to $49,999 (87.78) (32.46) (21.76)

102 27 13 142$50,000 to $99,000 (45.13) (16.69) (11.19)

36 22 15 73At least $100,000 (38.95) (14.40) (9.65)

15 23 25 63238 88 59 385

ij

i j

en n

e

e

e

N

11

12

13

107 238385

66 15

107 88385

24 46

107 59385

16 40

.

.

.

Page 19: Statr session 21 and 22

Gasoline preference Versus Income Category: calculation

2

2

88 6615 16 24 46 6 16 40

102 87 78 27 32 46 13 2176

36 4513 22 16 69 15 1119

15 38 95 23 14 40 25 9 65

66 15 24 46 16 40

87 78 32 46 21 76

4513 16 69 1119

38 95 14 40 9 6570 78

o ef ff e

2 2 2

2 2 2

2 2 2

2 2 2

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . ..

Page 20: Statr session 21 and 22

Gasoline preference Versus Income Category

• The observed chi-square value of 70.78 is greaterthan the critical value of 16.8119.

• The decision is to reject the null hypothesis. Thedata does provide enough evidence to indicate that the type of gasoline preferred is not independent of income.

Page 21: Statr session 21 and 22

Gasoline preference Versus Income Category: calculation

Page 22: Statr session 21 and 22

Parametric versus Nonparametric Statistics

• Parametric Statistics are statistical techniques based on assumptions about the population from which the sample data are collected. Assumption that data being analyzed are randomly

selected from a normally distributed population. Requires quantitative measurement that yield interval

or ratio level data.• Nonparametric Statistics are based on fewer

assumptions about the population and the parameters. Sometimes called “distribution-free” statistics. A variety of nonparametric statistics are available for

use with nominal or ordinal data.

Page 23: Statr session 21 and 22

Advantages of Nonparametric Techniques

• Sometimes there is no parametric alternative to the use of nonparametric statistics.

• Certain nonparametric test can be used to analyze nominal data.

• Certain nonparametric test can be used to analyze ordinal data.

• The computations on nonparametric statistics are usually less complicated than those for parametric statistics, particularly for small samples.

• Probability statements obtained from most nonparametric tests are exact probabilities.

Page 24: Statr session 21 and 22

Disadvantages of Nonparametric Statistics

• Nonparametric tests can be wasteful of data if parametric tests are available for use with the data.

• Nonparametric tests are usually not as widely available and well known as parametric tests.

• For large samples, the calculations for many nonparametric statistics can be tedious.

Page 25: Statr session 21 and 22

Runs Test

• Test for randomness - Is the order or sequence of observations in a sample random or not?

• Each sample item possesses one of two possible characteristics

• Run – defined as a succession of observations which possess the same characteristic

• Example with two runs: F, F, F, F, F, F, F, F, M, M, M, M, M, M, M

• Example with fifteen runs: F, M, F, M, F, M, F, M, F, M, F, M, F, M, F

Page 26: Statr session 21 and 22

Runs Test: Sample Size Consideration

• Sample size: n• Number of sample member possessing the first

characteristic: n1

• Number of sample members possessing the second characteristic: n2

• n = n1 + n2

• If both n1 and n2 are 20, the small sample runstest is appropriate.

Page 27: Statr session 21 and 22

Runs Test: Small Sample Example

Suppose 26 cola drinkers are sampled randomly to determine whether they prefer regular cola or diet cola. The random sample contains 18 regular cola drinkers and 8 diet cola drinkers. Let C denote regular cola drinkers and D denote diet cola drinkers. Suppose the sequence of sampled cola drinkers is CCCCCDCCDCCCCDCDCCCDDDCCC.

Does this sequence of cola drinkers evidence that the sample is not random?

Page 28: Statr session 21 and 22

Runs Test: Small Sample Example

H0: The observations in the sample are randomly generated.Ha: The observations in the sample are not randomly generated.

= .05n1 = 18n2 = 8If 7 R 17, do not reject H0Otherwise, reject H0.

1 2 3 4 5 6 7 8 9 10 11 12D CCCCC D CC D CCCC D C D CCC DDD CCCR = 12Since 7 R = 12 17, do not reject H0

Page 29: Statr session 21 and 22

Runs Test: Small Sample Example in R

X = as.factor(c("c","c","c","d","d","d")) > runs.test(x) > Runs Test data: Standard Normal = -1.8257, p-value = 0.06789 alternative hypothesis: two.sided

Page 30: Statr session 21 and 22

Runs Test: Large Sample

Consider the following manufacturing example. A machine produces parts that are occasionally flawed. When the machine is working in adjustment, flaws still occur but seem to happen randomly. A quality-control person randomly selects 50 of the parts produced by the machine today and examines them one at a time in the order that they were made. The result is 40 parts with no flaws and 10 parts with flaws. The sequence of no flaws (denoted by N) and flaws (denoted by F ) is shown on an upcoming slide. Using an alpha of .05, the quality controller tests to determine whether the machine is producing randomly (the flaws are occurring randomly)

Page 31: Statr session 21 and 22

Runs Test: Large Sample

If either n1 or n2 is > 20, the sampling distribution of R is approximately normal.

Page 32: Statr session 21 and 22

Runs Test: Large Sample Example

-1.96 Z = -1.81 1.96,do not reject H0

Page 33: Statr session 21 and 22

Runs Test: Large Sample Example

H0: The observations in the sample are randomly generated.Ha: The observations in the sample are not randomly generated.

= .05n1 = 40n2 = 10If -1.96 Z 1.96, do not reject H0Otherwise, reject H0. 1 1 2 3 4 5 6 7 8 9 0 11NNN F NNNNNNN F NN FF NNNNNN F NNNN F NNNNN

12 13FFFF NNNNNNNNNNNN R = 13

Page 34: Statr session 21 and 22

Mann-Whitney U Test

• Mann-Whitney U test - a nonparametric counterpart of the t test used to compare the means of two independent populations.

• Nonparametric counterpart of the t test for independent samples

• Does not require normally distributed populations• May be applied to ordinal data• Assumptions

Independent Samples At Least Ordinal Data

Page 35: Statr session 21 and 22

Mann-Whitney U Test: Sample Size Consideration

• Size of sample one: n1

• Size of sample two: n2

• If both n1 and n2 are 10, the small sample procedure is appropriate.

• If either n1 or n2 is greater than 10, the large sample procedure is appropriate.

Page 36: Statr session 21 and 22

Mann-Whitney U Test: Small Sample Example - Demonstration Problem 17.1

• H0: The health service populationis identical to the educational service population on employee compensation

• Ha: The health service population is not identical to the educational service population on employee compensation

ServiceHealth Educational

Service20.10 26.1919.80 23.8822.36 25.5018.75 21.6421.90 24.8522.96 25.3020.75 24.12

23.45

Page 37: Statr session 21 and 22

Mann-Whitney U Test: Small Sample Example - Demonstration Problem 17.1

• Since U2 < U1, U = 3.

• p-value = .0011*2 (for a two-tailed test) = .022 < .05, reject H0.

1 1 21 1

1

2 1 22 2

2

1 2

12

77

231

53

12

79

289

3

U n n n n W

U n n n n W

n n

( )

( )(8)( )(8)

( )

( )(8)(8)( )

Page 38: Statr session 21 and 22

Mann-Whitney U Test: Formulas for Large Sample Case

Page 39: Statr session 21 and 22

Incomes of PBS and Non-PBS Viewers

The Mann-Whitney U test can be used to determine whether there is a difference in the average income of families who view PBS television and families who do not view PBS television. Suppose a sample of 14 families that have identified themselves as PBS television viewers and a sample of 13 families that have identified themselves as non-PBS television viewers are selected randomly.

Page 40: Statr session 21 and 22

Incomes of PBS and Non-PBS Viewers

Ho: The incomes for PBS viewers and non-PBS viewers are identical

Ha: The incomes for PBS viewers and non-PBS viewers are not identical

PBS Non-PBS24,500 41,00039,400 32,50036,800 33,00044,300 21,00057,960 40,50032,000 32,40061,000 16,00034,000 21,50043,500 39,50055,000 27,60039,000 43,50062,500 51,90061,400 27,80053,000

n1 = 14

n2 = 13

Page 41: Statr session 21 and 22

Ranks of Income from CombinedGroups of PBS and Non-PBS Viewers

Income Rank Group Income Rank Group16,000 1 Non-PBS 39,500 15 Non-PBS21,000 2 Non-PBS 40,500 16 Non-PBS21,500 3 Non-PBS 41,000 17 Non-PBS24,500 4 PBS 43,000 18 PBS27,600 5 Non-PBS 43,500 19.5 PBS27,800 6 Non-PBS 43,500 19.5 Non-PBS32,000 7 PBS 51,900 21 Non-PBS32,400 8 Non-PBS 53,000 22 PBS32,500 9 Non-PBS 55,000 23 PBS33,000 10 Non-PBS 57,960 24 PBS34,000 11 PBS 61,000 25 PBS36,800 12 PBS 61,400 26 PBS39,000 13 PBS 62,500 27 PBS39,400 14 PBS

Page 42: Statr session 21 and 22

PBS and Non-PBS Viewers: Calculation of U

Page 43: Statr session 21 and 22

PBS and Non-PBS Viewers: Conclusion

Page 44: Statr session 21 and 22

Wilcoxon Matched-Pairs Signed Rank Test

• Mann-Whitney U test is a nonparametric alternative to the t test for two independent samples. If the two samples are related, the U test is not applicable. Handle related data Serves as a nonparametric alternative to the t test for

two related samples A nonparametric alternative to the t test for related

samples• Before and After studies• Studies in which measures are taken on the same

person or object under different conditions• Studies of twins or other relatives

Page 45: Statr session 21 and 22

Wilcoxon Matched-Pairs Signed Rank Test

• Differences of the scores of the two matched samples

• Differences are ranked, ignoring the sign• Ranks are given the sign of the difference• Positive ranks are summed• Negative ranks are summed• T is the smaller sum of ranks

Page 46: Statr session 21 and 22

Wilcoxon Matched-Pairs Signed Rank Test:

Sample Size Consideration

• n is the number of matched pairs• If n > 15, T is approximately normally distributed,

and a Z test is used.• If n 15, a special “small sample” procedure is

followed. The paired data are randomly selected. The underlying distributions are symmetric.

Page 47: Statr session 21 and 22

Wilcoxon Matched-Pairs Signed Rank Test:

Small Sample Example

Consider the survey by American Demographics that estimated the average annual household spending on healthcare. The U.S. metropolitan average was $1,800. Suppose six families in Pittsburgh, Pennsylvania, are matched demographically with six families in Oakland, California, and their amounts of household spending on healthcare for last year are obtained.

Page 48: Statr session 21 and 22

Wilcoxon Matched-Pairs Signed Rank Test:

Small Sample Example

H0: Md = 0Ha: Md 0n = 6 =0.05

If Tobserved 1, reject H0.

Family Pair Pittsburgh Oakland

1 1,950 1,760 2 1,840 1,870 3 2,015 1,810 4 1,580 1,660 5 1,790 1,340 6 1,925 1,765

Page 49: Statr session 21 and 22

Wilcoxon Matched-Pairs Signed Rank Test:

Small Sample ExampleFamily

Pair Pittsburgh Oakland d Rank1 1,950 1,760 1902 1,840 1,870 -303 2,015 1,810 2054 1,580 1,660 -805 1,790 1,340 4506 1,925 1,765 160

+4-1

+5-2

+6+3

T = minimum(T+, T-)T+ = 4 + 5 + 6 + 3= 18T- = 1 + 2 = 3T = 3

T = 3 > Tcrit = 1, do not reject H0.

Page 50: Statr session 21 and 22

Wilcoxon Matched-Pairs Signed Rank Test:

Large Sample Formulas

For large samples, the T statistic is approximately normally distributed and a z score can be used as the test statistic. This technique can be applied to the airline industry, where an analyst might want to determine whether there is a difference in the cost per mile of airfares in the United States between 1979 and 2011 for various cities. The data in the next slide represent the costs per mile of airline tickets for a sample of 17 cities for both 1979 and 2011.

Page 51: Statr session 21 and 22

Wilcoxon Matched-Pairs Signed Rank Test:

Large Sample Formulas

Page 52: Statr session 21 and 22

Wilcoxon Matched-Pairs Signed Rank Test:

Large Sample Formulas

Page 53: Statr session 21 and 22

Airline Cost Data for 17 Cities, 1979 and 2009

City 1979 2011 d Rank City 1979 2011 d Rank1 20.3 22.8 -2.5 -8 10 20.3 20.9 -0.6 -12 19.5 12.7 6.8 17 11 19.2 22.6 -3.4 -11.53 18.6 14.1 4.5 13 12 19.5 16.9 2.6 94 20.9 16.1 4.8 15 13 18.7 20.6 -1.9 -6.55 19.9 25.2 -5.3 -16 14 17.7 18.5 -0.8 -26 18.6 20.2 -1.6 -4 15 21.6 23.4 -1.8 -57 19.6 14.9 4.7 14 16 22.4 21.3 1.1 38 23.2 21.3 1.9 6.5 17 20.8 17.4 3.4 11.59 21.8 18.7 3.1 10

H0: Md = 0Ha: Md 0

Page 54: Statr session 21 and 22

Airline Cost Data:T Calculation

Page 55: Statr session 21 and 22

Airline Cost Data:Conclusion

Page 56: Statr session 21 and 22

Kruskal-Wallis Test

• Kruskal-Wallis Test - A nonparametric alternativeto one-way analysis of variance

• May be used to analyze ordinal data• No assumed population shape• Assumes that the Treatment (C) groups are

independent• Assumes random selection of individual items

Page 57: Statr session 21 and 22

Kruskal-Wallis K Statistic

Page 58: Statr session 21 and 22

Number of Patients per Day per Physicianin Three Organizational Categories

Suppose a researcher wants to determine whether the number of physicians in an office produces significant differences in the number of office patients seen by each physician per day. She takes a random sample of physicians from practices in which (1) there are only two partners, (2) there are three or more partners, or (3) the office is a health maintenance organization (HMO).

Page 59: Statr session 21 and 22

Number of Patients per Day per Physicianin Three Organizational Categories

Ho: The three populations are identicalHa: At least one of the three populations is different

Two Partners

Three or More Partners HMO

13 24 2615 16 2220 19 3118 22 2723 25 28

14 3317

Page 60: Statr session 21 and 22

Patients per Day Data: Kruskal-Wallis TestPreliminary Calculations

n = n1 + n2 + n3 = 5 + 7 + 6 = 18

Two Partners

Three or More

Partners HMOPatients Rank Patients Rank Patients Rank

13 1 24 12 26 1415 3 16 4 22 9.520 8 19 7 31 1718 6 22 9.5 27 1523 11 25 13 28 16

14 2 33 1817 5

T1 = 29 T2 = 52.5 T3 = 89.5n1 = 5 n2 = 7 n3 = 6

Page 61: Statr session 21 and 22

Patients per Day Data: Kruskal-Wallis Test Calculations and Conclusion

Page 62: Statr session 21 and 22

Friedman Test

• Friedman Test - A nonparametric alternative to the randomized block design

• Assumptions The blocks are independent. There is no interaction between blocks and treatments. Observations within each block can be ranked.

• Hypotheses Ho: The treatment populations are equal Ha: At least one treatment population yields larger values

than at least one other treatment population

Page 63: Statr session 21 and 22

Friedman Test

Page 64: Statr session 21 and 22

Friedman Test: Tensile Strength of Plastic Housings

A manufacturing company assembles microcircuits that contain a plastic housing. Managers are concerned about an unacceptably high number of the products that sustained housing damage during shipment. The housing component is made by four different suppliers.

Managers have decided to conduct a study of the plastic housing by randomly selecting five housings made by each of the four suppliers. One housing is selected for each day of the week. That is, for each supplier, a housing made on Monday is selected, one made on Tuesday is selected, and so on. In analyzing the data, the treatment variable is supplier and the treatment levels are the four suppliers. The blocking effect is day of the week with each day representing a block level. The quality control team wants to determine whether there is any significant difference in the tensile strength of the plastic housing by supplier.

Page 65: Statr session 21 and 22

Friedman Test: Tensile Strength of Plastic Housings

Supplier 1 Supplier 2 Supplier 3 Supplier 4Monday 62 63 57 61Tuesday 63 61 59 65Wednesday 61 62 56 63Thursday 62 60 57 64Friday 64 63 58 66

Ho: The supplier populations are equalHa: At least one supplier population yields larger values

than at least one other supplier population

Page 66: Statr session 21 and 22

Friedman Test: Tensile Strength of Plastic Housings

Supplier 1 Supplier 2 Supplier 3 Supplier 4Monday 3 4 1 2Tuesday 3 2 1 4Wednesday 2 3 1 4Thursday 3 2 1 4Friday 3 2 1 4

14 13 5 18196 169 25 324jR2

jR

Page 67: Statr session 21 and 22

Friedman Test: Tensile Strength of Plastic Housings

Page 68: Statr session 21 and 22

Friedman Test: Tensile Strength of Plastic Housings

Page 69: Statr session 21 and 22

Spearman’s Rank Correlation

• Spearman’s Rank Correlation - Analyze the degreeof association of two variables

• Applicable to ordinal level data (ranks)

Page 70: Statr session 21 and 22

Spearman’s Rank Correlation: Example

Listed below are the average prices in dollars per 100 pounds for choice spring lambs and choice heifers over a 10-year period. The data were published by the National Agricultural Statistics Service of the U.S. Department of Agriculture. Suppose the researcher want to determine the strength of association of the prices between these two commodities by using Spearman’s rank correlation.

Page 71: Statr session 21 and 22

Spearman’s Rank Correlation Testfor Heifer and Lamb Prices

Page 72: Statr session 21 and 22

Spearman’s Rank Correlation Testfor Heifer and Lamb Prices

Page 73: Statr session 21 and 22

Spearman’s Rank Correlation Testfor Heifer and Lamb Prices

• The lamb prices are ranked and the heifer prices are ranked.

• The difference in ranks is computed for each year.• The differences are squared and summed,

producing ∑d2 = 108.• The number of pairs, n, is 10.• The value of rs = 0.345 indicates that there is a very

modest if not poor positive correlation between lamb and heifer prices.