62
Introduction to Introduction to Biostatistics (ZJU Biostatistics (ZJU 2008) 2008) Wenjiang Fu, Ph.D Wenjiang Fu, Ph.D Associate Professor Associate Professor Division of Biostatistics, Division of Biostatistics, Department of Epidemiology Department of Epidemiology Michigan State University Michigan State University East Lansing, Michigan 48824, USA East Lansing, Michigan 48824, USA Email: Email: [email protected] [email protected] www: www: http://www.msu.edu/~fuw http://www.msu.edu/~fuw

Introduction to Biostatistics (ZJU 2008)

Embed Size (px)

DESCRIPTION

Introduction to Biostatistics (ZJU 2008). Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University East Lansing, Michigan 48824, USA Email: [email protected] www: http://www.msu.edu/~fuw. Categorical Data Analysis. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to Biostatistics (ZJU 2008)

Introduction to Introduction to Biostatistics (ZJU Biostatistics (ZJU

2008)2008)Wenjiang Fu, Ph.DWenjiang Fu, Ph.DAssociate ProfessorAssociate Professor

Division of Biostatistics, Department of Division of Biostatistics, Department of Epidemiology Epidemiology

Michigan State UniversityMichigan State UniversityEast Lansing, Michigan 48824, USAEast Lansing, Michigan 48824, USA

Email: Email: [email protected]@msu.eduwww: www: http://www.msu.edu/~fuwhttp://www.msu.edu/~fuw

Page 2: Introduction to Biostatistics (ZJU 2008)

Categorical Data AnalysisCategorical Data Analysis Examples of categorical data. Examples of categorical data.

Qualitative Random Variables Yield Qualitative Random Variables Yield Responses That Can Be Put In Categories. Responses That Can Be Put In Categories. Example: Gender (Male, Female), disease Example: Gender (Male, Female), disease (+, -).(+, -). Measurement or Count Reflect # in CategoryMeasurement or Count Reflect # in Category Nominal (no order) or Ordinal Scale (order)Nominal (no order) or Ordinal Scale (order) Data can be collected as continuous but Data can be collected as continuous but

recoded to categorical data. Example (Systolic recoded to categorical data. Example (Systolic Blood Pressure - Hypotension, Normal tension, Blood Pressure - Hypotension, Normal tension, hypertension )hypertension )

Counts of episodes of incidence. Counts of episodes of incidence.

Page 3: Introduction to Biostatistics (ZJU 2008)

Categorical Data AnalysisCategorical Data Analysis Responses are not continuous, Responses are not continuous,

cannot use linear regression cannot use linear regression model or ANOVA model.model or ANOVA model.

Not even to use transformations Not even to use transformations to make response variable to make response variable meaningful under new scale.meaningful under new scale.

Need to introduce new methods Need to introduce new methods to analyze data. to analyze data.

Page 4: Introduction to Biostatistics (ZJU 2008)

Test for Test for TwoTwo ProportionsProportions

Research Questions

Hypothesis No DifferenceAny Difference

Pop 1 Pop 2Pop 1 < Pop 2

Pop 1 Pop 2Pop 1 > Pop 2

H0 p1 - p2 = 0 p1 - p2 0 p1 - p2

0

Ha p1 - p2 0 p1 - p2 < 0 p1 - p2 > 0

Research Questions

Hypothesis No DifferenceAny Difference

Pop 1 Pop 2Pop 1 < Pop 2

Pop 1 Pop 2Pop 1 > Pop 2

H0 p1 - p2 = 0 p1 - p2 0 p1 - p2

0

Ha p1 - p2 0 p1 - p2 < 0 p1 - p2 > 0

Page 5: Introduction to Biostatistics (ZJU 2008)

Z Test for Two Z Test for Two ProportionsProportions

1.1. AssumptionsAssumptions Populations Are IndependentPopulations Are Independent Populations Follow Binomial Populations Follow Binomial

DistributionDistribution Normal Approximation Can Be Used Normal Approximation Can Be Used

for large samples for large samples (All Expected (All Expected Counts Counts 5) 5)

2.2. Z-Test Statistic for Two ProportionsZ-Test Statistic for Two Proportions

1 2 1 2 1 2

1 2

1 2

ˆ ˆwhere

1 11

p p p p X XZ p

n np p

n n

1 2 1 2 1 2

1 2

1 2

ˆ ˆwhere

1 11

p p p p X XZ p

n np p

n n

Page 6: Introduction to Biostatistics (ZJU 2008)

Sample Distribution for Sample Distribution for Difference Between Difference Between

Proportions Proportions

1 1 2 21 2 1 2

1 2

0 1 21 2

1 2

1 2

1 1 N ;

1 1N 0; :

,

p p p pp p p p

n n

pq under H p pn n

x xp

n n

2 21 2

1 2 1 21 2

~ N ;X Xn n

Page 7: Introduction to Biostatistics (ZJU 2008)

Z Test for Two Z Test for Two Proportions Proportions

Thinking Challenge Thinking Challenge You’re an epidemiologist for the You’re an epidemiologist for the

US Department of Health and US Department of Health and Human Services. You’re studying Human Services. You’re studying the prevalence of disease X in two the prevalence of disease X in two states (MA and CA). In states (MA and CA). In MAMA, , 7474 of of 15001500 people surveyed were people surveyed were diseased and in diseased and in CACA, , 129 129 of of 15001500 were diseased. At were diseased. At .05.05 level, does level, does MAMA have a have a lowerlower prevalence prevalence rate? rate?

MA

CA

Page 8: Introduction to Biostatistics (ZJU 2008)

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:

Z Test for Two Z Test for Two Proportions Proportions Solution*Solution*

HH00: : ppMAMA - - ppCACA = 0 = 0

HHaa: : ppMAMA - - ppCACA < 0 < 0

== .05 .05

nnMAMA == 1500 1500 nnCACA == 15001500

Critical Value(s):Critical Value(s):

Z0-1.645

.05

Reject

Z0-1.645

.05

Reject

Page 9: Introduction to Biostatistics (ZJU 2008)

Z Test for Two Z Test for Two Proportions Solution*Proportions Solution*

74 129ˆ ˆ.0493 .0860

1500 1500

74 129 .0677

1500 1500

.0493 .0860 0

1 1.0677 1 .0677

1500 1500

4.00

CAMAMA CA

MA CA

MA CA

MA CA

XXp p

n n

X Xp

n n

Z

74 129ˆ ˆ.0493 .0860

1500 1500

74 129 .0677

1500 1500

.0493 .0860 0

1 1.0677 1 .0677

1500 1500

4.00

CAMAMA CA

MA CA

MA CA

MA CA

XXp p

n n

X Xp

n n

Z

Page 10: Introduction to Biostatistics (ZJU 2008)

Z = -4.00Z = -4.00

Z Test for Two Z Test for Two Proportions Proportions Solution*Solution*

HH00: : ppMAMA - - ppCACA = 0 = 0

HHaa: : ppMAMA - - ppCACA < 0 < 0

== .05 .05

nnMAMA = = 1500 1500 nnCACA == 15001500

Critical Value(s):Critical Value(s):

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:

Z0-1.645

.05

Reject

Z0-1.645

.05

Reject

Page 11: Introduction to Biostatistics (ZJU 2008)

Z = -4.00Z = -4.00

Z Test for Two Z Test for Two Proportions Proportions Solution*Solution*

HH00: : ppMAMA - - ppCACA = 0 = 0

HHaa: : ppMAMA - - ppCACA < 0 < 0

= = .05.05

nnMAMA = = 1500 1500 nnCACA == 15001500

Critical Value(s):Critical Value(s):

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:

Z0-1.645

.05

Reject

Z0-1.645

.05

Reject Reject at Reject at = .05 = .05

Page 12: Introduction to Biostatistics (ZJU 2008)

Z = -4.00Z = -4.00

Z Test for Two Z Test for Two Proportions Proportions Solution*Solution*

HH00: : ppMAMA - - ppCACA = 0 = 0

HHaa: : ppMAMA - - ppCACA < 0 < 0

== .05 .05

nnMAMA == 1500 1500 nnCACA == 15001500

Critical Value(s):Critical Value(s):

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:

Z0-1.645

.05

Reject

Z0-1.645

.05

Reject Reject at Reject at = .05 = .05

There is evidence MA There is evidence MA is less than CAis less than CA

Page 13: Introduction to Biostatistics (ZJU 2008)

Chi-Square (Chi-Square (22) Test ) Test for for kk Proportions Proportions

1.1. Tests Equality (=) of Proportions OnlyTests Equality (=) of Proportions OnlyExample: Example: pp11 = .2, = .2, pp22=.3, =.3, pp33 = .5 = .5

2.2. One Variable With Several LevelsOne Variable With Several Levels

3.3. AssumptionsAssumptionsMultinomial ExperimentMultinomial Experiment

Large Sample Size (All Expected Counts Large Sample Size (All Expected Counts 5) 5)

4.4. Uses One-Way Contingency TableUses One-Way Contingency Table

Page 14: Introduction to Biostatistics (ZJU 2008)

Multinomial ExperimentMultinomial Experiment

1.1. nn Identical Trials Identical Trials2.2. kk Outcomes to Each Trial Outcomes to Each Trial

3.3. Constant Outcome Probability, Constant Outcome Probability, pp1 1 ,…, ,…, ppkk

4.4. Independent TrialsIndependent Trials

5.5. Random Variable is Count, nRandom Variable is Count, n1 1 ,…, n,…, nkk

6.6. Example: In health services research, Example: In health services research, ask 100 workers Which of 3 health ask 100 workers Which of 3 health insurance plans (insurance plans (k=3k=3) they prefer) they prefer

Page 15: Introduction to Biostatistics (ZJU 2008)

Health insurance plan

HMO PPO Other Total

35 20 45 100

Health insurance plan

HMO PPO Other Total

35 20 45 100

One-Way Contingency One-Way Contingency TableTable

1.1. Shows # Observations in Shows # Observations in kk Independent Groups (Outcomes or Independent Groups (Outcomes or Variable Levels)Variable Levels) Outcomes (Outcomes (kk = 3) = 3)

Number of responsesNumber of responses

Page 16: Introduction to Biostatistics (ZJU 2008)

1.1. HypothesesHypothesesHH00: : pp11 = = pp1,01,0, , pp22 = = pp2,02,0, ..., , ..., ppkk = = ppkk,0,0

HHaa: Not all : Not all ppii are equal are equal

2.2. Test StatisticTest Statistic

3.3. Degrees of Freedom: Degrees of Freedom: kk - 1 - 1

22

n E n

E ni i

i

afafall cells

22

n E n

E ni i

i

afafall cells

22 Test for Test for kk Proportions Proportions

Hypotheses & StatisticHypotheses & Statistic

Observed countObserved count

Expected countExpected count

Number of Number of outcomesoutcomes

Hypothesized Hypothesized probabilityprobability

Page 17: Introduction to Biostatistics (ZJU 2008)

22 Test Basic Idea Test Basic Idea

1.1. Compares Observed Count to Compares Observed Count to Expected Count under the Null Expected Count under the Null HypothesisHypothesis

2.2. The Closer Observed Count to The Closer Observed Count to Expected Count, the More Likely the Expected Count, the More Likely the HH00 Is True Is True Measured by Squared Difference Measured by Squared Difference

Relative to Expected CountRelative to Expected Count Reject Large ValuesReject Large Values

Page 18: Introduction to Biostatistics (ZJU 2008)

Upper Tail AreaDF .995 … .95 … .051 ... … 0.004 … 3.8412 0.010 … 0.103 … 5.991

Upper Tail AreaDF .995 … .95 … .051 ... … 0.004 … 3.8412 0.010 … 0.103 … 5.991

Finding Critical Value Finding Critical Value ExampleExample

20 5.991

Reject

20 5.991

Reject

What is the critical What is the critical 22 value if value if kk = 3, & = 3, & =.05? =.05?

= .05= .05

22 Table Table (Portion)(Portion)

dfdf = = kk - 1 = 2 - 1 = 2

If If nnii = = EE((nnii)), , 22 = 0. = 0.

Do not reject HDo not reject H00

Page 19: Introduction to Biostatistics (ZJU 2008)

As an MD epidemiologist, you want to test As an MD epidemiologist, you want to test the difference in prevention measures for the difference in prevention measures for patients with CVD diseases Hx. Of patients with CVD diseases Hx. Of 180180 patientspatients, , 6363 were following strict were following strict prevention measures regularlyprevention measures regularly (taking beta- (taking beta-blockers, aspirin, etc…), blockers, aspirin, etc…), 4545 were taking were taking some prevention measures but not on a some prevention measures but not on a regular basis regular basis and and 7272 were not following were not following any preventionsany preventions. At the . At the .05.05 level, is there a level, is there a differencedifference in prevention measures?in prevention measures?

22 Test for Test for kk Proportions ExampleProportions Example

Page 20: Introduction to Biostatistics (ZJU 2008)

22 Test for Test for kk Proportions SolutionProportions Solution

HH00: : pp11 = = pp22 = = pp33 = = 1/3 1/3

HHaa: : At least 1 is At least 1 is differentdifferent

= = .05.05

nn11 = = 63 63 nn22 = = 45 45 nn33 = = 72 72

Critical Value(s):Critical Value(s):

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:

20 5.991

Reject

20 5.991

Reject

= .05= .05

Page 21: Introduction to Biostatistics (ZJU 2008)

E n np

E n E n E n

n E n

E n

n n n

i i

i i

i

afaf af af a f

afaf

,

.

0

1 2 3

22

12

22

32

2 2 2

180 1 3 60

60

60

60

60

60

60

63 60

60

45 60

60

72 60

606 3

all cells

E n np

E n E n E n

n E n

E n

n n n

i i

i i

i

afaf af af a f

afaf

,

.

0

1 2 3

22

12

22

32

2 2 2

180 1 3 60

60

60

60

60

60

60

63 60

60

45 60

60

72 60

606 3

all cells

22 Test for Test for kk Proportions SolutionProportions Solution

Page 22: Introduction to Biostatistics (ZJU 2008)

22 Test for Test for kk Proportions SolutionProportions Solution

HH00: : pp11 = = pp22 = = pp33 = = 1/3 1/3

HHaa: : At least 1 is At least 1 is differentdifferent

= = .05.05

nn11 = = 63 63 nn22 = = 45 45 nn33 = = 72 72

Critical Value(s):Critical Value(s):

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:

Reject at Reject at = .05 = .05

There is evidence of a There is evidence of a difference in proportions difference in proportions 20 5.991

Reject

20 5.991

Reject

= .05= .05

22 = 6.3 = 6.3

Page 23: Introduction to Biostatistics (ZJU 2008)

22 Test for Test for kk Proportions ProportionsSAS CodesSAS Codes

DataData CVD; CVD; input method count;input method count;datalines;datalines;1 631 632 452 453 723 72;;runrun;;

procproc freqfreq data=CVD order=data; data=CVD order=data; weight Count;weight Count; tables method/nocum testp=(tables method/nocum testp=(0.3330.333 0.3330.333 0.3330.333););runrun;;

Page 24: Introduction to Biostatistics (ZJU 2008)

22 Test for Test for kk Proportions ProportionsSAS OutputSAS Output

The FREQ Procedure

Test method Frequency Percent

Percent --------------------------------------------------------- 1 63 35.00 33.30 2 45 25.00 33.30 3 72 40.00 33.30

Chi-Square Test for Specified Proportions

-------------------------------------------- Chi-Square 6.3065 DF 2 Pr > ChiSq 0.0427

Sample Size = 180

Page 25: Introduction to Biostatistics (ZJU 2008)

R program for Chisq.testR program for Chisq.test

chisq.test(c(63,45,72),p=c(1/3,1/3,1/chisq.test(c(63,45,72),p=c(1/3,1/3,1/3))3))

Chi-squared test for given Chi-squared test for given probabilitiesprobabilities

data: c(63, 45, 72) data: c(63, 45, 72)

X-squared = 6.3, df = 2, p-value = X-squared = 6.3, df = 2, p-value = 0.042850.04285

Page 26: Introduction to Biostatistics (ZJU 2008)

Relation to the Z test for Relation to the Z test for comparison of two comparison of two

proportions?proportions? When k=2, only two groups are to be When k=2, only two groups are to be

compared, the Z test and the compared, the Z test and the 22 Test for Test for 2 Proportions are equivalent tests.2 Proportions are equivalent tests.(p(p11=p=p22=1/2)=1/2)

Notice that when k=2, the Notice that when k=2, the 22 Test for 2 Test for 2 Proportions will have 1dfProportions will have 1df

In distribution, In distribution, 22 = Z = Z22

Page 27: Introduction to Biostatistics (ZJU 2008)

22 Test of Homogeneity Test of Homogeneity

1.1. Shows If a Relationship Exists Shows If a Relationship Exists Between 2 Qualitative Variables, but Between 2 Qualitative Variables, but does does NotNot Show Causality (no Show Causality (no specification of dependency)specification of dependency)

2.2. AssumptionsAssumptionsMultinomial ExperimentMultinomial Experiment

All Expected Counts All Expected Counts 5 5

3.3. Uses Two-Way Contingency Uses Two-Way Contingency TableTable

Page 28: Introduction to Biostatistics (ZJU 2008)

Residence Disease Status

Urban Rural Total

Disease 63 49 112 No disease 15 33 48 Total 78 82 160

Residence Disease Status

Urban Rural Total

Disease 63 49 112 No disease 15 33 48 Total 78 82 160

22 Test of Test of Homogeneity Homogeneity

Contingency Table Contingency Table 1.1. Shows # Observations From 1 Shows # Observations From 1

Sample Jointly in 2 Qualitative Sample Jointly in 2 Qualitative VariablesVariables Levels of variable 2Levels of variable 2

Levels of variable 1Levels of variable 1

Page 29: Introduction to Biostatistics (ZJU 2008)

22 Test of Test of Homogeneity Homogeneity Hypotheses & Hypotheses &

StatisticStatistic1.1. HypothesesHypothesesHH00: Homogeneity or equal proportions : Homogeneity or equal proportions

HHaa: Heterogeneity or unequal proportions: Heterogeneity or unequal proportions

2.2. Test StatisticTest Statistic

Degrees of Freedom: (Degrees of Freedom: (rr - 1)( - 1)(cc - 1) - 1)RowsRows Columns Columns

Observed countObserved count

Expected Expected countcount 2

2

n E n

E n

ij ij

ij

c h

c hall cells

2

2

n E n

E n

ij ij

ij

c h

c hall cells

Page 30: Introduction to Biostatistics (ZJU 2008)

22 Test of Homogeneity Test of Homogeneity Expected CountsExpected Counts

1.1. Statistical Independence Means Statistical Independence Means Joint Probability Equals Product of Joint Probability Equals Product of Marginal ProbabilitiesMarginal Probabilities

2.2. Compute Marginal Probabilities Compute Marginal Probabilities & Multiply for Joint Probability& Multiply for Joint Probability

3.3. Expected Count Is Sample Size Expected Count Is Sample Size Times Joint ProbabilityTimes Joint Probability

Page 31: Introduction to Biostatistics (ZJU 2008)

Residence Disease Urban Rural

Status Obs. Obs. Total

Disease 63 49 112

No Disease 15 33 48

Total 78 82 160

Residence Disease Urban Rural

Status Obs. Obs. Total

Disease 63 49 112

No Disease 15 33 48

Total 78 82 160

Expected Count ExampleExpected Count Example

112 112 160160

Marginal probability = Marginal probability =

Page 32: Introduction to Biostatistics (ZJU 2008)

Residence Disease Urban Rural

Status Obs. Obs. Total

Disease 63 49 112

No Disease 15 33 48

Total 78 82 160

Residence Disease Urban Rural

Status Obs. Obs. Total

Disease 63 49 112

No Disease 15 33 48

Total 78 82 160

Expected Count ExampleExpected Count Example

112 112 160160

78 78 160160

Marginal probability = Marginal probability =

Marginal probability = Marginal probability =

Page 33: Introduction to Biostatistics (ZJU 2008)

Residence Disease Urban Rural

Status Obs. Obs. Total

Disease 63 49 112

No Disease 15 33 48

Total 78 82 160

Residence Disease Urban Rural

Status Obs. Obs. Total

Disease 63 49 112

No Disease 15 33 48

Total 78 82 160

Expected Count ExampleExpected Count Example

112 112 160160

78 78 160160

Marginal probability = Marginal probability =

Marginal probability = Marginal probability =

Joint probability = Joint probability = 112 112 160160

78 78 160160

Expected count = 160· Expected count = 160· 112 112 160160

78 78 160160

= 54.6 = 54.6

Page 34: Introduction to Biostatistics (ZJU 2008)

Residence Disease Urban Rural

Status Obs. Exp. Obs. Exp. Total

Disease 63 54.6 49 57.4 112

No Disease 15 23.4 33 24.6 48

Total 78 78 82 82 160

Residence Disease Urban Rural

Status Obs. Exp. Obs. Exp. Total

Disease 63 54.6 49 57.4 112

No Disease 15 23.4 33 24.6 48

Total 78 78 82 82 160

Expected Count Expected Count CalculationCalculation

112x82 112x82 160160

48x78 48x78 160160

48x82 48x82 160160

112x78 112x78 160160

Expected count = Row total Column total

Sample sizea fa f

Expected count = Row total Column total

Sample sizea fa f

Page 35: Introduction to Biostatistics (ZJU 2008)

HIV STDs Hx No Yes Total

No 84 32 116 Yes 48 122 170 Total 132 154 286

HIV STDs Hx No Yes Total

No 84 32 116 Yes 48 122 170 Total 132 154 286

You randomly sample You randomly sample 286286 sexually active sexually active individuals and collect information on their individuals and collect information on their HIV status and History of STDs. At the HIV status and History of STDs. At the .05.05 level, is there evidence of a level, is there evidence of a relationshiprelationship??

22 Test of Test of Homogeneity Homogeneity

Example on HIVExample on HIV

Page 36: Introduction to Biostatistics (ZJU 2008)

22 Test of Test of Homogeneity Homogeneity

SolutionSolutionHH00: : No No

Relationship Relationship

HHaa: : Relationship Relationship

= = .05.05

df = df = (2 - 1)(2 - 1) (2 - 1)(2 - 1) = 1 = 1

Critical Value(s):Critical Value(s):

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:

20 3.841

Reject

20 3.841

Reject

= .05= .05

Page 37: Introduction to Biostatistics (ZJU 2008)

HIV No Yes

STDs HX Obs. Exp. Obs. Exp. Total

No 84 53.5 32 62.5 116

Yes 48 78.5 122 91.5 170

Total 132 132 154 154 286

HIV No Yes

STDs HX Obs. Exp. Obs. Exp. Total

No 84 53.5 32 62.5 116

Yes 48 78.5 122 91.5 170

Total 132 132 154 154 286

EE((nnijij)) 5 in all 5 in all

cellscells

170x132 170x132 286286

170x154 170x154 286286

116x132 116x132 286286

154x132 154x132 286286

22 Test of Test of Homogeneity Homogeneity

SolutionSolution

Page 38: Introduction to Biostatistics (ZJU 2008)

2

2

11 11

2

11

12 12

2

12

22 22

2

22

2 2 284 53 5

53 5

32 62 5

62 5

122 915

91554 29

n E n

E n

n E n

E n

n E n

E n

n E n

E n

ij ij

ij

.

.

.

.

.

..

c hc h

a fa f

a fa f

a fa f

all cells

2

2

11 11

2

11

12 12

2

12

22 22

2

22

2 2 284 53 5

53 5

32 62 5

62 5

122 915

91554 29

n E n

E n

n E n

E n

n E n

E n

n E n

E n

ij ij

ij

.

.

.

.

.

..

c hc h

a fa f

a fa f

a fa f

all cells

22 Test of Test of Homogeneity Homogeneity

SolutionSolution

Page 39: Introduction to Biostatistics (ZJU 2008)

22 Test of Test of Homogeneity Homogeneity

SolutionSolutionHH00: : No No

Relationship Relationship

HHaa: : Relationship Relationship

= .05= .05

dfdf = (2 - 1)(2 - 1) = (2 - 1)(2 - 1) = 1 = 1

Critical Value(s):Critical Value(s):

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:

Reject at Reject at = .05 = .05

20 3.841

Reject

20 3.841

Reject

= .05= .05

22 = 54.29 = 54.29

Page 40: Introduction to Biostatistics (ZJU 2008)

22 Test of Homogeneity Test of Homogeneity SAS CODESSAS CODES

DataData dis; dis;input STDs HIV input STDs HIV

count;count;cards;cards;1 1 84 1 1 84 1 2 321 2 322 1 482 1 482 2 1222 2 122;;runrun;;

ProcProc freqfreq data=dis data=dis order=data;order=data;

weight weight Count;Count;

tables tables STDs*HIV/STDs*HIV/chischisqq;;

runrun;;

Page 41: Introduction to Biostatistics (ZJU 2008)

22 Test of Homogeneity Test of Homogeneity SAS OUTPUTSAS OUTPUT

Statistics for Table of STDs by HIV

Statistic DF Value Prob ------------------------------------------------------- Chi-Square 1 54.1502 <.0001 Likelihood Ratio Chi-Square 1 55.7826 <.0001 Continuity Adj. Chi-Square 1 52.3871 <.0001 Mantel-Haenszel Chi-Square 1 53.9609 <.0001

Phi Coefficient 0.4351 Contingency Coefficient 0.3990 Cramer's V 0.4351

Continuity Correction

∑i (|Oi-Ei|-0.5)2 / Ei

Page 42: Introduction to Biostatistics (ZJU 2008)

Fisher’s Exact TestFisher’s Exact Test

Fisher’s Exact TestFisher’s Exact Test is a test for is a test for independence in a 2 X 2 table. It is independence in a 2 X 2 table. It is most useful when the total sample most useful when the total sample size and the expected values are size and the expected values are small. The test holds the marginal small. The test holds the marginal totals fixed and computes the totals fixed and computes the hypergeometric probability that nhypergeometric probability that n1111 is at least as large as the observed is at least as large as the observed valuevalue

Page 43: Introduction to Biostatistics (ZJU 2008)

Fisher’s Exact Test Fisher’s Exact Test ExampleExample

Is HIV Infection related to Hx of STDs Is HIV Infection related to Hx of STDs in Sub Saharan African Countries? in Sub Saharan African Countries? Test at 5% level.Test at 5% level.

yesyes nono totatotall

yesyes 33 77 1010

nono 55 1010 1515

totatotall

88 1717

HIV Infection

Hx of STDs

Page 44: Introduction to Biostatistics (ZJU 2008)

Data dis;Data dis;input STDs $ HIV $ count;input STDs $ HIV $ count;cards;cards;no no 10 no no 10 No Yes 5No Yes 5yes no 7yes no 7yes yes 3yes yes 3;;run;run;

proc freq data=dis order=data;proc freq data=dis order=data; weight Count;weight Count; tables STDs*HIV/chisq;tables STDs*HIV/chisq;run;run;

Fisher’s Exact Test SAS Fisher’s Exact Test SAS CodesCodes

Page 45: Introduction to Biostatistics (ZJU 2008)

Fisher’s Exact Test SAS Fisher’s Exact Test SAS OutputOutput Statistics for Table of STDs by HIV

Statistic DF Value Prob

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.0306 0.8611 Likelihood Ratio Chi-Square 1 0.0308 0.8608 Continuity Adj. Chi-Square 1 0.0000 1.0000 Mantel-Haenszel Chi-Square 1 0.0294 0.8638 Phi Coefficient -0.0350 Contingency Coefficient 0.0350 Cramer's V -0.0350

WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test.

Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cell (1,1) Frequency (F) 10 Left-sided Pr <= F 0.6069 Right-sided Pr >= F 0.7263 Table Probability (P) 0.3332 Two-sided Pr <= P 1.0000

Page 46: Introduction to Biostatistics (ZJU 2008)

Fisher’s Exact TestFisher’s Exact Test The output consists of three p-values and one

table probability. Left: Use this when the alternative to

independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right.

Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right.

2-Tail: Use this when there is no prior alternative.

Table probability – the probability for this table to occur – not a tail probability or not a p-value.

Page 47: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

The approximate test previously presented for The approximate test previously presented for assessing a difference in proportions is based assessing a difference in proportions is based upon the assumption that the two samples are upon the assumption that the two samples are independentindependent..

Suppose, however, that we are faced with a Suppose, however, that we are faced with a situation where this is not true. Suppose we situation where this is not true. Suppose we randomly-select 100 people, and find that 20% randomly-select 100 people, and find that 20% of them have flu. Then, imagine that we apply of them have flu. Then, imagine that we apply some type of treatment to all sampled peoples; some type of treatment to all sampled peoples; and on a post-test, we find that 20% have flu. and on a post-test, we find that 20% have flu.

Basis / Rationale for the TestBasis / Rationale for the Test

Page 48: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

We might be tempted to suppose that no hypothesis We might be tempted to suppose that no hypothesis test is required under these conditions, in that the test is required under these conditions, in that the ‘Before’ and ‘After’ probabilities are identical, and ‘Before’ and ‘After’ probabilities are identical, and would surely result in a test statistic value of 0.00.would surely result in a test statistic value of 0.00.

The problem with this thinking, however, is that the The problem with this thinking, however, is that the two sample two sample pp values are dependent, in that each values are dependent, in that each person was assessed twice. It person was assessed twice. It isis possible that the 20 possible that the 20 people that had flu originally still had flu. It is also people that had flu originally still had flu. It is also possible that the 20 people that had flu on the possible that the 20 people that had flu on the second test second test were a completely different setwere a completely different set of 20 of 20 people!people!

Page 49: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

It is for precisely this type of situation that It is for precisely this type of situation that McNemar’s Test for Correlated (Dependent) McNemar’s Test for Correlated (Dependent) Proportions is applicable.Proportions is applicable.

McNemar’s Test employs two unique McNemar’s Test employs two unique features for testing the two proportions:features for testing the two proportions:

* a special fourfold contingency table; with a* a special fourfold contingency table; with a

* special-purpose chi-square (* special-purpose chi-square ( 22) test ) test statistic (the approximate test).statistic (the approximate test).

Page 50: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Sample ProblemSample ProblemA randomly selected group of 120 students taking a standardized test for entrance into college exhibits a failure rate of 50%. A company which specializes in coaching students on this type of test has indicated that it can significantly reduce failure rates through a four-hour seminar. The students are exposed to this coaching session, and re-take the test a few weeks later. The school board is wondering if the results justify paying this firm to coach all of the students in the high school. Should they? Test at the 5% level.

Page 51: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Sample ProblemSample ProblemThe summary data for this study appear as follows:

Number ofStudents

Status BeforeSession

Status AfterSession

4 Fail Fail4 Pass Fail

56 Fail Pass56 Pass Pass

Page 52: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Nomenclature for the Fourfold (2 x 2) Nomenclature for the Fourfold (2 x 2) Contingency TableContingency Table

AA BB

DDCC

(B + D)(B + D)(A + C)(A + C)

(C + D)(C + D)

(A + B)(A + B)

nn

where(A+B) + (C+D) =(A+C) + (B+D) = n = number of units evaluated

and where

df = 1

Page 53: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

1. Construct a 2x2 table where the paired 1. Construct a 2x2 table where the paired observations are the sampling units.observations are the sampling units.

2. Each observation must represent a single 2. Each observation must represent a single joint event possibility; that is, classifiable joint event possibility; that is, classifiable in only one cell of the contingency table.in only one cell of the contingency table.

3. In it’s Exact form, this test may be 3. In it’s Exact form, this test may be conducted as a One Sample Binomial for conducted as a One Sample Binomial for the B & C cellsthe B & C cells

Underlying Assumptions of the TestUnderlying Assumptions of the Test

Page 54: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

4. The expected frequency (4. The expected frequency (ffee) for ) for the B and C cells on the the B and C cells on the contingency table must be equal to contingency table must be equal to or greater than 5; where or greater than 5; where

ffe e = (B + C) / 2 = (B + C) / 2

from the Fourfold tablefrom the Fourfold table

Underlying Assumptions of the TestUnderlying Assumptions of the Test

Page 55: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

The data are then entered into the Fourfold Contingency table:

5656 5656

4444

60606060

88

112112

120120

BeforePass Fail

Pass

Fail

After

Page 56: Introduction to Biostatistics (ZJU 2008)

Step I : State the Null & Research HypothesesStep I : State the Null & Research Hypotheses

HH00 : : = =

HH11 : :

where where and and relate to the proportion of observations reflecting relate to the proportion of observations reflecting changeschanges in status (the B & C cells in the table) in status (the B & C cells in the table)

Step II : Step II : 0.050.05

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Page 57: Introduction to Biostatistics (ZJU 2008)

Step III : State the Associated Test StatisticStep III : State the Associated Test Statistic

B – C| - 1 }2

2 = B + C

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

Continuity Correction

Page 58: Introduction to Biostatistics (ZJU 2008)

Step IV : State the distribution of the Test Step IV : State the distribution of the Test Statistic When HStatistic When Hoo is True is True

22 = = 22 with 1 with 1 dfdf when H when Hoo is True is True

Step V : Reject Ho if test statistic 2 > 3.84

d

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

X² Distribution

X²0 3.84

Chi-Square Curve w. 1 df

Page 59: Introduction to Biostatistics (ZJU 2008)

Step VI : Calculate the Value of the Test StatisticStep VI : Calculate the Value of the Test Statistic

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions

56 – 4| - 1 }2

2 = = 43.35 56 + 4

Page 60: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions-SAS (Dependent) Proportions-SAS

CodesCodesDataData test; test;input Before $ After $ input Before $ After $

count;count;cards;cards;pass pass 56 pass pass 56 pass fail 56pass fail 56fail pass 4fail pass 4fail fail 4;fail fail 4;runrun;;

procproc freqfreq data=test data=test order=data;order=data;

weight Count;weight Count; tables tables

Before*After/Before*After/agreeagree;;runrun;;

Page 61: Introduction to Biostatistics (ZJU 2008)

McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions-SAS (Dependent) Proportions-SAS

OutputOutput

Statistics for Table of Before by After

McNemar's Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Statistic (S) 45.0667 DF 1 Pr > S <.0001

Sample Size = 120

Without the correction

Page 62: Introduction to Biostatistics (ZJU 2008)

R program for R program for McNemar’s TestMcNemar’s Test

> mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=T)> mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=T)

McNemar's Chi-squared test with continuity correctionMcNemar's Chi-squared test with continuity correction

data: matrix(c(56, 56, 4, 4), nrow = 2, byr = T) data: matrix(c(56, 56, 4, 4), nrow = 2, byr = T) McNemar's chi-squared = 43.35, df = 1, p-value = 4.577e-11McNemar's chi-squared = 43.35, df = 1, p-value = 4.577e-11

> mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=F)> mcnemar.test(matrix(c(56,56,4,4),nrow=2,byr=T),corr=F)

McNemar's Chi-squared testMcNemar's Chi-squared test

data: matrix(c(56, 56, 4, 4), nrow = 2, byr = T) data: matrix(c(56, 56, 4, 4), nrow = 2, byr = T) McNemar's chi-squared = 45.0667, df = 1, p-value = 1.904e-McNemar's chi-squared = 45.0667, df = 1, p-value = 1.904e-

1111