Contingency analysis. Sample Test statistic Null hypothesis Null distribution compare How unusual is...

Preview:

Citation preview

Contingency analysis

Sample

Test statistic

Null hypothesis

Null distributioncompare

How unusual is this test statistic?

P < 0.05 P > 0.05

Reject Ho Fail to reject Ho

Using one tail in the 2

• We always use only one tail for a 2 test

• Why?

Data match null expectationexactly

0Data deviate fromnull expectation in some way

Reality

Result

Ho true Ho false

Reject Ho

Do not reject Ho correct

correctType I error

Type II error

Test statistic

If null hypothesis is really true…

Do not reject HoCorrect answer

Reject HoType I error

Test statistic

If null hypothesis is really false…

Do not reject HoType II error

Reject Hocorrect

Errors and statistics

• These are theoretical - you usually don’t know for sure if you’ve made an error

• Pr[Type I error] = • Pr[Type II error] = …

– Requires power analysis– Depends on sample size

Contingency analysis

• Estimates and tests for an association between two or more categorical variables

Music and wine buyingOBSERVED French

music playing

German music playing

Totals

Bottles of French wine sold

40 12 52

Bottles of German wine sold

8 22 30

Totals 48 34 82

Mosaic plot

Odds ratio

• Odds of success = probability of success divided by the probability of failure

O =p

1− p

Estimating the Odds ratio

• Odds of success = probability of success divided by the probability of failure

ˆ O =ˆ p

1− ˆ p

ˆ p =x

n

Music and wine buyingOBSERVED French

music playing

Bottles of French wine sold

40

Bottles of German wine sold

8

Totals 48

Example

• Out of 48 bottles of wine, 40 were French

ˆ O =ˆ p

1− ˆ p

ˆ p =x

n

Example

• Out of 48 bottles of wine, 40 were French

ˆ O =0.833

1− 0.833= 5.00

ˆ p =40

48= 0.833

Interpretation: people are about 5 times more likely to buy a French wine

O=1

Success and failureequally likely

Success more likely

Failure more likely

Odds ratio

• The odds of success in one group divided by the odds of success in a second group

OR =O1

O2

Estimating the Odds ratio

• The odds of success in one group divided by the odds of success in a second group

ˆ O R =ˆ O 1ˆ O 2

Music and wine buying

• Group 1 = French music, Group 2 = German music

• Success = French wine

ˆ O R =ˆ O 1ˆ O 2

Group 2

• Out of 34 bottles of wine, 12 were French

ˆ O 2 =0.353

1− 0.353= 0.55

ˆ p =12

34= 0.353

Music and wine buying

• Group 1 = French music, Group 2 = German music

• Success = French wine

ˆ O R =ˆ O 1ˆ O 2

=5.00

0.55= 9.09

ˆ O 1 = 5.00

ˆ O 2 = 0.55

Music and wine buying

• Group 1 = French music, Group 2 = German music

• Success = French wine

ˆ O R =ˆ O 1ˆ O 2

=5.00

0.55= 9.09

Interpretation: people are about 9 times more likely to buy French wine in Group 1 compared to Group 2

OR=1

Success more likelyin Group 1

Success more likelyin Group 2

Success equally likelyin both groups

Hypothesis testing

• Contingency analysis• Is there a difference in odds between two groups?

Hypothesis testing

• Contingency analysis• Is there an association between two categorical variables?

Music and wine buyingOBSERVED French

music playing

German music playing

Totals

Bottles of French wine sold

40 12 52

Bottles of German wine sold

8 22 30

Totals 48 34 82

Contingency analysis

• Is there a difference in the odds of buying French wine depending on the music that is playing?

• Is there an association between wine bought and music playing?

• Is the nationality of the wine independent of the music playing when it is sold?

Hypotheses

• H0: The nationality of the bottle of wine is independent of the nationality of the music played when it is sold.

• HA: The nationality of the bottle of wine sold depends on the nationality of the music being played when it is sold.

Calculating the expectations

With independence,

Pr[ French wine AND French music] =

Pr[French wine] Pr[French music]

Calculating the expectations

Pr[French wine] = 52/82=0.634

Pr[French music] = 48/82= 0.585

OBS. French music

German

music

Totals

French wine sold

52

German wine sold

30

Totals 48 34 82

By H0, Pr[French wine AND French music] = (0.634)(0.585)=0.37112

Calculating the expectations

EXP. French music

German music

Totals

French wine sold

0.37 (82) = 30.4

52

German wine sold 30

Totals 48 34 82

By H0, Pr[French wine AND French music] = (0.634)(0.585)=0.37112

Calculating the expectations

EXP. French music

German music

Totals

French wine sold

0.37 (82) = 30.4

21.6 52

German wine sold 17.6 12.4 30

Totals 48 34 82

2

2 =Oi − E i( )

2

E ii

=40 − 30.4( )

2

30.4+

12 − 21.6( )2

21.6+

8 −17.6( )2

17.6+

22 −12.4( )2

12.4= 20.0

Degrees of freedom

For a 2 Contingency test,df = # categories -1- # parameters

df= (# columns -1)(# rows -1)

For music/wine example, df = (2-1)(2-1) = 1

Conclusion

2 = 20.0 >> 2 = 3.84,

So we can reject the null hypothesis of independence, and say that the nationality of the wine sold did depend on what music was played.

Assumptions

• This 2 test is just a special case of the 2 goodness-of-fit test, so the same rules apply.

• You can’t have any expectation less than 1, and no more than 20% < 5

Fisher’s exact test

• For 2 x 2 contingency analysis

• Does not make assumptions about the size of expectations

• JMP will do it, but cumbersome to do by hand

Other extensions you might see

• Yates correction for continuity

• G-test• Read about these in your book

Recommended