Contingency analysis
Sample
Test statistic
Null hypothesis
Null distributioncompare
How unusual is this test statistic?
P < 0.05 P > 0.05
Reject Ho Fail to reject Ho
Using one tail in the 2
• We always use only one tail for a 2 test
• Why?
Data match null expectationexactly
0Data deviate fromnull expectation in some way
Reality
Result
Ho true Ho false
Reject Ho
Do not reject Ho correct
correctType I error
Type II error
Test statistic
If null hypothesis is really true…
Do not reject HoCorrect answer
Reject HoType I error
Test statistic
If null hypothesis is really false…
Do not reject HoType II error
Reject Hocorrect
Errors and statistics
• These are theoretical - you usually don’t know for sure if you’ve made an error
• Pr[Type I error] = • Pr[Type II error] = …
– Requires power analysis– Depends on sample size
Contingency analysis
• Estimates and tests for an association between two or more categorical variables
Music and wine buyingOBSERVED French
music playing
German music playing
Totals
Bottles of French wine sold
40 12 52
Bottles of German wine sold
8 22 30
Totals 48 34 82
Mosaic plot
Odds ratio
• Odds of success = probability of success divided by the probability of failure
€
O =p
1− p
Estimating the Odds ratio
• Odds of success = probability of success divided by the probability of failure
€
ˆ O =ˆ p
1− ˆ p
€
ˆ p =x
n
Music and wine buyingOBSERVED French
music playing
Bottles of French wine sold
40
Bottles of German wine sold
8
Totals 48
Example
• Out of 48 bottles of wine, 40 were French
€
ˆ O =ˆ p
1− ˆ p
€
ˆ p =x
n
Example
• Out of 48 bottles of wine, 40 were French
€
ˆ O =0.833
1− 0.833= 5.00
€
ˆ p =40
48= 0.833
Interpretation: people are about 5 times more likely to buy a French wine
O=1
Success and failureequally likely
Success more likely
Failure more likely
Odds ratio
• The odds of success in one group divided by the odds of success in a second group
€
OR =O1
O2
Estimating the Odds ratio
• The odds of success in one group divided by the odds of success in a second group
€
ˆ O R =ˆ O 1ˆ O 2
Music and wine buying
• Group 1 = French music, Group 2 = German music
• Success = French wine
€
ˆ O R =ˆ O 1ˆ O 2
Group 2
• Out of 34 bottles of wine, 12 were French
€
ˆ O 2 =0.353
1− 0.353= 0.55
€
ˆ p =12
34= 0.353
Music and wine buying
• Group 1 = French music, Group 2 = German music
• Success = French wine
€
ˆ O R =ˆ O 1ˆ O 2
=5.00
0.55= 9.09
€
ˆ O 1 = 5.00
€
ˆ O 2 = 0.55
Music and wine buying
• Group 1 = French music, Group 2 = German music
• Success = French wine
€
ˆ O R =ˆ O 1ˆ O 2
=5.00
0.55= 9.09
Interpretation: people are about 9 times more likely to buy French wine in Group 1 compared to Group 2
OR=1
Success more likelyin Group 1
Success more likelyin Group 2
Success equally likelyin both groups
Hypothesis testing
• Contingency analysis• Is there a difference in odds between two groups?
Hypothesis testing
• Contingency analysis• Is there an association between two categorical variables?
Music and wine buyingOBSERVED French
music playing
German music playing
Totals
Bottles of French wine sold
40 12 52
Bottles of German wine sold
8 22 30
Totals 48 34 82
Contingency analysis
• Is there a difference in the odds of buying French wine depending on the music that is playing?
• Is there an association between wine bought and music playing?
• Is the nationality of the wine independent of the music playing when it is sold?
Hypotheses
• H0: The nationality of the bottle of wine is independent of the nationality of the music played when it is sold.
• HA: The nationality of the bottle of wine sold depends on the nationality of the music being played when it is sold.
Calculating the expectations
With independence,
Pr[ French wine AND French music] =
Pr[French wine] Pr[French music]
Calculating the expectations
Pr[French wine] = 52/82=0.634
Pr[French music] = 48/82= 0.585
OBS. French music
German
music
Totals
French wine sold
52
German wine sold
30
Totals 48 34 82
By H0, Pr[French wine AND French music] = (0.634)(0.585)=0.37112
Calculating the expectations
EXP. French music
German music
Totals
French wine sold
0.37 (82) = 30.4
52
German wine sold 30
Totals 48 34 82
By H0, Pr[French wine AND French music] = (0.634)(0.585)=0.37112
Calculating the expectations
EXP. French music
German music
Totals
French wine sold
0.37 (82) = 30.4
21.6 52
German wine sold 17.6 12.4 30
Totals 48 34 82
2
€
2 =Oi − E i( )
2
E ii
∑
=40 − 30.4( )
2
30.4+
12 − 21.6( )2
21.6+
8 −17.6( )2
17.6+
22 −12.4( )2
12.4= 20.0
Degrees of freedom
For a 2 Contingency test,df = # categories -1- # parameters
df= (# columns -1)(# rows -1)
For music/wine example, df = (2-1)(2-1) = 1
Conclusion
2 = 20.0 >> 2 = 3.84,
So we can reject the null hypothesis of independence, and say that the nationality of the wine sold did depend on what music was played.
Assumptions
• This 2 test is just a special case of the 2 goodness-of-fit test, so the same rules apply.
• You can’t have any expectation less than 1, and no more than 20% < 5
Fisher’s exact test
• For 2 x 2 contingency analysis
• Does not make assumptions about the size of expectations
• JMP will do it, but cumbersome to do by hand
Other extensions you might see
• Yates correction for continuity
• G-test• Read about these in your book