21
Categorical Data Analysis CDA

Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Embed Size (px)

Citation preview

Page 1: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Categorical Data Analysis

CDA

Page 2: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Outline

• Contingency Table• Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot• Measures of Association Pearson Correlation Coefficient, Cramer’s V• Test of Independence• Test of Symmetry

Page 3: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Contingency Table

• A contingency table is a rectangular table having I rows for categories of X and J columns for categories of Y.

• The cells of the table represent the I×J possible outcomes.

Page 4: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Contingency Table: Example 1_Heart attack vs. Aspirin use

• The table below is from a report on the relationship between aspirin use and heart attacks by the Physicians’ Health Study Research Group at Harvard Medical School.

• The 2×3 contingency table is Myocardial Infarction

Fatal Attack Nonfatal Attack No Attack

Treatment Placebo 18 171 10,845

Aspirin 5 99 10,933

Page 5: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Generating Contingency Table in R

• Input the 2×3 table in R as a 2×3 matrix• Change the matrix to table using the function

as.table(), because some functions are happier with tables than matrices

Page 6: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Graphical Display of Categorical Data

• One Categorical Variable Bar Chart: a chart with rectangular bars with

lengths proportional to the values that they represent

Pie Chart: a circular chart divided into sectors, illustrating proportion.

Page 7: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Graphical Display of Categorical Data

• Two Categorical Variables Mosaic Plot: a graphical display that examine

the relationship among two or more categorical variables.

Page 8: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Mosaic Plot Construction

• A mosaic plot starts with a square with length one. The square is divided firstly into horizontal bars whose widths are proportional to the probabilities associated with the first categorical variable. Then each bar is split vertically into bars that are proportional to the conditional probabilities of the second categorical variables. Additional splits can be made if wanted using a third, fourth variable, etc.

Page 9: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Mosaic Plot: Example 2_HairEyeColor• The HiarEyeColor data comes from a survey of

students at the University of Delaware (1974). It has 592 observations on 3 variables (Hair, Eye, Sex). Here we omit Sex.

Page 10: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Mosaic Plot in R

• Option 1: install package vcd, use function mosaic()

• Option 2: use function mosaicplot()

Page 11: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Measures of Association

• Continuous Variables-Pearson Correlation Coefficient

• Ordinal Variables-Pearson Correlation Coefficient

• Nominal Variables-Cramer’s V

Page 12: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Cramer’s V

• Cramer’s V measures the association between two nominal variables. It varies from 0 (no association) to 1 (complete association) and can reach 1 only when the two variables are equal to each other.

Page 13: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Cramer’s V (cont’d)

Comments:1, When the two variables are binary, Cramer’s V

is the same as Phi Coefficient (which measures the association between two binary variables)

2, In R, under library(vcd), use function assocstats()

Page 14: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Contingency Table Analysis

• Large Sample Size Chi-square Test• Small Sample Size Fisher’s Exact Test

Page 15: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Test of Independence (Chi-square Test)

Column 1 Column 2 Total

Row 1 π11 π12 π1+

Row 2 π21 π22 π2+

Total π+1 π+2 1

H0: Row and Column are independent πij=πi+π+j for all i,j

Ha: Row and Column are not independent πij≠πi+π+j for some i and j

Page 16: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Test of Independence (Chi-square Test)

Under H0: πij=πi+π+j for all i,j

Expected Counts in each cell is

Page 17: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Test of Independence (Fisher’s Exact Test)

• When any of the expected counts fall below 5, Chi-square test is not appropriate. Instead, we use Fisher’s Exact Test.

Example 3: The following data are from a Stanford University study of the effectiveness of the antidepressant Celexa in the treatment of compulsive shopping.

OutcomeWorse Same Better

Treatment Celexa 2 3 7

Placebo 2 8 2

Page 18: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Test of Independence in R

• Chi-Square Test Use R function chisq.test()• Fisher’s Exact Test Use R function fisher.test()

Page 19: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Test of Symmetry: Matched Pairs

• Example 4: Suppose two surveys on President’s job approval were conducted one-month apart on 1600 Americans and the result is summarized in the following table. (Source: Agresti, 1990) Is there a significant difference in job approval rating?

2nd Survey

Approve Disapprove

1st Survey Approve 794 150

Disapprove 86 570

Page 20: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Test of Symmetry: Matched Pairs

Page 21: Categorical Data Analysis CDA. Outline Contingency Table Graphical display of Categorical Data Bar Chart, Pie Chart, Mosaic Plot Measures of Association

Useful Resource

• Quick R http://www.statmethods.net/index.html