© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions

The Statistical Imagination

• Chapter 13:

Nominal Variables: The Chi-Square and Binomial Distributions

The Chi-Square Test

• Chi-Square is a test for a relationship between two nominal variables

• Calculations are made using a cross-tabulation (or “crosstab”) table, which reports frequencies of joint occurrences of attributes

Crosstab Tables

• Cross-tabulation or “crosstab” tables are designed to compare the frequencies of two nominal/ordinal variables at once

Sample Crosstab Table

• Spent night on streets in last 2 weeks by gender among homeless persons

On streets Male Female Total

Yes 28 10 38

No 79 44 123

Total 107 54 161

Reading a Crosstab Table

• The number in a cell is the frequency of joint occurrences, where a joint occurrence is the combination of categories of the two variables for a single individual

• From the cell, look up then look to the left• E.g., in the table above, the joint

occurrence of “male and on-street” is 28, the number in the sample who are both male and spent a night on the streets

Reading a Crosstab Table (cont.)

• The numbers in the margins on the right side and the bottom present marginal totals, the total number of subjects in a category

• The grand total (n, the sample size) is presented in the bottom right-hand corner

Crosstab Tables and the Chi-Square Test

• For the chi-square test, the categories of the independent variable (X) go in the columns of the table, and those of the dependent variable (Y), in the rows

• E.g.: Is gender a good predictor of who among homeless persons is likely to spend a night on the streets?

Calculating Expected Frequencies

• In addition to the observed joint frequencies, the chi-square test involves calculating the expected frequency of each table cell

• The expected frequency of a cell is equal to the column marginal total for the cell (look down) times the row marginal total for cell (look to the right) divided by the grand total

Using Expected Frequencies to Test the Hypothesis

• The expected frequencies are those that would occur if there is no relationship between the two nominal/ordinal variables

• The chi-square statistic measures the gap between expected and observed frequencies

• If there is no relationship, then the expected and observed frequencies are the same and chi-square computes to zero

The Chi-Square Statistic

• The sampling distribution is generated using the chi-square equation:

χ2 = Σ[(O-E)2/ E]

where O is the observed frequency of a cell,

and E is the expected frequency• Chi-square tells us whether the summed squared

differences between the observed and expected cell frequencies are so great that they are not simply the result of sampling error

When to Use the Chi-Square Statistic

1) There is one population with a representative sample from it

2) There are two variables, both of a nominal/ordinal level of measurement

3) The expected frequency of each cell in the crosstab table is at least five

Features of the Chi-Square Hypothesis Test

• Step 1. The H0 states that there is no relationship between the two variables. When this is the case, chi-square calculates to a value of zero, give or take some sampling error

• This null hypothesis asserts no difference in observed and expected frequencies

Features of the Chi-Square Hypothesis Test (cont.)

• Step 2. The sampling distribution is the chi-square distribution. It describes all possible outcomes of the chi-square statistic with repeated sampling when there is no relationship between X and Y

• Degrees of freedom are determined by the number of columns and rows in the crosstab table: df = (r -1) (c -1)

Features of the Chi-Square Hypothesis Test (cont.)

• Step 4. The test effects are the differences between expected and observed frequencies

• The test statistic is the chi-square statistic• The p-value is obtained by comparing the

calculated chi-square value to the critical values of the chi-square distribution in Statistical Table G of Appendix B

The Existence of a Relationship for the Chi-Square Test

• Existence: Test the H0 that χ2 = 0;

that is, there is no relationship between X and Y

• If the H0 is rejected, a relationship exists

Direction and Strength of a Relationship for Chi-Square

• Direction: Not applicable (because the variables are nominal level)

• Strength: These measures exist but are seldom reported because they are prone to misinterpretation

Nature of a Relationship for the Chi-Square Test

• Nature: Report the differences between the observed and expected cell frequencies for a couple of outstanding cells

• Calculate column percentages for selected cells

Column and Row Percentages

• A column percentage is a cell’s frequency as a percentage of the column marginal total

• A row percentage is a cell’s frequency as a percentage of the row marginal total

Chi-Square as a Difference of Proportions Test

• The chi-square test is frequently used to compare proportions of categories of a nominal/ordinal variable for two or more groups of a second nominal/ordinal variable

• Thus, it may be viewed as a difference of proportions test as illustrated in Figure 13-2 in the text

The Binomial Distribution

• The binomial distribution test is a small single-sample proportions test. Contrast it to the large single-sample proportions test of Chapter 10

• The test hinges on mathematically expanding the binomial distribution equation, (P + Q)n

When to Use the Binomial Distribution

1) There is only one nominal variable and it is dichotomous, with P = p [of success] and Q = p [of failure]

2) There is a single, representative sample from one population

3) Sample size is such that [(psmaller)(n)] < 5, where psmaller = the smaller of Pu and Qu

4) There is a target value of the variable to which we may compare the sample proportion

Expansion of the Binomial Distribution Equation

• Expansion of the binomial distribution equation, (P + Q)n, provides the sampling distribution for dichotomous events. That is, the equation describes all possible sampling outcomes and the probability of each, where there are only two possible categories of a nominal variable

An Example of an Expanded Binomial Equation

• The equation reveals, for example, the possible outcomes of the tossing of 4 coins

• P = p [heads] = .5; Q = p [tails] = .5; n = 4 coins

• (P + Q)4 = P4 + 4P3Q1 + 6P2Q2 + 4P1Q3 + Q4 • Add the coefficients to get the total number

of possible outcomes = 16• The probability of 3 heads and 1 tails, is the

coefficient of P3Q1 over the sum of coefficients = 4 over 16 = .25

Pascal’s Triangle

• Pascal’s Triangle provides a shortcut method for expanding the binomial equation

• It provides the coefficients for small samples and allows a quick computation of the probabilities of all possible outcomes when P and Q are equal to .5

• See Table 13-7 in the text

Features of the Binomial Distribution Test

• Step 1. H0: Pu = a target value

• Step 2. The sampling distribution is an expanded binomial equation for the given sample size

Features of the Binomial Distribution Test (cont.)

• Step 4. The effect is the observed combination of successes and failures, which corresponds to a term in the equation (e.g., 3 heads and 1 tails, is represented by the term 4P3Q1)

• The test statistic is the expanded binomial equation

• The p-value is taken directly from the equation (not from a statistical table)

Statistical Follies: Statistical Power and Sample Size

• For a given level of significance, statistical power is a test statistic’s probability of not incurring a Type II error (i.e., unknowingly making the incorrect decision of failing to reject a false null hypothesis)

• Low statistical power can result from having too small a sample size