Chi Square (2)

Chi SquareUp to this point, the inference to the population has

been concerned with “scores” on one or more variables, such as CAT scores, mathematics achievement, and hours spent on the computer.

We used these scores to make the inferences about population means. To be sure not all research questions involve score data.

Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale.

The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.

Chi Square used when data are nominal (both IV and

DV) Comparing frequencies of distributions

occurring in different categories or groups Tests whether group distributions are

different Shoppers’ preference for the taste of 3 brands of

candy determines the association between IV and

DV by counting the frequencies of distribution Gender relative to study preference (alone or in

group)

Chi Square• Test statistic is based on counts that

represent the number of items that fall in each category

• For testing significance of patterns in qualitative data

• Test statistics measures the agreement between actual counts and expected counts assuming the null hypothesis

Chi Square1. Establish the level of significance:α2. Formulate the statistical hypothesis3. Calculate the test statistic4. Determine the degree of freedom5. Compare computed test statistic against a

tabled/critical value6. Determine the appropriate test

Chi Square2 tables are included in most statistics texts

and consist of columns and rows, with columns representing areas under the curve and rows associated with the degrees of freedom (df) which, for 2 tests of homogeneity and independence are: df = (r-1)(c-1).

Chi SquareThe chi-square distribution can be used to

see whether or not an observed counts agree with an expected counts. Let

o = observed count ande = Expected count

Formula

e

eo

FFF 2

2 )(

8

4. Calculating Test Statistics

e

eo

FFF 2

2 )(

Observed

frequencies

Expe

cted

fre

quen

cy

Expected

frequency

An example of a one variable problem or “Goodness of fit”Example from book:Four candidates for a school board position.

You poll a random sample of 200 voters regarding their candidate of choice. Do differences exist among the proportions of registered voters preferring each school board candidate.

Example continuedMartzial Breece Dunton artesaniObserved Frequency

fo = 40 fo = 62 fo = 56 fo = 42 Σfo = 200

ExpectedFrequency

fe =50 fe = 50 fe = 50 fe = 50 Σfe = 200

(fo - fe )2

fe

(40 – 50)2

50 (-10)2

50= 2.00

(62 – 50)2

50= (12)2

50= 2.88

( 56 – 50)2

50 (6)2

50= .72

(42 – 50)2

50 (-8)2

50= 1.28

χ = Σ fo – fe )2

fe

= 2.00 + 2.88 + .72 + 1.28 + 6.88

Example continuedThe one varialbe chi square test is quite simple.

The expected frequencies are simply the total n divided by the number of groups. In this case there were four candidates and n = 200, therefore, 50 voters go in each cell.

The observed frequencies are simply what is truly obtained in the sample.

Next you compute the discrepancies between the observed and expected frequencies divided by the expected frequencies and then sum to get your chi square statistic. In this case chi square = 6.88.

The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable

The critical tabled values are based on sampling distributions of the Pearson chi-square statistic

If calculated 2 is greater than 2 table value, reject Ho

Example continuedNext you compute your df.For a one variable case it is simple df = C – 1; where C = number of columns here that would be 4 – 1 = 3.Now you find your chi square critical in Table F in

the back of the book for 3 degrees of freedom and alpha is .05. The result is 7.81.

Now we see that our chi square obtained of 6.88 falls short of the chi square critical of 7.81 and we would retain our Null hypothesis. This means that all four candidates are doing equally well in the polls.

Chi Square for two variablesIn a two variable chi square the principal is

the same as before. The big difference is in calculating the expected frequencies for each cell. See page 349 for example.

The formula for calculating the expected frequencies is

fe = (frow ) (fcol ) nThe easiest way to learn chi square is by

example.

ExampleSuppose a researcher is interested in voting

preferences on gun control issues.A questionnaire was developed and sent to a

random sample of 90 voters. The researcher also collects information

about the political party membership of the sample of 90 respondents.

Bivariate Frequency Table or Contingency Table

Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column 25 25 40 n = 90

17





f column 25 25 40 n = 90

Obser

ved

frequ

encie

s

18





f column 25 25 40 n = 90

Row

frequency

19





f column 25 25 40 n = 90Column frequency

20

1. Determine Appropriate Test1. Party Membership ( 2 levels) and Nominal2. Voting Preference ( 3 levels) and Nominal

2. Establish Level of Significance

Alpha of .05

21

22

3. Determine The Hypothesis• Ho : There is no difference between D & R in

their opinion on gun control issue.

• Ha : There is an association between responses to the gun control survey and the party membership in the population.

23



Democrat fo =10fe =13.9

fo =10fe =13.9

fo =30fe=22.2

50

Republican fo =15fe =11.1

fo =15fe =11.1

fo =10fe =17.8

40

f column 25 25 40 n = 90

24

4. Calculating Test StatisticsFavor Neutral Oppose f row


fo =10fe =13.9

fo =30fe=22.2

50


fo =15fe =11.1

fo =10fe =17.8

40

f column 25 25 40 n = 90

= 50*25/90

25

4. Calculating Test StatisticsFavor Neutral Oppose f row


fo =10fe =13.9

fo =30fe=22.2

50


fo =15fe =11.1

fo =10fe =17.8

40

f column 25 25 40 n = 90

= 40* 25/90

26


8.17)8.1710(

11.11)11.1115(

11.11)11.1115(

2.22)2.2230(

89.13)89.1310(

89.13)89.1310(

222

2222

= 11.03

5. Determine Degrees of Freedom

•df = (R-1)(C-1)

27

Num

ber of

levels in

column

variable

Num

ber of levels in row

variable

5. Determine Degrees of Freedom

df = (R-1)(C-1) =(2-1)(3-1) = 2

28

29

6. Compare computed test statistic against a tabled/critical valueα = 0.05df = 2Critical tabled value = 5.991Test statistic, 11.03, exceeds critical valueNull hypothesis is rejectedDemocrats & Republicans differ significantly

in their opinions on gun control issues

Documents

Chi Square (2)