Upload
kevin-nyakundi
View
219
Download
0
Embed Size (px)
DESCRIPTION
Introduction to Chi Square
Citation preview
Chi SquareUp to this point, the inference to the population has
been concerned with “scores” on one or more variables, such as CAT scores, mathematics achievement, and hours spent on the computer.
We used these scores to make the inferences about population means. To be sure not all research questions involve score data.
Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale.
The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.
Chi Square used when data are nominal (both IV and
DV) Comparing frequencies of distributions
occurring in different categories or groups Tests whether group distributions are
different Shoppers’ preference for the taste of 3 brands of
candy determines the association between IV and
DV by counting the frequencies of distribution Gender relative to study preference (alone or in
group)
Chi Square• Test statistic is based on counts that
represent the number of items that fall in each category
• For testing significance of patterns in qualitative data
• Test statistics measures the agreement between actual counts and expected counts assuming the null hypothesis
Chi Square1. Establish the level of significance:α2. Formulate the statistical hypothesis3. Calculate the test statistic4. Determine the degree of freedom5. Compare computed test statistic against a
tabled/critical value6. Determine the appropriate test
Chi Square2 tables are included in most statistics texts
and consist of columns and rows, with columns representing areas under the curve and rows associated with the degrees of freedom (df) which, for 2 tests of homogeneity and independence are: df = (r-1)(c-1).
Chi SquareThe chi-square distribution can be used to
see whether or not an observed counts agree with an expected counts. Let
o = observed count ande = Expected count
Formula
e
eo
FFF 2
2 )(
8
4. Calculating Test Statistics
e
eo
FFF 2
2 )(
Observed
frequencies
Expe
cted
fre
quen
cy
Expected
frequency
An example of a one variable problem or “Goodness of fit”Example from book:Four candidates for a school board position.
You poll a random sample of 200 voters regarding their candidate of choice. Do differences exist among the proportions of registered voters preferring each school board candidate.
Example continuedMartzial Breece Dunton artesaniObserved Frequency
fo = 40 fo = 62 fo = 56 fo = 42 Σfo = 200
ExpectedFrequency
fe =50 fe = 50 fe = 50 fe = 50 Σfe = 200
(fo - fe )2
fe
(40 – 50)2
50 (-10)2
50= 2.00
(62 – 50)2
50= (12)2
50= 2.88
( 56 – 50)2
50 (6)2
50= .72
(42 – 50)2
50 (-8)2
50= 1.28
χ = Σ fo – fe )2
fe
= 2.00 + 2.88 + .72 + 1.28 + 6.88
Example continuedThe one varialbe chi square test is quite simple.
The expected frequencies are simply the total n divided by the number of groups. In this case there were four candidates and n = 200, therefore, 50 voters go in each cell.
The observed frequencies are simply what is truly obtained in the sample.
Next you compute the discrepancies between the observed and expected frequencies divided by the expected frequencies and then sum to get your chi square statistic. In this case chi square = 6.88.
The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable
The critical tabled values are based on sampling distributions of the Pearson chi-square statistic
If calculated 2 is greater than 2 table value, reject Ho
Example continuedNext you compute your df.For a one variable case it is simple df = C – 1; where C = number of columns here that would be 4 – 1 = 3.Now you find your chi square critical in Table F in
the back of the book for 3 degrees of freedom and alpha is .05. The result is 7.81.
Now we see that our chi square obtained of 6.88 falls short of the chi square critical of 7.81 and we would retain our Null hypothesis. This means that all four candidates are doing equally well in the polls.
Chi Square for two variablesIn a two variable chi square the principal is
the same as before. The big difference is in calculating the expected frequencies for each cell. See page 349 for example.
The formula for calculating the expected frequencies is
fe = (frow ) (fcol ) nThe easiest way to learn chi square is by
example.
ExampleSuppose a researcher is interested in voting
preferences on gun control issues.A questionnaire was developed and sent to a
random sample of 90 voters. The researcher also collects information
about the political party membership of the sample of 90 respondents.
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90
17
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90
Obser
ved
frequ
encie
s
18
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90
Row
frequency
19
Bivariate Frequency Table or Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90Column frequency
20
1. Determine Appropriate Test1. Party Membership ( 2 levels) and Nominal2. Voting Preference ( 3 levels) and Nominal
2. Establish Level of Significance
Alpha of .05
21
22
3. Determine The Hypothesis• Ho : There is no difference between D & R in
their opinion on gun control issue.
• Ha : There is an association between responses to the gun control survey and the party membership in the population.
23
4. Calculating Test Statistics
Favor Neutral Oppose f row
Democrat fo =10fe =13.9
fo =10fe =13.9
fo =30fe=22.2
50
Republican fo =15fe =11.1
fo =15fe =11.1
fo =10fe =17.8
40
f column 25 25 40 n = 90
24
4. Calculating Test StatisticsFavor Neutral Oppose f row
Democrat fo =10fe =13.9
fo =10fe =13.9
fo =30fe=22.2
50
Republican fo =15fe =11.1
fo =15fe =11.1
fo =10fe =17.8
40
f column 25 25 40 n = 90
= 50*25/90
25
4. Calculating Test StatisticsFavor Neutral Oppose f row
Democrat fo =10fe =13.9
fo =10fe =13.9
fo =30fe=22.2
50
Republican fo =15fe =11.1
fo =15fe =11.1
fo =10fe =17.8
40
f column 25 25 40 n = 90
= 40* 25/90
26
4. Calculating Test Statistics
8.17)8.1710(
11.11)11.1115(
11.11)11.1115(
2.22)2.2230(
89.13)89.1310(
89.13)89.1310(
222
2222
= 11.03
5. Determine Degrees of Freedom
•df = (R-1)(C-1)
27
Num
ber of
levels in
column
variable
Num
ber of levels in row
variable
5. Determine Degrees of Freedom
df = (R-1)(C-1) =(2-1)(3-1) = 2
28
29
6. Compare computed test statistic against a tabled/critical valueα = 0.05df = 2Critical tabled value = 5.991Test statistic, 11.03, exceeds critical valueNull hypothesis is rejectedDemocrats & Republicans differ significantly
in their opinions on gun control issues