Upload
obert
View
25
Download
0
Embed Size (px)
DESCRIPTION
Wednesday, Dec. 2 Chi-square Goodness of Fit Chi-square Test of Independence: Two Variables. Summing up!. gg. yy. yg. yg. yg. yg. yy. yg. gg. gy. 25%. 25% 25% 25%. Pea Color freq Observed freq Expected - PowerPoint PPT Presentation
Citation preview
Wednesday, Dec. 2
Chi-square Goodness of FitChi-square Test of Independence: Two Variables.
Summing up!
gg yy
yg yg yg yg
yy yg gg gy
25% 25% 25% 25%
Pea Color freq Observed freq Expected
Yellow 158 150
Green 42 50
TOTAL 200 200
Pea Color freq Observed freq Expected
Yellow 158 150
Green 42 50
TOTAL 200 200
2 = (fo - fe)2
fei=1
k
Chi Square Goodness of Fit
d.f. = k - 1, where k = number of categories of in the variable.
“… the general level of agreement between Mendel’s expectations and his reported results shows that it is closer than would be expected in the best of several thousand repetitions. The data have evidently been sophisticated systematically, and after examining various possibilities, I have no doubt that Mendel was deceived by a gardening assistant, who knew only too well what his principal expected from each trial made…”
-- R. A. Fisher
Pea Color freq Observed freq Expected
Yellow 151 150
Green 49 50
TOTAL 200 200
2 = (fo - fe)2
fei=1
k
Chi Square Goodness of Fit
d.f. = k - 1, where k = number of categories of in the variable.
Peas to Kids: Another ExampleGoodness of Fit
At my children’s school science fair last year,where participation was voluntary but strongly encouraged,
I counted about 60 boys and 40 girls who hadsubmitted entries. Since I expect a ratio of 50:50 if there were no gender preference for submission,is this observation deviant, beyond chance level?
Boys Girls
Expected: 50 50
Observed: 60 40
Boys Girls
Expected: 50 50
Observed: 60 40
2 = (fo - fe)2
fei=1
k
Boys Girls
Expected: 50 50
Observed: 60 40
2 = (fo - fe)2
fei=1
k
For each of k categories, square the difference between theobserved and the expected frequency, divide by the expectedfrequency, and sum over all k categories.
Boys Girls
Expected: 50 50
Observed: 60 40
2 = (fo - fe)2
fei=1
k
For each of k categories, square the difference between theobserved and the expected frequency, divide by the expectedfrequency, and sum over all k categories.
(60-50)2 (40-50)2
+50 50
= 4.00=
Boys Girls
Expected: 50 50
Observed: 60 40
2 = (fo - fe)2
fei=1
k
For each of k categories, square the difference between theobserved and the expected frequency, divide by the expectedfrequency, and sum over all k categories.
(60-50)2 (40-50)2
+50 50
= 4.00=
This value, chi-square, will be distributed with known probabilityvalues, where the degrees of freedom is a function of the number ofcategories (not n). In this one-variable case, d.f. = k - 1.
Boys Girls
Expected: 50 50
Observed: 60 40
2 = (fo - fe)2
fei=1
k
For each of k categories, square the difference between theobserved and the expected frequency, divide by the expectedfrequency, and sum over all k categories.
(60-50)2 (40-50)2
+50 50
= 4.00=
This value, chi-square, will be distributed with known probabilityvalues, where the degrees of freedom is a function of the number ofcategories (not n). In this one-variable case, d.f. = k - 1. Critical value of chi-square at =.05, d.f.=1 is 3.84, so reject H0.
Chi-square Test of Independence
Are two nominal level variables related or independentfrom each other?
Is race related to SES, or are they independent?
15
32
1928 47
Lo
Hi
SES
White Black
12 3
16 16
Row n x Column n
Total n
The expected frequency of any given cell is
15
32
1928 47
Lo
Hi
SES
White Black
12 3
16 16
2 =(fo - fe)2
fe
r=1
r
c=1
c
At d.f. = (r - 1)(c - 1)
Row n x Column n
Total n
The expected frequency of any given cell is
15
32
1928 47
(15x28)/47 (15x19)/47
(32x28)/47 (32x19)/47
Row n x Column n
Total n
The expected frequency of any given cell is
15
32
1928 47
(15x28)/47 (15x19)/47
(32x28)/47 (32x19)/47
8.94 6.06
19.06 12.94
15
32
1928 47
8.94 6.06
19.06 12.94
12 3
16 16
2 =(fo - fe)2
fe
r=1
r
c=1
c
Please calculate:
Important assumptions:
Independent observations.
Observations are mutually exclusive.
Expected frequencies should be reasonably large: d.f. 1, at least 5 d.f. 2, >2 d.f. >3, if all expected frequencies but one are greater than or equal to 5 and if the one that is not is at least equal to 1.
Univariate Statistics:
Interval Mean one-sample t-test
Ordinal Median
Nominal Mode Chi-squared goodness of fit
Bivariate Statistics
Nominal Ordinal Interval
Nominal 2 Rank-sum t-testKruskal-Wallis H ANOVA
Ordinal Spearman rs (rho)
Interval Pearson rRegression
Y
X