Upload
victor-varnell
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
Finish Anova
And then Chi- Square
Fcrit
• Table A-5: 4 pages of values
• Left-hand column: df denominator
df for MSW = n-k where k is the number of groups
• Next column: area in upper tail, e.g 0.100. This is an alpha.
• Across the top, df numerator
df for MSA = k - 1
Conclusions for Anova
• Look at the graph of the f distribution on page 290.
• If Fcalc is greater than Fcrit, we reject the null hypothesis that the means of the populations for the three or more groups are equal.
• We can accept the hypothesis that at least one of the groups comes from a different population
Bonferroni Method• Anova tells us if there is a difference but
not if all the groups come from different populations
• Maybe 2 or more are from the same population and only one is different
• How do we find out which one(s) are different?
• Bonferroni or Tukey’s Method or others• We aren’t going to do them until later in
the course.
Degrees of Freedom
• You may notice in each of these cases, we are still using n-1 for the degrees of freedom
• Within the groups, take n-1 for each and add them up. Equals n-k.
• Among the groups, take number of groups minus one. Equals k-1.
Any idea what degrees of freedom means??
• You can just remember n-1 or you can consider this.
• Degrees of freedom means how many values are independent given that there is a sum of the values. Once we have a sum of the values and we have n-1 of the values, then the last one is fixed, not free.
• We can consider that there is a sum or a mean or whatever, that fixes that last value.
When do we use df?
• One situation is for the sample standard deviation.
• That uses the deviation around the mean.
• Since we have the mean, then only n-1 values within the sample are “free” or “variable”.
Chi Square
Testing for Associations
NEW QUESTIONS
Up till now we looked for relationships by comparing the means of measurements from samples or populations.
Here we are using counts instead of means of measurements
A non-parametric Test
• Can be used for nominal data
• Count cases within categories
• Chi Square Tests have many applications
• We will use the method to test for independence of categories
Hypotheses
NULL
• The categories are independent
• There is no association between them
ALTERNATIVE
• The categories are related
What are we trying to prove?
As usual, we are trying to reject the null hypothesis.
We are looking for evidence that events are related.
(In other applications of chi-square, may be looking for independence. Not in this course)
Example
• Outbreak of stomach upset on the day following a community picnic
• At the doctor’s office, we note two facts.
• Does the person have an upset stomach
• Was the person at the picnic
• Set up a contingency table - what’s that?
The Table, set it up by counts or proportions
Upset
stomach
Not
upset
Totals
At
picnic
Not at
picnic
Totals
Upset
stomach
Not
upset
Totals
At
picnic
6 19 25
Not at
picnic
14 61 75
Totals 20 80 100
Null Hypothesis
• The upset stomachs have nothing to do with the picnic
• HA: Presence at the picnic is related to upset stomach
• If the H0 is correct, each event will be independent of the others
• The proportion in each category will be just by “chance”
Let’s leave the actual numbers
• Consider the situation
• General concepts of the chance of getting sick if you were at the picnic
• And if you were not at the picnic
Think it through a bit
• 100 persons
• 20 of them are sick, 80 are fine.
• 1/5th and 4/5ths.
• What about the proportions who were at the picnic?
• If an independent event then 1/5 will be sick and 4/5 will be fine
• And the same for those not at the picnic
Upset
stomach
Not
upset
Totals
At
picnic
1/5th of 25 4/5ths of
25
25
Not at
picnic
1/5th of 75 4/5ths of 75
75
Totals 20 80 100
We call these the “expected” values
• Of course, we set up mechanisms so that we calculate these expected values automatically.
• For any cell, we take the total of the row times the total of the column and divide by the grand total.
Upset
stomach
Not
upset
Totals
At
picnic
25
Not at
picnic
75
Totals 20 80 100
Compare “expected” frequencies with actual frequencies
Upset
stomach
Not
upset
Totals
At
picnic
6
(5)
19
(20)
25
Not at
picnic
14
(15)
61
(60)
75
Totals 20 80 100
Sum of Deviations
• How different is the actual from the expected
• Use a formula
• (E – A)2/E and take the sum
• The general form looks familiar – a deviation squared and divided
2
2 = ∑[(Observed – Expected)2 / E]
(6-5)2 + (19-20)2 + (14-20)2 + (61-60)2
5 20 20 60
= 1 + 1 + 36 + 1 = 12+3+108+1 5 20 20 60 60= 124/60 = 2.07
Hypothesis Test for 2
• Use Table A8
• Need a value for alpha
• And degrees of freedom
• For 2 , degrees of freedom =
(number of rows – 1) * (number of columns – 1)
• Calculate it for our example
• In our example, a 2 X 2 table, df = 1
The testFor alpha = 0.05 and df = 1, 2crit = 3.84
Our calculated 2 was 2.07
We have failed to reject the null hypothesis.
We have not proven an association between the potato salad, hot dogs & beer & the upset stomachs