Finish Anova And then Chi- Square. Fcrit Table A-5: 4 pages of values Left-hand column: df denominator df for MSW = n-k where k is the number of groups

Finish Anova

And then Chi- Square

Fcrit

• Table A-5: 4 pages of values

• Left-hand column: df denominator

df for MSW = n-k where k is the number of groups

• Next column: area in upper tail, e.g 0.100. This is an alpha.

• Across the top, df numerator

df for MSA = k - 1

Conclusions for Anova

• Look at the graph of the f distribution on page 290.

• If Fcalc is greater than Fcrit, we reject the null hypothesis that the means of the populations for the three or more groups are equal.

• We can accept the hypothesis that at least one of the groups comes from a different population

Bonferroni Method• Anova tells us if there is a difference but

not if all the groups come from different populations

• Maybe 2 or more are from the same population and only one is different

• How do we find out which one(s) are different?

• Bonferroni or Tukey’s Method or others• We aren’t going to do them until later in

the course.

Degrees of Freedom

• You may notice in each of these cases, we are still using n-1 for the degrees of freedom

• Within the groups, take n-1 for each and add them up. Equals n-k.

• Among the groups, take number of groups minus one. Equals k-1.

Any idea what degrees of freedom means??

• You can just remember n-1 or you can consider this.

• Degrees of freedom means how many values are independent given that there is a sum of the values. Once we have a sum of the values and we have n-1 of the values, then the last one is fixed, not free.

• We can consider that there is a sum or a mean or whatever, that fixes that last value.

When do we use df?

• One situation is for the sample standard deviation.

• That uses the deviation around the mean.

• Since we have the mean, then only n-1 values within the sample are “free” or “variable”.

Chi Square

Testing for Associations

NEW QUESTIONS

Up till now we looked for relationships by comparing the means of measurements from samples or populations.

Here we are using counts instead of means of measurements

A non-parametric Test

• Can be used for nominal data

• Count cases within categories

• Chi Square Tests have many applications

• We will use the method to test for independence of categories

Hypotheses

NULL

• The categories are independent

• There is no association between them

ALTERNATIVE

• The categories are related

What are we trying to prove?

As usual, we are trying to reject the null hypothesis.

We are looking for evidence that events are related.

(In other applications of chi-square, may be looking for independence. Not in this course)

Example

• Outbreak of stomach upset on the day following a community picnic

• At the doctor’s office, we note two facts.

• Does the person have an upset stomach

• Was the person at the picnic

• Set up a contingency table - what’s that?

The Table, set it up by counts or proportions

Upset

stomach

Not

upset

Totals

At

picnic

Not at

picnic

Totals

Upset

stomach

Not

upset

Totals

At

picnic

6 19 25

Not at

picnic

14 61 75

Totals 20 80 100

Null Hypothesis

• The upset stomachs have nothing to do with the picnic

• HA: Presence at the picnic is related to upset stomach

• If the H0 is correct, each event will be independent of the others

• The proportion in each category will be just by “chance”

Let’s leave the actual numbers

• Consider the situation

• General concepts of the chance of getting sick if you were at the picnic

• And if you were not at the picnic

Think it through a bit

• 100 persons

• 20 of them are sick, 80 are fine.

• 1/5th and 4/5ths.

• What about the proportions who were at the picnic?

• If an independent event then 1/5 will be sick and 4/5 will be fine

• And the same for those not at the picnic

Upset

stomach

Not

upset

Totals

At

picnic

1/5th of 25 4/5ths of

25

25

Not at

picnic

1/5th of 75 4/5ths of 75

75

Totals 20 80 100

We call these the “expected” values

• Of course, we set up mechanisms so that we calculate these expected values automatically.

• For any cell, we take the total of the row times the total of the column and divide by the grand total.

Upset

stomach

Not

upset

Totals

At

picnic

25

Not at

picnic

75

Totals 20 80 100

Compare “expected” frequencies with actual frequencies

Upset

stomach

Not

upset

Totals

At

picnic

6

(5)

19

(20)

25

Not at

picnic

14

(15)

61

(60)

75

Totals 20 80 100

Sum of Deviations

• How different is the actual from the expected

• Use a formula

• (E – A)2/E and take the sum

• The general form looks familiar – a deviation squared and divided

2

2 = ∑[(Observed – Expected)2 / E]

(6-5)2 + (19-20)2 + (14-20)2 + (61-60)2

5 20 20 60

= 1 + 1 + 36 + 1 = 12+3+108+1 5 20 20 60 60= 124/60 = 2.07

Hypothesis Test for 2

• Use Table A8

• Need a value for alpha

• And degrees of freedom

• For 2 , degrees of freedom =

(number of rows – 1) * (number of columns – 1)

• Calculate it for our example

• In our example, a 2 X 2 table, df = 1

The testFor alpha = 0.05 and df = 1, 2crit = 3.84

Our calculated 2 was 2.07

We have failed to reject the null hypothesis.

We have not proven an association between the potato salad, hot dogs & beer & the upset stomachs

Documents

Finish Anova And then Chi- Square. Fcrit Table A-5: 4 pages of values Left-hand column: df denominator df for MSW = n-k where k is the number of groups