Upload
lethuy
View
214
Download
0
Embed Size (px)
Citation preview
Midterm: Next Thursday Oct 22nd
at 2pm Last Name Starts With
Building and Room Number
A-C MATX 1100
D-K MCML 166
L-Z WESB 100
Assignment #5
Chapter 8: 16, 19 Chapter 9: 19 Due Two Fridays from now Oct. 30th by 2pm in your TA’s homework box
Goodness-of-fit tests Compare an observed frequency distribution with frequency distribution expected under simple probability model Binomial Test: Limited to categorical variables with only two possible outcomes χ2 Test: Can handle categorical and discrete numerical variables having more than two outcomes
χ2 Goodness-of-fit test
Uses a test statistic called χ2 to measure the discrepancy between an observed discrete frequency distribution and the frequencies expected under a simple probability model serving as the null
hypothesis.
Hypotheses for χ2 test
H0: The data come from a particular discrete probability distribution. HA: The data do not come from that
distribution.
Degrees of freedom for χ2 test
df = (Number of categories)
– (Number of parameters estimated from the data)
– 1
χ2 test as approximation of binomial test
• χ2 goodness-of-fit test works even when there are only two categories, so it can be used as a substitute for the binomial test.
• Very useful if the number of data points is large. – Imagine if, in our red/blue wrestler example, rather
than 16/20 wins by red, we had 1600/2000 wins by red. Imagine calculating:
– And then imagine calculating:
Pr[1600]= 2000!1600!400!
0.516000.5400
P = 2*(Pr[1600]+Pr[1601]+...+Pr[2000])
Assumptions of χ2 test
• No more than 20% of categories have Expected<5
• No category with Expected ≤ 1
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?
Number of heads
Number of coins
0 6 1 32 2 105 3 1c86 4 236 5 201 6 98 7 33 8 103
Total 1000
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? H0: The number of heads has a binomial distribution with p=0.05 HA: The number of heads does not have a binomial distribution with
p=0.05
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?
Number of heads
Number of coins
0 6 1 32 2 105 3 186 4 236 5 201 6 98 7 33 8 103
Total 1000
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?
Etc…
Pr[0]= 80
!
"#
$
%& 0.5( )0 0.5( )8 = 0.0039
Pr[1]= 81
!
"#
$
%& 0.5( )1 0.5( )7 = 0.0313
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?
Number of heads
Number of coins
Binomial expectation
0 6 0.0039 1 32 0.0313 2 105 0.1094 3 186 0.2188 4 236 0.2734 5 201 0.2188 6 98 0.1094 7 33 0.0313 8 103 0.0039
Total 1000 1.0
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? Expected Values = Expected probability * Total number of sets of trials Expected[0 heads] = 0.0039 * 1000 = 3.91 Expected[1 heads] = 0.313 * 1000 = 31.25 Etc…
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?
Number of heads
Number of coins
Binomial expectation
Expected
0 6 0.0039 3.91 1 32 0.0313 31.25 2 105 0.1094 109.38 3 186 0.2188 218.75 4 236 0.2734 273.44 5 201 0.2188 218.75 6 98 0.1094 109.38 7 33 0.0313 31.25 8 103 0.0039 3.91
Total 1000 1.0 1000
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?
Number of heads
Number of coins
Binomial expectation
Expected
0 6 0.0039 3.91 1 32 0.0313 31.25 2 105 0.1094 109.38 3 186 0.2188 218.75 4 236 0.2734 273.44 5 201 0.2188 218.75 6 98 0.1094 109.38 7 33 0.0313 31.25 8 103 0.0039 3.91
Total 1000 1.0 1000
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?
Number of heads
Number of coins
Expected
0 or 1 38 35.16 2 105 109.38 3 186 218.75 4 236 273.44 5 201 218.75 6 98 109.38
7 or 8 136 35.16 Total 1000 1000
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins?
Number of heads
Number of coins
Expected (O-E)2 / E
0 or 1 38 35.16 0.23 2 105 109.38 0.18 3 186 218.75 4.90 4 236 273.44 5.13 5 201 218.75 1.44 6 98 109.38 1.18
7 or 8 136 35.16 289.2 Total 1000 1000 χ2 = 302.27
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? χ2 = 302.27 df = 7 – 0 – 1 = 6
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? χ2 = 302.27 df = 7 – 0 – 1 = 6
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? χ2 = 302.27 df = 7 – 0 – 1 = 6 = 12.59 = 22.46 P < 0.001 We reject the null hypothesis. The coins were not fair.
χ0.05,62
χ0.001,62
In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random?
Number of nematodes
Number of fish
0 103 1 72 2 44 3 14 4 3 5 1 6 1
In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random? H0: The number of nematodes per fish has a Poisson distribution HA: The number of nematodes per fish does not have a Poisson
distribution
In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random? Number of nematodes
Number of fish
Poisson expectation
0 103 1 72 2 44 3 14 4 3 5 1 6 1
Pr X[ ] = e−µµ X
X!
Y = 103(0)+ 72(1)+ 44(2)+14(3)+3(4)+1(5)+1(6)238
= 0.945
In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random? Number of nematodes
Number of fish
Poisson expectation
0 103 0.389 1 72 0.367 2 44 0.174 3 14 0.055 4 3 0.013 5 1 0.002 6 1 0.000
In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random?
Number of nematodes
Number of fish
Poisson expectation
Expected
0 103 0.389 92.58 1 72 0.367 87.35 2 44 0.174 41.41 3 14 0.055 13.09 4 3 0.013 3.09 5 1 0.002 0.48 6 1 0.000 0.00
In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random?
Number of nematodes
Number of fish
Expected (O-E)2 / E
0 103 92.58 1.17 1 72 87.35 2.70 2 44 41.41 0.16 3 14 13.09 0.06 ≥4 5 3.57 0.57
χ2 = 4.66
In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random? χ2 = 4.66 df = 5 – 1 – 1 = 3
Example: Fitting the Binomial Distribution One thousand coins were flipped eight times, and the number of heads was recorded for each coin. Were they fair coins? χ2 = 302.27 df = 7 – 0 – 1 = 6
In-class Exercise The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. Do they infect fish at random? χ2 = 4.66 df = 5 – 1 – 1 = 3 = 7.81 P>0.05 We do not reject the null hypothesis. There is no evidence that nematodes do not infect fish randomly.
χ0.05,32
Odds
€
O =p
1− pOdds of survival:
€
Omen =0.201− 0.20
=0.200.80
= 0.25
€
Owomen =0.741− 0.74
=0.740.26
= 2.85
Or “1 to 4”
Or roughly “3 to 1”
The probability of success divided by the probability of failure.
Odds ratio
€
OR =O1O2
Odds ratio of female to male survival:
If interested, see text for how to calculate standard error and confidence interval.
The odds of success in one group divided by the odds of success in another group.
€
OR =Owomen
Omen
=2.850.25
=11.4
Used often in medical research
Contingency analysis
• Test the independence of two or more categorical variables
• We’ll learn one kind: χ2 contingency analysis
Music and wine buying OBSERVED French
music playing
German music
playing
Totals
Bottles of French wine
sold
40 12 52
Bottles of German
wine sold
8 22 30
Totals 48 34 82
Hypotheses
• H0: The nationality of the bottle of wine is independent of the nationality of the music played when it is sold.
• HA: The nationality of the bottle of wine sold depends on the nationality of the music being played when it is sold.
Calculating the expectations
With independence, Pr[ French wine AND French music] =
Pr[French wine] × Pr[French music]
Calculating the expectations
Pr[French wine] = 52/82=0.634 Pr[French music] = 48/82= 0.585
EXP. French music
German music
Totals
French wine sold
52
German wine sold
30
Totals 48 34 82
If H0 is true, Pr[French wine AND French music] = (0.634)(0.585) = 0.37112
Calculating the expectations
EXP. French music German music
Totals
French wine sold
0.37 (82) = 30.4 21.6 52
German wine sold 17.6 12.4 30
Totals 48 34 82
By H0, Pr[French wine AND French music] = (0.634)(0.585)=0.37112
χ2
€
χ 2 =Observedi − Expectedi( )2
Expectedii∑
=40 − 30.4( )2
30.4+12 − 21.6( )2
21.6+8 −17.6( )2
17.6+22 −12.4( )2
12.4
= 20.0
Conclusion
χ2 = 20.0 >> χ21,α=0.05 = 3.84,
So we can reject the null hypothesis of
independence, and say that the nationality of the wine sold did depend on what music was played.
Moreover, χ2 = 20.0 >> χ2
1,α=0.001 = 10.83, so we can say P < 0.001.
Assumptions
• This χ2 test is just a special case of the χ2 goodness-of-fit test, so the same rules apply.
• You can’t have any expectation less than 1, and no more than 20% < 5.
Fisher’s exact test
• For 2 x 2 contingency analysis
• Does not make assumptions about the size of expectations
• JMP (or other programs) will do it, but cumbersome to do by hand
Winter Wren (Troglodytes troglodytes) • Are western and eastern forms (currently considered subspecies) actually reproductively isolated, and therefore separate species?
Tumbler Ridge, BC or ?
T. (t.) pacificus T. t. hiemalis
Photos by D. Irwin
Association of DNA and song: The winter wren contact zone
OBSERVED Western song
Eastern song
Totals
Western mtDNA
12 0 12
Eastern mtDNA
0 4 4
Totals 12 4 16
Data from Toews & Irwin 2008, Molecular Ecology
Calculating the expectations A shortcut for calculating expectations (assuming H0 is true): EXP. Western
song Eastern
song Totals
Western mtDNA 12
Eastern mtDNA 4
Totals 12 4 16
Exp[row i, column j] =
(row i total)(column j total) grand total
Exp[w mtDNA, w song] = 12*12/16 = 9
Comparing observed and expected
EXP. Western song
Eastern song
Totals
Western mtDNA 9 3 12
Eastern mtDNA 3 1 4
Totals 12 4 16
OBS. Western song
Eastern song
Totals
Western mtDNA 12 0 12
Eastern mtDNA 0 4 4
Totals 12 4 16
Too many of the expected are below 5, so we cannot use the χ2 contingency test. Instead, we use a computer to do Fisher’s exact test:
P = 0.00055, so we reject the H0 of no association.
In-class Exercise Do mosquitos infected with malaria bite more people?
Infected Uninfected Total
Multiple Bites
20 16 36
Single Bite
69 157 226
Total 89 173 262
In-class Exercise Do mosquitos infected with malaria bite more people? H0: Biting multiple times is independent of malaria infection HA: Biting multiple times is dependent on malaria infection
In-class Exercise Do mosquitos infected with malaria bite more people?
Infected Uninfected Total
Multiple Bites
20 16 36
Single Bite
69 157 226
Total 89 173 262
Pr[Infected] = 89/262 Pr[Multiple] = 36/262
If H0 is true, Pr[Infected AND multiple bites] = (0.340)(0.137) = 0.047
In-class Exercise Do mosquitos infected with malaria bite more people?
Infected Uninfected Total
Multiple Bites
262(0.0467) = 12.23
23.77 36
Single Bite
76.77 149.23 226
Total 89 173 262
Pr[Infected] = 89/262 Pr[Multiple] = 36/262
If H0 is true, Pr[Infected AND multiple bites] = (0.3400)(0.1374) = 0.0467
Expected:
χ2
χ 2 =Observedi −Expectedi( )2
Expectedii∑
=20−12.23( )2
12.23+69− 76.77( )2
76.77+16− 23.77( )2
23.77+157−149.23( )2
149.23
= 8.67
In-class Exercise Do mosquitos infected with malaria bite more people? χ2 = 8.67 df = (2-1)(2-1)= 1 0.005 > P > 0.001 We reject the null hypothesis. Biting multiple times is dependent on malaria infection