46
The Analysis of Biological Data - Whitlock and Schluter Solutions to practice problems Chapter 1 1. (a) Ordinal (b) Nominal (c) Nominal (d) Ordinal 2. (a) Observational (cats were observed to fall a given number of storiesthe researchers did not assign the number of stories fallen to the cats). (b) The number of floors fallen. (c) The number of injuries per cat. 3. (a) Discrete (b) Continuous (c) Discrete (d) Continuous 4. (a) Collectors do not usually sample randomly, but prefer rare and unusual specimens. Therefore, the moths are probably not a random sample. (b) A sample of convenience. (c) Bias (rare color types are likely to be over-represented in the sample compared with the population). 5. Accuracy. 6. (a) Categorical (b) No random sampling procedure was followed so the survey is not random; the respondents were volunteers. (c) Volunteer bias (those most interested in the issue were more likely to respond). 7. (a) US army personnel stationed in Iraq. (b) Yes. The number of subjects interviewed was a random sample of only 100 from the population; therefore the responses of the particular individuals who happened to be sampled will differ from the population of interest by chance. (c) The advantage of random sampling is that it minimizes bias and allows the precision of estimates of stress levels to be measured. 8. (a) The population being estimated is all the small mammals of Kruger National Park. (b) No, the sample is not likely to be random. In a random sample, every individual has the same chance of being selected. But some small mammals might be easier to trap than others (for example, trapping only at night might miss all the mammals active only in the day time). In a random sample individuals are selected independently. Multiple animals caught in the same trap might not be independent if they are related or live near one another (this is harder to judge). (c) The number of species in the sample might underestimate the number in the Park if sampling was not random (e.g., if daytime mammals were missed), or if rare species happened to avoid capture. In this case m would be a biased estimate of M. 9. Chicks were not sampled randomly. The multiple chicks from the same nest are not independent. Chicks from the same nest are likely to be more similar to one another in their measurements than chicks chosen randomly from the population. Ignoring this problem will lead to erroneous measurements of the precision of survival estimates. sampled randomly from the population. Chapter 2 1. (a) 25% (b) 33% (c) 33%

The Analysis Of Biological Data Practice problem answers

Embed Size (px)

DESCRIPTION

Practice Problem answers to The Analysis of Biological Data

Citation preview

  • The Analysis of Biological Data - Whitlock and SchluterSolutions to practice problems

    Chapter 11. (a) Ordinal (b) Nominal (c) Nominal (d) Ordinal

    2. (a) Observational (cats were observed to fall a given number of storiesthe researchers did notassign the number of stories fallen to the cats). (b) The number of floors fallen. (c) The numberof injuries per cat.

    3. (a) Discrete (b) Continuous (c) Discrete (d) Continuous4. (a) Collectors do not usually sample randomly, but prefer rare and unusual specimens. Therefore,

    the moths are probably not a random sample. (b) A sample of convenience. (c) Bias (rare colortypes are likely to be over-represented in the sample compared with the population).

    5. Accuracy.

    6. (a) Categorical (b) No random sampling procedure was followed so the survey is not random; therespondents were volunteers. (c) Volunteer bias (those most interested in the issue were morelikely to respond).

    7. (a) US army personnel stationed in Iraq. (b) Yes. The number of subjects interviewed was arandom sample of only 100 from the population; therefore the responses of the particularindividuals who happened to be sampled will differ from the population of interest by chance.(c) The advantage of random sampling is that it minimizes bias and allows the precision ofestimates of stress levels to be measured.

    8. (a) The population being estimated is all the small mammals of Kruger National Park. (b) No, thesample is not likely to be random. In a random sample, every individual has the same chance ofbeing selected. But some small mammals might be easier to trap than others (for example,trapping only at night might miss all the mammals active only in the day time). In a randomsample individuals are selected independently. Multiple animals caught in the same trap mightnot be independent if they are related or live near one another (this is harder to judge).(c) The number of species in the sample might underestimate the number in the Park if samplingwas not random (e.g., if daytime mammals were missed), or if rare species happened to avoidcapture. In this case m would be a biased estimate of M.

    9. Chicks were not sampled randomly. The multiple chicks from the same nest are not independent.Chicks from the same nest are likely to be more similar to one another in their measurementsthan chicks chosen randomly from the population. Ignoring this problem will lead to erroneousmeasurements of the precision of survival estimates.sampled randomly from the population.

    Chapter 21. (a) 25% (b) 33% (c) 33%

  • 2. (a) Skewed right(b) Modes: 7.58, 2.53, 1111.5(c) The bimodal distribution in this example is left skewed.

    3. The histogram on the right is correct. Histograms use area, not height, to represent frequency.

    4. 5. Contingency table.

    Source of fryHatchery Wild Total

    Survived 27 51 78Perished 3973 3949 7922

    Total 4000 4000 80006. (a) The following table orders the taxa from those with the most endangered species to those with

    the fewest endangered species. (Other orderings could make sense, such as by phylum)Taxon Number of speciesPlants 745Fish 115Birds 92Mammals 74Clams 70Insects 44Reptiles 36Snails 32Amphibians 22Crustaceans 21Arachnids 12

    (b) Frequency table.(c) Bar graph, because data are categorical.

  • (d) The baseline should be 0 so that bar height and area correspond to frequency.7. (a) The variables are year (or famine stage) and incidence of schizophrenia. (b) Year (and famine

    stage) is categorical ordinal. Incidence of schizophrenia is categorical nominal. (c)Contingency table.

    1956(pre famine)

    1960(famine)

    1965(post famine) Total

    Schizophrenic 483 192 695 1370Non-schizophrenic 58605 13556 82841 155002

    Total 59088 13748 83536 156372

    (d) Proportions: 1956: 0.0082, 1960: 0.0140, 1965: 0.0083. Other variations of the line graphbelow are valid (e.g., famine state could be the explanatory variable instead of year)

    Pattern revealed: incidence of schizophrenia highest in 1960, the famine year.8. (a) Grouped histograms. (b) Groups are disease types (diseases transmitted directly from one

    individual to another and diseases transferred by insect vectors). (c) The variable is the virulenceof the disease. It is a continuous numeric variable. (d) Relative frequency, the fraction ofdiseases occurring in each interval of virulence. (e) Directly transmitted diseases tend to be lessvirulent than insect-transmitted diseases. Human diseases transmitted directly more frequentlyhave low virulence, and less frequently have high virulence, than diseases transmitted by insectvector.

    9. (a) Map. (b) Date of first appearance of rabies (measured by the number of months following 1March 1991). (c) Geographic location - the township.

  • 10. (a) Launch temperature is the explanatory variable, number of O-ring failures is the responsevariable. Because we wish to predict failures from temperature.

    (b) There is a negative association. (c) Low temperature is predicted to increase the risk of O-ring failure.

    11. (a) The distribution is left skewed.(b) The mode is between 65 and 80 degrees (if you used an interval width of 10 degrees instead

    of 5 degrees then the mode is between 70 and 80 degrees.12. (a) Husbands of adopted women resemble a woman's adoptive father, more than they resemble

    the women themselves or the womens adoptive mother. (b) Inappropriate baseline. The eye isreceptive to bar height and area, but because of the baseline height and area do not depictmagnitude. A baseline of 0 would probably be best (adding a horizontal line at resemblance = 25might also be a good idea to indicate the random expectation). (c) No, the graph is depicting ameasurement (resemblance) in each of several categories, not frequency of occurrence.

    13. (a) Mosaic plot. The explanatory variable is fruit set in previous years. The response variable isfruiting in the given year. Both variables are categorical. (b) Line graph. The explanatoryvariable is year. The response variable is the density of the wood. Both variables are numerical.(c) Grouped cumulative frequency distributions. The explanatory variable is the taxonomic group

    (butterflies, birds, or plants). The response variable is the percent change in range size. Groupis a categorical variable. Range change is numerical.

  • Chapter 31. (a) The number of doctors studied who sterilized a given percentage of patients under age 25.

    (b) The median would be more informative. The mean would be sensitive to the presence of theoutlier (sterilizing more than 30% of female patients under 25), whereas the median is notaffected.

    2. (a) Box plot. (b) Median body mass of the mammals in each group. (c) The first and thirdquartiles of body mass in each group. (d) Extreme values, those lying farther than 1.5 times theinterquartile range from the box edge. (e) Whiskers. They extend to the smallest and largestvalues in the data, excluding extreme values (those lying farther than 1.5 times the interquartilerange from the box edge). (f) Living mammals have the smallest median body size (with a logbody mass of about 2), and mammals that went extinct in the last ice age had the largest mediansize (around 5.5). Mammals that went extinct recently were intermediate in size, with a medianof around 3.2. (g) The body size distribution of living mammals is right skewed (long tail towardlarger values), whereas the size frequency distribution in mammals that went extinct in the lastice age is left skewed. The frequency distribution of sizes is nearly symmetric in mammals thatwent extinct recently. (h) One way to answer this is to use the interquartile range as a measure ofspread, in which case the mammals that went extinct in the last ice age have the lowest spread.Another way to compare spread is to calculate the standard deviation of log body size in eachgroup, but we are unable to calculate this without the raw data. (i) It is likely that extinctionshave reduced the median body size of mammals.

    3. (a) Grouped histograms. (b) Explanatory variable: sex. Response variable: Number of words perday. (c) Men: 8,000-12,000 words per day. Women: 16,000-20,000 words per day. (d) Women.(e) Men

    4. (a) The averages are 7.5 g for the crimson-rumped waxbill (CW), 15.4 g for the cutthroat finch(CF), and 37.9 g for the white-browed sparrow weaver (WS). (b) Standard deviations are 0.6 gfor CW, 1.2 g for CF, and 3.1 g for WS. The standard deviation is greater when the mean isgreater. (c) Coefficients of variation:8.3% for CW, 8.0% for CF, and 8.2% for WS. Thecoefficients of variation are much more similar than the standard deviations. (d) Comparecoefficients of variation to compare variation relative to the mean. Beak length: 3.8%, Bodymass: 8.2%. Body mass is most variable relative to the mean.

    5. Use the cumulative frequency distribution to estimate the median (0.50 quantile, approximately1.4%), the first and third quartiles (approximately 0.7% and 2.4%). The interquartile range isabout 2.4 0.7 = 1.6% and 1.5 times this amount is about 2.4%, which sets the maximum lengthof each whisker. The smallest value is about 1.1% and the largest value is about 4.3%. Boththese values are within 1.5 times the interquartile range, so the whiskers extend from 1.1% and4.3%. The boxplot is:

  • 6. (a) cm/s.

    (b) (c) Asymmetric. The range of changes in speed is much greater above the median than below themedian. This is true whether we look at the 3rd quartile or the total range. (d) The span of the boxindicates where the middle 50% of the measurements occur (first quartile to third quartile). (e)Mean is 1.19 cm/s, which is greater than the median (0.94). The distribution is right skewed, andthe large values influence the mean more than the median. (f) 1.15 (cm/s)2 (g) The interval fromY s and Y + s is 0.11 to 2.26. This includes 11 of the 16 observations, or 69%.

    7. (a) Increase the mean by k. (b) No effect.

    8. (a) Increase Y by 10 times (11.9 mm/s) (b) Increase s by 10 times (10.7 mm/s) (c) Increasemedian by 10 times (9.4 mm /s) (d) Increase interquartile range by 10 times (16.8 mm/s) (e) Noeffect. (f) Increase s2 by 100 times (115.3 (mm/s)2

    9. (a) 5.7 hours. (b) 2.4 hours. (c) 83/114 = 0.73 (73%). (d) Median = 5 hours, which is less than themean. The distribution is right skewed, and the large values influence the mean more than themedian.

    Chapter 41. (a) 0.22 hours. (b) The sampling distribution of the estimates of the mean time to rigor mortis. (c)

    Random sampling from the population of corpses.

    2. The median will probably be smaller than the mean. The distribution is right skewed. Extremevalues have a greater effect on the mean, pulling it upwards, than on the median.

    3. Sample size.

  • 4. (a) False ( p is just an estimate based on a sample). (b) True (c) True (d) False, this fraction is aconstant. (Leaving aside the possibility of gene copy number variation between people) (e) True

    5. (a) The fraction of all Canadians who agree with the statement. (b) The sample estimate is 73%.(c) The sample size is 1641 people. (d) The 95% confidence interval for the population fraction.

    6. (a) 95.9 msec. The population mean flash duration. (b) No, because the calculation was based ona sample rather than the whole population. By chance, the sample mean will differ from that ofthe population mean. (c) 1.9 msec. (d) The spread of the sampling distribution of the samplemean. (e) Using the 2SE rule, 95.9+1.9 msec yields 92.2 < < 99.7. (f) The interval representsthe most plausible values for the population mean . In roughly 95% of random samples fromthe population, when we compute the 95% confidence interval the interval will include the truepopulation mean.

    7. The population mean is likely to be small or zero.

    Chapter 51. (a) 5/8 (b) 1/4 (c) 7/8 (either in this case means pepperoni or anchovies or both) (d) No. (some

    slices have both pepperoni and anchovies) (e) Yes. Olives and mushrooms are mutuallyexclusive (f) No. Pr[mushrooms] = 3/8; Pr[anchovies] = 1/2; if independent, Pr[mushrooms &anchovies] = Pr[mushrooms] Pr[anchovies] = 3/16. Actual probability = 1/8. Not independent.(g) Pr[anchovies | olives] = 1/2 (two slices have olives: one of these two has anchovies) (h)Pr[olives | anchovies] = 1/4 (four slices have anchovies, one of these has olives) (i) Pr[last slicehas olives] = 1/4 (two slices of the eight have olives; you still get one slice - it doesn't matterwhether your friends pick before you or after you). (j) Pr[two slices with olives] = Pr[first slicehas olives] Pr[second slice has olives | first slice has olives] = 2/8 1/7 = 1/28. (k) Pr[slicewithout pepperoni] = 1 - Pr[slice with pepperoni] = 3/8 (l) Each piece has either one or notopping.

    2. Pr[successful hunting) = Pr[find prey] Pr[captures prey | finds prey] = 0.8 0.1 = 0.08.

    3. 45 of 273 trees have cavities, so the probability of choosing a tree with a cavity is 45/273 =0.165.

    4. (a) Pr[vowel] = Pr[A] + Pr[E] + Pr[I] + Pr[O] + Pr[U] = 8.2% + 12.7% + 7.0% + 7.5% + 2.8% =38.2%. (b) Pr[five randomly chosen letters from an English text would spell "STATS"] = Pr[S] Pr[T] Pr[A] Pr[T] Pr[S] = 0.063 0.091 0.082 0.091 0.063 = 2.7 10-6. (Eachdraw is independent, but all must be successful to satisfy the conditions, so we must multiply theprobability of each independent event.) (c) Pr[2 letters from an English text = "e"] = 0.127 0.127 = 0.016.

    5. (a) Pr[A1 or A4] = Pr[A1] + Pr[A4] = 0.1 + 0.05 = 0.15 (b) Pr[A1 A1) = Pr[A1] Pr[A1] = 0.1 0.1 = 0.01. (This is only true because the individuals mate at random. If we did not know this, wecould not calculate the probability). (c) Pr[not A1 A1] = 1 - Pr[A1 A1] = 1 - 0.01 = 0.99 (d)Pr[two individuals not A1A1] = Pr[not A1 A1] Pr[not A1 A1] = 0.99 0.99 = 0.9801. (e) Pr[atleast one of two individuals chosen at random has genotype A1 A1] = 1 - Pr[neither individual isA1 A1] = 1 - 0.9801 = 0.0199. (f) Pr[three randomly chosen individuals have no A2 or A3 alleles]= Pr[six randomly chosen alleles are not A2 or A3] = Pr[one allele is neither A2 nor A3]6 ;

  • Pr[one allele is neither A2 nor A3] = 1 - Pr[individual has either A2 or A3] = 1 - [Pr[A2) + Pr[A3)]= 1 - (0.15 + 0.6) = 0.25. Pr[all three lack these alleles] = 0.256 = 0.00024.

    6.

    Pr[sum to 7] = 1/6

    7. (a) Pr[no venomous snakes] = Pr[not venomous in the left hand] Pr[not venomous in the righthand]= 3/8 2/7 = 6/56 = 0.107(b) Pr[bite]= Pr[bite| 0 venomous snakes] Pr[0 venomous snakes] +

    Pr[bite| 1 venomous snakes] Pr[1 venomous snakes] +Pr[bite| 2 venomous snakes] Pr[2 venomous snakes]

    Pr[0 venomous snakes] = 0.107 (from part a) Pr[1 venomous snake] = (5/8 3/7) + (3/8 5/7) = 0.536 Pr[ 2 venomous snakes] = 5/8 4/7 = 0.357Pr[bite| 0 venomous snakes] = 0Pr[bite| 1 venomous snake] = 0.8Pr[bite| 2 venomous snakes] = 1 (10.8)2 = 0.96Putting these all together:Pr[bite]= (0 0.107) + (0.8 0.536) + (0.96 0.357) = 0.772

    (c)

    Pr defanged | no bite[ ] = Pr no bite | defanged[ ]Pr defanged[ ]Pr no bite[ ]Pr[no bite | defanged] = 1; Pr[defanged] = 3/8;Pr[no bite] = Pr[defanged] Pr[no bite |defanged] + Pr[venomous] Pr[no bite | venomous] = (3/8 1)+ (5/8 (1 0.8)) = 0.5So, [defanged | one snake did not bite] = (1.0 3/8) / (0.5) = 0.75.

  • 8. (a) Pr[all five researchers calculate 95% CI with the true value]? Each one has a 95% chance, allsamples are independent, so Pr = 0.955 = 0.774 (b) Pr[at least one does not include trueparameter] = 1 - Pr[all include true parameter] = 1 - 0.774 = 0.226.

    9. (a) Pr[cat survives seven days] = Pr[cat not poisoned one day]7 = 0.997 = 0.932 (b) Pr[catsurvives a year] = Pr[cat not poisoned one day]365 = 0.99365 = 0.026 (c) Pr[cat dies within year] =1 - Pr[cat survives year] = 1 - 0.26 = 0.974

    10. (a) Pr[single seed lands in suitable habitat] = 0.3 (b) Pr[two independent seeds land in suitablehabitat] = 0.32 = 0.09. (c) Pr[two of three independent seeds land suitable habitat] = 3 (0.7 0.3 0.3) = 0.189

    11 (a) False positive rate = Pr[positive test | no HIV] = 8 out of 4000 = 0.002 (b) False negative rate= Pr[negative test | HIV] = 20 out of 1000 = 0.02 (c) 988 tested positive, of which 980 had HIV;980/988= 0.9919;

    12. Each win has probability 0.09, each loss probability 0.91.(a) Pr[WWLWWW] = 0.095 0.91 = 5.4 10-6 (b) Pr[WWWWWL] = 0.095 0.91 = 5.4 10-6

    (c) Pr[LWWWWW] = 0.095 0.91 = 5.4 10-6 (d) Pr[WLWLWL] = 0.093 0.913 = 5.5 10-4

    (e) Pr[WWWLLL] = 0.093 0.913 = 5.5 10-4 (f) Pr[WWWWWW] = 0.096 = 5.3 10-7

    13. (a)

    The probability that the match lasts two sets is 1/4 + 1/4 = 1/2. The probability that it lasts threegames is 1/8+1/8+1/8+1/8=1/2.

  • (b)

    The probability of the weaker player winning is (0.452 0.55) + (0.452 0.55) + 0.452 = 0.42525

    14. Pr[next person will wash his / her hands] =Pr[wash | man] Pr[man] + Pr[wash | woman] Pr[woman] = 0.74 0.4 + 0.83 0.6 = 0.794

    15. Total probability: 4/36 = 1/9 = 0.1116. (a) Pr[one person not blinking] = 1 - Pr[person blinks] = 1 - 0.04 = 0.96 (b) Pr[at least one blink

    in 10 people] = 1 - Pr[no one blinks] = 1 - 0.9610 = 0.335

    Chapter 61. (a) True. (b) False.2. (a) H0: The rate of correct guesses is 1/6. (b) HA: The rate of correct guesses is not 1/6. (We

    would argue that a two-tailed test is more appropriate than a one-tailed test who can say thatESP would work the way we want it to?).

  • 3. (a) Alternative hypothesis. (b) Alternative hypothesis. (c) Null hypothesis. (d) Alternativehypothesis (e) Null hypothesis.

    4. (a) Lowers the probability of committing a Type I error. (b) Increases the probability ofcommitting a Type II error. (c) Lowers power of a test. (d) No effect.

    5. (a) No effect. (b) Decreases the probability of committing a type II error. (c) Increases the powerof a test. (d) No effect.

    6. (a) P = 2 (Pr[15] + Pr[16] + Pr[17]+Pr[18]) = 0.0075. (b) P = 2 (Pr[13] + Pr[14] + . +Pr[18]) = 0.096. (c) P = 2 (Pr[10] + Pr[11] + Pr[12] + + Pr[18]) = 0.815. (d) P = 2 (Pr[1]+ Pr[2] + Pr[3] + + Pr[7]) = 0.481.

    7. False. The hypotheses arent variables subject to chance. Rather, the P value measures howunusual the data are if the null hypothesis is true.

    8. Failing to reject H0 does not mean H0 is correct, because the power of the test might be limited.The null hypothesis is the default and is either rejected or not rejected.

    9. Begin by stating the hypotheses. H0: Size on islands does not differ in a consistent direction fromsize on mainlands in Asian large mammals (i.e., p = 0.5); HA: Size on islands differs in aconsistent direction from size on mainlands in Asian large mammals (i.e., p 0.5), where p is thetrue fraction of large mammal species that are smaller on islands than on the mainland. Note thatthis is a 2-tailed test. The test statistic is the observed number of mammal species for which sizeis smaller on islands than mainland: 16. The P-value is the probability of a result as unusual as16 our of 18 when H0 is true: P = 2 (Pr[16] + Pr[17] + Pr[18]) = 0.00135. Since P < 0.05, rejectH0. Conclude that size on islands is usually smaller than on mainlands in Asian large mammals.

    10. (a) Not correct. The P-value does not give the size of the effect. (b) Correct. H0 was rejected, sowe conclude that there is indeed an effect. (c) Not correct. The probability of committing a TypeI error is set by the significance level, 0.05, which is decided beforehand. (d) Not correct. Theprobability of committing a type II error depended on the effect size, which wasnt known. (e)Correct.

    Chapter 71. (a) 91 / 220 had cancer, so the estimated probability of a cast or crew member developing cancer

    is 0.414. (b) The standard error of the proportion is 0.033. [The square-root of (0.414)(1 -0.414)/(220-1). This quantity measures the standard deviation of the sampling distribution of theproportion. (c) Using the Agresti-Coull method to generate confidence intervals, we firstcalculate p' = (x+2)/(n+4) = (91 + 2) / (220 + 4 ) = 0.415.

    The lower bound for the 95% confidence interval is 0.415 - 1.96

    p 1 p ( )n + 4 = 0.351

    The upper bound is 0.415 + 1.96

    p 1 p ( )n + 4 = 0.480. The confidence intervals do not bracket the

    typical 14% cancer rate for the age group. It is unlikely that 14% is the true cancer rate for thisgroup.

    2. (a) 46/50 bills had cocaine, so the estimated proportion is 0.92. (b) The 99% confidence interval,calculated using the Agresti-Coull method: p' = 48 / 54 = 0.89. Z for 99% = 2.58. The bounds of

  • the confidence interval are 0.779 and 0.999. (c) We are 99% confident that the true proportion ofUS one-dollar bills that have a measurable cocaine content lies between 0.779 and 0.999.

    3. (a) No; the probability of drawing a red card changes depending on the cards that have beendrawn. (b) Yes: because the sampling is with replacement, the probability of success remainsconstant. (c) No, the probability of drawing a red ball will change with each draw.(d) No, the number of trials is not known. (e) Yes, the number of red-eyed flies in a sample froma large population can be described using the binomial distribution. (Because the populationsampled is large, we will assume that the sampling of each individual has a negligible effect onthe probability of drawing a red fly on the next draw). (f) No, the individuals within differentfamilies may have different probabilities of having red eyes due to shared genetic differences.

    4. (a) This estimate pools numbers from two different groups. Women with rosacea are not arandom sample of the population. (b) If the control group women are a random sample, 15/16have mites, 0.938. (c) p'= 17 / 20 = 0.85. The 95% confidence interval is 0.69 to 1.00.(Proportions cannot be larger than 1, so the confidence interval is truncated at 1). (d) For thewomen with rosacea, p' = 18/20 = 0.90. The 95% confidence interval is: 0.769 < p < 1.0.

    5. (a) Pr[male is eaten] = 21/52 = 0.404. p'=23/56 = 0.411. The 95% confidence interval is from0.282 to 0.540. (b) This estimate is consistent with a 50% capture of males, but not with a 10%capture of males. (c) The larger sample would not change the estimate of the proportion (thefraction is exactly the same), but it would reduce width of the confidence interval. (The samplesize is in the denominator of the confidence interval equations.)

    6. Estimated probability of moving north is 22/24 = 0.917. To test the hypothesis that north andsouth movements of ranges are equally probable, we need to calculate the probability of 22, 23,or 24 species moving northward if p = 0.5. We will make this a two-tailed test, just to satisfy theskeptics, so we'll multiply this probability by two.

    222 )5.01(5.02224

    ]22Pr[

    = = 1.6 10-5

    Doing the same for 23 and 24, summing all three, and multiplying by two, we find that P = 3.6 10-5. We can confidently reject the null hypothesis.

    7. (a) The first study with the narrower confidence interval probably had the larger sample size. (b)The estimate with the smaller confidence interval is the more believable, in the sense that thetrue value is likely to be somewhere near the estimate. (c) The differences are what we mightexpect due to random sampling: the second confidence interval overlaps the estimate of the firststudy.

    8. (a) This is evidence that males' choices may be affected by female positioning of the females: theprobability of seeing 19 of 24 choose the M0 female with null hypothesis that there is nopreference (p = 0.5). With a binomial test,

    P = 2 Pr 19[ ] + Pr 20[ ] + Pr 21[ ] + Pr 22[ ] + Pr 23[ ] + Pr 24[ ]{ } = 2 0.00253 + 0.0006 + 0.0001 + ...{ } = 0.0066. Therefore P < 0.05, and we can reject the null hypothesis that the males have no preference.(b) However, the males may have preferred one female over the other for a number of reasons. Ifthe females were sisters, this would reduce the number of differences due to genes and maternal

  • environment, and so would be more likely to test for fetal position. Ideally, this would be donewith 24 sets of sisters so that each trial is independent.

    9. (a) In 20 samples, the average proportion of A will be 30%. (b) We would expect the number ofA strain bacteria in samples to conform to the binomial distribution. (c) The standard deviation

    of the proportion of A cells is the standard error:

    p =0.3 1 0.3( )

    15 = 0.118. (d) 95% of thetechnicians should construct a confidence interval for the proportion of A cells that includes 0.3.

    10. (a) 1/6 of the 12 dice should have "3", or 2 dice. (b) Pr[no three out of 12 rolls] = (1 - Pr[3 on

    one roll])12 = 5/6 12 = 0.112. (c) Pr[3 threes out of 12] =

    Pr[3] = 123

    0.1673(1 0.167)9= 0.197.

    (d) Each die has six faces, each with probability 1/6 of showing. The average value for the diewill be the sum of each value times its probability, or (1 1/6) + (2 1/6) + . . . (6 1/6) or 1/6 21 = 3.5. For twelve dice, the average sum of the numbers showing will be 12 3.5 = 42. (e)Pr[all dice show 1 or 6] = (Pr[1]+Pr[6])12 = 1.88 10-6.

    11. (a) 6052 out of 12028 of the deaths occurred in the week before Christmas, or 0.503. (b) Usingthe Agresti-Coull confidence interval approximation for proportions, p' = (X + 2) / (n + 4), whereX is the number of successes (= 'died before Christmas') and n is the total number of trials (='died within a week of Christmas') = 6054 / 12032 = 0.503. Then the confidence interval is 0.494to 0.512. (c) This interval includes 50%, which is what is expected by chance. There is nostatistical support for the belief that the living hang on until that special day.

    12. (a) We can calculate the P-value by summing the probabilities of 13, 14, 15, or 16 resin clades

    having more species, then multiplying by two. 313 )5.01(5.01316

    ]13Pr[

    = = 0.0085. Summing

    the probabilities for 13 through 16, then multiplying by two, P = 0.021. (b) This is anobservational study, not an experimental study. We did not randomly assign clades to havelatex/resin. It is possible that clades with resin have higher diversity not due to the resin, but dueto some third factor that affects both resin and the number of species.

    Chapter 81. (a) There would be six categories, one for each outcome, and so five degrees of freedom, since

    no parameters are estimated. (b) There are 11 categories (from 0 heads to 10 heads), and tendegrees of freedom, since no parameters are estimated. (c) There are now nine degrees offreedom, since p must be estimated from the data. (d) There would be five categories (0 to 4insect heads per sample), and three degrees of freedom, since the mean number of heads persample would need to be estimated from the data.

    2. No, the tickets purchased on a given day are not all independent; for example a single personcould buy multiple tickets at one purchase. Therefore this is not a random sample as presented.

    3. To test whether nematodes are distributed at random among fish, we can see whether the data fitthe Poisson distribution. The mean number of parasites per fish is

    X = 103 0( ) + 72 1( ) + 44 2( ) +14 3( ) + 3 4( ) +1 5( ) +1 6( ) = 0.94538.

  • The expected values from a Poisson distribution with this mean are given below, from theproportion expected from the Poisson distribution times the total sample size, n= 238:

    Number ofparasites

    Observed Expected

    0 103 92.471 72 87.422 44 41.323 14 13.024 3 3.085 1 0.586 1 0.09

    The expected values for 5 and 6 are below one, so we will combine 4 through 6 into onecategory.

    Number ofparasites

    Observed Expected

    Observed Expected( )2Expected

    0 103 92.47 1.201 72 87.42 2.722 44 41.32 0.173 14 13.02 0.074 5 3.75 0.42

    2 = 4.58. There are five categories, one parameter estimated, so df = 3.

    0.05,32 = 7.81. 4.58 > 10.83, the critical value for P = 0.001, so we can reject the hypothesis thatthe cast and crew suffered the population-wide cancer mortality rate, P< 0.001.

    10. The average admissions are 20 per night. What is the probability of five or fewer admissions?Assume that each admission is independent (no riots, please). Then, they should be Poissondistributed. To find the probability of a quiet night, we sum the probabilities of 0 to 5admissions.

    admissions probability0 2.06E-091 4.12E-082 4.12E-073 2.75E-064 1.37E-055 5.50E-05

    Total probability = 7.2 10-5. Best to keep that coffee brewing: you'll need to work 58 years onaverage before you'll have a quiet night!

    Chapter 91. (a)

  • observed expected

    Observed Expected( )2Expected

    capture escape total capture escape total capture escape totalwhite 9 92 101 white 50.25 50.75 101 white 33.86 33.53blue 92 10 102 blue 50.75 51.25 102 blue 33.53 33.20total 101 102 203 total 101 102 203 total 134.132 = 134.13 >> 10.83 (critical value for P = 0.001 for 1 df), so P < 0.001. The two types ofpigeons differ in their rate of capture.(b) Odds of capture for white rumped: (9 / 101) / (92 /101) = 0.098.Odds of capture for blue rumped: (92 / 102) / (10 / 102) = 9.2Odds ratio: 0.098 / 9.2 = 0.01For the 95% confidence interval, we use the log odds ratio, ln(0.01) = -4.54.

    SE[ln( O R)] = 19 +1

    92 +1

    92 +1

    10 = 0.48.For the 95% confidence interval, we use Z = 1.96.

    4.54 1.96 0.48( ) < ln(OR) < 4.54 +1.96 0.48( )

    5.49 < ln(OR) < 3.60.004 < OR < 0.027

    2.

    observed expected

    Observed Expected( )2Expected

    one multi total one multi total one multi totalmalar 69 20 89 malar 76.77 12.23 89 malar 0.79 4.94healthy 157 16 173 healthy 149.23 23.77 173 healthy 0.40 2.54total 226 36 262 total 226 36 203 total 8.672 = 8.67 > 7.88, the critical value for P = 0.005 for 1 df, (but less than 10.83, the critical valuefor P = 0.001), so P < 0.005. Infected and uninfected mosquitoes differ in the probability ofmultiple blood meals.

    3. (a) The odds of the second suitor being accepted if the first was eaten are 0.5, while the odds ofthe second suitor being accepted if the first escaped are 22.0. There is a much higher probabilityof the second suitor being rejected if the first was eaten. (b) There are only four cells, so if anycell has an expected frequency of less than five, the assumptions of the 2 contingency test willbe violated. In this case, it seems likely that one cell will less than five, so it would be necessaryto use Fisher's Exact Test in this case.

    4.

    observed expected

    Observed Expected( )2Expected

    firerevfire white total fire

    revfire white total fire rev fire white total

    leave 18 6 0 24 leave 8 8 8 24 leave 12.5 0.5 8 21stay 2 14 20 36 stay 12 12 12 36 stay 8.33 0.333 5.33 14total 20 20 20 60 total 20 20 20 60 total 20.8 0.833 13.3 35

  • 2 = 35.0, for 2 df, (since there are two rows, three columns, so (2-1)(3-1) = 2 df. 35.0>13.82, thecritical value for P = 0.001 for 2 df, so P < 0.001. Yes, there is evidence that reed frogs reactconsistently to the sound of fire.

    5.observed expectedjuv m juv f total juv m juv f total

    adultm 1 11 12 adult m 3.82 8.18 12adult f 6 4 10 adult f 3.18 6.82 10

    7 15 22 total 7 15 22(a) Two of the expected values are less than 5, so we cannot use a 2 contingency analysis.Fisher's exact test is a good alternative. (b) The effects is in the expected direction.

    6. (a) Proportions of kids in each TV viewing class having violent records eight years later, andconfidence intervals (using Agresti-Coull calculations from chapter 7)

    record total proportion p' low CI high CI1hr 5 88 0.057 0.0761 0.022 0.1301-3hr 87 386 0.225 0.2282 0.187 0.2703+hr 67 233 0.288 0.2911 0.233 0.349

    (b) Contingency table for test of independence of TV watching and later violent record.

    observed expected

    Observed Expected( )2Expected

    classnorecord record total class

    norecord record total class

    norecord record total

    1hr 83 5 88 1hr 68.21 19.79 88 1hr 3.21 11.051-3hr 299 87 386 1-3 hr 299.2 86.81 386 1-3 hr 0.0001 0.00043+ hr 166 67 233 3+ hr 180.6 52.4 233 3+ hr 1.18 4.068

    548 159 707 548 159 707 19.52 = 19.5, for 2 df, (since there are three rows, two columns, so (3-1)(2-1) = 2 df. 35.0>13.82, thecritical value for P = 0.001 for 2 df, so P < 0.001. There is a relationship between watching TVand future violence: those that watched less than one hour TV were less likely to have a record,while those that watched three or more hours were more likely to have a record.(c) No, this does not prove that TV watching causes aggression in kids. This was anobservational study, not experimental, so we do not know if kids watching more TV have otherfactors in common (e.g. lower parental supervision, etc.).

    7.

    observed expected

    Observed Expected( )2Expected

    arrest healthy total arrest healthy total arrest healthy totalabstainers 12 197 209 abstainers 10.70 198.30 209 abstainers 0.16 0.01drinkers 9 192 201 drinkers 10.30 190.70 201 drinkers 0.16 0.01total 21 389 410 total 21 389 203 total 0.34

  • 2 = 0.34, for 1 df
  • 0.05, so from 0.172 to 0.362. Taking the exponent of the confidence interval for the ln odds ratio,we find that the 95% confidence interval for the odds ratio is from 1.19 to 1.44.

    Chapter 101. From statistical table B. (a) 0.09012 (b) 0.09012 (c) Pr[Z > -2.15] = 1 - Pr[Z < -2.15] = 1 - Pr[Z >

    2.15] = 1-0.01578 = 0.9842. (d) Pr[Z < 1.2] = 1 - Pr[z > 1.2] = 1- 0.11507 = 0.8849 (e) Pr[0.52 1.57]) - Pr[Z > 0.32] = (1 -0.05821) - 0.37448 = 0.56731

    2. (a) To determine the proportion of men excluded, we convert the height limit into standardnormal deviates. (180.3 - 177.0) / 7.1 = 0.46. Pr[Z > 0.46] = 0.32276 - roughly one third ofBritish men are excluded from applying. (b) (172.7 - 163.3) / 6.4 = 1.47. Pr[Z > 1.47] = 0.07078- which is the proportion of British women excluded. 1 - 0.0708 = 0.9292 proportion of Britishwomen acceptable to MI5. (c) (183.4 - 180.3) / 7.1 = 0.44 standard deviation units above theheight limit.

    3. (a) B is most like the normal distribution. A is bimodal, while C is skewed. (b) All three wouldgenerate approximately normal distributions of sample means due to the Central Limit Theorem.

    4. (a) Pr[weight > 5 kg]? Transform to standard normal deviate: (5 - 3.339) / 0.573 = 2.90. Pr[2.90< Z] = 0.00187.(b) Pr[3 < birth weight < 4]? Transform both to standard normal deviates, and subtractprobabilities of the Z values. (3 - 3.339)/0.573 = -0.59. (4-3.339)/0.573 = 1.15. Pr[-0.59 < Z] = 1- Pr[0.59 < Z] = 1 - 0.2776 = 0.7224. Pr[1.15 < Z] = 0.12507. 0.7224 - 0.12507 = 0.59733.(c) 0.06681 babies are more than 1.5 standard deviations above, with the same fraction below, so0.13362 of babies are more than 1.5 standard deviations either way.(d) First, transform 1.5 kg into normal standard deviates: 1.5 / 0.573 = 2.62 standard deviations.Pr[2.62 < Z] = 0.0044. Since the distribution is symmetric, we multiply this by two to reflect theprobability of being 2.62 standard deviations above or below the mean: 0.0044 * 2 = 0.0088.(e) The standard error is the same as the standard deviation of the mean. It is equal to thestandard deviation divided by the square-root of n, or 0.573 /

    10 = 0.18 kg. To find theprobability that the mean of a sample of 10 babies is greater than 3.5 kg, we transform this meaninto a Z score. (3.5 - 3.339) / 0.18 = 0.89 standard deviations of the mean = 0.18673.

    5. (a) The right-hand graph has the higher mean (ca. 20 vs. 10), the left-hand has the higherstandard deviation (ca. 2 vs. 1). (b) The right hand graph has the higher mean (ca. 15 vs. 10),and the higher standard deviation (ca. 5, vs. 2.5).

    6. The standard deviation is approximately 10, as the region within one standard deviation from themean will contain roughly 2/3 of the data points.

    7. (a) In a normal distribution, the modal value occurs at the mean, so the mode is 35mm. (b) Anormal distribution is symmetric, so the middle data point is the mean, 35 mm. (c) Twentypercent of the distribution is less than 20 mm in size. (Why? Normal distributions are symmetric,

  • so if 20% of the distribution is 15 mm or larger than the mean, 20% must be 15 mm or smallerthan the mean.)

    8. (a) The right-hand distribution (II.) would more have sample means that had a more normaldistribution, because the initial distribution is closer to normal. Both distributions wouldconverge to a normal distribution if the sample size were sufficiently large. (b) The distributionof the sums of samples from a distribution will be normally distributed, given a sufficiently largenumber of samples.

    9. (a) The probability of sampling haplotype A is p = 0.3. We expect a sample of 400 fish to have0.3 400 A individuals on average, or 120. The standard deviation is approximately

    n p 1 p( ) = 400 0.3( ) 0.7( ) = 9.17. To find the probability of sampling 130 or more ofhaplotype A, we convert this into a standard normal deviate: (130 - 120) / 9.17 = 1.09; Pr[1.09 y) SE30 Z30 Pr(Y > y)

    14 5 15 1.58 0.63 0.26435 0.91 1.10 0.1356715 3 15.5 0.95 0.53 0.29806 0.54 0.91 0.18141-23 4 -22 1.26 0.79 0.21476 0.73 1.37 0.0853472 50 45 15.81 -1.71 0.95637 9.18 -2.96 0.99846

    Chapter 111. The 99% confidence interval must be larger than the 95% confidence interval, in order to have a

    higher probability of capturing the true mean.2. (a) The sample mean is 10.32 cm and the standard error of the mean is 0.056 cm. If we sampled

    this wolf population repeatedly, 68% of the estimates of the mean would lie between 10.32 0.056. (b) The 95% confidence interval for the mean is 10.32 (0.056 t0.05(2),34 ). t = 2.03, so the95% confidence interval is: 10.21 < < 10.44 cm. (c) The variance is 0.11005. There are 35individuals, so 34 df. The 2 critical values are looked up in Statistical Table A for = 0.01 / 2and = 1 - (0.01 / 2) are 58.96 and 16.50. We now calculate the lower bound of the confidenceinterval: 34 (0.11005) / 58.96 = 0.063. The upper bound is 34 (0.11005) / 16.5 = 0.227. 0.063 < 2< 0.227. (d) The 99% confidence interval of the standard deviation of the sample is found bytaking the square-root of the variance confidence interval: 0.252 < < 0.476, around theestimated standard deviation of 0.332.

  • 3. No, using n = 70 would assume that the right distance was independent of the left distance. Thisisn't likely, as most animals are symmetric, so we would expect similar distances on each side ofthe mouth.

    4. (a) To test whether the mean fitness changed, we will test whether the mean fitness after 100generations was significantly different from 0. For this, we will use a t-test. We need the samplemean, the sample standard error, and the sample size. Our hypothesized value from the nullhypothesis of no change in fitness, 0, is 0. The mean is 0.2724, the standard deviation is 0.600,and the sample size is 5. From this, we calculate the SE = s /

    n = 0.6 /

    5 = 0.268, and wecalculate the t-statistic: t = (mean - 0 ) / SE = (0.2724 - 0) / 0.268 = 1.02. This is a two-tailedtest, so we look up the critical value of (2) = 0.05 for 4 df, 2.78. t < tcrit = 2.78, so we do notreject the null hypothesis of no change in fitness. (b) The 95% confidence interval for the meanfitness change is 0.2724+2.78(0.268) : -0.47 < < 1.02. (c) We are assuming that fitness isnormally distributed and that the five lineages are random and independent.

    5. (a) The mean testes area for monogamous lines is 0.848, while the mean area for polyandrouslines is 0.950. The standard deviation was 0.031 and 0.034 for the two sets. (b) The standarderror of the mean is 0.015 for the monogamous lines and 0.017 for the polyandrous lines. (c) The95% confidence interval for the mean in the polyandrous line is 0.95 0.017t0.05(2),3 =0.95 3.18 (0.017), or 0.90 < < 1.00. (d) The 99% CI for the standard deviation of testes areaamong monogamous lines requires the sample variance (0.001), the degrees of freedom (3), andthe critical values of the 2 distribution for the appropriate and df. The 2 critical values arelooked up in Statistical Table A for =0.01 / 2 and 1 - (0.01 / 2) are 0.07 and 12.84. Then, the99% CI for the variance is 3 (0.001) / 0.07 and 3 (0.001) / 12.84, or 0.0002 to 0.04. To get the99% CI for the standard deviation, we take the square root of each of these, or 0.015 < < 0.20.

    6. (a) The 95% CI for the discontinuity score requires the estimated mean (0.183), the SE(calculated as the s divided by

    n , or 0.051), and the critical t value for (2) = 0.05 for 6 df,2.45. Then the CI is 0.18 (0.051 * 2.45) , or 0.058 < < 0.308. (b) To test whether the samplemean continuity score of 0.183 is consistent with the predicted mean, 0 = 0, we use the t-test. t= (mean - 0) / SE = (0.183 - 0) / 0.051 = 3.6. The critical value for (2) = 0.05 for 6 df is 2.45. t> tcrit, so we reject the null hypothesis that the mean discontinuity score is zero.

    7. In this case, we want to see if rats do better than the chance performance, so we are comparingtheir scores with 0 = 0.5. The mean is 0.684, the standard deviation is 0.071. There were sevenrats, so SE = 0.071 /

    7= 0.027. t = (0.684 - 0.5) / 0.027 = 6.82. For (2) = 0.05, tcrit for six df =2.45; t > tcrit so we reject the null hypothesis that rats were doing as expected by chance.(Moreover, we can show that P < 0. 002.) (For this we assumed that the seven rats were random,independent samples and that their performance scores were normally distributed).

    8. Mean coconut weight is 3.2 kg, upper 95% CI bound is 3.5 kg. The confidence interval for themean is symmetric, so the lower bound must be 0.3 kg below the mean, or at 2.9 kg.

    9. The confidence interval of the variance is the square of the bounds for the standard deviation.

  • Standard deviation Variance2.22 < < 4.78 4.93< 2
  • treatment, the confidence interval is 4.8 + (2.26 1.03): 2.47 < < 7.13. For the control, theconfidence interval is 0.51 + (2.26 0.28), or -0.13 < < 1.15. (b) The two-sample t-test cannotbe used test for differences in the means, since the standard deviations are more than three-folddifferent between the two groups. Instead, a Welch's approximate t-test is appropriate.

    5. Because these are not paired samples, we will analyze the difference of the means, not the meanof the differences. The monogamous flies had a mean testes size of 0.8475 mm2, the polyandrous0.95 mm2, for a difference of 0.1025 mm2. The 95% confidence interval for this differencerequires finding the SEY1bar - Y2bar ,= (sp2(1/n1 + 1/n2), where n1 and n2 = 4, and sp2 = (df1s12 +df2s22 )/ (df1 + df2), where df1 = df2 = 3, and s12 and s22 = 0.0010 and 0.0011 respectively. Then,SEY1bar - Y2bar = 0.023. The confidence interval for the difference is the standard error timest0.05(2),6 = 2.45, so 0.1025+ 2.45 0.23, or 0.046 to 0.159. (b) The null hypothesis is that there isno difference between the monogamous and polyandrous treatments in testes size, so (1 - 2)0 =0. t = the difference in means, 0.1025, over the SE of the difference, 0.23. t = 4.48, for 6 df. tcrit =3.71 for P = 0.01, so P < 0.01. The mean testes sizes are significantly different.

    6. (a) On average, 33% more of the male bodies were covered if they emitted pheromones. sp =26.5%;

    SE Y 1 Y 2 = 6.02%. df1 = 48 and df2 = 31, so df = 79. t0.05(2), 79 df = 1.99, so the confidenceinterval is: 21% <

    1 2 < 45%. (b) Using a two-sample t-test, we will assume that the percentcoverage is normally distributed, that each snake is independent, and that the standard deviationsare not different (they are not more than threefold different). The null hypothesis is that there isno difference between the males emitting pheromones and those not, so (1 - 2)0 = 0. t = 0.33 /0.0602 = 5.47 > 3.9, the critical value for (2) = 0.0002 for 79 df, so we can reject the nullhypothesis, with P < 0.0002.

    7. (a) The PLFA levels between the control and addition plots before cicada death were notsignificantly different (two sample t-test: t = -0.19, df = 42, P > 0.10). (b) After the cicadaaddition, the PLFA levels differed significantly: t = 2.89, df = 42, P < 0.01. (c) No: we areinterested in the effect of the cicada addition, so we need to look at whether the change in PLFAlevels differed between the treatments. The correct comparison is the mean change in the controlplots compared to the mean change in the addition plots. It is possible to have a situation inwhich test b was significant and test a was not, but the effect of cicada addition was notsignificant. For instance, the control plots may have been non-significantly lower in PLFA priorto cicada addition, and significantly lower after cicada addition, but the difference might not besignificant.

    8. As described, this test assumes that the eight "open water" samples were independent of the eight"near shore" samples, as it uses a two-sample t-test (and so would have 14 df). Differences ingrowth rate could be due to differences between lakes, so the two samples within each lake arenot independent. The paired t-test would better reflect this (and would have 7 df).

    9. (a) Since we assume that the distributions are normal, we can use the F-test to compare thevariances. The ratio of the variances (the larger variance is always in the numerator) is F =(0.1582)2/(0.0642)2 = 0.025 / 0.004 = 6.07. The degrees of freedom are 32 and 19. F0.05(1),32,19 =2.05 (between 2.07 for 30 df in the numerator and 2.03 for 40 df), so we reject the nullhypotheses that the variances are equal. (b) The variances are not equal, but the difference in thestandard deviations is not greater than threefold. Therefore a two-sample t-test could be used: t =

  • 0.28, df= 51, P > 0.05. You could use Welch's approximate t test rather than the two-sample t-test. t = 0.13, with 46 df, which is not significant.

    10. (a) Remember to multiply each value of flower length by the number of flowers in that category,when calculating the mean and variance (the zeros are dropped in the equation below):

    4 55( ) +10 58( ) + 41 61( ) + 75 64( ) + 40 67( ) + 3 70( )4 +10 + 41+ 75 + 40 + 3 = 63.5

    4 55 63.5( )2 +10 58 63.5( )2 + 41 61 63.5( )2 + 75 64 63.5( )2 + 40 67 63.5( )2 + 3 70 63.5( )2(4 +10 + 41+ 75 + 40 + 3) 1 = 8.6

    (b) For the simplest test, we will assume that distributions are normal. Then we can use the F-test to compare the variances. 42.4 / 8.6 = 4.93; 443, 172 df. Looking this up (roughly) we findthat the critical value is 1.21 (for 1000,200) and 1.31 (for 200,100) for (1) = 0.05, so weconclude that the variance for the F2 is significantly greater than the variance for the F1.

    11. Paired t-test: mean difference is 22.3, which is significantly different from 0 (P

  • III. No, this plot is right skewed. It looks log-normal rather than normal.IV. Yes, this is a normal distribution.(b) I. The sign test would be best for these data, as they are unlikely to transformed into anythingresembling normal.II. The sign test would probably be best for this distribution as well. Bimodal distributions aretough to transform into anything else.III. These data could probably be tested by a one-sample t-test after transformation (probably alog transformation, as it is right-skewed).IV. These data could be tested by a one-sample t-test as they are.(c) I: B (this explains why there is such a constant density of points in the quantile plots).II: D. (The two peaks correspond to the two curves in the quantile plots; these are also the areasof the highest density in the quantile plots).III. A. (The density is highest on the left, with a few points on the right scattered over a largerange on the x-axis).IV. C. (Sparse points at the extremes of the range; points falling along a straight line as expectedfor a normal distribution).

    3. (a) mean 2.75; 95% CI: -3.28 < log[x] < 8.78. (b) mean 1.86; 95% CI: -0.02 < log[x] < 3.74. (c)Not possible: cannot use ln transformation on negative values. (d) mean 4.23; 95% CI: -2.04