32
1 MATHEMATICAL METHODS IN BIOLOGY PART 2 EXERCISES Eva Kisdi Department of Mathematics and Statistics University of Helsinki © Eva Kisdi. Any part of this material may be copied or re-used only with the explicit permission of the author.

MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

1

MATHEMATICAL METHODS IN BIOLOGY

PART 2

EXERCISES

Eva KisdiDepartment of Mathematics and Statistics

University of Helsinki

© Eva Kisdi.Any part of this material may be copied or re-used only with the explicit permission of the author.

Page 2: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

2

EXERCISES 1-11: CALCULUS OF PROBABILITIES

1. Genetic risks. A couple is facing the risk that their children may suffer from agenetic disorder, because both the husband and the wife are known to be heterozygote(Aa) carriers of the harmful recessive allele a (the homozygote aa individuals areaffected by the disorder, all others are healthy). The couple plans to have twochildren, and wants to know the prospects for their health: What is the probability thatboth children will be healthy?

2. Marriage of relatives. Close relatives are not allowed to marry but if they do, theirchildren are very often affected by serious genetic problems. This is because most ofus carry 2-3 recessive lethal genes and some more that are not lethal but harmful. Inunrelated people, these harmful recessive alleles are most likely in different loci sothat their children are healthy heterozygotes. Descendants of a single person mayhowever carry harmful recessives in the same locus; and if they marry, their childrencan be recessive homozygotes.

The figure shows a pedigree with marriage between cousins. A and B are the sharedgrandparents; C and D are sisters, who marry unrelated persons; E and F are cousins;and G is their child. a denotes a harmful recessive allele present in A. Calculate theprobability that G inherits a from both parents and is therefore a recessivehomozygote aa exhibiting the symptoms of the disorder caused by a. Next, calculatethe probability that G is not a recessive homozygote for any of the 3 unlinked harmfulalleles that A carried and also not for another 3 unlinked alleles that were present inB. (In reality we cannot know that A and B carry exactly 3 harmful alleles each, butthis illustrates the probability of having a healthy child in a cousin marriage withroughly realistic numbers.)

G

E F

C D

A B

G

E F

C D

A B

Page 3: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

3

3. Did Mendel cheat? Mendel found that the seven traits of garden pea he studiedwere inherited independently, i.e., in modern terms as if all seven traits were onseparate chromosomes. The garden pea happens to have seven chromosomes.Assuming that the chromosomes are equally long, calculate the probability that sevenrandomly picked loci are all on different chromosomes.

Actually, this is not the case. The seven traits are seed shape (smooth/wrinkled); seed colour(yellow/green); seed coat colour (white/coloured); pod shape (smooth/constricted); pod colour(yellow/green); flower position (terminal/along stem); plant height (long/short). Of these, seedcolour and seed coat colour are on chromosome 1, pod shape, flower position and plant heightare on chromosome 4, and only the other traits are on separate chromosomes. The loci on thesame chromosome are however far apart, so that they are inherited independently, except podshape and plant height. Mendel did not perform all possible dihybrid crosses and so hehappened not to investigate the pod shape - plant height dihybrid cross. (How many pairs oftraits are there to test independent inheritance for?)

4. Population genetics of sex-linked traits. Drosophila males have a single Xchromosome whereas females have two. White eye is a recessive X-linked trait ofDrosophila. Suppose that we cross white-eyed females with wild (red-eyed) males.

Calculate the frequency of genotypes and the frequency of the white allele (p) in thefemales and in the males of the next generation. Next, suppose that we let the femalesand males born from the previous cross mate randomly among themselves. Plot thefrequency of the white allele in females and in males for several generations, andcalculate the equilibrium frequency of phenotypes.

5. Test of independence. An ecologist collects presence-absence data of two differentspecies of plants in sample quadrats, and wants to know whether the plants occurindependently. The data are

(a) Both plants: 25% of quadrats, plant A only: 25%, plant B only: 5%(b) Both plants: 15% of quadrats, plant A only: 35%, plant B only: 15%

6. Total probability. About 33% of African American people develop high bloodpressure during their lives, whereas in people of Caucasian origin, this occurs onlywith probability 25%. In a town where 40% of people are African Americans and therest are Caucasians, what percentage of people will need care for high blood pressure?

7. Blood transfusion. Current medical protocols of blood transfusion require matchingblood types of the AB0, Rh+/-, and also other minor blood groups, but before bloodgroups were discovered, transfusion was risky due to the blood groupincompatibilities: If the donor has an antigen (A or B in the AB0 system) that therecipient does not have, then the recipient's body produces an immune reaction that iseasily fatal. In the AB0 system, people with blood group 0 or A do not have antigen Band therefore may not receive B or AB blood; similarly people with blood group 0 orB may not receive A or AB blood. Blood of type 0 can be given to anyone and peopleof blood group AB may receive any blood.

Page 4: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

4

In Finland, the frequencies of blood groups are

A 44.2% B 16.6%AB 8.1% 0 31.2%

Calculate what would be the probability that a blood transfusion is fatal if the AB0blood groups were not known.

8. If a rare mutation is present with frequency q in a large population, how manyindividuals does one need to sample in order to be 99% sure that the sample containsat least one mutant for investigation?

9. Eigen's paradox. In prebiotic conditions, where polynucleotides replicated withoutenzymes, the probability of mutation per nucleotide could not be less than 10-2. Asequence must produce at least one mutation-free copy out of ca 5 copies if it is to bemaintained. What is the maximum length of a sequence (primitive "genome") that canbe copied faithfully enough?

The answer is a number much less than the length of the smallest genome.This is known as Eigen's paradox: The primitive genome is not long enoughto code for an enzymatic replication system, but without enzymaticreplication, the genome cannot be longer!

10. The Luria-Delbrück fluctuation test. This test is a simple method to estimatemutation rates in bacteria. Suppose we want to measure the rate of mutation that givesresistance against some toxic material. First, we inoculate 20 test tubes with a smallnumber of non-resistant (wild type) bacteria, and grow the cultures in normal mediumto 108 cells/ml. Then we take a 0.1 ml sample of each of the 20 cultures. The samplesare spread on plates that contain the toxic substance, hence only resistant bacteria cangrow. Out of the 20 samples, we find that 11 samples did not contain any mutant (noresistant bacteria found). Calculate the mutation rate.

Hints: figure out the number of bacteria in the sample. Because each tube started witha small number of bacteria but ended with a large number of them, almost everybacterium in the sample is the product of a cell division: hence the number ofdivisions is approximately the same as the number of cells. If m is the probability ofmutation in one division, you can calculate the probability that no mutation hasoccurred in any of the divisions leading to the cells in the sample; and this must matchthe fraction of samples found to be mutation-free.

11. The birthday problem. Calculate the probability that (at least) two persons sharetheir birth days in a group of (i) 5 people; (ii) 30 people. Hint: calculate first theprobability of the opposite, i.e., that no two persons have the same birthday. Theprobability that my birthday is different from yours is 364/365. Optional: find the size

Page 5: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

5

of the group such that with 99% probability, there are two persons with the samebirthday.

EXERCISES 12-14, A1: BAYES' THEOREM

12. Rare disease screening. A medical test picks out a disease in 100% of the caseswhen it really occurs, but also produces positive results in 5% of healthy people (falsepositives). The disease is known to affect 0.1% of the population. Your test comesback positive. What is the probability that you really have the disease?

13. Prior vs posterior probabilities. The frequency of identical (monozygotic) twinbirths among all human births is about 0.3% and is fairly constant over time andacross populations. The frequency of fraternal (dizygotic) twins was about 1.7% ageneration ago in the US.

(a) Calculate the prior probability that a pair of twins is monozygotic and the posteriorprobability that they are monozygotic if we know that they are of the same sex.

(b) The frequency of fraternal twins is increasing (the present value in the US isaround 3%). How does this modify the prior probability of twins being identical?How does the posterior probability change if the prior decreases?

14. Bayesian statistics. In this problem, we consider a simple coin-tossing experimentfor clarity. However, the same kind of statistics is now used widely fromphylogenetics to artificial intelligence. In principle, it can be used any time when wewant to estimate parameters from experimental data.

Let q be the probability that when tossing a coin, we get a tail, and let 1-q be theremaining probability of getting a head. The coin may be fair (which gives tails andheads equally often, i.e., q=0.5) but may also be loaded either in favour of tails(q>0.5) or in favour of heads (q<0.5). We want to estimate parameter q of a givencoin. To this end, we toss our coin 6 times. Suppose that we get 6 tails in a row. Thetraditional estimate would then be q=6/6=1, predicting that this coin will always givea tail. But we are not happy with this point estimate, because we had a pretty strongbelief that our coin is fair (q=0.5). The Bayesian way of formalizing our prior "belief"is to specify a prior probability of the coin being fair and also the probability ofdeviating from fairness to different degrees.

For simplicity, here we assume that there are only three types of possible coins:

fair coins (q=0.5)tail coins (q=1), andhead coins (q=0)

[This simplifying assumption is of course unrealistic for the coin experiment; we canrelax this assumption later.]

Page 6: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

6

(a) Our confidence in having a fair coin is formalized in the prior probabilities

P(fair coin) = 0.98P(tail coin) = 0.01P(head coin) = 0.01

Calculate the posterior probability of having a fair coin given that we have tossed 6tails out of 6 trials. Calculate the posterior probabilities of head and tail coins, too.

(b) Repeat (a) with the uniform distribution as prior,

P(fair coin) = 1/3P(tail coin) = 1/3P(head coin) = 1/3

Observe how the posterior probabilities change due to changing the prior when yougo from the uniform prior to the prior in (a).

The uniform prior is often called the "uninformative prior", which does not"bias" the resulting posterior probabilities, and is often used for this reason.This view is however questionable; knowing that all coin types are equallylikely is information just as knowing it otherwise. Typical coins in our pursesare much closer to fair than to "tail" or to "head", so in this particular example,the "uninformative" prior is not a very sensible choice. Sometimes the prior isevident from the context (for example, if the prior has to specify theprobability that a randomly chosen person is male or female, then 50-50% willbe taken as prior) whereas at other times the prior is fully ad hoc - but it doesmatter.

(c) What happens to the posterior probabilities if we exclude a possibility in the prior,e.g. we assume P(tail coin)=0? What happens if we have unshakeable faith in the coinbeing fair, i.e., we use the prior P(fair coin) = 1?

A1. Bayes factor. The bacterium Wolbachia is an intracellular parasite of insects,which transmits also via the eggs, and which has the ability to distort the sex ratio ofthe offspring such that Wolbachia-infected females have more daughters than sons.(This is a clever trick on the part of Wolbachia: the host's sons are dead ends forWolbachia because it cannot infect their sperm, but daughters of the host can transmitWolbachia to the next generation through their eggs.) Suppose that infection by aparticular strain of Wolbachia leads to an offspring sex ratio 80% daughters, 20%sons. We believe that Wolbachia is not common in our population, so thatwe give only 9% prior probability that a female would be infected. Additionally, wegive 1% probability that some other factor, like a sex-linked meiotic drive system,distorts the sex ratio to an unknown degree; and we give 90% probability that afemale produces equal sex ratio in her offspring. Hence by the prior probabilities, weare 10 times more sure that the sex ratio is unbiased relative to that the female isinfected with Wolbachia (90%/9%). Investigating only one female, we find that shehas 21 female and only 9 male offspring. How does this change our belief? And whatif the data were 26 female and 4 male offspring?

Page 7: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

7

EXERCISES 15-22: BINOMIAL AND POISSON DISTRIBUTIONS

15. Cohort survival. n = 5 birds are born in the same year. Each bird survives one yearwith probability 0.7, and they are independent of one another.

(a) Plot the distribution (P(k) against k) of the number of birds alive after 1 year; 3years; and 10 years.

(b) How long do we have to wait to be 95% sure that none of the birds is alive?

Hint: use Excel or similar software for the calculations in (a). In Excel, thefactorial n! is computed by FACT(n). If working with a calculator, computeP(k) after 1 year and outline how the rest could be done.

16. Offspring number of highly fecund organisms. A tree may produce tens tohundreds of thousands of seeds during its life, but in an expanding (!) population, onaverage only 1.1 of its seeds survives to become an established mature tree. Theprobability of survival is therefore very small, and the number of surviving seeds pertree follows a Poisson distribution with expectation 1=l .1.

Calculate the probabilities that a tree has 0, 1, 2, 3, ..., k surviving seeds. Plot the dataas a histogram. Take k high enough such that the probability of having more than kseeds is less than 1%.

17. Compare the binomial and Poisson distributions. Use Excel or similar software tomake a histogram of the binomial probabilities (i.e., plot P(k) against k) using theparameters n = 10 and p = 0.25. Then increase n and decrease p such that you keepthe expectation 5.2== npl constant. Prepare a series of histograms of the binomialdistribution with increasing n, and compare them to the histogram of the Poissondistribution with parameter 5.2=l .

18. Stochastic behaviour of membrane channels. A membrane channel, when open,goes closed at a rate a , and when closed, it goes open at a rate b . Consider first avery large number N of channels. Let x(t) be the probability that a channel is open attime t, such that the number of open channels is Nx(t).

(a) Verify that the number of open channels changes according to the differentialequation

NxxNdt

dNx )1( -+-= ba

and since N is a constant, the probability of being open changes according to

Page 8: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

8

)1( xxdtdx

-+-= ba

Find the equilibrium probability of the channel being open (this is the equilibriumfraction of open channels), and evaluate it assuming ab 2= (opening is twice as fastas closing).

(b) Suppose a cell has only n = 18 channels. Calculate the probability that atequilibrium, k = 12 of these are open.

19. Genetic mapping. To establish the distance between two genes on a chromosome,one performs a testcross of double heterozygote and recessive homozygote parents:

abab

abAB

´

The offspring obtained from this cross are partly non-recombinant (having either bothdominant alleles A and B or neither dominant allele) or recombinant (having only A oronly B). Recombinant offspring are produced when there is a crossover between theloci during the meiosis of the double heterozygote parent.

Crossover can occur at any base pair, and therefore the potential number of crossoversis very large. However, the probability of a crossover at any given base pair is small,so that the actual number of crossovers is only a few, and the number of crossoversfollows a Poisson distribution. The expectation of the Poisson distribution, l , isproportional to the physical distance between the two loci.

If the double heterozygote parent had no crossover at all between loci A and B, alloffspring will be non-recombinant. Somewhat surprisingly, any nonzero number ofcrossovers results in 50% recombinant offspring (see

http://www.ncbi.nlm.nih.gov/books/NBK21819/figure/A1116/for a figure demonstrating this fact).

(a) Derive the fraction of recombinant offspring, RF = # of recombinant offspring / #of all offspring x 100%, and plot it as a function of the physical distance measured byl .

(b) Traditionally, a map unit is defined such that 1 map unit corresponds to 1%recombinant offspring. Argue that this definition cannot be extended to large mapdistances associated with high percentages of recombinant offspring; in other words,finding 30% recombinant offspring does not imply that the two loci are 30 timesfarther apart than two loci that exhibit 1% recombinant offspring.

(c) Based on (a), calculate l from the measured fraction of recombinant offspring,RF. Show that a distance of 100)2/( ×l map units is consistent with the traditionaldefinition for small distances.

Page 9: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

9

100)2/( ×l is called the distance in corrected map units. Calculating l fromRF and using the formula 100)2/( ×l yields the true map distance also forlarge distances, and eliminates the problem that several crossovers can happenbetween genes far apart yet these do not increase the fraction of recombinantoffspring compared to a single crossover (cf the result in (a)).

For part (c), use the approximation xe x -»- 1 or xx -»- )1ln( , which holdsas long as x is small; you can prove this approximation by simpledifferentiation. Alternatively, you can demonstrate (c) just with numericalexamples: show that 100)2/( ×l gives roughly the % value of RF if RF waslow, but not if RF was high.

20. Generalizing the Skellam model to perennial plants. In the lecture, we constructedthe model

)exp(11 tt xx a--=+ (1)

for the fraction of living sites x occupied by an annual plant species with per capitaseed number a .

(a) Construct an analogous model for a perennial species, which matures at age 1 andsurvives each subsequent year with probability p (the model above is the special casep = 0). Adult plants are competitively superior to seedlings, i.e., a seed cannot survivein a site occupied by a surviving plant.

(b) Find the equilibrium fraction of occupied sites and using Excel, plot it as afunction of fecundity (a ) for several different values of p.

Hint: it is not possible to solve the equilibrium equation explicitly. Instead,solve the equation for a and plot a as a function of the equilibrium fractionof occupied sites; and then swap the axes to plot the equilibrium fraction ofoccupied sites as a fraction of a .

21. Coexistence by the competition-colonisation trade-off. Generalise the Skellammodel in equation (1) above to the case of two species. Both species are annual.Species 1 with fecundity a is competitively superior to species 2 with fecundity b ,i.e., if a site contains seed(s) of both species, it will be occupied by species 1.

(a) Construct the model equations for the fraction of sites occupied by the superiorand by the inferior species ( 1+tx and 1+ty ), respectively.

(b) As in the original Skellam model, the superior species has a positive equilibriumand therefore is said to be viable if 1>a . Find the condition for the inferior species tobe viable in the presence of the superior species, i.e., find the parameter region ),( bawhere the two species coexist.

Page 10: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

10

22. The Nicholson-Bailey model of host-parasitoid systems. For a parasitoid, a hostindividual is analogous to a living site for a seed in the Skellam model. Assume thateach parasitised host can support the development of B parasitoid larvae, andparasitoid mothers deposit at least B eggs such that each attacked host will indeedrelease B parasitoids. Parasitised hosts die without reproduction, whereas non-parasitised hosts produce F offspring each. Both hosts and parasitoids are annual.

(a) Calculate the probability that a host individual avoids attack assuming that boththe number of hosts ( tH ) and the number of parasitoids ( tP ) are large, and are of thesame order of magnitude (i.e., the ratio of tH and tP is neither very large nor verysmall).

(b) Construct the model of population dynamics, i.e., write down the equations for1+tH and 1+tP in terms of tH and tP .

No one has been able prove what is the long-term behaviour of this model: ithas an equilibrium which is always unstable, and periodic (or quasi-periodic)solutions are not known. You may want to investigate the model numerically,by iterating the host and parasitoid densities from year to year and plotting tPagainst tH .

EXERCISES A2-A5, 23-26: MEAN AND VARIANCE

A2. The table below contains the heights (in cm) and weights (in kg) of five persons.Calculate the average height, the average weight, the covariance between height andweight, and the correlation coefficient in this sample.

height (ξ) 169 175 182 179 177weight (h ) 65 80 76 87 82

Remark: this exercise is for illustration only. Here the sample size (5) is verysmall, and therefore the average height and weight is not necessarily a goodestimate of the true mean height and weight (as it would be in an infinitelylarge sample). This is a problem when we use the average in place of the truemean to calculate variances and covariances (the variance and covarianceformulas contain the true mean, but we have only the average of the sample).There is a way to correct for this problem, discussed in statistics courses.

A3. We cast a die infinitely many times. Calculate the average of the numbers shown.

A4. Suppose we are monitoring how a pair of birds feed their offspring in the nest. Byplacing a scale under the nest and measuring its weight continuously, we can detectwhen a parent arrives to feed the chicks (the weight of the nest increases by theweight of the parent plus food) and measure how much food s/he brought (when the

Page 11: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

11

parent leaves, the weight of the nest decreases by his/her weight, the food remains).The two parents have different weights, so we also know which parent brought howmuch food. From the data, we calculate that the mother brings on average 24 g foodper day, with variance 2.3 g2 (some days she brings more, some days less), whereasthe father brings on average 21 g food per day with variance 3.1 g2. We assume thatthe two parents are independent.

The food the chicks get is, of course, the sum of what the mother and what the fatherbrings, so if ξ1 and ξ2 denote respectively the amount of food what the mother and thefather brings in the next day, then the chicks get a total of ξ1 + ξ2.

(a) What is the mean amount of food the chicks get per day, and what is its variance?(b) If there are 9 chicks in the nest and each gets exactly the same amount of food,what is the mean and variance of the food one chick gets?(c) Do you see any reason why the amount of food brought by the mother and theamount brought by the father may not be independent? Try to judge whether thiswould increase or decrease the variance of the amount the chicks get.

A5. In this exercise, we prove that the binomial distribution with parameters n and phas mean np and variance np(1-p), and then we use this result to obtain also the meanand variance of the Poisson distribution. (The expectation np makes immediateintuitive sense: if we repeat n times and the outcome is "yes" in a fraction p of therepeats, then we expect to get np "yes" outcomes. The variance is perhaps lessobvious.)

(a) Let ξ denote a random variable that takes value 1 with probability p and 0 withprobability 1-p (as with flipping a coin, head is 1 and tail is 0; Bernoulli distribution).Show that E(ξ) = p and V(ξ) = p(1-p) (use the definition).

(b) Let ξ1, ξ2, ..., ξn denote n random variables, each taking value 1 with probability pand 0 with probability 1-p, and each independent of all the others. Convince yourselfthat the sum of these, nxxx +++ ...21 is binomially distributed with parameters n andp.

(c) Prove that the sum nxxx +++ ...21 , i.e., the binomial distribution, has theexpectation np and variance np(1-p). (Use the boxed text in A4 above.)

(d) Now imagine n becoming very large and p very small, but np = λ fixed, so that thebinomial distribution becomes the Poisson distribution. What will be the mean and thevariance?

If we add two random variables, ξ1 and ξ2, then the mean of this sum is (always)the sum of the means, E(ξ1 + ξ2) = E(ξ1) + E(ξ2). The variance of the sum ξ1 + ξ2 isthe sum of the variances, V(ξ1 + ξ2) = V(ξ1) + V(ξ2), but only if ξ1 and ξ2 areindependent. Similar rules apply also when we add more than two randomvariables.

Page 12: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

12

23. h is the average of 10 random numbers that we generate by casting a die. What isthe expectation and the variance of h ?

24. Let η1 and η2 denote the body weight we measure on a randomly chosen pair ofidentical twins. The body weight is the sum of a baseline weight, the effects of genes,and the effect of the environment. Identical twins have the same genes but have(partly) different environmental effects. Their phenotypes are therefore given by

22

11

exh

exh

++=

++=

c

c

where c is the constant baseline weight, ξ is the genetic value (same for both) and ε1,ε2 are the environmental deviations. ξ, ε1, and ε2 are independent, ε1 and ε2 areidentically distributed, and the mean of ε1 and ε2 have been scaled to zero (seelecture).

(a) Calculate the covariance and the correlation coefficient between η1 and η2.

(b) The quotient )(/)( hx VV is called the (broad-sense) heritability, which tells whatfraction of the observable phenotypic variance ( )(hV ) is due to genetic effects. Basedon (a), suggest a practical way to measure the heritability of body weight; andcomment on whether this measurement is correct for the entire (non-twin) population.

25. Variance vs the "accuracy" of measurement. Suppose we would like to measurethe frequency p of a certain trait (genotype, disorder, etc.) in a population. To this end,we take a sample of n individuals and count the number ξ of those in the sample whohave the trait.

(a) Describe the conditions under which ξ is a binomially distributed random variable.

(b) One has the intuitive feeling that with larger sample size n, variation shouldsomehow dampen and our measurement should become more accurate. Does thevariance of ξ become smaller as n increases? Or the standard deviation,

)()( xx VD = ? Or the coefficient of variation, )(/)( xx EDc = ? Or the variance ofthe estimated frequency, )/( nV x ?

26. Chemical reactions: how many molecules are "infinitely many"? Suppose thatthere are N enzyme molecules and a large number M inhibitor molecules present in awell-mixed system. The corresponding concentrations are x for the free enzyme, y forthe enzyme-inhibitor complex and z for the free inhibitor. The enzyme binds theinhibitor at rate a and the inhibitor dissociates from the enzyme-inhibitor complex atrate b .

If N is sufficiently large, we can model this system with the differential equations

Page 13: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

13

yzxdtdy

yzxdtdx

ba

ba

-=

+-=

We assume throughout that z is constant; if M is very much larger than N, then thechange in the number of free inhibitor molecules is negligible even if all N enzymemolecules happen to bind an inhibitor.

Consider now the (realistic) case that N is not very large, and denote the number offree enzyme molecules (a random variable) with x . The number of enzyme-inhibitorcomplexes is then x-N .

(a) Show that x is binomially distributed and calculate its mean and variance inequilibrium.

(b) Say that the variation in x and in x-N is negligible and the deterministic ODEmodel is applicable if the coefficient of variation of both x and of x-N are less than0.01. How large should N be to achieve this? (Recall that the coefficient of variationis )()()( xxx EVc = .)

EXERCISE 27: EXPONENTIAL DISTRIBUTION

27*. How do stupid animals forage optimally? (Based on Adler & Kotar (1999),Evolutionary Ecology Research 1:411-421.)

Assume that an animal arrives in a fresh patch of resource at time t=0, and thefunction ( ) 1 tg t e b-= - gives the amount of resource it has consumed by time t (thetotal resource content of a patch is scaled to 1). The animal leaves the patch at aconstant rate a , such that the time spent in the patch t is exponentially distributedwith probability density function ( ) tf t e aa -= . (It is of course stupid to leave thepatch too early when it still has a lot to consume; it is also stupid to stay too longwhen the patch is depleted. But a simple organism may be unable to judge time orresource level, and may leave at a constant rate.)

(a) Calculate the expected amount of resource eaten before leaving. (This is given byan integral that you can calculate explicitly.)

(b) Denote the expected amount of resources eaten (calculated above) by A. The long-term energy intake over many rounds of foraging in a patch and travelling to a newpatch is A / (S + T), where S is the expected time spent in the patch and T is theexpected travel time to a new patch. S depends on the leaving rate a ; as we discussedin Part 1 of the course, a/1=S . T is independent of what the animal does within thepatch and we assume it to be known.

Find the optimal value of a , i.e., the value for which A / (S + T) is the greatest.

Page 14: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

14

EXERCISES 28-32: NORMAL DISTRIBUTION

28. To be a pilot-astronaut, NASA requires that the candidate's height is between 163cm and 193 cm. The height of people is normally distributed, the mean height of menin the US is 175 cm with variance 35 cm2. What is the probability that a randomlychosen male US citizen meets NASA's height requirement?

29. A random variable x follows the standard normal distribution N(0,1). Find thenumber u such that x falls between -u and u with probability 95%.

30. Truncation selection in animal breeding. The distribution of body weight isnormal. An animal breeder wants to select the heaviest animals for reproduction in apopulation where the mean weight is 10 units and the variance is 5 unit2. This iscommonly done by truncation selection: All animals above a threshold T of weightare bred, whereas those below T are not allowed to reproduce (see figure below). Thebreeder, however, must reproduce a certain fraction of the population in order tomaintain the number of animals. This fraction depends on fecundity (if one animal hasa lot of offspring, then a few parents are enough to produce the next generation;otherwise more parents are needed to produce as many offspring as many animals thebreeder had in the initial population).

Calculate T if the breeder has to select for reproduction

(a) 5% of the animals(b) 60% of the animals

31. Confidence interval of an average. We measure a certain random variable x in asample and calculate the sample average x . Clearly, if we calculated the average inanother sample, we would obtain a somewhat different result for x , i.e., x itself is arandom variable. The question is, how reliable the sample average x is as an estimateof the true mean.

Denote the true mean with m and the variance of x with V (here we assume that V isknown). Let n denote the number of individuals in the sample. We assume that thesample is relatively large (n is not too small).

T trait

prob

abilit

yde

nsity

fractionreproduced

T trait

prob

abilit

yde

nsity

fractionreproduced

Page 15: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

15

(a) Show that the transformed random variablenV /mx - follows the standard normal

distribution N(0,1).

(b) Find the number u such that ( ) 0.95P u z u- £ £ = (use the result of exercise 26above).

(c) Find numbers c1 and c2 such that 1 2( ) 0.95P c cm£ £ = , if the sample average is164x = , the known variance of x is 62V = and the size of the sample is 30n = .

The interval [ ]1 2,c c is said to be the confidence interval for the unknown true meanm , with confidence level 95%.

There are two important points to mention here. First, can we say that "with95% probability, the true mean is in the confidence interval"? It is a factwhether m is in the calculated interval [ ]1 2,c c or not; only we do not knowthis fact. Strictly speaking, what we can say is this: if we repeated samplingand calculated [ ]1 2,c c from each sample, then 95% of these intervals willcontain the true mean. This is illustrated with the figure below (fromWikipedia). Each sampling yields a confidence interval (vertical lines), whichdepend on the sample average and are thus generally different. Most of themcover the true mean, but an expected 5% does not.

Second, in this example we assumed that the variance V is known, whereas inpractice, we must estimate V also from the sample we have. The estimatedvariance is however a random variable, not a constant (just the same way asthe sample average is a random variable). If we substitute the estimated

variance for V innV /mx - , then it is not normally distributed (because it

contains division by a random variable) but follows the so-called Student's t-distribution when x is normally distributed. The t-distribution is however verysimilar to the normal distribution if the sample size n is sufficiently large. Forsmall samples, the calculation of the confidence interval goes similarly to thisexercise but one has to use tables of the t-distribution (in part (b)) rather thanthe standard normal distribution.

32. Hypothesis testing. Suppose that in a large and well-known population, a normallydistributed quantitative trait has mean m = 143 and variance V = 441. One culturederived from this population, however, appears to be different, because its average

Page 16: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

16

trait value is only 134=x . This deviating average was calculated from measuring 25individuals. Could the difference between the sample average x and the knownpopulation mean m be due to statistical fluctuations, or is there reason to suspect thatsomething unusual happened to this culture?

Start with the null hypothesis that the 25 individuals represent a random sample fromthe population, and calculate the probability that the average trait value x of 25randomly selected individuals differs from m by 9134143 =-=- mx or more. Ifthis probability is small (e.g. less than 5%), then we reject the null hypothesis and saythat the difference between the measured average of the 25 individuals and the knownmean trait value of the population is statistically significant, so that the culture is(probably) not just a random sample from the population.

(a) Argue that x is a normally distributed random variable. (Recall that the average iscalculated as the sum of trait values divided with n = 25.)

(b) Determine the mean and the variance of x under the null hypothesis that the 25individuals are a random sample from the large population.

(c) Calculate the probability ( ) )152()134()9 ³+£=³- xxx PPmP under the nullhypothesis and decide if we should reject the null hypothesis.

Page 17: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

17

SOLUTIONS

1. The probability that a child is aa and therefore has the disorder is 1/4. One child ishealthy with probability 1-1/4 = 3/4; the two children are independent and thereforeboth are healthy with probability 3/4 x 3/4 = 9/16.

2. a is inherited from A to C with probability 1/2; if so, then it is inherited from C to Ewith probability 1/2; and if so, then G inherits it from E with probability 1/2. Theprobability that the paternally derived allele of G is a is therefore 1/8. The maternallyderived allele of G is a also with probability 1/8, and the maternal line is independentof the paternal line. G is thus aa with probability 1/64, and healthy, concerning onlythe harmful allele a, with probability 63/64. The same calculation applies to any of the6 harmful alleles in A and B, and the unlinked alleles are inherited independently. Gis not a homozygote for any of the 6 harmful recessives with probability (63/64)6 or90.98%. Cousins are the closest relatives allowed to marry in most Western societies,but even these marriages carry a fairly high risk of conceiving a disordered child.

3. To have 7 loci on 7 different chromosomes, the first of the 7 loci can be on anychromosome; the next locus can be on any of the remaining 6 chromosomes, whichhas probability 6/7; the next locus can be on 5 chromosomes, which has probability5/7; etc. The probability of having all 7 loci on different chromosomes is therefore

006.071

72

73

74

75

761 »××××××

which is a very small probability. This simple argument started the rumour thatMendel might have cheated. More detailed analysis (taking into account that thechromosomes are not equally long, that loci on the same chromosome but far apart areinherited ca independently; and that Mendel did not perform every single cross) hashowever showed that the real probability of getting Mendel's data is not this low, andthere is no ground of accusing Mendel of cheating.

4. Let pf and pm denote the frequency of the recessive "white" allele w in females andin males, respectively; at the beginning, pf = 1 and pm = 0. In the next generation,

fm pp =¢ because males inherit all their X chromosomes from females; and

2mf

f

ppp

+=¢ because females inherit half their X chromosomes from females and

half from males. Then the same story repeats in each generation: the next malefrequency is the present female frequency, and the next female frequency is theaverage of the present male and female frequencies. The allele frequencies oscillate asshown below.

Page 18: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

18

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

femalemale

The overall frequency of allele w is mf pp31

32

+ , because 2/3 of the X chromosomes

are in females and 1/3 are in males. It is easy to see that the overall frequency remainsconstant from generation to generation:

mffmf

mf ppppp

pp31

32

31

232

31

32

+=++

=¢+¢

Since mf pp31

32

+ remains the same in each generation, it is always the same as it

was in the beginning,320

311

32

=×+× . In equilibrium, mm pp =¢ (frequencies do not

change) and therefore fm pp = , i.e., females and males will have the same allele

frequency. Because we still have that the overall allele frequency is32 , both female

and male allele frequencies must converge to this value. In equilibrium, therefore, thephenotypic frequencies are

white males: 2/3 of malesred males: 1/3 of males

white females:94

32

32

=× of females

red females (all non-white): 5/9 of females

5. (a) The frequency of quadrats where plant A is present (either with B or alone) isP(A)=0.25+0.25=0.5; similarly, P(B)=0.25+0.05=0.3. If the two plants are present orabsent independently of each other, then P(A and B) should equal P(A) times P(B).But this is not the case: the product of P(A) and P(B) is 0.15, whereas P(A and

Page 19: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

19

B)=0.25. Therefore, the plants occur together more often than they would if theirspatial distribution is independent. (They might need the same type of environmentalconditions, e.g. may both occur in wet sites; or may depend on each other asmutualists.)

(b) The same calculation shows that with these numbers, the plants occurindependently of each other.

6. Denote the event of having high blood pressure with HB; African American with Aand Caucasian with C. Then the data are

6040250

330

., P(C).P(A).C)P(HB

.A)P(HB

==

=

=

The total probability of HB is 282.0)()()()( =+ CPCHBPAPAHBP .

7. If the recipient is of blood group A, than it receives the wrong blood withprobability P(B)+P(AB)=0.247. By the analogous calculation for each blood group,we obtain the conditional probabilities

688.0)0(1)()()()0(

0)(

523.0)()()(

247.0)()()(

=-=++=

=

=+=

=+=

PABPBPAPfatalP

ABfatalP

ABPAPBfatalP

ABPBPAfatalP

The total probability of a fatal transfusion is

4106.0312.0688.0081.00166.0523.0442.0247.0)0()0()()()()()()()(

=×+×+×+×

=+++= PfatalPABPABfatalPBPBfatalPAPAfatalPfatalP

8. ln 0.01 ln(1 ) 4.6 /q q- »

9. The probability of mutation per nucleotide is 0.01 (or more, which wouldmake life only more difficult), and at least 20% must be mutation-free. If the sequenceis n nucleotides long, then the probability that there is no mutation is 0.99n (0.99 foreach nucleotide, and there are n nucleotides). The higher n, the smaller 0.99n is. So themaximum length n is determined by the equation 0.99n =0.2, which solves ton=160.138, but of course n should be an integer, so n=160.

Page 20: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

20

11. (i) 0.027; (ii) 0.706. In a group of 57 people, with 99% probability there aretwo with the same birthday.

10. 85.98 10m -= ×

12. The data are (+ denotes a positive test):

001.0)(05.0)|(

1)|(

==+=+

diseasePhealthyPdiseaseP

and we want to know )|( +diseaseP , the probability of having the disease if we havea positive test. By Bayes' theorem,

)(001.01

)()()|()|(

=+

+=+

PPdiseasePdiseasePdiseaseP

We calculate P(+) from the law of total probability:

05095.0999.005.0001.01)()|()()|()(

=×+×==+++=+ healthyPhealthyPdiseasePdiseasePP

Substituting this, we finally obtain

0196.005095.0

001.0)(

001.01)|( ==+

×=+

PdiseaseP

i.e., the probability of having disease after the positive test is only 1.96%. This isbecause the disease is much rarer than a false positive test. Obviously, a ca 2% risk isa high risk when it comes to human life, so a positive test must absolutely be followedup to determine if the disease is present. But there is no need for panic at this point.

13. (a) The frequency of twin births is 0.3% + 1.7% = 2%, and the fraction ofmonozygotic twins among all twins is 0.3%/2% = 0.15. This is the prior probability,i.e., the probability of a pair of twins to be monozygotic when we have no furtherinformation e.g. about their sex. When we know that the two twins are of the samesex, then we use Bayes' theorem to include this information:

)()()|()|(

sexsamePcmonozygotiPcmonozygotisexsamePsexsamecmonozygotiP =

Monozygotic twins are always of the same sex, i.e., 1)|( =cmonozygotisexsameP .To calculate the probability that twins have the same sex, we use the law of totalprobability,

Page 21: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

21

)()|()()|()(

dizygoticPdizygoticsexsamePcmonozygotiPcmonozygotisexsamePsexsameP

++=

1)|( =cmonozygotisexsameP as above. Because dizygotic twins are geneticallysiblings, they have 50% chance of having the same sex just like any other pair of sibs,so that 5.0)( =sexgoticsameP . Substituting these and 15.0)( =cmonozygotiP ,

85.015.01)( =-=dizygoticP , we get 575.0)( =sexsameP (note that this is higherthan 0.5 because of the relatively high frequency of monozygotic twins among twins;in the general population twins are so rare that the effect of monozygotic twins isnegligible). Finally, the posterior probability of the twins being monozygotic is

2609.0575.0

15.01)(

)()|()|(

=

==sexsameP

cmonozygotiPcmonozygotisexsamePsexsamecmonozygotiP

(b) Repeating the above procedure with 3% dizygotic twins and therefore the priorprobability 0.3%/3% = 0.1 yields the posterior probability 0.1818. As the priorprobability decreases (now it is only 10% rather than 15%), the posterior probabilitydecreases, too.

14. (a) If the coin is a fair coin (q = 0.5), then the probability of having 6 tails out of 6flips is 015625.05.0)5.0|( 6 ===qdataP . If the coin is a tail coin, then it willcertainly give 6 tails ( 1)1|( ==qdataP ) and if it a head coin, then it cannot give sixtails ( 0)0|( ==qdataP ). Substituting these and the prior given in the exercise intothe law of total probability, the probability of the 6 tails is

0253125.001.0198.0015625.00)1()1|()5.0()5.0|()0()0|()(

=×+×+====+==+=== qPqdataPqPqdataPqPqdataPdataP

Then Bayes' theorem yields

39506.00253125.0

01.01)(

)1()1|()|1(

00253125.0

01.00)(

)0()0|()|0(

60494.00253125.0

98.0015625.0)(

)5.0()5.0|()|5.0(

===

==

===

==

===

==

dataPqPqdataPdataqP

dataPqPqdataPdataqP

dataPqPqdataPdataqP

These results shake our belief in the coin being a fair coin: before the data, weassigned 98% probability to a fair coin in the prior, which is now reduced to 60.5%.We still think that the coin is most likely a fair coin, but not with a large margin(60.5% vs 39.5% to the tail coin).

Page 22: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

22

This example is obviously a caricature of reality: Coins may have any level of bias,i.e., q can take any value between 0 and 1. We could however extend the aboveexample including many more values of q (for example, q = 0, 0.01, 0.02, ..., 0.99, 1instead of q = 0, 0.5, 1), only the computational burden would increase, the principleis the same. We then obtain a posterior probability for any possible value of q, whichwe can plot as a function of q. We conclude that q likely has the value around whichthe posterior probabilities are the highest. If the posterior probability clearly peaksaround some value of q, we can be fairly certain that the true value of q is there. If theposteriors are spread out more evenly, then we remain uncertain about what q mayreally be.

(b) Repeating the same with the uniform prior given in the exercise, we get

( )( )( )

0.5 0.015385

0 0

1 0.984615

P q data

P q data

P q data

= =

= =

= =

Now the tail coin is of the highest posterior probability. The data clearly favour thetail coin, and if we did not have particular confidence in the coin being fair, then wewould abandon the hypothesis of fairness. Yet the uniform prior itself is not veryplausible: I do believe that most coins in my purse are close to fair. The mostimportant message is, the prior matters!

(c) If we exclude a possibility in the prior, the corresponding posterior probability willalways be zero. For example, if we use P(q = 1) = 0 in the prior, then we get

0)(

0)1|()|1( =×=

==dataP

qdataPdataqP

If we assume P(q = 0.5) = 1 and hence P(q = 0) = 0 and P(q = 1) = 0, then allposterior probabilities other than the fair coin's probability will be zero.

A1. Let H0 denote the hypothesis that the female is normal, H1 that the female isinfected with Wolbachia, and H2 that there is some other sex ratio distortion.The prior distribution is P(H0)=0.9, P(H1)=0.09, and P(H2)=0.01.

With 21 females and 9 males, the Bayes factor is

1972.02.08.0

5.0

2.08.02630

5.05.02630

)|()|(

921

30

921

921

1

0 =×

=××÷÷

ø

öççè

æ

××÷÷ø

öççè

æ

=HdataPHdataP

Since we have 10)(/)( 10 =HPHP in the prior distribution, we have for theposterior

Page 23: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

23

972.1)()(

)|()|(

)|()|(

1

0

1

0

1

0 ==HPHP

HdataPHdataP

dataHPdataHP

With these data, we still believe the null hypothesis (normal offspringproduction) more than the Wolbachia hypothesis, but much less strongly thanbefore the data (only by factor ca 2 instead of factor 10).

With 26 female and 4 male offspring we have the Bayes factor

0001925.02.08.0

5.0

2.08.02630

5.05.02630

)|()|(

426

30

426

426

1

0 =×

=××÷÷

ø

öççè

æ

××÷÷ø

öççè

æ

=HdataPHdataP

so that

001925.0)()(

)|()|(

)|()|(

1

0

1

0

1

0 ==HPHP

HdataPHdataP

dataHPdataHP

With the second set of data, our belief changes dramatically: now it is muchmore likely that the female is infected with Wolbachia than that the female isnormal. The figure below shows the Bayes factor as a function of the numberof females out of 30 offspring, between 18 and 26. With 18 and 19 females,the Bayes factor is greater than 1 (and with fewer than 18 females, muchgreater than 1), so that the data sway our belief towards the null hypothesis ofnormal offspring production. 18 or 19 females out of 30 offspring is of coursemore than 50%, but not as far more as what we would expect from Wolbachia,which causes a 80:20 offspring sex ratio. With 20 or more female offspring,the data sway away from the null hypothesis and towards Wolbachia. This isonly between the null hypothesis and Wolbachia; we cannot say anythingabout other possible distortions (H2), because it was not specified how theywould affect the offspring sex ratio.

Page 24: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

24

15. (a) To obtain the probabilities that 0, 1, ..., 5 birds survive one year, substitute n =5 and p = 0.7 into the binomial formula and calculate its result for k = 0, 1, ..., 5. Theresults are shown in the table below, see the column for t=1.

The probability that a single bird survives 3 years is p = 0.73 = 0.343. To get theprobabilities that 0, 1, ..., 5 birds are alive after 3 years, repeat the above with p =0.343. For five years, use p = 0.75 = 0.16807. Note in the results that as p decreases, itis increasingly unlikely that many birds are alive, so that most probability goes to thecases of only a few birds alive (and eventually to the case of 0 bird alive).

(b) If we wait t years, the probability that a single bird is alive is 0.7t, and theprobability that a single bird is dead is 1-0.7t. The probability that all five birds aredead is therefore (1-0.7t)5. To be 95% sure that all five birds are dead means requiringthat (1-0.7t)5 = 0.95. Taking logarithms, we have ln(1-0.7t) = ln(0.95)/5 = -0.01026(this helped to get rid of the 5th power). Now taking exponentials again, 1-0.7t =e-0.01026 = 0.98979, so that 0.7t = 1 – 0.98979 = 0.01021, then t = ln(0.01021)/ln(0.7) =12.85. Since we monitor the population once a year, we shall see no birds with (atleast) 95% probability after 13 years.

16. To calculate the probability that, with λ=1.1, a tree will have no surviving seeds,

substitute k = 0 into the Poisson formula: 33287.0!0

)0( 1.10

=== -- eeP ll . Similarly,

substitute k = 1, 2, ... to get the probabilities of the tree having 1, 2, ... surviving seeds;the results are shown in the histogram below, with the exact values written above thecolumns. To know when k is high enough, we add up the probabilities calculated for0, 1, ..., k; when the sum gets above 99%, we know that the probability of havingmore than k offspring is less than 1%. The values shown in the histogram add up to0.9945, and therefore in this example, having more than 4 surviving offspring has aprobability 1-0.9945 < 0.01. (Note that k = 3 would not be enough.)

# alive t=1 t=3 t=100 0.00243 0.12241 0.866521 0.02835 0.31954 0.125942 0.1323 0.33365 0.007323 0.3087 0.17419 0.000214 0.36015 0.04547 3.09·10-6

5 0.16807 0.00475 1.8·10-8

Page 25: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

25

0.3328710840.366158192

0.201387006

0.073841902

0.020306523

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 1 2 3 4

17. The black columns of the following three charts show the histogram of thebinomial distribution for n = 10, 20 and 150, and p such that np = 2.5. Thewhite columns are the histogram of the Poisson distribution for 5.2=lfor comparison. For large n, the two distributions are very similar.

Page 26: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

26

18. (a)ba

b+

=x̂ ; with ab 2= , this yields32

22ˆ =+

=aa

ax . Notice that the

equilibrium does not depend on the absolute speed of opening and closing(i.e., on the values of b and a ), only on their ratio. When opening is twiceas fast as closing, the number of open channels is twice as high as the numberof closed channels (2/3 vs 1/3) so that the number of all closing events is thesame as the number of all opening events ( ba )3/1()3/2( = ).

(b) The channels are independent of each other and each channel is open withprobability 3/2ˆ =x . The number of open channels is therefore binomially

distributed. 19627.031

32

1218

)12(612

=÷øö

çèæ

÷øö

çèæ

÷÷ø

öççè

æ==kP

19. (a) The number of crossovers is Poisson distributed with parameter λ. Thereforethe probability that there is no crossover is e-λ, in which case there are no recombinantoffspring; and the probability that there is at least one crossover is 1–e-λ, in which casehalf the offspring are recombinant. Putting this together, we get )1(2

1 l--= eRF(multiply this with 100% if you want to express RF in %). RF is plotted as a functionof λ below.

0

0.1

0.2

0.3

0.4

0.5

0 1 2 3 4 5

(b) From the graph above, finding 30% recombinant offspring (RF=0.3 on the verticalaxis) corresponds to λ = 0.92, whereas RF=0.01 (1% recombinant offspring)corresponds to λ = 0.02. The difference in λ, and therefore in the physical distance, isclearly not 30-fold. This is because the graph is concave, saturating to the value 0.5.Very large physical distances correspond to 50% recombinant offspring, i.e., 50%recombinant offspring is not only 50 map units distance.

(c) Expressing λ from )1(21 l--= eRF , we get )21ln( RF--=l (where RF is the

measured fraction of recombinant offspring). If 2RF is small (such as a few % atmost), then )21ln( RF-- is approximately equal to 2RF, so that λ/2 is approximatelyequal to RF. The traditional definition is that 1 map unit corresponds to RF=1%. Tokeep this definition for short distances, we define 1 map unit as the distance thatcorresponds to λ = 0.02. In other words, if we measure any value of RF, we calculate

)21ln( RF--=l , and give the distance as λ/0.02 = (λ/2)×100 map units.

Page 27: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

27

20. (a) xt denotes the fraction of sites occupied by a plant in year t. In the next year(t+1), a fraction p of the plants are still alive, and they occupy a fraction pxt of allsites. Sites in the remaining fraction (1 – pxt) will be occupied if they have received atleast one seed, which happens with probability txea-1 (see lecture). Hence thefraction of occupied sites in year t+1 is )1)(1(1

txttt epxpxx a-

+ --+= .

(b) At equilibrium, tt xx =+1 (which we can now denote simply by x), i.e., we have)1)(1( xepxpxx a---+= as the equation that determines the equilibrium value of x.

Unfortunately, this equation cannot be solved (we cannot write an explicit expressionthat would give x in terms of the parameters p and α). What we can do in this case, wecan solve the equation for the parameter α, yielding

÷÷ø

öççè

æ--

-=pxx

x 11ln1

a

From this, we can calculate α for many different values of x (imagine a large table oftwo columns, which contain α and x values that belong to each other); and then wecan plot these data. The chart below shows x (vertical axis) as a function of a(horizontal axis) for p = 0.8, 0.5, 0.2, 0 (from left to right). The rightmost curve(p = 0) corresponds to the original Skellam model for annual plants.

Notice that we need a minimum value for α to obtain a positive x. This means that theplant must have a certain amount of seeds (α) to be viable, i.e., to maintain a positivepopulation size. We can calculate this minimum α by substituting a very small x intothe formula above (substituting 0 directly is not possible because then the logarithmof 1, which is 0, divided with 0, i.e., we get a meaningless expression). From thegraph, it appears that α = 1 is the minimum for p = 0 (rightmost curve). This makessense heuristically: an annual plant must make at least 1 seed to replace itself in thenext generation. (Note that here we count only seeds that manage to arrive at a livingsite; in reality, many seeds are lost because they fall outside any living site, so that theplant must make more seeds to compensate for this loss.) Similarly, if the plantsurvives with probability p, then it needs at least 1 – p seeds to replace itself in case itdies; this gives the minimum α for the perennial plants.

Page 28: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

28

21. (a)

( ))exp(1)exp()exp(1

1

1

ttt

tt

yxyxx

baa

---=--=

+

+

(b) The equilibrium equations

( ))exp(1)exp()exp(1

yxyxx

baa

---=--=

cannot be solved for x and y directly. Solving for the parameters ),( ba yields

÷øö

çèæ

---=

--=

xy

y

xx

11ln1

)1ln(

b

a

One can vary x and y between 0 and 1 and calculate the corresponding rangeof ),( ba numerically.

In particular, if y is close to zero, thenx

yx

y-

-»÷øö

çèæ

--

111ln and

x-»

11b so

thatx-

>1

1b is necessary for the inferior species to be viable. Vary x

between 0 and 1 and calculate the corresponding pairs of parameter valuesxx /)1ln( --=a and )1/(1 x-=b . Plot the resulting b 's against the a 's to

arrive at the curve in the figure below.

0123456789

10

0 1 2 3

alpha

beta

1<a implies that the superior species is not viable, and therefore the inferiorspecies is viable if and only if 1>b . When 1>a , the superior species ispresent and the inferior species is viable when it's fecundity is sufficientlylarge to compensate for the loss of seeds to unsuccessful competition with thesuperior species; i.e., the inferior species is present and the two species coexistwhen 1>a and b is above the curve.

Page 29: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

29

22. (a) The expected number of attacks on a given host is proportional to theparasitoid density, i.e., tPal = . The number of attacks is Poisson distributed,and therefore the probability of avoiding attack is tPe a- .

(b)

BPHPFPHH

ttt

ttt

))exp(1()exp(

1

1

aa

--=-=

+

+

A2. Average height: E(ξ) = 176.4 cm, average weight: E(η) = 78.0 kg, varianceof height: V(ξ) = 19.04 cm2, variance of weight: V(η) = 54.8 kg2, covariancebetween height and weight: COV(ξ, η) = 21.6 cm kg, correlation r = 0.6687.(The variances are uncorrected.) Use the definitions of the variance, thecovariance, and the correlation coefficient to obtain these results.

A3. 5.36615

614

613

612

611

616

1=×+×+×+×+×+×=å

=iii xp

A4. (a) Adding the means obtained from the mother and from the father, themean of the daily food is 45 g. Because the parents are independent, we canadd the variances, arriving at the variance 5.4 g2.

(b) Let η denote the total amount of daily food (η = ξ1 + ξ2); from (a), weknow that E(η) = 45 g and V(η) = 5.4 g2. The total is divided equally among 9chicks, so that each one gets η/9. The expectation of this is E(η/9) = (1/9)E(η)= 45/9 g = 5 g (recall that a constant, such as 1/9, can be factored out from theexpectation). The variance is V(η/9) = (1/9)2 V(η) = 5.4/81 = 0.0667 g2 (recallthat to factor out a constant from a variance, we have to square it).

(c) The amount of food brought by each parent may depend e.g. on theweather. If so, the "bad" days are bad for both parents, and the "good" days aregood for both, so that the amount food the mother brings in one day correlatespositively with the amount of food the father brings. This correlation impliesthat there will be "very bad" and "very good" days for the offspring, i.e., thevariance of the daily food will be higher than what we calculated above. Orelse it may be the case that when one parent does not work very hard, then thebegging chicks induce the other parent to put in extra effort. This yields anegative correlation between the amounts of food the two parents bring, andby buffering against the variation, this effect reduces the variance of the dailyfood.

A5. (a) pppE =×-+×= 0)1(1)(x ,[ ] )1(0)1(1)()()( 222222 pppppppEEV -=-=-×-+×=-= xxx

(b) The sum counts the number of "yes" outcomes.

Page 30: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

30

(c) Each ξ has mean p and variance p(1-p). Therefore the sum of n terms hasmean np and (since they are independent) variance np(1-p).(d) The mean remains λ = np. When p is small, (1-p) is ca 1, so that thevariance becomes np, which is the same as λ. The Poisson distribution hasonly one parameter, λ, which is its mean and also its variance.

23. 5.3)( =hE , 29167.0)( =hV

24. (a) )(),( 21 xhh VCOV = ,)()(

hx

VVr =

(b) The correlation coefficient can be measured from the twin data. From (a),this equals the heritability.

A problem with this measurement is that twins share more than their genes;part of the environmental effects (important childhood effects, for example)are also the same, which means that we underestimate the variance ofenvironmental effects. This can be corrected if we consider also fraternaltwins, who share the environment ca to the same extent as identical twins butshare less of their genes.

25. (b) As n increases,

)1()( pnpV -=x increases)1()( pnpD -=x increases

nppEDc -

==1)(/)( xx decreases (c is a measure of variability

relative to "typical" values, such as the mean; with increasingn, variation indeed decreases on the scale of typical values)

nppVnnV /)1()()/1()/( 2 -== xx decreases (with large n, we aregetting the frequency more precisely)

26. (a) In equilibrium, zxy ab = and therefore the fraction of free enzymes is

zxzxx

yxx

abb

ba +=

+=

+ )/(. The enzyme molecules are independent of

each other (as long as the number of inhibitor molecules is much higher thanthe number of enzyme molecules). Therefore ξ is binomial with parameters N

andz

pab

b+

= (note that z is considered constant).

Page 31: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

31

zNNpE

abbx+

==)( ;z

zNpNNEab

ax+

=-=- )1()(

2)()1()()(

zzNpNpNVV

abbaxx+

=-=-=

(b) N should be at leastb

az10000 to have the coefficient of variation of x to

be at most 0.01; and N should be at leastza

b10000 to have the same for x-N .

Hence ÷÷ø

öççè

æ³

zzN

ab

ba ,max10000 .

27. (a)0

( ) ( )A g t f t dt ba b

¥

= =+ò

(b)Tb

a =

28. Let h denote the transformed random variable35175-

=xh . h follows the

standard normal distribution, so that we can use the table of the standardnormal distribution to obtain the probability that h is less than a given value.

9988.0)04.3(35

175193)193( =<=÷÷ø

öççè

æ -<=< hhx PPP

)03.2(35

175163)163( -<=÷÷ø

öççè

æ -<=< hhx PPP

)03.2( -<hP is not listed in the table. Because the normal distribution issymmetric, )(1)()( zPzPzP <-=>=-< hhh . Look up

9788.0)03.2( =<hP from the table, and obtain )03.2( -<hP as0212.09788.01)03.2( =-=-<hP

9776.00212.09988.0)163()193()193163( =-=<-<=<< xxx PPP

29. 1.96

30. (a) T=13.6895, (b) T=9.4186 (your result may be somewhat different due todifferent precision of the calculation)

Page 32: MATHEMATICAL METHODS IN BIOLOGY...4 In Finland, the frequencies of blood groups are A 44.2% B 16.6% AB 8.1% 0 31.2% Calculate what would be the probability that a blood transfusion

32

31. (c) 1 1.96 / 161.18c V nx= - = , 2 1.96 / 166.82c V nx= + =

32. (a) The sum of 25 independent, identically distributed random variables isapproximately normal. (b) 143)( == mE x ;

64.1725441)()(1...

)( 21 ===××=÷

øö

çèæ ++

=n

VVnnn

VV n xx

xxx

(c) If x is indeed normally distributed with mean 143 and variance 17.64, then

64.17143-

=x

h follows the standard normal distribution.

( ) ( ) 0162.014.2114.264.17143134)134( =<-=-<=÷÷

ø

öççè

æ -<=< hhhx PPPP

0324.00162.02)152()134( =×=>+< xx PP , and because this is less than5%, we reject the null hypothesis.