12
Continuous Coalescent Model The continuous coalescent lends itself to generative models Algorithm to construct a plausible genealogy for n genes Note that this model runs backwards, it begins from the current population and posits ancestry, in contrast to a forward algorithm like those used in the first lecture 06/14/22 Comp 790– Continuous-Time Coalescence 1 1. Start with k = n genes 2. Simulate the waiting time, , to the next event, 3. Choose a random pair (i, j) with 1 ≤ i < j ≤ k uniformly among the pairs 4. Merge I and J into one gene and decrease the sample size by one, k k -1 5. Repeat from step 2 while k > 1 T k c T k c ~ Exp k 2 () ( ) k 2 ()

Continuous Coalescent Model

  • Upload
    lexine

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Continuous Coalescent Model. The continuous coalescent lends itself to generative models Algorithm to construct a plausible genealogy for n genes - PowerPoint PPT Presentation

Citation preview

Page 1: Continuous Coalescent Model

Continuous Coalescent Model

• The continuous coalescent lends itself to generative models• Algorithm to construct a plausible genealogy for n genes

• Note that this model runs backwards, it begins from the current population and posits ancestry, in contrast to a forward algorithm like those used in the first lecture

04/21/23 Comp 790– Continuous-Time Coalescence 1

1. Start with k = n genes2. Simulate the waiting time, , to the next event,3. Choose a random pair (i, j) with 1 ≤ i < j ≤ k uniformly

among the pairs4. Merge I and J into one gene and decrease the sample size

by one, k k -15. Repeat from step 2 while k > 1

Tkc

Tkc ~Exp k

2( )( )

k2( )

Page 2: Continuous Coalescent Model

In Python

• A simulator in 12 lines

04/21/23 Comp 790– Continuous-Time Coalescence 2

T = [[i,0.0] for i in xrange(N)] # gene id, time of merge k = N t = 0.0 while k > 1: t += expovariate(0.5*k*(k-1)) i = randint(0,k-1) j = randint(0,k-1) while i == j: j = randint(0,k-1) T[i] = [T[i], T[j], t] T.pop(j) k -= 1

Page 3: Continuous Coalescent Model

Properties of a Coalescent Tree

• The height, Hn, of the tree is the sum of time epochs, Tj, where there are j = n, n-1, n-2, … , 2, 1 ancestors.

04/21/23 Comp 790– Continuous-Time Coalescence 3

E(Hn)= E(Tj)=21

j(j−1)=2 1−1

n( )j=2

n

∑j=2

n

As n ∞, E(Hn) 2,and, if n=2, E(H2)=1.

Thus, the waiting time for n genes to find their common ancestor is less than twice the time for 2!

Var(Hn)= Var(Tj)j=2

n

∑ =4 1j2 (j−1)2j=2

n

∑As n ∞, Var(Hn) 4(π2-9)/3,

and, if n=2, Var(H2)=1.

Page 4: Continuous Coalescent Model

• N = 1000000

Sampled Distribution

04/21/23 Comp 790– Continuous-Time Coalescence 4

Page 5: Continuous Coalescent Model

• Observation: The contribution of T2, where the last two ancestors converge to a common root, is disproportionately large

Example Trees

04/21/23 Comp 790– Continuous-Time Coalescence 5

Page 6: Continuous Coalescent Model

Total Branch Length

• In contrast to Hn, the distribution of the total branch length Ln, has a simple form:

• The mean of Ln is found by weighting the coalescent times by the number of active lineages

• This sum does not converge for large n, but grows slowly. It fact, it is proportional to log(n)

04/21/23 Comp 790– Continuous-Time Coalescence 6

P(Ln ≤t)=(1−e−t / 2 )n−1

E(Ln)= jE(Tj)j=2

n

∑ =2 1j

j=1

n−1

Page 7: Continuous Coalescent Model

Shared History• E(Ln) can be used to get a sense of how much history genes share.

• Genes would share the least history if they all arose from a common ancestor long ago and then propagated along distinct lineages.

• If the mean time to the common ancestor is E(Hn) = 2(1 – 1/n), and we assume the split was a early as possible (thus minimizing the shared history), then the total branch length would be nE(Hn) = 2(n-1).

• Comparing to E(Ln) as a fraction of this minimum shared-history case gives:

04/21/23 Comp 790– Continuous-Time Coalescence 7

E(Ln)nE(Hn)

=1jj=1

n−1∑n−1

≈ (logn)n−1

7 7 7 7

Page 8: Continuous Coalescent Model

Plot of Shared History

• Even for small n, samples, on average, share considerable history– share(5) = 48%– share(10) = 69%– share(20) = 81%

• Sharing is the fractionof a genealogy that anaverage gene shareswith two or more otherextant genes

04/21/23 Comp 790– Continuous-Time Coalescence 8

share(n)=1−1jj=1

n−1∑n−1

Page 9: Continuous Coalescent Model

Variance of Total Branch Length

• The variance in the total branch length is:

which converges to 2π2/3 ≈ 6.579 as n ∞.• This implies that for large n, Ln is narrowly

centered around E(Ln). Likewise, sharing is also relatively consistent.

04/21/23 Comp 790– Continuous-Time Coalescence 9

Var(Ln)= j2Var(Tj)j=2

n

∑ =4 1j2

j=1

n−1

Page 10: Continuous Coalescent Model

Implications on Sampling Paths

• Sampling multiple paths from extant genes along their ancestors is less effective than one might think.

• Most long branches are covered by relatively few samples• Not surprising since the E(H40) = 1.95 and E(H10) = 1.8

(a 4x increase in samples increases height by less than 10%).

04/21/23 Comp 790– Continuous-Time Coalescence 10

Page 11: Continuous Coalescent Model

Effective Population Size

• Real populations are not likely to satisfy the Wright-Fisher model.

• In particular, most real populations show some sort of reproductive structure, either due to geography or societal constraints

• Also likely that the number of descendents is a generation depends on many factors (health, disease, etc.), as opposed to the implicit Poisson model

• Total population size is not fixed, but changes over time

04/21/23 Comp 790– Continuous-Time Coalescence 11

Page 12: Continuous Coalescent Model

Sanity Check

• When the Wright-Fisher model, or the basic coalescent, is used to model a real population, the size of the population (2N) cannot be taken literally.

• For example, many human genes have a MRCA less than 200,000 years ago. If we consider one generation per 20 years then N should be less than 200,000/(4*20) = 2500, which is too small (recall the maximum tree height for the entire population is 2. and 2(2 generation_time) = 4*20)

04/21/23 Comp 790– Continuous-Time Coalescence 12