Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
The Coalescent Model
Florian Weber
23. 7. 2016
The Coalescent Model
coalescent = „zusammenwachsend“
Outline
I Population Genetics and the Wright-Fisher-modelI The CoalescentI Non-constant population-sizesI Further extensionsI Summary
Population Genetics
(Shamelessly stealing Alexis’ slides)I Study of polymorphisms in a population
I What are the processes that introduce polymorphisms in thepopulation?
I If a polymorphism exists in a population, will it be there forever?
I Is there some process that removes polymorphisms from thepopulation?
I Do the polymorphisms exhibit patterns?I . . .
Motivation
I The coalescent is basically the Wright-Fisher-model with a lotof analysis.
I It can easily do calculations about the past
I It is very fast to compute
I Is can easily be extended to represent a more complex reality
Motivation
I The coalescent is basically the Wright-Fisher-model with a lotof analysis.
I It can easily do calculations about the past
I It is very fast to compute
I Is can easily be extended to represent a more complex reality
Motivation
I The coalescent is basically the Wright-Fisher-model with a lotof analysis.
I It can easily do calculations about the past
I It is very fast to compute
I Is can easily be extended to represent a more complex reality
Motivation
I The coalescent is basically the Wright-Fisher-model with a lotof analysis.
I It can easily do calculations about the past
I It is very fast to compute
I Is can easily be extended to represent a more complex reality
Hardy-Weinberg
I Assuming an infinite population size, random mating, diploidpopulation, no selection. . .the allele-frequencies are constant
I Infinity is weird. . . 0.3×∞ =∞I . . . and unrealistic
Hardy-Weinberg
I Assuming an infinite population size, random mating, diploidpopulation, no selection. . .the allele-frequencies are constant
I Infinity is weird. . . 0.3×∞ =∞I . . . and unrealistic
Wright-Fisher
I Assuming a finite but constant population size, randommating, non-overlapping generations, no selection. . .all alleles except for one will disappear over time.
I The likelihood for an allele to prevail is equal to it’s initialfrequency
Wright-Fisher
I Assuming a finite but constant population size, randommating, non-overlapping generations, no selection. . .all alleles except for one will disappear over time.
I The likelihood for an allele to prevail is equal to it’s initialfrequency
Wright-Fisher
Figure 1: A simulation of three alleles under the model
Wright-Fisher (Individuals)past
present
time
Figure 2: An evolutionary history in the model
Wright-Fisher (Individuals)past
present
time
Figure 3: Extinct alleles removed
Wright-Fisher (Individuals)past
present
time
Figure 4: Surviving Tree
Wright-Fisher (MRCA)
MRCA
past
present
time
Figure 5: Most Recent Common Ancestor marked
Wright-Fisher (Coalescence-Events)
MRCA
Coalescence-Events
past
present
time
Figure 6: Coalescence-Events of the green individuals
The Coalescent Model
N
A B
P(B)P(A)
Figure 7: Two individuals and their parents
The Coalescent Model
I Likelihood for two nodes to coalesce in the previous generation:p(P(A) = P(B)) = 1
N
I In the previous two generations:1− (N−1
N · N−1N ) = 1−
(N−1
N
)2
I In the previous three generations:1−
((N−1
N
)2· N−1
N
)= 1−
(N−1
N
)3
I In the previous t generations1−
(N−1
N
)t
The Coalescent Model
I Likelihood for two nodes to coalesce in the previous generation:p(P(A) = P(B)) = 1
N
I In the previous two generations:1− (N−1
N · N−1N ) = 1−
(N−1
N
)2
I In the previous three generations:1−
((N−1
N
)2· N−1
N
)= 1−
(N−1
N
)3
I In the previous t generations1−
(N−1
N
)t
The Coalescent Model
I Likelihood for two nodes to coalesce in the previous generation:p(P(A) = P(B)) = 1
N
I In the previous two generations:1− (N−1
N · N−1N ) = 1−
(N−1
N
)2
I In the previous three generations:1−
((N−1
N
)2· N−1
N
)= 1−
(N−1
N
)3
I In the previous t generations1−
(N−1
N
)t
The Coalescent Model
I Likelihood for two nodes to coalesce in the previous generation:p(P(A) = P(B)) = 1
N
I In the previous two generations:1− (N−1
N · N−1N ) = 1−
(N−1
N
)2
I In the previous three generations:1−
((N−1
N
)2· N−1
N
)= 1−
(N−1
N
)3
I In the previous t generations1−
(N−1
N
)t
The Coalescent Model
Gen 0
Gen 1
Gen 2
Gen 3
1*
1*
1*
* without loss of generality
1/N
1/N
1/N
(N-1) / N
(N-1) / N
(N-1) / N
Figure 8: Likelihood of coalescence
The Coalescent Model
I Likelihood of coalescence in the previous t generations:
1−(N − 1
N
)t
I Likelihood for lineages to remain distinct for t generations:(N − 1N
)t
I Expected time for coalescence: E (t) = N
I Rescale: τ = tN : (N − 1
N
)dNτe−−−−→N→∞
e−τ
The Coalescent Model
I Likelihood of coalescence in the previous t generations:
1−(N − 1
N
)t
I Likelihood for lineages to remain distinct for t generations:(N − 1N
)t
I Expected time for coalescence: E (t) = N
I Rescale: τ = tN : (N − 1
N
)dNτe−−−−→N→∞
e−τ
The Coalescent Model
I Likelihood of coalescence in the previous t generations:
1−(N − 1
N
)t
I Likelihood for lineages to remain distinct for t generations:(N − 1N
)t
I Expected time for coalescence: E (t) = N
I Rescale: τ = tN : (N − 1
N
)dNτe−−−−→N→∞
e−τ
The Coalescent Model
I Likelihood of coalescence in the previous t generations:
1−(N − 1
N
)t
I Likelihood for lineages to remain distinct for t generations:(N − 1N
)t
I Expected time for coalescence: E (t) = N
I Rescale: τ = tN : (N − 1
N
)dNτe−−−−→N→∞
e−τ
The Coalescent Model
⇒ The likelihood for two lineages to stay distinct over timeis exponentially small!
1.0
0.0
0.50.25
1 2 time t
lineage 1
lineage 2
Moar Lineages!!
lineages
Figure 9: http://what-if.xkcd.com/13/
More Lineages
I Likelihood of no coalescence in one generation and threelineages:
N − 1N × N − 2
N
I One generation, k lineages:
N − 1N × N − 2
N × · · · × N − k + 1N =
k−1∏i=1
N − iN
More Lineages
I Likelihood of no coalescence in one generation and threelineages:
N − 1N × N − 2
N
I One generation, k lineages:
N − 1N × N − 2
N × · · · × N − k + 1N =
k−1∏i=1
N − iN
More Lineages
I For some reason this is equal to:
k−1∏i=1
N − iN = 1−
(k2)
N + O( 1N2
)≈ 1−
(k2)
N
I(k
2)is the binomial coefficient and equates to k·(k−1)
2
I There are(k
2)ways to pick two lineages from a set of k lineages.
I Therefore a coalescence-event is(k
2)-times as likely with k
lineages than with 2
I The number of coalescence-events grows quadraticallywith the number of lineages!
More Lineages
I For some reason this is equal to:
k−1∏i=1
N − iN = 1−
(k2)
N + O( 1N2
)≈ 1−
(k2)
N
I(k
2)is the binomial coefficient and equates to k·(k−1)
2
I There are(k
2)ways to pick two lineages from a set of k lineages.
I Therefore a coalescence-event is(k
2)-times as likely with k
lineages than with 2
I The number of coalescence-events grows quadraticallywith the number of lineages!
More Lineages
I For some reason this is equal to:
k−1∏i=1
N − iN = 1−
(k2)
N + O( 1N2
)≈ 1−
(k2)
N
I(k
2)is the binomial coefficient and equates to k·(k−1)
2
I There are(k
2)ways to pick two lineages from a set of k lineages.
I Therefore a coalescence-event is(k
2)-times as likely with k
lineages than with 2
I The number of coalescence-events grows quadraticallywith the number of lineages!
More Lineages
I For some reason this is equal to:
k−1∏i=1
N − iN = 1−
(k2)
N + O( 1N2
)≈ 1−
(k2)
N
I(k
2)is the binomial coefficient and equates to k·(k−1)
2
I There are(k
2)ways to pick two lineages from a set of k lineages.
I Therefore a coalescence-event is(k
2)-times as likely with k
lineages than with 2
I The number of coalescence-events grows quadraticallywith the number of lineages!
More Lineages
time
coalescence-rateEvents gettingexponentially rare
Figure 10: More lineages = faster coalscence
Properties
I Few deep furcations
I Likelihood: Everything is possible but maybe unlikely
I Calculation is backward in times (Wright-Fisher: forward)
I Efficient: no calculation per individual or for extinct lineages
Properties
I Few deep furcations
I Likelihood: Everything is possible but maybe unlikely
I Calculation is backward in times (Wright-Fisher: forward)
I Efficient: no calculation per individual or for extinct lineages
Properties
I Few deep furcations
I Likelihood: Everything is possible but maybe unlikely
I Calculation is backward in times (Wright-Fisher: forward)
I Efficient: no calculation per individual or for extinct lineages
Properties
I Few deep furcations
I Likelihood: Everything is possible but maybe unlikely
I Calculation is backward in times (Wright-Fisher: forward)
I Efficient: no calculation per individual or for extinct lineages
Non-constant population-sizesWorldpopulation,billions
0
1
2
3
4
5
6
7
10,000 BC 8000 6000 4000 2000 AD 1 1000 2000
Figure 11: Wordpopulation - not very constant [Wikimedia]
Non-constant population-sizes
I Non-constant, but known population-sizeI Coalescence is more likely in small populationsI ⇒ Coalescence-rate changes over time
I Simply rescale time.
Non-constant population-sizes
I Non-constant, but known population-sizeI Coalescence is more likely in small populationsI ⇒ Coalescence-rate changes over time
I Simply rescale time.
Rescaling Time
I Before: t Generations corresponded to t/N units ofcoalescence-time
I Now: t Generations correspond to
t∑i=1
1Ni
units of coalescence-timeI Note: for a constant population both formulas are equal
Rescaling Time - Example
I 5 Generations, with on average 5 individuals:
I For constant 5 individuals: τ = tN = 5
5 = 1 unit of coalescencetime
I For non-constant {4, 4, 5, 6, 6} individuals:
τ =t∑
i=1
1Ni
= 14 + 1
4 + 15 + 1
6 + 16 = 31
30
note the lesser influence of the larger generations
I A generation with twice the size, will get halve thecoalescence-time
Rescaling Time - Example
I 5 Generations, with on average 5 individuals:
I For constant 5 individuals: τ = tN = 5
5 = 1 unit of coalescencetime
I For non-constant {4, 4, 5, 6, 6} individuals:
τ =t∑
i=1
1Ni
= 14 + 1
4 + 15 + 1
6 + 16 = 31
30
note the lesser influence of the larger generations
I A generation with twice the size, will get halve thecoalescence-time
Rescaling Time - Example
I 5 Generations, with on average 5 individuals:
I For constant 5 individuals: τ = tN = 5
5 = 1 unit of coalescencetime
I For non-constant {4, 4, 5, 6, 6} individuals:
τ =t∑
i=1
1Ni
= 14 + 1
4 + 15 + 1
6 + 16 = 31
30
note the lesser influence of the larger generations
I A generation with twice the size, will get halve thecoalescence-time
Rescaling Time - Example
I 5 Generations, with on average 5 individuals:
I For constant 5 individuals: τ = tN = 5
5 = 1 unit of coalescencetime
I For non-constant {4, 4, 5, 6, 6} individuals:
τ =t∑
i=1
1Ni
= 14 + 1
4 + 15 + 1
6 + 16 = 31
30
note the lesser influence of the larger generations
I A generation with twice the size, will get halve thecoalescence-time
Rescaling Time - Exponential Growth
time
population-size
Generation
1 100
2 120
3 144
4 173
5 207
6 249
7 299
8 358
9 430
10 516
Figure 12: Exponentially growing population versus coalescence-time
Rescaling Time - Exponential Growth
2500 5000 7500 10000 12500 15000
5
10
15
20
constant size
exponential growth
scal
ed ti
me
2500 5000 7500 10000 12500 15000
0.5
1
1.5
2
2.5
3
3.5
real time (generations)
log
N(t)
Figure 13: Exponentially growing and constant opulations. Note thereverse time-scale! [Nordborg]
Rescaling Time - Applicability
I Approximation converges against theory for growing NI Close enough for most purposes
Further Extensions
I Separated PopulationsI Diploid PopulationsI Males and FemalesI SelectionI Multiple SpeciesI . . .
Wright-Fisher:Assuming a finite but constant population size, randommating, non-overlapping generations, no selection. . .
Coalescent:Assuming non-overlapping generations. . .
Further Extensions
I Separated PopulationsI Diploid PopulationsI Males and FemalesI SelectionI Multiple SpeciesI . . .
Wright-Fisher:Assuming a finite but constant population size, randommating, non-overlapping generations, no selection. . .
Coalescent:Assuming non-overlapping generations. . .
Further Extensions
I Separated PopulationsI Diploid PopulationsI Males and FemalesI SelectionI Multiple SpeciesI . . .
Wright-Fisher:Assuming a finite but constant population size, randommating, non-overlapping generations, no selection. . .
Coalescent:Assuming non-overlapping generations. . .
An actual example
Figure 14: Coalescent vs. Anthropological Estimates [Atkinson et al.]
Software
Software that uses the coalescent model1:BEAST, COAL, CoaSim, DIYABC, DendroPy, GeneRecon, genetree,GENOME, IBDSim, IMa, Lamarc, Migraine, Migrate, MaCS, ms &msHOT, msms, Recodon and NetRecodon, SARG, simcoal2,TreesimJ
1Source: https://en.wikipedia.org/wiki/Coalescent_theory
Summary
I The coalescent is the Wright-Fisher-model plus mathI Coalescent-events are, with exponential likelihood, relatively
recentI The more lineages there are, the more coalescence-events occurI Non-Constant populations can be simulated by rescaling timeI The simulated time for a generation is anti-proportional to it’s
size
References
ContentI Magnus Nordborg, “Coalescent Theory”, March 2000
Software-listI en.wikipedia.org/wiki/Coalescent_theory
ImagesI Fig. 09: Randal Munroe: what-if.xkcd.com/13/I Fig. 11: El T:
commons.wikimedia.org/wiki/File:Population_curve.svgI Fig. 13: Magnus Nordborg: “Coalescent Theory”, 2000I Fig. 14: Atkinson et al.: “mtDNA variation predicts population
size in humans and reveals a major Southern Asian chapter inhuman prehistory”, 2008