Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi...

Preview:

Citation preview

Mathematical models of population genetics

(I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans

January 21, 2014

There and back again

I part I: basic models, construction of the coalescent,incorporating mutation

I part II: extensions of the coalescent to include recombination,demography

I part III: forward-time perspective, Wright-Fisher diffusion,selection and mutation

There and back again

I part I: basic models, construction of the coalescent,incorporating mutation

I part II: extensions of the coalescent to include recombination,demography

I part III: forward-time perspective, Wright-Fisher diffusion,selection and mutation

Wright-Fisher model (1930s)

I discrete, non-overlappinggenerations

I constant population size, N

I individuals pick parent uniformlyat random from previousgeneration

I no types, mutation, selection, orrecombination (for now)

Wright-Fisher model (1930s)

I discrete, non-overlappinggenerations

I constant population size, N

I individuals pick parent uniformlyat random from previousgeneration

I no types, mutation, selection, orrecombination (for now)

Wright-Fisher model (1930s)

I discrete, non-overlappinggenerations

I constant population size, N

I individuals pick parent uniformlyat random from previousgeneration

I no types, mutation, selection, orrecombination (for now)

Wright-Fisher model (1930s)

I discrete, non-overlappinggenerations

I constant population size, N

I individuals pick parent uniformlyat random from previousgeneration

I no types, mutation, selection, orrecombination (for now)

1 2 3 4

The ancestral process of the Wright-Fisher model

P(2 individuals have distinct parents) = 1− 1

N

P(2 have distinct ancestors for k generations) =

(1− 1

N

)kP(l have distinct ancestors for k generations)

=

(1− 1

N

)k. . .

(1− l − 1

N

)k→ e−

l(l−1)2

t as N →∞

where k generations corresponds to t = kN .

Time to coalescence in a sample of l individuals ∼ Exp(l(l−1)

2

).

The ancestral process of the Wright-Fisher model

P(2 individuals have distinct parents) = 1− 1

N

P(2 have distinct ancestors for k generations) =

(1− 1

N

)kP(l have distinct ancestors for k generations)

=

(1− 1

N

)k. . .

(1− l − 1

N

)k→ e−

l(l−1)2

t as N →∞

where k generations corresponds to t = kN .

Time to coalescence in a sample of l individuals ∼ Exp(l(l−1)

2

).

The ancestral process of the Wright-Fisher model

P(2 individuals have distinct parents) = 1− 1

N

P(2 have distinct ancestors for k generations) =

(1− 1

N

)kP(l have distinct ancestors for k generations)

=

(1− 1

N

)k. . .

(1− l − 1

N

)k→ e−

l(l−1)2

t as N →∞

where k generations corresponds to t = kN .

Time to coalescence in a sample of l individuals ∼ Exp(l(l−1)

2

).

1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4 1 2 3 4

1 2 3 4 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Kingman’s coalescent (1982)

1 2 3 4

The ‘n-coalescent’ is a continuous timestochastic process,

Π [n] =(Π [n](t)

)t≥0

on the space, P[n], of partitions of[n] := {1, . . . , n}.

{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}{{1, 2, 3, 4}}

Kingman’s coalescent (1982)

1 2 3 4

The ‘n-coalescent’ is a continuous timestochastic process,

Π [n] =(Π [n](t)

)t≥0

on the space, P[n], of partitions of[n] := {1, . . . , n}.

{{1}, {2}, {3}, {4}}

{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}{{1, 2, 3, 4}}

Kingman’s coalescent (1982)

1 2 3 4

The ‘n-coalescent’ is a continuous timestochastic process,

Π [n] =(Π [n](t)

)t≥0

on the space, P[n], of partitions of[n] := {1, . . . , n}.

{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}

{{1, 2}, {3, 4}}{{1, 2, 3, 4}}

Kingman’s coalescent (1982)

1 2 3 4

The ‘n-coalescent’ is a continuous timestochastic process,

Π [n] =(Π [n](t)

)t≥0

on the space, P[n], of partitions of[n] := {1, . . . , n}.

{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}

{{1, 2, 3, 4}}

Kingman’s coalescent (1982)

1 2 3 4

The ‘n-coalescent’ is a continuous timestochastic process,

Π [n] =(Π [n](t)

)t≥0

on the space, P[n], of partitions of[n] := {1, . . . , n}.

{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}{{1, 2, 3, 4}}

Mathematical description

Initial condition Π [n](0) = {{1}, . . . , {n}}

Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise

Absorbing state {{1, . . . , n}}

Remarks:

I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)

2

I sequence of partitions of Π [n] is independent of #Π [n]

Mathematical description

Initial condition Π [n](0) = {{1}, . . . , {n}}

Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise

Absorbing state {{1, . . . , n}}

Remarks:

I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)

2

I sequence of partitions of Π [n] is independent of #Π [n]

Mathematical description

Initial condition Π [n](0) = {{1}, . . . , {n}}

Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise

Absorbing state {{1, . . . , n}}

Remarks:

I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)

2

I sequence of partitions of Π [n] is independent of #Π [n]

Mathematical description

Initial condition Π [n](0) = {{1}, . . . , {n}}

Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise

Absorbing state {{1, . . . , n}}

Remarks:

I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)

2

I sequence of partitions of Π [n] is independent of #Π [n]

Some cute properties

I closed-form expression for P(Π [n](t) = π

), where π ∈ P[n]

I ‘comes down from infinity’

I projection of n-coalescent to [m], m < n, is m-coalescent

Universality of Kingman’s coalescent

The ancestral processes of a broad class of population modelsconverge to Π [n] in the large population limit.

Cannings model

I population model given by (ν1, . . . , νN ), where νi areexchangeable integer-valued random variables with

∑νi = N

I interpret νi as the number of offspring left by individual i fromthe previous generation

Mohle’s lemma (2000)

If

limn→∞

Φ1(3)

Φ1(2)= 0

in a Cannings model, then the genealogy of sample of thepopulation converges to the Kingman’s coalescent.

Here,

Φ1(3) =E (ν1(ν1 − 1)(ν1 − 2))

(N − 1)(N − 2)

Φ1(2) =E (ν1(ν1 − 1))

N − 1

Mutation is a Poisson process on top of the coalescent

1 2 3 4

x

x

x

x

I Poisson rate of θ2 where θ = 2Nµ and

µ is the mutation per individual pergeneration

I interpretation depends on mutationprocess assumed

I infinite alleles modelI infinite sites model

Mutation is a Poisson process on top of the coalescent

1 2 3 4

x

x

x

x

I Poisson rate of θ2 where θ = 2Nµ and

µ is the mutation per individual pergeneration

I interpretation depends on mutationprocess assumed

I infinite alleles modelI infinite sites model

Mutation is a Poisson process on top of the coalescent

1 2 3 4

x

x

x

x

I Poisson rate of θ2 where θ = 2Nµ and

µ is the mutation per individual pergeneration

I interpretation depends on mutationprocess assumed

I infinite alleles modelI infinite sites model

Mutation is a Poisson process on top of the coalescent

1 2 3 4

x

x

x

x

I Poisson rate of θ2 where θ = 2Nµ and

µ is the mutation per individual pergeneration

I interpretation depends on mutationprocess assumed

I infinite alleles modelI infinite sites model

Mutation is a Poisson process on top of the coalescent

1 2 3 4

x

x

x

x

I Poisson rate of θ2 where θ = 2Nµ and

µ is the mutation per individual pergeneration

I interpretation depends on mutationprocess assumed

I infinite alleles modelI infinite sites model

Infinite alleles

Each mutation leads to a distinct allele (type)Before sequencing, had data of the form

a1 = 18 a2 = 3 a4 = 1 a32 = 1

called allelic partitions. (a2 = 3 means there are 3 alleles which arefound in exactly 2 individuals)

Ewens’s sampling formula (1972)Let {B1, . . . , Bk} be an allelic partition induced by an n-coalescentand a Poisson( θ2) mutation process. Then,

Pθ,n({B1, . . . , Bk}) =θk

θ(θ + 1) · · · (θ + n− 1)

k∏i=1

(|Bi| − 1)!

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Kingman’s coalescent is not appropriate for all systems

I population bottleneck

I selective sweep from beneficial mutations

I large variability in offspring distribution

Λ-coalescent allows for ≥ 2 blocks to coalesceSpecified by λb,k, the rate at which a particular k-tuple of blocks(out of b blocks) merges.

Pitman (1999) showed that coalescent processes associated with(λb,k)2≤k≤b can be uniquely characterized by finite measure Λ,where

λb,k =

∫ 1

0xk−2(1− x)b−kΛ(dx)

Kingman’s coalescent is not appropriate for all systems

I population bottleneck

I selective sweep from beneficial mutations

I large variability in offspring distribution

Λ-coalescent allows for ≥ 2 blocks to coalesceSpecified by λb,k, the rate at which a particular k-tuple of blocks(out of b blocks) merges.

Pitman (1999) showed that coalescent processes associated with(λb,k)2≤k≤b can be uniquely characterized by finite measure Λ,where

λb,k =

∫ 1

0xk−2(1− x)b−kΛ(dx)

Stay tuned

I Can the coalescent model handle recombination?

I What happens when we know the population size isn’tconstant?

I Is a forwards-in-time perspective ever advantageous?

References

I Berestycki, Recent Progress in Coalescent Theory, lecture notes, 2000

I Bertoin, Exchangeable Coalescents, Nachdiplom lectures, 2010

I Durrett, Probability Models for DNA Sequence Evolution, 2008

I Tavare, Ancestral Inference in Population Genetics, 2004

I Wakeley, Coalescent Theory: An Introduction, 2008

Recommended