46
Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014

Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mathematical models of population genetics

(I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans

January 21, 2014

Page 2: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again
Page 3: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again
Page 4: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again
Page 5: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

There and back again

I part I: basic models, construction of the coalescent,incorporating mutation

I part II: extensions of the coalescent to include recombination,demography

I part III: forward-time perspective, Wright-Fisher diffusion,selection and mutation

Page 6: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

There and back again

I part I: basic models, construction of the coalescent,incorporating mutation

I part II: extensions of the coalescent to include recombination,demography

I part III: forward-time perspective, Wright-Fisher diffusion,selection and mutation

Page 7: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Wright-Fisher model (1930s)

I discrete, non-overlappinggenerations

I constant population size, N

I individuals pick parent uniformlyat random from previousgeneration

I no types, mutation, selection, orrecombination (for now)

Page 8: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Wright-Fisher model (1930s)

I discrete, non-overlappinggenerations

I constant population size, N

I individuals pick parent uniformlyat random from previousgeneration

I no types, mutation, selection, orrecombination (for now)

Page 9: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Wright-Fisher model (1930s)

I discrete, non-overlappinggenerations

I constant population size, N

I individuals pick parent uniformlyat random from previousgeneration

I no types, mutation, selection, orrecombination (for now)

Page 10: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Wright-Fisher model (1930s)

I discrete, non-overlappinggenerations

I constant population size, N

I individuals pick parent uniformlyat random from previousgeneration

I no types, mutation, selection, orrecombination (for now)

1 2 3 4

Page 11: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

The ancestral process of the Wright-Fisher model

P(2 individuals have distinct parents) = 1− 1

N

P(2 have distinct ancestors for k generations) =

(1− 1

N

)kP(l have distinct ancestors for k generations)

=

(1− 1

N

)k. . .

(1− l − 1

N

)k→ e−

l(l−1)2

t as N →∞

where k generations corresponds to t = kN .

Time to coalescence in a sample of l individuals ∼ Exp(l(l−1)

2

).

Page 12: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

The ancestral process of the Wright-Fisher model

P(2 individuals have distinct parents) = 1− 1

N

P(2 have distinct ancestors for k generations) =

(1− 1

N

)kP(l have distinct ancestors for k generations)

=

(1− 1

N

)k. . .

(1− l − 1

N

)k→ e−

l(l−1)2

t as N →∞

where k generations corresponds to t = kN .

Time to coalescence in a sample of l individuals ∼ Exp(l(l−1)

2

).

Page 13: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

The ancestral process of the Wright-Fisher model

P(2 individuals have distinct parents) = 1− 1

N

P(2 have distinct ancestors for k generations) =

(1− 1

N

)kP(l have distinct ancestors for k generations)

=

(1− 1

N

)k. . .

(1− l − 1

N

)k→ e−

l(l−1)2

t as N →∞

where k generations corresponds to t = kN .

Time to coalescence in a sample of l individuals ∼ Exp(l(l−1)

2

).

Page 14: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4

Page 15: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

1 2 3 4 1 2 3 4

1 2 3 4 1 2 3 4

Page 16: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

1 2 3 4 1 2 3 4 1 2 3 4

1 2 3 4

Page 17: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Page 18: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Kingman’s coalescent (1982)

1 2 3 4

The ‘n-coalescent’ is a continuous timestochastic process,

Π [n] =(Π [n](t)

)t≥0

on the space, P[n], of partitions of[n] := {1, . . . , n}.

{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}{{1, 2, 3, 4}}

Page 19: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Kingman’s coalescent (1982)

1 2 3 4

The ‘n-coalescent’ is a continuous timestochastic process,

Π [n] =(Π [n](t)

)t≥0

on the space, P[n], of partitions of[n] := {1, . . . , n}.

{{1}, {2}, {3}, {4}}

{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}{{1, 2, 3, 4}}

Page 20: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Kingman’s coalescent (1982)

1 2 3 4

The ‘n-coalescent’ is a continuous timestochastic process,

Π [n] =(Π [n](t)

)t≥0

on the space, P[n], of partitions of[n] := {1, . . . , n}.

{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}

{{1, 2}, {3, 4}}{{1, 2, 3, 4}}

Page 21: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Kingman’s coalescent (1982)

1 2 3 4

The ‘n-coalescent’ is a continuous timestochastic process,

Π [n] =(Π [n](t)

)t≥0

on the space, P[n], of partitions of[n] := {1, . . . , n}.

{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}

{{1, 2, 3, 4}}

Page 22: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Kingman’s coalescent (1982)

1 2 3 4

The ‘n-coalescent’ is a continuous timestochastic process,

Π [n] =(Π [n](t)

)t≥0

on the space, P[n], of partitions of[n] := {1, . . . , n}.

{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}{{1, 2, 3, 4}}

Page 23: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mathematical description

Initial condition Π [n](0) = {{1}, . . . , {n}}

Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise

Absorbing state {{1, . . . , n}}

Remarks:

I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)

2

I sequence of partitions of Π [n] is independent of #Π [n]

Page 24: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mathematical description

Initial condition Π [n](0) = {{1}, . . . , {n}}

Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise

Absorbing state {{1, . . . , n}}

Remarks:

I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)

2

I sequence of partitions of Π [n] is independent of #Π [n]

Page 25: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mathematical description

Initial condition Π [n](0) = {{1}, . . . , {n}}

Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise

Absorbing state {{1, . . . , n}}

Remarks:

I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)

2

I sequence of partitions of Π [n] is independent of #Π [n]

Page 26: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mathematical description

Initial condition Π [n](0) = {{1}, . . . , {n}}

Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise

Absorbing state {{1, . . . , n}}

Remarks:

I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)

2

I sequence of partitions of Π [n] is independent of #Π [n]

Page 27: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Some cute properties

I closed-form expression for P(Π [n](t) = π

), where π ∈ P[n]

I ‘comes down from infinity’

I projection of n-coalescent to [m], m < n, is m-coalescent

Page 28: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Universality of Kingman’s coalescent

The ancestral processes of a broad class of population modelsconverge to Π [n] in the large population limit.

Cannings model

I population model given by (ν1, . . . , νN ), where νi areexchangeable integer-valued random variables with

∑νi = N

I interpret νi as the number of offspring left by individual i fromthe previous generation

Page 29: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mohle’s lemma (2000)

If

limn→∞

Φ1(3)

Φ1(2)= 0

in a Cannings model, then the genealogy of sample of thepopulation converges to the Kingman’s coalescent.

Here,

Φ1(3) =E (ν1(ν1 − 1)(ν1 − 2))

(N − 1)(N − 2)

Φ1(2) =E (ν1(ν1 − 1))

N − 1

Page 30: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mutation is a Poisson process on top of the coalescent

1 2 3 4

x

x

x

x

I Poisson rate of θ2 where θ = 2Nµ and

µ is the mutation per individual pergeneration

I interpretation depends on mutationprocess assumed

I infinite alleles modelI infinite sites model

Page 31: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mutation is a Poisson process on top of the coalescent

1 2 3 4

x

x

x

x

I Poisson rate of θ2 where θ = 2Nµ and

µ is the mutation per individual pergeneration

I interpretation depends on mutationprocess assumed

I infinite alleles modelI infinite sites model

Page 32: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mutation is a Poisson process on top of the coalescent

1 2 3 4

x

x

x

x

I Poisson rate of θ2 where θ = 2Nµ and

µ is the mutation per individual pergeneration

I interpretation depends on mutationprocess assumed

I infinite alleles modelI infinite sites model

Page 33: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mutation is a Poisson process on top of the coalescent

1 2 3 4

x

x

x

x

I Poisson rate of θ2 where θ = 2Nµ and

µ is the mutation per individual pergeneration

I interpretation depends on mutationprocess assumed

I infinite alleles modelI infinite sites model

Page 34: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Mutation is a Poisson process on top of the coalescent

1 2 3 4

x

x

x

x

I Poisson rate of θ2 where θ = 2Nµ and

µ is the mutation per individual pergeneration

I interpretation depends on mutationprocess assumed

I infinite alleles modelI infinite sites model

Page 35: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Infinite alleles

Each mutation leads to a distinct allele (type)Before sequencing, had data of the form

a1 = 18 a2 = 3 a4 = 1 a32 = 1

called allelic partitions. (a2 = 3 means there are 3 alleles which arefound in exactly 2 individuals)

Ewens’s sampling formula (1972)Let {B1, . . . , Bk} be an allelic partition induced by an n-coalescentand a Poisson( θ2) mutation process. Then,

Pθ,n({B1, . . . , Bk}) =θk

θ(θ + 1) · · · (θ + n− 1)

k∏i=1

(|Bi| − 1)!

Page 36: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Page 37: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Page 38: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Page 39: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Page 40: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Page 41: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Page 42: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Infinite sites

Each mutation occurs on a distinct site (eg, nucleotide)

1 2 3 4

x u1

x u2

x u3

x u4

u1 u2 u3 u4

1 0 0 1 0

2 0 0 1 0

3 1 1 0 0

4 1 1 0 1

Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.

Page 43: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Kingman’s coalescent is not appropriate for all systems

I population bottleneck

I selective sweep from beneficial mutations

I large variability in offspring distribution

Λ-coalescent allows for ≥ 2 blocks to coalesceSpecified by λb,k, the rate at which a particular k-tuple of blocks(out of b blocks) merges.

Pitman (1999) showed that coalescent processes associated with(λb,k)2≤k≤b can be uniquely characterized by finite measure Λ,where

λb,k =

∫ 1

0xk−2(1− x)b−kΛ(dx)

Page 44: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Kingman’s coalescent is not appropriate for all systems

I population bottleneck

I selective sweep from beneficial mutations

I large variability in offspring distribution

Λ-coalescent allows for ≥ 2 blocks to coalesceSpecified by λb,k, the rate at which a particular k-tuple of blocks(out of b blocks) merges.

Pitman (1999) showed that coalescent processes associated with(λb,k)2≤k≤b can be uniquely characterized by finite measure Λ,where

λb,k =

∫ 1

0xk−2(1− x)b−kΛ(dx)

Page 45: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

Stay tuned

I Can the coalescent model handle recombination?

I What happens when we know the population size isn’tconstant?

I Is a forwards-in-time perspective ever advantageous?

Page 46: Mathematical models of population genetics · Mathematical models of population genetics (I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans January 21, 2014. There and back again

References

I Berestycki, Recent Progress in Coalescent Theory, lecture notes, 2000

I Bertoin, Exchangeable Coalescents, Nachdiplom lectures, 2010

I Durrett, Probability Models for DNA Sequence Evolution, 2008

I Tavare, Ancestral Inference in Population Genetics, 2004

I Wakeley, Coalescent Theory: An Introduction, 2008