74
Week 9: Coalescents, part 2 Genome 562 March, 2013 Week 9: Coalescents, part 2 – p.1/74

Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Week 9: Coalescents, part 2

Genome 562

March, 2013

Week 9: Coalescents, part 2 – p.1/74

Page 2: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Gene copies in a population of 10 individuals

Time

A random−mating population

Week 9: Coalescents, part 2 – p.2/74

Page 3: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Going back one generation

Time

A random−mating population

Week 9: Coalescents, part 2 – p.3/74

Page 4: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and one more

Time

A random−mating population

Week 9: Coalescents, part 2 – p.4/74

Page 5: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and one more

Time

A random−mating population

Week 9: Coalescents, part 2 – p.5/74

Page 6: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and one more

Time

A random−mating population

Week 9: Coalescents, part 2 – p.6/74

Page 7: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and one more

Time

A random−mating population

Week 9: Coalescents, part 2 – p.7/74

Page 8: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and one more

Time

A random−mating population

Week 9: Coalescents, part 2 – p.8/74

Page 9: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and one more

Time

A random−mating population

Week 9: Coalescents, part 2 – p.9/74

Page 10: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and one more

Time

A random−mating population

Week 9: Coalescents, part 2 – p.10/74

Page 11: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and one more

Time

A random−mating population

Week 9: Coalescents, part 2 – p.11/74

Page 12: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and one more

Time

A random−mating population

Week 9: Coalescents, part 2 – p.12/74

Page 13: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and one more

Time

A random−mating population

Week 9: Coalescents, part 2 – p.13/74

Page 14: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

The genealogy of gene copies is a tree

Time

Genealogy of gene copies, after reordering the copies

Week 9: Coalescents, part 2 – p.14/74

Page 15: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Ancestry of a sample of 3 copies

Time

Genealogy of a small sample of genes from the population

Week 9: Coalescents, part 2 – p.15/74

Page 16: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Here is that tree of 3 copies in the pedigree

Time

Week 9: Coalescents, part 2 – p.16/74

Page 17: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Kingman’s coalescent

Random collision of lineages as go back in time (sans recombination)

Collision is faster the smaller the effective population size

u9

u7

u5

u3

u8

u6

u4

u2

Average time for n

Average time for

copies to coalesce to

4N

k(k−1) k−1 =

In a diploid population of

effective population size N,

copies to coalesce

= 4N (1 − 1n ( generations

k

Average time for

two copies to coalesce

= 2N generations

What’s misleading about this diagram: the lineages that coalesce arerandom pairs, not necessarily ones that are next to each other in a linearorder.

Week 9: Coalescents, part 2 – p.17/74

Page 18: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

The Wright-Fisher model

This is the canonical model of genetic drift in populations. It was inventedin 1930 and 1932 by Sewall Wright and R. A. Fisher.

In this model the next generation is produced by doing this:

Choose two individuals with replacement (including the possibility thatthey are the same individual) to be parents,

Each produces one gamete, these become a diploid individual,

Repeat these steps until N diploid individuals have been produced.

The effect of this is to have each locus in an individual in the nextgeneration consist of two genes sampled from the parents’ generation atrandom, with replacement.

Week 9: Coalescents, part 2 – p.18/74

Page 19: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Sir John Kingman

J. F. C. Kingman in about 1983

Currently Emeritus Professor of Mathematics at Cambridge University,U.K., and former head of the Isaac Newton Institute of MathematicalSciences.

Week 9: Coalescents, part 2 – p.19/74

Page 20: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

The coalescent – a derivation

The probability that k lineages becomes k − 1 one generation earlierturns out to be (as each lineage “chooses” its ancestor independently):

k(k − 1)/2 × Prob (First two have same parent, rest are different)

(since there are(k2

)= k(k − 1)/2 different pairs of copies)

We add up terms, all the same, for the k(k − 1)/2 pairs that couldcoalesce; the sum is:

k(k − 1)/2 × 1 × 12N

×(1 − 1

2N

)

×(1 − 2

2N

)× · · · ×

(1 − k−2

2N

)

so that the total probability that a pair coalesces is

= k(k − 1)/4N + O(1/N2)

Week 9: Coalescents, part 2 – p.20/74

Page 21: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Can probabilities of two or more lineages coalescing

Note that the total probability that some combination of lineagescoalesces is

1 − Prob (Probability all genes have separate ancestors)

= 1 −

[

1 ×

(

1 −1

2N

) (

1 −2

2N

)

. . .

(

1 −k − 1

2N

)]

= 1 −

[

1 −1 + 2 + 3 + · · · + (k − 1)

2N+ O(1/N2)

]

and since1 + 2 + 3 + . . . + (n − 1) = n(n − 1)/2

the quantity

= 1 −[

1 − k(k − 1)/4N + O(1/N2)]≃ k(k − 1)/4N + O(1/N2)

Week 9: Coalescents, part 2 – p.21/74

Page 22: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Can calculate how many coalescences are of pairs

This shows, since the terms of order 1/N are the same, that the eventsinvolving 3 or more lineages simultaneously coalescing are in the terms oforder 1/N2 and thus become unimportant if N is large.

Here are the probabilities of 0, 1, or more coalescences with 10 lineagesin populations of different sizes:

N 0 1 > 1

100 0.79560747 0.18744678 0.016945751000 0.97771632 0.02209806 0.00018562

10000 0.99775217 0.00224595 0.00000187

Note that increasing the population size by a factor of 10 reduces thecoalescent rate for pairs by about 10-fold, but reduces the rate for triples(or more) by about 100-fold.

Week 9: Coalescents, part 2 – p.22/74

Page 23: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

The coalescentTo simulate a random genealogy, do the following:

1. Start with k lineages

2. Draw an exponential time interval with mean 4N/(k(k − 1))generations.

3. Combine two randomly chosen lineages.

4. Decrease k by 1.

5. If k = 1, then stop

6. Otherwise go back to step 2.

Week 9: Coalescents, part 2 – p.23/74

Page 24: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

An accurate analogy: Bugs In A Box

There is a box ...

Week 9: Coalescents, part 2 – p.24/74

Page 25: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

An accurate analogy: Bugs In A Box

with bugs that are ...

Week 9: Coalescents, part 2 – p.25/74

Page 26: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

An accurate analogy: Bugs In A Box

hyperactive, ...

Week 9: Coalescents, part 2 – p.26/74

Page 27: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

An accurate analogy: Bugs In A Box

indiscriminate, ...

Week 9: Coalescents, part 2 – p.27/74

Page 28: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

An accurate analogy: Bugs In A Box

voracious ...

Week 9: Coalescents, part 2 – p.28/74

Page 29: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

An accurate analogy: Bugs In A Box

(eats other bug) ...

Gulp!

Week 9: Coalescents, part 2 – p.29/74

Page 30: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

An accurate analogy: Bugs In A Box

and insatiable.

Week 9: Coalescents, part 2 – p.30/74

Page 31: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Random coalescent trees with 16 lineages

O C S M L P K E J I T R H Q F B N D G A M J B F G C E R A S Q K N L H T I P D O B G T M L Q D O F K P E A I J S C H R N F R N L M D H B T C Q S O G P I A K J E

I Q C A J L S G P F O D H B M E T R K N R C L D K H O Q F M B G S I T P A J E N N M P R H L E S O F B G J D C I T K Q A N H M C R P G L T E D S O I K J Q F A B

Week 9: Coalescents, part 2 – p.31/74

Page 32: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Coalescence is faster in small populations

Change of population size and coalescents

Ne

time

the changes in population size will produce waves of coalescence

time

Coalescence events

time

the tree

The parameters of the growth curve for Ne can be inferred bylikelihood methods as they affect the prior probabilities of those treesthat fit the data.

Week 9: Coalescents, part 2 – p.32/74

Page 33: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Migration can be taken into account

Time

population #1 population #2Week 9: Coalescents, part 2 – p.33/74

Page 34: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Recombination creates loops

Recomb.

Different markers have slightly different coalescent trees

Week 9: Coalescents, part 2 – p.34/74

Page 35: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

If we have a sample of 50 copies

50−gene sample in a coalescent tree

Week 9: Coalescents, part 2 – p.35/74

Page 36: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

The first 10 account for most of the branch length

10 genes sampled randomly out of a

50−gene sample in a coalescent tree

Week 9: Coalescents, part 2 – p.36/74

Page 37: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

... and when we add the other 40 they add less length

10 genes sampled randomly out of a

50−gene sample in a coalescent tree

(purple lines are the 10−gene tree)Week 9: Coalescents, part 2 – p.37/74

Page 38: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Cann, Stoneking, and Wilson

Becky Cann Mark Stoneking the late Allan Wilson

Cann, R. L., M. Stoneking, and A. C. Wilson. 1987. Mitochondrial DNAand human evolution. Nature 325:a 31-36.

Week 9: Coalescents, part 2 – p.38/74

Page 39: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Mitochondrial Eve

Week 9: Coalescents, part 2 – p.39/74

Page 40: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

We want to be able to analyze human evolution

Africa

Europe Asia

"Out of Africa" hypothesis

(vertical scale is not time or evolutionary change)

Week 9: Coalescents, part 2 – p.40/74

Page 41: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

Week 9: Coalescents, part 2 – p.41/74

Page 42: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

Week 9: Coalescents, part 2 – p.42/74

Page 43: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

Week 9: Coalescents, part 2 – p.43/74

Page 44: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

Week 9: Coalescents, part 2 – p.44/74

Page 45: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

Week 9: Coalescents, part 2 – p.45/74

Page 46: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

coalescent and “gene trees” versus species trees

Consistency of gene tree with species tree

coalescence time

Week 9: Coalescents, part 2 – p.46/74

Page 47: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

If the branch is more than Ne generations long ...

t1

t2

N1

N2

N4

N3

N5

Gene tree and Species tree

Week 9: Coalescents, part 2 – p.47/74

Page 48: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

If the branch is more than Ne generations long ...

t1

t2

N1

N2

N4

N3

N5

Gene tree and Species tree

Week 9: Coalescents, part 2 – p.48/74

Page 49: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

If the branch is more than Ne generations long ...

t1

t2

N1

N2

N4

N3

N5

Gene tree and Species tree

Week 9: Coalescents, part 2 – p.49/74

Page 50: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

How do we compute a likelihood for a population sample?

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCCCAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTTGGCGTCC

CAGTTTTGGCGTCCCAGTTTTGGCGTCC

CAGTTTTGGCGTCC

CAGTTTTGGCGTCC

CAGTTTCAGCGTAC

CAGTTTCAGCGTAC

CAGTTTCAGCGTAC

, CAGTTTCAGCGTCC CAGTTTCAGCGTCC ), ... L = Prob ( = ??

Week 9: Coalescents, part 2 – p.50/74

Page 51: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

If we have a tree for the sample sequences, we can

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTTAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTCAGCGTCC

CAGTTTTGGCGTCCCAGTTTTGGCGTCC

CAGTTTTGGCGTCC

CAGTTTTGGCGTCC

CAGTTTCAGCGTACCAGTTTCAGCGTAC

CAGTTTCAGCGTAC

CAGTTTCAGCGTCC

, CAGTTTCAGCGTCC CAGTTTCAGCGTCCProb( | Genealogy)

so we can compute

but how to computer the overall likelihood from this?

, ...

CAGTTTCAGCGTCC

CAGTTTTAGCGTCCCAGTTTTAGCGTCC

CAGTTTCAGCGTCCCAGTTTTGGCGTCC

CAGTTTCAGCGTCC

Week 9: Coalescents, part 2 – p.51/74

Page 52: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

The basic equation for coalescent likelihoods

In the case of a single population with parametersNe effective population sizeµ mutation rate per site

and assuming G′ stands for a coalescent genealogy and D for the

sequences,

L = Prob (D | Ne, µ)

=∑

G′

Prob (G′ | Ne) Prob (D | G′, µ)

︸ ︷︷ ︸ ︸ ︷︷ ︸

Kingman′s prior likelihood of tree

Week 9: Coalescents, part 2 – p.52/74

Page 53: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Rescaling the branch lengths

Rescaling branch lengths of G′ so that branches are given in expectedmutations per site, G = µG′ , we get (if we let Θ = 4Neµ )

L =∑

G

Prob (G | Θ) Prob (D | G)

as the fundamental equation. For more complex population scenarios onesimply replaces Θ with a vector of parameters.

Week 9: Coalescents, part 2 – p.53/74

Page 54: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

The variability comes from two sources

Ne

Necan reduce variability by looking at

(i) more gene copies, or

(2) Randomness of coalescence of lineages

affected by the

can reduce variance of

branch by examining more sites

number of mutations per site per

mutation rate(1) Randomness of mutation

affected by effective population size

coalescence times allow estimation of

µ

(ii) more loci

Week 9: Coalescents, part 2 – p.54/74

Page 55: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Computing the likelihood: averaging over coalescents

t

t

Like

lihoo

d of

t

Like

lihoo

d of

The product of the prior on t,

times the likelihood of that t from the data,

when integrated over all possible t’s, gives the

likelihood for the underlying parameter

The likelihood calculation in a sample of two gene copies

t

Θ

Prio

r P

rob

of t

Θ1

Θ

Θ

Week 9: Coalescents, part 2 – p.55/74

Page 56: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Computing the likelihood: averaging over coalescents

t

t

Like

lihoo

d of

t

Like

lihoo

d of

The product of the prior on t,

times the likelihood of that t from the data,

when integrated over all possible t’s, gives the

likelihood for the underlying parameter

The likelihood calculation in a sample of two gene copies

t

2ΘΘ

Prio

r P

rob

of t

Θ

Θ

Week 9: Coalescents, part 2 – p.56/74

Page 57: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Computing the likelihood: averaging over coalescents

t

t

Like

lihoo

d of

t

Like

lihoo

d of

The product of the prior on t,

times the likelihood of that t from the data,

when integrated over all possible t’s, gives the

likelihood for the underlying parameter

The likelihood calculation in a sample of two gene copies

t

Θ

Prio

r P

rob

of t

Θ

Θ

Week 9: Coalescents, part 2 – p.57/74

Page 58: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Computing the likelihood: averaging over coalescents

t

t

Like

lihoo

d of

t

Like

lihoo

d of

The product of the prior on t,

times the likelihood of that t from the data,

when integrated over all possible t’s, gives the

likelihood for the underlying parameter

The likelihood calculation in a sample of two gene copies

t

1Θ2

Θ

Θ

Prio

r P

rob

of t

Θ1

Θ

Θ

Week 9: Coalescents, part 2 – p.58/74

Page 59: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Labelled historiesLabelled Histories (Edwards, 1970; Harding, 1971)

Trees that differ in the time−ordering of their nodes

A B C D

A B C D

These two are the same:

A B C D

A B C D

These two are different:

Week 9: Coalescents, part 2 – p.59/74

Page 60: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Sampling approaches to coalescent likelihood

Bob Griffiths Simon Tavaré Mary Kuhner and Jon Yamato

Week 9: Coalescents, part 2 – p.60/74

Page 61: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Monte Carlo integration

To get the area under a curve, we can either evaluate the function (f(x)) ata series of grid points and add up heights × widths:

or we can sample at random the same number of points, add up height ×width:

Week 9: Coalescents, part 2 – p.61/74

Page 62: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Importance sampling

Week 9: Coalescents, part 2 – p.62/74

Page 63: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Importance sampling

The function we integrate

We sample from this density

f(x)

g(x)

Week 9: Coalescents, part 2 – p.63/74

Page 64: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

The math of importance sampling

∫f(x) dx =

∫ f(x)g(x) g(x) dx

= Eg

[f(x)g(x)

]

which is the expectation for points sampled from g(x) of the ratio f(x)g(x) .

This is approximated by sampling a lot (n) of points from g(x) and thecomputing the average:

L =1

n

n∑

i=1

f(xi)

g(xi)

Week 9: Coalescents, part 2 – p.64/74

Page 65: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

The importance function used in LAMARC

In Mary Kuhner and Jon Yamato’s program LAMARC they use as theimportance function the probability density of the tree given the data at aset of “driving values” θ0 of the parameters:

f(G) =Prob (D |G) Prob (G | θ0)

Prob (D | θ0)

The denominator is impossible to evaluate but as we will see, isn’t reallyneeded.

The resulting likelihood ratio is

L(Θ)

L(Θ0)=

1

n

n∑

i=1

Prob (Gi|Θ)

Prob (Gi|Θ0)

Week 9: Coalescents, part 2 – p.65/74

Page 66: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Markov Chain Monte Carlo (MCMC) methods

To do the importance sampling, MCMC methods are employed (in allprograms that do full likelihood or Bayesian analyses).

To sample from f(G), start with a tree Gold and

1. Have a “proposal distribution” from which you sample a new treeGnew

2. Compute the function f(Gnew) (we have that also for the old tree)

3. Draw a random fraction R between 0 and 1

4. If R < f(Gnew)f(Gold)

, accept the new tree. (Note that in that ratio any

horrible, but shared, denominators cancel out).

repeat this vast numbers of times (the correct number of times is infinity).

Week 9: Coalescents, part 2 – p.66/74

Page 67: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Rearrangement to sample points in tree space

A conditional coalescent rearrangement strategy

Week 9: Coalescents, part 2 – p.67/74

Page 68: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Dissolving a branch and regrowing it backwards

First pick a random node (interior or tip) and remove its subtree

Week 9: Coalescents, part 2 – p.68/74

Page 69: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

We allow it coalesce with the other branches

Then allow this node to re−coalesce with the tree

Week 9: Coalescents, part 2 – p.69/74

Page 70: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

and this gives another coalescent

The resulting tree proposed by this process

Week 9: Coalescents, part 2 – p.70/74

Page 71: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

An example of an MCMC likelihood curve

0

−10

−20

−30

−40

−50

−60

−70

−80

0.001 0.002 0.005 0.01 0.02 0.05 0.1

Θ

ln L

0.00650776

Results of analysing a data set with 50 sequences of 500 baseswhich was simulated with a true value of Θ = 0.01

Week 9: Coalescents, part 2 – p.71/74

Page 72: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Major MCMC likelihood or Bayesian programs

LAMARC by Mary Kuhner and Jon Yamato and others. Likelihoodinference with multiple populations, recombination, migration,population growth. No historical branching events or serialsampling, yet.

BEAST by Andrew Rambaut, Alexei Drummond and others.Bayesian inference with multiple populations related by a tree.Support for serial sampling (no migration or recombination yet).

genetree by Bob Griffiths and Melanie Bahlo. Likelihood inference ofmigration rates and changes in population size. No recombination orhistorical branching events.

migrate by Peter Beerli. Likelihood inference with multiplepopulations and migration rates. No recombination or historicalbranching events yet.

IM and IMa by Rasmus Nielsen and Jody Hey. Two or morepopulations allowing both historical splitting and migration after that.No recombination yet.

Week 9: Coalescents, part 2 – p.72/74

Page 73: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Approximately Bayesian Computation (ABC) methods

These involve approximating the sampling by computing some “summarystatistics” from the data, then finding parameter values that, in a simulationof a tree and data, result in summary statistic values close to these.

They are faster, and very popular now.

But ... they are very dependent on getting the right summary statistics soas not to lose too much power compared to fully-powerful likelihood orBayesian MCMC methods.

Week 9: Coalescents, part 2 – p.73/74

Page 74: Week 9: Coalescents, part 2 - University of Washingtonevolution.gs.washington.edu/gs562/2013/week9.pdfWeek 9: Coalescents, part 2 – p.22/74. The coalescent To simulate a random genealogy,

Fixation probabilities from the diffusion approximation

100

10

10.1

−0.1−1

−10

−100

Probability of fixation of an allele with multiplicative fitnesses. Results fromthe diffusion approximation for various values of 4Ns and p are shown.The values of 4Ns are shown next to the nine curves, except for thediagonal, which has 4Ns = 0. Week 9: Coalescents, part 2 – p.74/74