109
Coalescents and Genealogies Jay Taylor School of Mathematical and Statistical Sciences Center for Evolutionary Medicine and Informatics Arizona State University Jay Taylor (ASU) Coalescents 1 / 66

Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Coalescents and Genealogies

Jay Taylor

School of Mathematical and Statistical SciencesCenter for Evolutionary Medicine and Informatics

Arizona State University

Jay Taylor (ASU) Coalescents 1 / 66

Page 2: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

mtDNA genealogy of humans, Neanderthals and Denisovans

(Krause et al., 2010)

Jay Taylor (ASU) Coalescents 2 / 66

Page 3: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Epidemiological processes shape pathogen genealogies

Grenfell et al. (2004): Unifying the Epidemiological and Evolutionary Dynamics of Pathogens

Jay Taylor (ASU) Coalescents 3 / 66

Page 4: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Pedigrees vs. Genealogies

Pedigrees show genetic relationships between individuals. The number of ancestorsincreases backwards in time (two parents, four grand parents, etc.).

Genealogies (gene genealogies) show how individual copies of the same gene/locusare evolutionarily related. The number of ancestors decreases backwards in time.

Jay Taylor (ASU) Coalescents 4 / 66

Page 5: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

The Wright-Fisher Model: Birth and Death in FinitePopulations

Assumptions:

1 Non-overlapping generations: adults reproduce and then die.

2 Constant adult population size: exactly N haploid individuals survive to adulthood.

3 Neutrality: all individuals have the same fitness.

4 No mutation (for now).

5 Each individual alive in generation t gives birth to a very large number ofoffspring, each of which is equally likely to survive to reproductive maturity ingeneration t + 1.

Equivalently: each individual alive in generation t + 1 independently chooses its parentfrom the preceding generation uniformly at random and with replacement.

Jay Taylor (ASU) Coalescents 5 / 66

Page 6: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 7: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1

2 3 4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 8: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2

3 4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 9: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3

4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 10: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4

5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 11: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5

6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 12: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6

7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 13: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7

8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 14: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7 8

9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 15: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7 8 9

10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 16: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 17: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 18: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 19: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 10

2 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 20: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 21: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 22: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Simulation of the Wright-Fisher Model (N = 10)

gen 0 1gen 0 2gen 0 3gen 0 4gen 0 5gen 0 6gen 0 7gen 0 8gen 0 9gen 0 10

gen 1 1 2 3 4 5 6 7 8 9 10

gen 0 1

gen 1 1

gen 0 2

gen 1 2

gen 0 3

gen 1 3

gen 0 4

gen 1 4

gen 0 5

gen 1 5

gen 0 6

gen 1 6

gen 0 7

gen 1 7

gen 0 8

gen 1 8

gen 0 9

gen 1 9

gen 0 10

gen 1 10

gen 2 1gen 2 2gen 2 3gen 2 4gen 2 5gen 2 6gen 2 7gen 2 8gen 2 9gen 2 10

gen 3 1gen 3 2gen 3 3gen 3 4gen 3 5gen 3 6gen 3 7gen 3 8gen 3 9gen 3 10

gen 4 1gen 4 2gen 4 3gen 4 4gen 4 5gen 4 6gen 4 7gen 4 8gen 4 9gen 4 10

gen 5 1gen 5 2gen 5 3gen 5 4gen 5 5gen 5 6gen 5 7gen 5 8gen 5 9gen 5 10

gen 6 1gen 6 2gen 6 3gen 6 4gen 6 5gen 6 6gen 6 7gen 6 8gen 6 9gen 6 10

gen 7 1gen 7 2gen 7 3gen 7 4gen 7 5gen 7 6gen 7 7gen 7 8gen 7 9gen 7 102 5 7

3 4 7

3 7

3 6

3 5

5

8 9 10

9 10

9 10

7 9

6 7

7 8

7 8

8

Jay Taylor (ASU) Coalescents 6 / 66

Page 23: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Pairwise Coalescent Times

Suppose that two individuals are sampled at random and let Tmrca be the number ofgenerations back to their most recent common ancestor. Our objective is todetermine the distribution of Tmrca.

We begin by calculating the probability that Tmrca = 1, i.e., that the two individualsshare the same parent. Since there are N possible parents, we have:

P(Tmrca = 1) =N∑i=1

P(they both have individual i as a parent)

=N∑i=1

1

N2=

1

N.

gen -1 1

gen 0 1

gen -1 2

gen 0 2

gen -1 3

gen 0 3

gen -1 4

gen 0 4

gen -1 5

gen 0 5

gen -1 6

gen 0 6

gen -1 7

gen 0 7

gen -1 8

gen 0 8

gen -1 9

gen 0 9

gen -1 10

gen 0 103 5

4

Jay Taylor (ASU) Coalescents 7 / 66

Page 24: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

To find the probability that Tmrca = 2, notice that two conditions must be satisfied:

1 The sampled individuals must have different parents: Tmrca > 1;

2 Those two parents must have the same parent.

Consequently, by conditioning on the first event, we obtain

P(Tmrca = 2) = P(Tmrca > 1)× P(Tmrca = 2|Tmrca > 1)

=

(1− 1

N

)× 1

N.

gen -2 1

gen -1 1

gen 0 1

gen -2 2

gen -1 2

gen 0 2

gen -2 3

gen -1 3

gen 0 3

gen -2 4

gen -1 4

gen 0 4

gen -2 5

gen -1 5

gen 0 5

gen -2 6

gen -1 6

gen 0 6

gen -2 7

gen -1 7

gen 0 7

gen -2 8

gen -1 8

gen 0 8

gen -2 9

gen -1 9

gen 0 9

gen -2 10

gen -1 10

gen 0 103 5

4 7

6

Jay Taylor (ASU) Coalescents 8 / 66

Page 25: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

By similar arguments, we can show that for any t ≥ 1, we have

Distribution of Tmrca

P(Tmrca = t) =

(1− 1

N

)t−11

N.

In other words, Tmrca is a geometrically-distributed random variable with parameter1/N. Furthermore, the mean and the variance of Tmrca are:

Moments of Tmrca

E [Tmrca] = N

Var (Tmrca) = N(N − 1)

Jay Taylor (ASU) Coalescents 9 / 66

Page 26: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Distribution of Pairwise Coalescent Times

0 1000 2000 3000 40000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

−3

Tmrca

density

N =200N =500N =1000

0

500

1000

1500

2000

2500

3000

3500

4000

200 500 1000

N

Jay Taylor (ASU) Coalescents 10 / 66

Page 27: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Genealogies of samples of arbitrary size

Now suppose that we have sampled k > 2 individuals at random from the population.In this case, our task is to describe the distribution of the random genealogy G of thesample. As in the previous section, it suffices to consider what can happen in a singlegeneration. There are three possibilities:

1 None of the k individuals share a common ancestor in the previous generation.

2 Exactly two of the k individuals share a common ancestor in the previousgeneration. In this case, we say that their lineages coalesce.

3 More than two of the individuals share a common ancestor in the previousgeneration.

Throughout we will assume that the population size N is large, say N > 100, and thatthe sample size k is small relative to N, say k < N/10.

Jay Taylor (ASU) Coalescents 11 / 66

Page 28: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

For none of the k individuals to share a common ancestor in the previous generation,each one must ‘choose’ a different parent. Since there are N choices for the firstindividual, N − 1 choices for the second individual, N − 2 choices for the third and soforth, the probability of this event is:

P(no common ancestors) =N

N

N − 1

N

N − 2

N· · · N − (k − 1)

N

= 1

(1− 1

N

)(1− 2

N

)· · ·(

1− k − 1

N

)= 1− k(k − 1)

2

1

N+ O

(1

N2

).

gen -1 1

gen 0 1

gen -1 2

gen 0 2

gen -1 3

gen 0 3

gen -1 4

gen 0 4

gen -1 5

gen 0 5

gen -1 6

gen 0 6

gen -1 7

gen 0 7

gen -1 8

gen 0 8

gen -1 9

gen 0 9

gen -1 10

gen 0 10

gen -1 11

gen 0 11

gen -1 12

gen 0 123 5 6 11

2 6 8 11

Jay Taylor (ASU) Coalescents 12 / 66

Page 29: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Similarly, for exactly two of the k individuals to share a common ancestor in the previousgeneration, those two must choose the same parent, whilst each of the remaining k − 2individuals must choose a different parent. Since there are

(k2

)= k(k − 1)/2 distinct

pairs of sampled individuals, it follows that the probability of this event is:

P(one pair of sibs) =k(k − 1)

2

1

N

N − 1

N

N − 2

N· · · N − (k − 2)

N

=k(k − 1)

2

1

N+ O

(1

N2

).

Furthermore, each pair of individuals is equally likely to be the one with the commonancestor.

gen -1 1

gen 0 1

gen -1 2

gen 0 2

gen -1 3

gen 0 3

gen -1 4

gen 0 4

gen -1 5

gen 0 5

gen -1 6

gen 0 6

gen -1 7

gen 0 7

gen -1 8

gen 0 8

gen -1 9

gen 0 9

gen -1 10

gen 0 10

gen -1 11

gen 0 11

gen -1 12

gen 0 123 5 6 11

2 6 11

Jay Taylor (ASU) Coalescents 13 / 66

Page 30: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

For more than two individuals to share a common ancestor in the previous generation,at least one of the following events must occur:

Multiple merger: three or more individuals share the same common ancestor;

Simultaneous mergers: there are two or more distinct pairs of sibs.

However, it can be shown that the collective probability of all of these events is of orderO(1/N2). Since this is small relative to

(k2

)1N

when N is much larger than k, it can alsobe shown that the sample genealogy is unlikely to contain any complex mergers, i.e., Gis likely to be strictly bifurcating.

gen -1 1

gen 0 1

gen -1 2

gen 0 2

gen -1 3

gen 0 3

gen -1 4

gen 0 4

gen -1 5

gen 0 5

gen -1 6

gen 0 6

gen -1 7

gen 0 7

gen -1 8

gen 0 8

gen -1 9

gen 0 9

gen -1 10

gen 0 10

gen -1 11

gen 0 11

gen -1 12

gen 0 123 5 6 9 10 11 12

4 10 11

Jay Taylor (ASU) Coalescents 14 / 66

Page 31: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

Let us write τk for the random time until some pair of lineages ancestral to our samplefirst finds a common ancestor. In the jargon of the field, we say that the two lineagescoalesce. If we ignore complex mergers, then there are two possibilities in eachgeneration:

1 With probability(k2

)1N

, some pair of lineages coalesces. Furthermore, that

pair is chosen uniformly at random from among the(k2

)pairs.

2 Otherwise, with probability 1−(k2

)1N

, each ancestral lineage is extended onemore generation into the past.

Since the parent-child relationships are chosen independently in each generation, itfollows that τk is a geometric random variable with parameter

(k2

)1N

, i.e.,

Distribution of τn

P(τk = t) =

(1−

(k

2

)1

N

)t−1(k

2

)1

N.

Jay Taylor (ASU) Coalescents 15 / 66

Page 32: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

So far we have described the genealogy up until the first pairwise coalescent event.However, because of the inter-generational independence of the Wright-Fisher model,this actually characterizes the entire genealogy:

1 The first coalescent event occurs after time τk , at which point the number oflineages ancestral to the sample is reduced from k to k − 1.

2 The next coalescent event occurs after an additional time τk−1, which isindependent of τk and which has distribution

P(τk−1 = t) ≈

(1−

((k − 1)

2

)1

N

)t−1((k − 1)

2

)1

N.

3 In general, once the genealogy contains i ancestral lineages, the time until thenext pairwise coalescent event is τi , with distribution

P(τi = t) ≈

(1−

(i

2

)1

N

)t−1(i

2

)1

N

Furthermore, τi is independent of τk , τk−1, · · · , τi+1.

4 The process terminates when only one ancestral lineage remains.

Jay Taylor (ASU) Coalescents 16 / 66

Page 33: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

There is another approximation that can be made when N is large. In this case, thediscreteness of the coalescent time τk is not important and we can approximate itsdistribution by an exponential distribution:

Distribution of τk : exponential approximation

P(τk > t) ≈ exp

(−

(k

2

)t

N

).

In fact, if we let τ(N)k = 1

Nτk be the coalescent time measured in units of N generations,

then it can be shown that the variables τ(N)k converge in distribution to an exponential

variable:

limN→∞

P(τ(N)k > t) = exp

(−

(k

2

)t

).

Jay Taylor (ASU) Coalescents 17 / 66

Page 34: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Derivation

These observations lead to the following approximate description of the genealogy of arandom sample of k individuals from a haploid population of size N. As shown inKingman (1982a, b), this description is exact in the limit as N →∞.

Kingman’s CoalescentThe sample genealogy Gn can be realized by running the following continuous-timeMarkov chain until only one ancestral lineage remains.

1 Simulate a random time τk for the first coalescent event using the exponentialdistribution with rate

(k2

)/N:

P(τk > t) = exp

(−

(k

2

)t

N

)

2 Choose two of the individuals in the sample uniformly at random and draw the toppart of the tree so that these two share a MRCA at time τk in the past.

3 If k > 1, then replace k by k − 1 and go back to step 1. If k = 1, then the MRCAhas been reached and the process terminates.

Jay Taylor (ASU) Coalescents 18 / 66

Page 35: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Properties

Random genealogies for samples of 20 individuals

Jay Taylor (ASU) Coalescents 19 / 66

Page 36: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Properties

Although genealogies generated under Kingman’s coalescent are random, they tend tohave a characteristic shape in which recent branches are much shorter than ancientbranches. Here are two examples for a sample containing k = 100 individuals:

Jay Taylor (ASU) Coalescents 20 / 66

Page 37: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Properties

The shape of these trees is a consequence of the combinatorial properties of Kingman’scoalescent. In particular, the mean branch length is inversely proportional to the squareof the number of coexisting lineages.

Branch length statistics

E[τk ] =2N

k(k − 1)

Var(τk) =4N2

k2(k − 1)2

0 5 10 15 20 25 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

numberof lineages

E[τ

k]/N

Mean Branch Length in theCoalescent

Time in units of N generations

Jay Taylor (ASU) Coalescents 21 / 66

Page 38: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Properties

The quadratic decrease in the branch length has a profound influence on the distributionof the time to the most recent common ancestor, which is equal to the sum of thebranch lengths:

T (k)mrca = τk + τk−1 + · · ·+ τ2

In fact, it can be shown that the variables τk decrease so rapidly that both the meanand the variance of T

(k)mrca/N have finite positive limits as k →∞.

Mean and variance of T(k)mrca

E[T (k)mrca] = 2N

(1− 1

k

)k→∞−→ 2N

Var(T (k)mrca) = 4N2

k∑i=2

1

i2(i − 1)2

k→∞−→ 1.16N2.

Jay Taylor (ASU) Coalescents 22 / 66

Page 39: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Properties

The rapid coalescence of terminal branches also means that the variability of T(n)mrca is

not appreciably reduced once the sample size exceeds 20.

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

samplesize

Tmrca

/Nas afunctionof samplesize

mean

variance

CV

2 5 10 20 50 1000

1

2

3

4

5

6

7

8

samplesize

DistributionofTmrca(n) /N

Jay Taylor (ASU) Coalescents 23 / 66

Page 40: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Kingman’s Coalescent: Properties

Kingman’s Coalescent: Summary of Properties

1 Kingman’s coalescent is a continuous-time Markov chain which characterizesthe genealogy of a random sample from a Wright-Fisher population.

2 The rate of coalescence is inversely proportional to the population size N.

3 Because the rate of coalescence is proportional to the square of the numberof branches in the genealogy, branches near the top of the tree tend to bemuch shorter than branches near the base.

4 The time to the most recent common ancestor only weakly depends on thesample size.

5 Because the mean and the standard deviation of T(n)mrca are of comparable

magnitude, it is not uncommon for unlinked loci to have very different timesto their most recent common ancestors.

Jay Taylor (ASU) Coalescents 24 / 66

Page 41: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Robustness and the Scope of Kingman’s Coalescent

Although it was convenient to use the Wright-Fisher model in the derivation, theimportance of Kingman’s coalescent stems from the fact that it is relevant to a muchlarger class of population genetical models satisfying these conditions:

1 The population size is constant.

2 The population is panmictic.

3 Evolution is neutral.

4 The within-generation fecundity variance is not too large.

5 The sample size is small relative to the population size.

Under these conditions, it can be shown that by choosing an appropriate scaling oftime the distribution of the genealogy of a random sample can be approximated byKingman’s coalescent. What changes is the relationship between the pairwise coalescentrate and the population size.

Jay Taylor (ASU) Coalescents 25 / 66

Page 42: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Galton-Watson Processes with Population Regulation

We first consider a class of models that combine the generality of the reproductivemechanism of a Galton-Watson branching process with the population regulation stepof the Wright-Fisher model.

1 Non-overlapping generations: adults reproduce and then die.

2 Constant adult population size: N haploid individuals survive to adulthood.

3 The number of offspring born to the i ’th individual alive in generation t is arandom variable νi,t . We will assume that these variables are independentand identically distributed with distribution ν.

4 N of the offspring are sampled uniformly at random and without replacementto form the next generation.

Jay Taylor (ASU) Coalescents 26 / 66

Page 43: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 44: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 45: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 46: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 47: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 48: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 5

2 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 49: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 50: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 51: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 52: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 53: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 54: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 5

2 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 55: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 56: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Two Examples

ν(1) = 0.9, ν(2) = 0.1

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

2 3

2 3

2 3

3

ν(0) = 0.75, ν(5) = 0.25

gen 0 1 2 3 4 5

gen 1 1 2 3 4 5

gen 2 1 2 3 4 5

gen 3 1 2 3 4 5

gen 4 1 2 3 4 5

gen 5 1 2 3 4 52 3

5

Jay Taylor (ASU) Coalescents 27 / 66

Page 57: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Provided that the variance of ν is finite, it can be shown that for large N the genealogyof a random sample can be approximated by Kingman’s coalescent. However, thedistribution of the time until the next coalescent event when there are k lineages is

P(τk > t) ≈ exp

(−

(k

2

)t

Ne

)

where the effective population size Ne is

Effective population size of a regulated Galton-Watson model

Ne ≡N(

1− 1m

+ σ2

m

) with m = E[ν] and σ2 = Var(ν).

Thus, as the within-generation fecundity variance σ2 increases, the time betweencoalescent events will tend to decrease. Note, however, that the coalescent rate is stillinversely proportional to the census population size N.

Jay Taylor (ASU) Coalescents 28 / 66

Page 58: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

The Moran Model

The Moran model is a continuous-time model of birth and death in a finite populationwith overlapping generations. It makes the following assumptions:

1 The population contains N haploid individuals.

2 Each individual reproduces independently of the others at rate 1.

3 When an individual reproduces, they give birth to a single offspring. To keep thepopulation size constant, one the remaining N − 1 individuals is chosen uniformlyat random and removed from the population (i.e., dies).

For this model, the genealogy of a random sample is exactly described by Kingman’scoalescent, even when N is finite. However, in this case, the effective population size isNe = N/2:

P(τk > t) = exp

(−

(k

2

)2t

N

).

Jay Taylor (ASU) Coalescents 29 / 66

Page 59: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 60: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 61: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 62: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 63: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 64: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 65: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 66: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 67: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 68: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 69: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 70: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 71: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 72: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 73: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Simulation of the Moran Model (N = 10)

past

present

1 2 3 4 5 6 7 8 9 10

Jay Taylor (ASU) Coalescents 30 / 66

Page 74: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

To understand why the effective population size of the Moran model is half the censuspopulation size, let ν be the random number of offspring produced by an individualover their lifetime and let T be this lifetime. Then

1 T is exponentially distributed with mean and variance equal to 1.

2 Conditional on T , ν is Poisson distributed with mean and varianceE[ν|T ] = Var(ν|T ) = T .

3 The average number of offspring produced by an individual over their lifetime canbe calculated using the law of total expectation:

m = E[ν] = E[E[ν|T ]] = E[T ] = 1.

4 Similarly, the variance of ν can be calculated using the law of total variance:

σ2 = Var(ν) = E[Var(ν|T )] + Var(E[ν|T ])

= E[T ] + Var(T )

= 2.

Jay Taylor (ASU) Coalescents 31 / 66

Page 75: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

The formula derived for the effective population size of a regulated Galton-Watsonprocess is also valid in this case. Since m = 1 and σ2 = 2, we have:

Effective population size of the Moran model

Ne =N(

1− 1m

+ σ2

m

) =N

(1− 1 + 2)= N/2.

In this case, the increased fecundity variance is a consequence of the variance of the lifespan: some individuals leave more offspring than others because they live longer.

In general, Ne depends on a multitude of life-history traits, including longevity,fecundity, age-structuring and sex ratio.

Jay Taylor (ASU) Coalescents 32 / 66

Page 76: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Coalescents in Large Metapopulations

Under some conditions, Kingman’s coalescent can be used to model genealogies fromsubdivided populations (Wakeley & Aliacar 2001). Consider the following version ofWright’s island model:

1 The population is subdivided into D subpopulations, each containing N haploidindividuals.

2 Each individual reproduces at rate 1, independently of all others, and gives birthto a single offspring.

3 With probability (1−m), the offspring remains within the subpopulation whereit was born, in which case one of the N adults in that subpopulation is chosenat random and dies.

4 Otherwise, with probability m, the offspring migrates to another subpopulation,which is chosen uniformly at random, where it replaces a random chosen adult.

The quantity m can be thought of as the migration rate.

Jay Taylor (ASU) Coalescents 33 / 66

Page 77: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

A Metapopulation with 15 Demes

Jay Taylor (ASU) Coalescents 34 / 66

Page 78: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

Kingman’s coalescent will provide an accurate approximation to the genealogy of arandom sample from a metapopulation so long as the following conditions are satisfied:

1 Each lineage must be sampled from a separate deme;

2 The number of demes must be much greater than the (effective) population sizewithin any single deme (D � N).

Together, these two conditions lead to a simplification of the genealogical process forthis model because they result in a separation of time scales: whilst coalescenceoccurs rapidly between lineages that occupy the same deme (O(N) generations), theprocess gathering lineages into the same deme is much slower (O(D) generations).

Jay Taylor (ASU) Coalescents 35 / 66

Page 79: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Robustness and Effective Population Size

The rate of coalescence in the metapopulation is inversely proportional to:

Effective population size in the island model

Ne = D

(N +

1−m

m

)≥ DN.

The magnitude of Ne depends on whetherNm is less than or greater than 1.

When Nm� 1, the population isnearly panmictic and Ne ≈ DN.

When Nm� 1, the demes areisolated and Ne ≈ D/m.

10−3

10−2

10−1

100

104

105

106

migration rate(m)

Ne

EffectivePopulationSizeof the IslandModel (D=1000)

N =100

N =50

N =10

panmixis

isolation

Jay Taylor (ASU) Coalescents 36 / 66

Page 80: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Genealogical Signatures of Population Dynamics

Coalescents in Populations of Varying Size

Kingman’s coalescent can also be extended to models in which the effective populationsize changes over time. The main requirement is that the population size is never verysmall, say Ne(t) > 20.

In this case, the ancestral process is a time-inhomogeneous continuous timeMarkov chain.

The rate of coalescence of lineages at time t in the past is inversely proportionalto the population size Ne(t) at that time. In particular,

P(t ≤ τk < t + δt) =

(k

2

)δt

Ne(t).

Branches tend to be shorter during periods when the population size washistorically small.

Jay Taylor (ASU) Coalescents 37 / 66

Page 81: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Genealogical Signatures of Population Dynamics

Random genealogies for samples of 20 individuals when Ne(t) = N0e−2t

Jay Taylor (ASU) Coalescents 38 / 66

Page 82: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Genealogical Signatures of Population Dynamics

Random genealogies for samples of 100 individuals when Ne(t) = N0e−2t

Jay Taylor (ASU) Coalescents 39 / 66

Page 83: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Genealogical Signatures of Population Dynamics

Genealogical Inference of Ancestral Population Size

The relationship between ancestral effective population size and coalescent rate can beused to infer the demographic history of a population from genetic sequence data. Bothparametric and semi-parametric approaches are available.

Parametric approaches assume a parametric model for the demography (e.g.,linear, exponential or logistic population growth) and then jointly infer thegenealogy G and the unknown parameters θdem.

Semi-parametric approaches include the classical skyline plot (Pybus et al.,2005) and Bayesian skyline plots (Drummond et al. 2005), which allow foreither a fixed or variable number of changes of population size that coincidewith inferred coalescent events.

Semi-parametric approaches are more flexible and allow for non-monotonicchanges but can also be more computationallydemanding.

Jay Taylor (ASU) Coalescents 40 / 66

Page 84: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Genealogical Signatures of Population Dynamics

Skyline Plots

In its original conception, the skyline plot uses a method-of-moments approach toestimate ancestral population sizes:

E[τk ] = Nk

/(k2

)−→ N̂k =

(k

2

)τk .

Population sizes between coalescentevents can either be constant orlinearly interpolated.

In a classic skyline plot, the genealogyis inferred first and then used toestimate the genealogy.

In Bayesian skyline plots, both thegenealogy and the demography areinferred together.

Jay Taylor (ASU) Coalescents 41 / 66

Page 85: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Genealogical Signatures of Population Dynamics

Bayesian skyline plot for a global sample of P. vivax mtDNA genomes.

Taylor et al. (2013)

Jay Taylor (ASU) Coalescents 42 / 66

Page 86: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Genealogical Signatures of Population Dynamics

Limitations and the Need for Ecological Realism

Genealogical inference of demographic history is subject to several important limitations:

1 Estimates derived from a single locus are usually very imprecise.

2 The relationship between the effective population size inferred using genealogicalmethods and the census population size is usually unknown.

3 Estimates of the effective population size are also confounded by the generationtime, which might vary over time.

4 These issues are especially problematic for parasites, where the effective populationsize and generation time depend on the prevalence and the incidence of theinfection, respectively (Frost & Volz 2010, Volz 2012, Koelle & Rasmussen 2012).

5 The relative importance of selection and genetic drift is still unresolved andprobably varies greatly between specie with different ecologies.

Jay Taylor (ASU) Coalescents 43 / 66

Page 87: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Genealogical Signatures of Population Dynamics

Reconciling Phylodynamics with Epidemiology: The Case of Dengue Virus in SouthernVietnam. D. A. Rasmussen et al. (MBE, 2014)

Jay Taylor (ASU) Coalescents 44 / 66

Page 88: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Bayesian Phylogenetics

Bayesian methods have proven to be especially useful in the analysis of genetic sequencedata. In these problems, the unknown parameters can often be divided into threecategories:

the unknown tree, Tthe parameters of the demographic model, Θdem

the parameters of the substitution model, Θsubst .

Then, given a sequence alignment D, the analytical problem is to calculate the posteriordistribution of all of the unknowns:

Bayes’ formula for phylogenetic inference, general case

p(T ,Θdem,Θsubst |D) = p(T ,Θdem,Θsubst) ·(p(D|T ,Θdem,Θsubst)

p(D)

)

Jay Taylor (ASU) Coalescents 45 / 66

Page 89: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

If the substitution process is assumed to be neutral, then the prior distribution and thelikelihood functions can be simplified:

Bayes’ formula for phylogenetic inference with neutral data

p(T ,Θsubst ,Θdem|D) = p(Θsubst) · p(Θdem) · p(T |Θdem) ·(p(D|T ,Θsubst)

p(D)

)

This is a consequence of the following assumptions:

Under the prior distribution, the parameters of the substitution processΘsubst are independent of the tree T and the demographic parameters Θdem.

The demographic parameters Θdem typically determine the conditionaldistribution of the genealogy T , e.g., through a coalescent model.

Conditional on T and Θsubst , the sequence data D is independent of Θdem.

Jay Taylor (ASU) Coalescents 46 / 66

Page 90: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Example: Bayesian Inference of Effective Population SizeSuppose that our data D consists of n randomly sampled individuals that have beensequenced at a neutral locus and that our objective is to estimate the effectivepopulation size Ne . For simplicity, we will use the Jukes-Cantor model for thesubstitution process with a strict molecular clock and we will assume that thedemography can be described by the constant population size coalescent.

To carry out a Bayesian analysis, we need to specify p(µ), p(Ne) and p(T |Ne).

p(µ) should be chosen to reflect what we know about the mutation rate,e.g., we could use a lognormal distribution with mean u and variance σ2.

Since Ne is a scale parameter in the coalescent, it is common practice touse the Jeffreys prior p(Ne) ∝ 1/Ne .

p(T |Ne) is then determined by Kingman’s coalescent.

Jay Taylor (ASU) Coalescents 47 / 66

Page 91: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Assuming that we are only interested in Ne , then µ and T are nuisance parameters andso we need to calculate the marginal posterior distribution of Ne by integrating over µand T :

p(Ne |D) =

∫ ∫p(T ,Ne , µ|D)dµdT

=p(Ne)

p(D)

∫ ∫p(D|T , µ)p(µ)p(T |NE )dµdT .

However, unless n is quite small, integration over T is not feasible. For example, if oursample contains 20 sequences, then there are approximately 8× 1021 possible trees to beconsidered. Even with the fastest computers available, this is an impossible calculation.

Jay Taylor (ASU) Coalescents 48 / 66

Page 92: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Markov Chain Monte Carlo Methods (MCMC)

Until recently, Bayesian methods were regarded as impractical for many problemsbecause of the computational difficulty of evaluating the posterior distribution. Inparticular, to use Bayes’ formula,

p(θ|D) = p(θ) ·(p(D|θ)

p(D)

),

we need to evaluate the marginal probability of the data

p(D) =

∫p(D|θ)p(θ)dθ,

which requires integration of the likelihood function over the parameter space. Except inspecial cases, this integration must be performed numerically, but sometimes even this isvery difficult.

Jay Taylor (ASU) Coalescents 49 / 66

Page 93: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

An alternative is to use Monte Carlo methods to sample from the posterior distribution.Here the idea is to generate a random sample from the distribution and then use theempirical distribution of that sample to approximate p(θ|D):

p(θ|D)︸ ︷︷ ︸posterior

≈ 1

N

N∑i=1

δΘi︸ ︷︷ ︸empirical distribution

where Θ1, · · · ,ΘN︸ ︷︷ ︸sample

∼ p(θ|D)

For example, the figure shows two histogram estimators for the Beta(8, 4) densitygenerated using either 100 (left) or 1000 (right) independent samples:

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4

θ

posteriordensity

sample: N =100Beta(8,4)

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4

θ

posteriordensity

sample: N =1000Beta(8,4)

Jay Taylor (ASU) Coalescents 50 / 66

Page 94: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

In particular, the empirical distribution can be used to estimate probabilities andexpectations under p(θ|D):

P(θ ∈ A|D) ≈ 1

N

N∑i=1

1A(Θi )

E[f (θ)|D] ≈ 1

N

N∑i=1

f (Θi ).

What makes this approach difficult is the need to generate random samples fromdistributions that are only known up to a constant of proportionality (e.g., p(D)).This is where Markov chain Monte Carlo methods come in.

Jay Taylor (ASU) Coalescents 51 / 66

Page 95: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Markov ChainsA discrete-time Markov chain is a stochastic process X0,X1, · · · with the property thatthe future behavior of the process only depends on its current state. More precisely, thismeans that for every set A and t, s ≥ 0,

P(Xt+s ∈ A︸ ︷︷ ︸future

| Xt︸︷︷︸present

,Xt−1, · · · ,X0︸ ︷︷ ︸past

) = P(Xt+s ∈ A|Xt).

0 20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time

frequency

Simulations of theWright−FisherProcess

past future

present

N=500,µ=0.001

Markov chains have numerous applicationsin biology. Some familiar examples include:

random walks

branching processes

the Wright-Fisher model

chain-binomial models (Reed-Frost)

Jay Taylor (ASU) Coalescents 52 / 66

Page 96: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

One consequence of the Markov property is that many Markov chains have a tendencyto ‘forget’ their initial state as time progresses. More precisely,

Asymptotic behavior of Markov chainsMany Markov chains have the property that there is a probability distribution π, calledthe stationary distribution of the chain, such that for large t the distribution of Xt

approaches π, i.e., for all states x and sets A,

limt→∞

P(Xt ∈ A|X0 = x) = π(A).

Stationary behavior of theWright-Fisher process:(N = 100, µ = 0.02)

0 0.5 10

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

p

density

t=2

0 0.5 10

0.005

0.01

0.015

0.02

0.025

0.03

0.035

p

density

t=20

0 0.5 10

0.005

0.01

0.015

0.02

0.025

p

density

t=100

p = 0.01

p = 0.5

p = 0.9

Jay Taylor (ASU) Coalescents 53 / 66

Page 97: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Some Markov chains satisfy an even stronger property, called ergodicity.

ErgodicityA Markov chain with stationary distribution π is said to be ergodic if for every initialstate X0 = x and every set A, we have

limT→∞

1

T

T∑t=1

1A(Xt) = π(A).

In other words, if we run the chain for a long time, then the proportion of time spentvisiting the set A is approximately π(A).

Ergodic behavior of theWright-Fisher process:(N = 100, µ = 0.02)

0 1 2 3 4 5 6 7 8 9 10

x104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Neutral Wright−Fishermodel

Generation

p

Jay Taylor (ASU) Coalescents 54 / 66

Page 98: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Markov Chain Monte Carlo: General ApproachThe central idea in Markov Chain Monte Carlo is to use an ergodic Markov chain togenerate a random sample from the target distribution π.

1 The first step is to select an ergodic Markov chain which has π as itsstationary distribution. There are different methods for doing this, includingthe Metropolis-Hastings algorithm and the Gibbs Sampler.

2 We then need to simulate the Markov chain until the distribution is close tothe target distribution. This initial period is often called the burn-in period.

3 We continue simulating the chain, but because successive values are highlycorrelated, it is common practice to only collect a sample every T generations,so as to reduce the correlations between the sampled states (thinning).

4 We can then use these samples to approximate the target distribution:

1

N

N∑n=1

δXB+nT ≈ π.

Jay Taylor (ASU) Coalescents 55 / 66

Page 99: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

The Metropolis-Hastings AlgorithmGiven a probability distribution π, the Metropolis-Hastings algorithm can be used toexplicitly construct a Markov chain that has π as its stationary distribution.

Implementation requires the following elements:

the target distribution, π, known up to a constant of proportionality;

a family of proposal distributions, Q(y |x), and a way to efficiently samplefrom these distributions.

The MH algorithm is based on a more general idea known as rejection sampling.Instead of sampling directly from π, we propose values using a distribution Q(y |x)that we can easily sample but then we reject values that are unlikely under π.

Jay Taylor (ASU) Coalescents 56 / 66

Page 100: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

The Metropolis-Hastings algorithm consists of repeated application of the followingthree steps. Suppose that Xn = x is the current state of the Markov chain. Then thenext state is chosen as follows:

Step 1: We first propose a new value for the chain by sampling y with probabilityQ(y |x).

Step 2: We then calculate the acceptance probability of the new state:

α(x ; y) = min

{π(y)Q(x |y)

π(x)Q(y |x), 1

}

Step 3: With probability α(x ; y), set Xn+1 = y . Otherwise, set Xn+1 = x .

RemarkBecause π enters into α as a ratio π(y)/π(x), we only need to know π up to a constantof proportionality. This is why the MH algorithm is so well suited for Bayesian analysis.

Jay Taylor (ASU) Coalescents 57 / 66

Page 101: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

The Proposal DistributionThe choice of the proposal distribution Q can have a profound impact on theperformance of the MH algorithm. While there is no universal procedure for selectinga ‘good’ proposal distribution, the following considerations are important.

Q should be chosen so that the chain rapidly converges to its stationarydistribution.

Q should also be chosen so that it is easy to sample from.

There is usually a tradeoff between these two conditions and sometimes it isnecessary to try out different proposal distributions to identify one with goodproperties.

Many implementations of MH (e.g., BEAST, MIGRATE) offer the user some controlover the proposal distribution.

Jay Taylor (ASU) Coalescents 58 / 66

Page 102: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

MCMC: Convergence and MixingOne of the most challenging issues in MCMC is knowing for how long to run the chain.There are two related considerations.

1 We need to run the chain until its distribution is sufficiently close to thetarget distribution (convergence).

2 We then need to collect a large enough number of samples that we canestimate any quantities of interest (e.g., the mean TMRCA) sufficientlyaccurately (mixing).

Unfortunately, there is no universally-valid, fool-proof way to guarantee that eitherone of these conditions is satisfied. However, there are a number of convergencediagnostics that can indicate when there are problems.

Jay Taylor (ASU) Coalescents 59 / 66

Page 103: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Convergence Diagnostics: Trace Plots

Trace plots show how the value of a parameter changes over the course of a simulation.In general, what we want to see is that the mean and the variance of the parameter arefairly constant over the duration of the trace plot, as in the two examples shown below.

P_

viva

x_g

sr_

ge

o1

.log

State

0 10000000 2E7 3E7 4E71600

1650

1700

1750

1800

1850

1900

1950

2000

P_

viva

x_g

sr_

ge

o1

.log

State

0 10000000 2E7 3E7 4E70

1

2

3

4

5

6

Jay Taylor (ASU) Coalescents 60 / 66

Page 104: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Problems with convergence or mixing may be revealed by trends or sudden changes inthe behavior of the trace plot.

P_

viva

x_g

sr_

ge

o1

.log

State

0 10000000 2E7 3E7 4E7-750

-700

-650

-600

-550

-500

The increasing trend indicates that thechain has not yet converged.

P_

viva

x_g

sr_

ge

o1

.log

State

0 10000000 2E7 3E7 4E7-7450

-7425

-7400

-7375

-7350

-7325

-7300

The sudden changes in mean indicate thatthe chain is poorly mixing.

Jay Taylor (ASU) Coalescents 61 / 66

Page 105: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Trace Plots: Some Guidelines

1 You should examine the trace plot of every parameter of interest, includingthe likelihood and the posterior probability. If any of the trace plots lookproblematic, then all of the results are suspect.

2 The fact that a trace plot appears to have converged is not conclusive proofthat it has. Especially in high-dimensional problems, a chain that appears tobe stationary for the first 500 million generations may well show a suddenchange in behavior in the next.

3 The program Tracer (http://tree.bio.ed.ac.uk/software/tracer/)can be used to display and analyze trace plots generated by BEAST, MrBayesand LAMARC.

Jay Taylor (ASU) Coalescents 62 / 66

Page 106: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Convergence Diagnostics: Effective Sample Size

Because successive states visited by a Markov chain are correlated, an estimate derivedusing N values generated by such a chain will usually be less precise than an estimatederived using N independent samples. This motivates the following definition.

Effective Sample Size (ESS)

The effective sample size of a sample of N correlated random variables is equal to thenumber of independent samples that would estimate the mean with the same variance.

For a stationary Markov chain with autocorrelation coefficients ρk , the ESS of Nsuccessive samples is equal to

ESS =N

1 + 2∑∞

k=1 ρk.

Jay Taylor (ASU) Coalescents 63 / 66

Page 107: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

Effective Sample Size: Guidelines

1 Each parameter has its own ESS and these can differ between parametersby more than order of magnitude.

2 Parameters with small ESS’s indicate that a chain either has not convergedor is slowly mixing. As a rule of thumb, the ESS of every parameter shouldexceed 1000 and larger values are even better.

3 Thinning by itself will not increase the ESS. However, we can increase theESS by simultaneously thinning and increasing the duration of the chain,e.g., collecting a 1000 samples from a chain lasting 100000 generations isbetter than collecting a 1000 samples from a chain lasting 10000 generations.

4 The ESS of a parameter can usually only be estimated from its trace. Forthis reason, large ESS’s do not guarantee that the chain has converged.

Jay Taylor (ASU) Coalescents 64 / 66

Page 108: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

References I

A. J. Drummond, A. Rambaut, B. Shapiro, and O. G. Pybus, Bayesian CoalescentInference of Past Population Dynamics from Molecular Sequences, Mol. Biol. Evol.22 (2005), 1185–1192.

S. D. W. Frost and E. M. Volz, Viral phylodynamics and the search for an‘effective number of infections’, Phil. Trans. Roy. Soc. B 365 (2010), 1879–1890.

B. T. Grenfell, O. G. Pybus, J. R. Gog, James L. N. Wood, Janet M. Daly,Jenny A. Mumford, and Edward C. Holmes, Unifying the Epidemiological andEvolutionary Dynamics of Pathogens, Science 303 (2004), 327–332.

J. F. C. Kingman, The Coalescent, Stoch. Proc. Appl. 13 (1982), 235–248.

, On the Genealogy of Large Populations, J. Appl. Prob. 19A (1982),27–43.

K. Koelle and D. A. Rasmussen, Rates of coalescence for common epidemiologicalmodels at equilibrium, J. Roy Soc. Interface 9 (2012), 997–1007.

Jay Taylor (ASU) Coalescents 65 / 66

Page 109: Coalescents and Genealogies - Arizona State UniversityPedigrees vs. Genealogies Pedigrees show genetic relationships between individuals. The number of ancestors increases backwards

Statistical Genetics

References II

P. Lemey, M. Salemi, and A.-M. Vandamme (eds.), The Phylogenetic Handbook:A Practical Approach to Phylogenetic Analysis and Hypothesis Testing, second ed.,Cambridge University Press, 2009.

O. G. Pybus, A. Rambaut, and P. H. Harvey, An integrated framework for theinference of viral population history from reconstructed genealogies, Genetics 155(2000), 1429–1437.

E. M. Volz, Complex Population Dynamics and the Coalescent Under Neutrality,Genetics 190 (2012), 187–201.

J. Wakeley and N. Aliacar, Gene Genealogies in a Metapopulation, Genetics 159(2001), 893–905.

J. Wakeley, Coalescent Theory: An Introduction, Roberts & Company, 2009.

Jay Taylor (ASU) Coalescents 66 / 66