Monkey Business

  • View
    511

  • Download
    0

Embed Size (px)

DESCRIPTION

A not too technical explanation of coalescence HMMs and the divergence time between human and apes

Text of Monkey Business

  • 1. Monkey Business Bioinformatics Research Center University of Aarhus Thomas Mailund Joint work withAsger Hobolth, Ole F. ChristiansenandMikkel H. Schierup

2. Paper: 3. Dating speciation events Relationship between great apes: 4. Dating speciation events Relationship between great apes: How do we know this is the relationship? 5. Dating speciation events Relationship between great apes: How do we know this is the relationship? How do we date the speciation events? 6. Fossil record Good fossil record for humans, less so for other apes... Jobling et al. 2004 Common ancestor human-chimp? 7. Fossil record Good fossil record for humans, less so for other apes... Enter: molecular genetics methods! Jobling et al. 2004 Common ancestor human-chimp? 8. Outline of talk

  • Introduction tomolecular evolution andpopulation genetics
  • Mathematical modeling of the problem:coalescence hidden Markov models
  • Inference and results

9. Mutations and a molecular clock

  • Mutations enter DNA when
    • cells replicate
    • chromosomes are modified by the cell's chemical soup
  • Error correction and various proof reading mechanisms makes mutations extremely rare
    • But there isa lotof DNA in each individual (2 x ~3 billion nucleotides)
    • Things that happen about every thousand years happens a lot over millions of years

10. Mutations and a molecular clock

  • Linear in time and (germ line cell) generations:
  • ...and for our purposes, generations are linear in time, so:

11. Mutations and a molecular clock HIV sequence: Accumulation of mutations over time 12. Mutations and a molecular clock 13. Mutations and a molecular clock Estimation using observed number of mutations 14. Mutations and a molecular clock Back mutations complicate matters slightly, but we can compensate for this by solving a simple differential equation... 15. Mutations and a molecular clock Back mutations complicate matters slightly, but we can compensate for this by solving a simple differential equation... 16. Mutations and a molecular clock The time estimate is known up to a mutation-rate factor... ...that e.g. can be calibrated using fossil evidence. 17. Dating divergence... So we can estimatepairwisedivergence in units of time... 14 My 36 My 18. Dating divergence... So we can estimatepairwisedivergence in units of time... ...although this over-estimates the number of mutations (does not take shared mutations into account) 14 My 36 My 19. Dating divergence... So we can estimatepairwisedivergence in units of time... ...possible to construct a tree and infer branch lengths from this... 14 My 36 My 20. Dating divergence... aactg agctg aggtg atatg agctg aactg So we can estimatepairwisedivergence in units of time... ...possible to construct a tree and infer branch lengths from this... ...or take a statistical approach and deal with unobserved sequences in a mathematical model. 21. Dating divergence... aactg agctg aggtg atatg agctg aactg t 1 t 2 t 3 22. Dating divergence... aactg agctg aggtg atatg agctg aactg Can be computed efficiently using a dynamic programming algorithm. t 1 t 2 t 3 23. Dating divergence... aactg agctg aggtg atatg agctg aactg Time parameters can then be estimated by maximizing the likelihood. t 1 t 2 t 3 24. Mutations in a population

  • Wright-Fisher model
    • Discrete, non-overlapping generations
    • Constant population size
    • Each individual in one generation is a random copy of an individual from the previous generation

N eindividuals 25. Mutations in a population

  • Wright-Fisher model
    • Discrete, non-overlapping generations
    • Constant population size
    • Each individual in one generation is a random copy of an individual from the previous generation

N eindividuals 26. Mutations in a population

  • Wright-Fisher model
    • Discrete, non-overlapping generations
    • Constant population size
    • Each individual in one generation is a random copy of an individual from the previous generation

N eindividuals 27. Populations and species Individuals A funny thing happens if the population size is large relative to the splitting time... 28. Populations and species Populations Individuals 29. Populations and species Individuals Populations 30. Populations and species Individuals Populations join at population join time join much later than population 31. Populations and species Individuals Populations 32. Populations and species A funny thing happens if the population size is large relative to the splitting time... ...for speciation, the time is too long... 33. Populations and species A funny thing happens if the population size is large relative to the splitting time... ...for speciation, the time is too long... ...and no human is closer related tochimps than any other human! 34. Populations and species

  • The time fork lines to coalesce is distributed asE ( k ( k 1)/2)in units of 2N egenerations.
    • Generation time ~25 years
    • N efor humans ~10000

35. Populations and species

    • We will essentiallyallhave coalesced before we meet the chimps

Humans Chimps 36. Recombination

  • But there isrecombination!
    • Breaks up DNA in segments with separate histories
    • Single extant sequence reaches an equilibrium back in time
    • Wewill have several segments meeting the chimp!

37. Recombination genome species Physical position along genome Divergence time Adapted from Patterson et al. 2006 38. Species and segment trees Human Chimp Gorilla Case A: Only Human and Chimp can coalesce here.Always consistent with species tree. Case B: All species can coalesce here.Only a third consistent with species tree. 39. Species and segment trees Human Chimp Gorilla Case A: Only Human and Chimp can coalesce here.Always consistent with species tree. Case B: All species can coalesce here.Only a third consistent with species tree. We can get the probability of A or B based on split times and population size t 1 t 1 + t 2 N HC N HCG 40. Species and segment trees We can express the probability of either of these trees in terms of splitting times and effective population sizes... 41. Species and segment trees We can express the probability of either of these trees in terms of splitting times and effective population sizes... ...or alternatively get the time parameters from the tree probabilities. 42. Species and segment trees We obtain the tree probabilities from an approximation to the coalescence process: ahidden Markov model 43. Markov models AMarkov modelis an automaton where transitions are probabilistic, i.e. each transition to the next state is taken with a certain probability. Arunis a sequence of states generated by the model. 44. Hidden Markov models (HMMs) Ahidden Markov modelin addition has emission symbols and probabilities.In each state it emits symbols with certain probabilities. Arunis a sequence of both states and emissions.We only observethe emissions. 45. A coalescence HMM

  • States correspond to the four trees
  • Emission probabilities using dynamic programming algorithm
    • Branch lengths are mean branch lengths for each tree type

46. A coalescence HMM A 47. A coalescence HMM A a a c a a c 48. A coalescence HMM AB1 a a c 49. A coalescence HMM AB1ac ac cc c c c 50. A coalescence HMM AB1B1 aca act cca a t a 51. A coalescence HMM AB1B1B1 acaa acta ccaa a a a 52. A coalescence HMM AB1B1B1B3acaag actac ccaac g c c 53. A coalescence HMM AB1B1B1B3B3B3AAA acaagatca c actactcta c ccaactcta c c c c 54. A coalescence HMM AB1B1B1B3B3B3AAA acaagatca c actactcta c ccaactcta c 55. Inference in the CoalHMM

  • We use standard algorithms for estimating the HMM parameters
    • Transition probabilities gives us:
    • State probabilities, which gives us:
    • Coalescence process parameters, which gives us:
    • Speciation times!

56. Results t 1 t 2 57. Annotation in the CoalHMM Posterior probabilities of each tree (probability of being in a particular tree given the sequence data) along a genomic region: 58. Summary

  • Dating the speciation between humans, chimps, and gorilla
    • Expressed as a molecular evolution / population genetics problem
    • Approximated by a machine learning / statistical model (HMM)
    • Estimated HMM parameters gives us speciation times
  • Puts speciation time of human-chimp ~4 My ago and (human-chimp) gorilla ~5-6 My ago

59. The end Thank you! http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1802818&rendertype=abstract