30
Intro to population genetics Shamil Sunyaev Broad Institute of M.I.T. and Harvard

Intro to population geneticsstatgen.us/files/2015/02/sunyaev/handouts/Rockefeller... · 2019. 3. 18. · mutation Functional Nonfunctional Selection indicates functional mutations,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Intro to population genetics

    Shamil Sunyaev

    Broad Institute of M.I.T. and Harvard

  • Forces responsible for genetic change

    Mutation

    Selection s

    NeDrift

    Population structure FST

  • Mutations

  • Mutation rate in humans and flies

    ~102 per nt changes genome

    2.5x10-8 (Nachman & Crowell) 1.8x10-8 (Kondrashov)

    Other events: indels (10-9)

    repeat extensions/contractions (10-5)

    large events (?)

    NGS estimates ~1.2X10-8 per nt changes genome

  • Mutation rate is variable along the genome

    Regional variation of mutation rate

    Context dependence of mutation rate

    Replication fidelity DNA damage DNA repair CpG deamination

  • Genetic drift

  • Drift is a random change of allele

    frequencies

  • Drift depends on population size

  • Demographic history

  • Selection

  • 12

    NeutralDeleterious Advantageous

    New

    mutation

    Functional

    Nonfunctional

    Selection indicates functional mutations, whether or

    not the tested trait is under selection

    Selective effect of mutation

    Most functional mutations are deleterious

  • Methods of mathematical

    population genetics

  • Dynamic of allelic substitution

    time

    0

    1

    Mathematically, allele frequency change in a population

    follows a one-dimensional random walk

  • Diffusion approximation

    Random walk that does not jump long distances can be

    approximated by a diffusion process

    ¶f x, p,t( )¶t

    = -¶Mf x, p,t( )

    ¶x+

    1

    2

    ¶2Vf x, p,t( )¶x2

  • Coalescent theory

    Instead of modeling a population, we can model our sample

    Time goes backwards !

    t

  • Natural selection in protein

    coding regions

  • Effect of new missense mutations

  • Computer simulations

    time

    ¶f x, p,t( )¶t

    = -¶Mf x, p,t( )

    ¶x+

    1

    2

    ¶2Vf x, p,t( )¶x2

    Demographic history

    Natural selection

  • • Can we find additional evidence in sequence data?

    • Is there any information beyond frequency? Can we

    tell alleles under selection from neutral alleles if they

    are of the same frequency?

  • 25

    Maruyama effect (1974): at any frequency advantageous ,

    or deleterious alleles are younger than neutral alleles

    Frequency x

    Frequency 0%

    Time

    At a given frequency deleterious and

    advantageous alleles are younger than

    neutral

  • Longer trajectory: 6 jumps

    Shorter trajectory: 4 jumps

    Frequency 0%

    Frequency x

    Time

    Intuition: shorter trajectories require

    fewer lucky jumps

  • time

    allele

    frequencyNeutrals: equal time at each frequency

    Selecteds: faster through higher frequencies

    Idea: low accumulation of mutations at linked

    sites indicates selection

    Diffusion theory: deleterious alleles pass

    fast through higher frequencies

  • 10

    !

    !

    !

    !

    !

    −25 −20 −15 −10 −5 0

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    selection coefficient 2Ns

    mean age (2N generations)

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    Population frequency

    7%

    5%

    3%

    !"

    #$%&' ( ") "

    0 5 10 15 20

    0.000

    0.005

    0.010

    0.015

    0.020

    Intermediate allele frequency (%)

    mean sojourn time (2N generations)

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !!

    ! ! ! ! !

    !

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !!

    !!

    !!

    !!

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !

    !! ! ! ! ! ! ! ! !

    Selection coefficient (2Ns)

    0 (neutral)

    −2 (weakly deleterious)

    −10 (deleterious)

    3%

    *"

    −0.20 −0.15 −0.10 −0.05 0.00

    05

    10

    15

    time (generations before present, in 2N units)

    population frequency (%)

    Allele

    neutral

    deleterious

    +"

    ""

    ""

    Figur e 1. Simulat ion and t heor et ical r esul t s for al lel ic age and sojour n t imes. a. Example

    t rajectories for a neutral and deleterious allele with current populat ion frequencies 3% (indicated by an

    arrow). The shaded areas indicate sojourn t imes at frequencies above 5%. b. Mean ages for neutral and

    deleterious alleles at a given populat ion frequency (lines show theoret ical predict ions, dots show

    simulat ion results with standard error bars). The graph shows that deleterious alleles at a given

    frequency are younger than neutral alleles, and that the e↵ect is greater for more st rongly selectedalleles. c. Mean sojourn t imes for neut ral and deleterious alleles. Vert ical line denotes the current

    populat ion frequency of the variant (3%). Mean sojourn t imes have been computed in bins of 1%. Line

    connects theoret ical predict ions for each frequency bin. Dots show simulat ion results. The graph

    illust rates that deleterious alleles spend much less t ime than neutral alleles at higher populat ion

    frequencies in the past even if they have the same current frequency.

  • Neighborhood clock

    (fuzzy clock)

    29

  • Neighborhood clock is consistent

    with Maruyama-effect expectations

    Data: pilot Genome of Netherlands dataset