29
A brief introduction to population genetics

population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

A brief introduction to population genetics

Page 2: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics

Definition- studies distributions & changes of allele frequencies in populations over time- effects considered:

- natural selection, genetic drift, mutation and gene flow- recombination, population subdivision and population structure

- allows inferring past events as well as predicting future

History- fundamental work by Haldane, Wright and Fisher on first half of 20th century- recent development: coalescent theory by Kingman in 1980’s

- suitable for SNPs data- computationally highly efficient

Page 3: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Allele

Allele- one of alternative forms of a gene or same genetic locus

- used to be visible gene product (e.g. blond vs. red hair)

- now typically a SNP (e.g. rs1805007(C) vs. rs1805007(T))

rs1805007 = C rs1805007 = T

Page 4: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Allele

Allele- one of alternative forms of a gene or same genetic locus

- used to be visible gene product (e.g. blond vs. red hair)

- now typically a SNP (e.g. rs1805007(C) vs. rs1805007(T))

Page 5: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Allele

Allele- one of alternative forms of a gene or same genetic locus

- used to be visible gene product (e.g. blond vs. red hair)

- now typically a SNP (e.g. rs1805007(C) vs. rs1805007(T))

Page 6: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Allele

Allele- one of alternative forms of a gene or same genetic locus

- used to be visible gene product (e.g. blond vs. red hair)

- now typically a SNP (e.g. rs1805007(C) vs. rs1805007(T))

Alleles in a “genetic locus” do not need to be functional

- in many studies we are interested in neutral variation

- rs1805007 associated e.g. with ‘Skin sensitivity to sun’, ‘Hair color’, ‘Non-melanoma skin cancer’, ‘Freckles’

Genome provides millions of variable loci, majority of those neutral

Inferring presence of function for a locus/allele is of special interest

Page 7: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Allele

Population modelTheoretical models assume a simplified population model

Most commonly used model is Wright-Fisher model. It assumes:

- haploid population

- no sex

- constant population size

Wright-Fisher model (WFM) can be generalised:

- diploid population

- panmictic, random mating

- variable population size

WFM gives a good approximation for more complex populations

Page 8: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Wright-Fisher model

Wright-Fisher modelEvolution of an idealised population: generation 1

1

Page 9: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Wright-Fisher model

Wright-Fisher modelEvolution of an idealised population: generation 2

21

Page 10: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Wright-Fisher model

Wright-Fisher modelEvolution of an idealised population: generation 3

32

Page 11: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Wright-Fisher model

Wright-Fisher modelEvolution of an idealised population: generation 10

103

Page 12: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Wright-Fisher model

Wright-Fisher modelEvolution of an idealised population: generation 10

10

Page 13: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Ne

Population sizeOne central parameter in population genetics is population size

Abbreviated as N

Population size defines

- how quickly variation is lost (forwards)

- how much frequencies change per generation (now)

- how quickly sample coalesces to MRCA (backwards)

Population size is measured in ’units’ of WFM population

- known as effective population size, Ne

- can be very different from census population size

- some violations of WFM can be corrected for

Page 14: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Ne and Drift

- loss of variation

- change of allele frequencies

Known as genetic drift

Page 15: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Ne and Drift

- loss of variation

- change of allele frequencies

Known as genetic drift

Page 16: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Ne and Drift

Genetic driftAt every locus, variation is eventually lost and one allele becomes fixed

- in non-neutral loci, selection affects chances of fixation

- variation is lost much more rapidly in small populations

- in small populations genetic drift prevails selection and even harmful alleles may get fixed

Variation once lost is lost forever

- population bottleneck reduces variation and population

- recovery cannot bring it back

- new variation is created by mutations

Page 17: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Population genetics basics: Coalescent

Coalescence time

- small populations coalesce faster, more recent MRCA

- conversely: Ne can be defined by coalescence time

MRCA

N = 20 N = 100

Page 18: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Coalescence of two lineages

Probability of coalescence on generation r before present follows

geometric distribution: [1 − 1/(2N)]r −1 [1/(2N)]

- its mean is 1/p or 1/[1/2N)] or 2N

- for 2N=20, expected time for two random lineages to coalesce is 40 generations

For large N, coalescence process follows exponential distribution

Page 19: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Coalescence of many lineages

For n lineages, coalescence rate is [n(n − 1)/2][1/(2N)]

- for 2N = 20, rate and expected time to next event are:

- when n is large, coalescent events happen quickly

- last event (n = 2) is expected to take at least half of total time

lineages coalescent rate generations

5 0.5 2

4 0.3 3.3

3 0.15 6.7

2 0.05 20

total 32

Page 20: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Coalescence of many lineages

Expected coalescence times have large variance, tree shapes differ

- expected genetic diversity is affected by tree structure

Page 21: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Coalescence and Site frequency spectrum

Bamshad and Wooding, NRG, 2003

Vertical branches are evolutionary time, mutations random

- tree shapes have expected distribution of allele frequencies

- for many loci this is called site frequency spectrum- with outgroup, ancestral and derived allele can be inferred

Page 22: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Site frequency spectrum, DAF and MAF

SFS: also known as allele frequency spectrum

- if ancestral sate known, derived allele frequency (DAF)

- if not, minor allele frequency (MAF) or folded SFS

One of the most widely used summary statistics

- at neutral sites, reflects population history- at non-neutral sites, reflects selection pressure

Page 23: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Coalescence with non-constant population size

Increasing population size Decreasing population size

few coalescent events in large phase many coalescent events in small phase

Page 24: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Coalescence with non-constant population size

Population increase and decrease affect the expected tree shape

- mutations are random so tree shape affects expected SFS

Nielsen and Slatkin, 2013

Page 25: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Coalescence and Site frequency spectrum

Bamshad and Wooding, NRG, 2003

Page 26: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Site frequency spectrum and Demography

Evidence for bottleneck in human EUR and ASN populations

- quick coalescent & deep branches → deficit of low frequencies

Keinan et al, NG, 2007

Page 27: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Site frequency spectrum and Selection

Selection affects fitness and thus AF

posi

tive

nega

tive

negative selection

syn sites_ 3’UTR_ cons 3’UTR_

nonsyn sites_ cons miRNA_

Increase in low frequencies alleles due to negative selection

Chen and Rajewsky, NG, 2006

Page 28: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Derived allele frequencies and annotation liftover

Page 29: population genetics A brief introduction towasabiapp.org/vbox/data/ngg2016/popgen/Day 2 - Session 2- Populat… · Population genetics Definition - studies distributions & changes

529053 Evolutionary Genomics Ari Löytynoja / [email protected]

Derived allele frequencies

We will look at DAF of

1. different populations

2. different genomic regions

The latter requires annotation that we lift-over from threespined