Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
A brief introduction to population genetics
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics
Definition- studies distributions & changes of allele frequencies in populations over time- effects considered:
- natural selection, genetic drift, mutation and gene flow- recombination, population subdivision and population structure
- allows inferring past events as well as predicting future
History- fundamental work by Haldane, Wright and Fisher on first half of 20th century- recent development: coalescent theory by Kingman in 1980’s
- suitable for SNPs data- computationally highly efficient
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Allele
Allele- one of alternative forms of a gene or same genetic locus
- used to be visible gene product (e.g. blond vs. red hair)
- now typically a SNP (e.g. rs1805007(C) vs. rs1805007(T))
rs1805007 = C rs1805007 = T
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Allele
Allele- one of alternative forms of a gene or same genetic locus
- used to be visible gene product (e.g. blond vs. red hair)
- now typically a SNP (e.g. rs1805007(C) vs. rs1805007(T))
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Allele
Allele- one of alternative forms of a gene or same genetic locus
- used to be visible gene product (e.g. blond vs. red hair)
- now typically a SNP (e.g. rs1805007(C) vs. rs1805007(T))
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Allele
Allele- one of alternative forms of a gene or same genetic locus
- used to be visible gene product (e.g. blond vs. red hair)
- now typically a SNP (e.g. rs1805007(C) vs. rs1805007(T))
Alleles in a “genetic locus” do not need to be functional
- in many studies we are interested in neutral variation
- rs1805007 associated e.g. with ‘Skin sensitivity to sun’, ‘Hair color’, ‘Non-melanoma skin cancer’, ‘Freckles’
Genome provides millions of variable loci, majority of those neutral
Inferring presence of function for a locus/allele is of special interest
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Allele
Population modelTheoretical models assume a simplified population model
Most commonly used model is Wright-Fisher model. It assumes:
- haploid population
- no sex
- constant population size
Wright-Fisher model (WFM) can be generalised:
- diploid population
- panmictic, random mating
- variable population size
WFM gives a good approximation for more complex populations
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Wright-Fisher model
Wright-Fisher modelEvolution of an idealised population: generation 1
1
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Wright-Fisher model
Wright-Fisher modelEvolution of an idealised population: generation 2
21
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Wright-Fisher model
Wright-Fisher modelEvolution of an idealised population: generation 3
32
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Wright-Fisher model
Wright-Fisher modelEvolution of an idealised population: generation 10
103
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Wright-Fisher model
Wright-Fisher modelEvolution of an idealised population: generation 10
10
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Ne
Population sizeOne central parameter in population genetics is population size
Abbreviated as N
Population size defines
- how quickly variation is lost (forwards)
- how much frequencies change per generation (now)
- how quickly sample coalesces to MRCA (backwards)
Population size is measured in ’units’ of WFM population
- known as effective population size, Ne
- can be very different from census population size
- some violations of WFM can be corrected for
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Ne and Drift
- loss of variation
- change of allele frequencies
Known as genetic drift
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Ne and Drift
- loss of variation
- change of allele frequencies
Known as genetic drift
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Ne and Drift
Genetic driftAt every locus, variation is eventually lost and one allele becomes fixed
- in non-neutral loci, selection affects chances of fixation
- variation is lost much more rapidly in small populations
- in small populations genetic drift prevails selection and even harmful alleles may get fixed
Variation once lost is lost forever
- population bottleneck reduces variation and population
- recovery cannot bring it back
- new variation is created by mutations
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Population genetics basics: Coalescent
Coalescence time
- small populations coalesce faster, more recent MRCA
- conversely: Ne can be defined by coalescence time
MRCA
N = 20 N = 100
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Coalescence of two lineages
Probability of coalescence on generation r before present follows
geometric distribution: [1 − 1/(2N)]r −1 [1/(2N)]
- its mean is 1/p or 1/[1/2N)] or 2N
- for 2N=20, expected time for two random lineages to coalesce is 40 generations
For large N, coalescence process follows exponential distribution
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Coalescence of many lineages
For n lineages, coalescence rate is [n(n − 1)/2][1/(2N)]
- for 2N = 20, rate and expected time to next event are:
- when n is large, coalescent events happen quickly
- last event (n = 2) is expected to take at least half of total time
lineages coalescent rate generations
5 0.5 2
4 0.3 3.3
3 0.15 6.7
2 0.05 20
total 32
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Coalescence of many lineages
Expected coalescence times have large variance, tree shapes differ
- expected genetic diversity is affected by tree structure
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Coalescence and Site frequency spectrum
Bamshad and Wooding, NRG, 2003
Vertical branches are evolutionary time, mutations random
- tree shapes have expected distribution of allele frequencies
- for many loci this is called site frequency spectrum- with outgroup, ancestral and derived allele can be inferred
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Site frequency spectrum, DAF and MAF
SFS: also known as allele frequency spectrum
- if ancestral sate known, derived allele frequency (DAF)
- if not, minor allele frequency (MAF) or folded SFS
One of the most widely used summary statistics
- at neutral sites, reflects population history- at non-neutral sites, reflects selection pressure
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Coalescence with non-constant population size
Increasing population size Decreasing population size
few coalescent events in large phase many coalescent events in small phase
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Coalescence with non-constant population size
Population increase and decrease affect the expected tree shape
- mutations are random so tree shape affects expected SFS
Nielsen and Slatkin, 2013
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Coalescence and Site frequency spectrum
Bamshad and Wooding, NRG, 2003
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Site frequency spectrum and Demography
Evidence for bottleneck in human EUR and ASN populations
- quick coalescent & deep branches → deficit of low frequencies
Keinan et al, NG, 2007
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Site frequency spectrum and Selection
Selection affects fitness and thus AF
posi
tive
nega
tive
negative selection
syn sites_ 3’UTR_ cons 3’UTR_
nonsyn sites_ cons miRNA_
Increase in low frequencies alleles due to negative selection
Chen and Rajewsky, NG, 2006
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Derived allele frequencies and annotation liftover
529053 Evolutionary Genomics Ari Löytynoja / [email protected]
Derived allele frequencies
We will look at DAF of
1. different populations
2. different genomic regions
The latter requires annotation that we lift-over from threespined