Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
using an accurate beta approximation
PAULA TATARU
THOMAS BATAILLON
ASGER HOBOLTH
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
CSHL, April 15th 2015
Inference under the Wright-Fisher model
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
2
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
›Mathematical models formalize the evolution of
genetic variation within and between populations
2
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Theoretical population genetics
›Mathematical models formalize the evolution of
genetic variation within and between populations
›Provide a framework for inferring evolutionary paths
from observed data to
2
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems
› Inference of population history from DNA data
› (Variable) population size
› Migration / admixture
› Divergence times
› Selection coefficients
3
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: population size
4
H. Li and R. Durbin. Inference of human population history from individual whole-genome
sequences. Nature, 475:493–496, 2011
PSMC
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations divergence
5
M. Gautier and R. Vitalis. Inferring population histories using genome-wide allele frequency data.
Molecular biology and evolution, 30(3):654–668, 2013
Kim Tree
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations admixture
6
J. K. Pickrell and J. K. Pritchard. Inference of population splits and mixtures from genome-wide allele
frequency data. PLOS Genetics, 8(11):e1002967, 2012
TreeMix
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: populations admixture
7
Gronau I., Hubisz M. J., Gulko B., Danko C. G., Siepel A. Bayesian inference of ancient human
demography from individual genome sequences. Nature genetics 43(10): 1031-1034, 2011
G-PhoCS
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference problems: loci under selection
8
Steinrücken M., Bhaskar A. and Song Y. S. A novel spectral method for inferring general selection from
time series genetic data. The Annals of Applied Statistics 8(4):2203–2222, 2014
spectralHMM
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the Wright-Fisher model
› Evolution of a population
forward in time
› Follow one locus (region
in the DNA)
›Different variants at the
locus are called alleles
9
individuals
ge
ne
rati
on
s (t
ime
)
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the Wright-Fisher model
›Basic model: only two
alleles per locus
› Follow the frequency of
one of the alleles
10
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Allele frequency distribution
11
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
ge
ne
rati
on
s (t
ime
)
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
ge
ne
rati
on
s (t
ime
)
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
12
individuals
ge
ne
rati
on
s (t
ime
)
MRCA
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Population genetics: the coalescent model
› Trace the genealogy of
sampled individuals
backward in time
›Coalescent process
terminates when
reaching MRCA
12
individuals
ge
ne
rati
on
s (t
ime
)
MRCA
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
›The coalescent
Two dual models
13
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
›The coalescent
› Backward in time
Two dual models
13
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
›The coalescent
› Backward in time
› Follow genealogy
Two dual models
13
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
› Selection
›The coalescent
› Backward in time
› Follow genealogy
› Recombination
Two dual models
13
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›The Wright-Fisher
› Forward in time
› Follow allele frequency
› Selection
› Scalability
›Sample size decreases
uncertainty
›The coalescent
› Backward in time
› Follow genealogy
› Recombination
› Scalability
›Sample size increases
complexity
Two dual models
13
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
›Moment-based
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
›Moment-based
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
› No closed solution
› Cumbersome to evaluate
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
› Closed analytical forms
› Fast to evaluate
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Large population size
› Infinitesimal change
› No closed solution
› Cumbersome to evaluate
›Moment-based
› Convenient distributions
› Normal distribution
› Beta distribution
› Closed analytical forms
› Fast to evaluate
› Problematic at boundaries
Approximations to the Wright-Fisher
14
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
›Beta distribution
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
›Beta distribution
› Support: [0, 1]
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
› Truncation
›Incorrect variance
›Beta distribution
› Support: [0, 1]
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Normal distribution
› Support: real line
› Truncation
›Incorrect variance
› Intermediary frequencies
›Beta distribution
› Support: [0, 1]
› Intermediary frequencies
Behavior at the boundaries
15
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes
›Use of Wright-Fisher
› Scalable
›Use of moments
› Simple mathematical calculations
› Improve behavior at boundaries
› Preserve mean and variance
16
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
›Zt allele count
›Xt = Zt /2N
›Zt+1 follows a binomial
distribution
17
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
›Zt allele count
›Xt = Zt /2N
›Zt+1 follows a binomial
distribution
17
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model
›Zt allele count
›Xt = Zt /2N
›Zt+1 follows a binomial
distribution
›g encodes the
evolutionary pressures
17
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Drift only
18
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
3
4
5
5
allele count
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Mutations
19
individuals
ge
ne
rati
on
s (t
ime
)
3
2
4
5
4
3
2
allele count
u v
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Mutations
19
individuals
ge
ne
rati
on
s (t
ime
)
3
2
4
5
4
3
2
allele count
u v
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Migration
20
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
5
4
2
3
allele count
m1 m2
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Migration
20
individuals
ge
ne
rati
on
s (t
ime
)
3
2
3
5
4
2
3
allele count
m1 m2
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Linear forces
›Mutations
›Migration
›Mutations & Migration
21
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Wright Fisher model: Linear forces
›Mutations
›Migration
›Mutations & Migration
21
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 22
The Beta approximation: Main idea
›The density of Xt
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 22
The Beta approximation: Main idea
›The density of Xt
›Use recursive approach to calculate
› Mean and variance
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 22
The Beta approximation: Main idea
›The density of Xt
›Use recursive approach to calculate
› Mean and variance
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 23
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 23
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 24
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 25
The Beta approximation: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: Main idea
›The density of Xt
26
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: Main idea
›The density of Xt
›Use recursive approach to calculate
› Loss and fixation probabilities
26
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
27
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: loss probability
28
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: fixation probability
29
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 30
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 30
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 31
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 32
The Beta with spikes: Drift only
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Numerical accuracy: Drift only
33
Beta Beta with spikes
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 34
Inference of divergence times: Drift only
›Simulated data
› 5000 independent loci
› 100 samples in each population
› 50 data sets (replicates)
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre 34
Inference of divergence times: Drift only
›Simulated data
› 5000 independent loci
› 100 samples in each population
› 50 data sets (replicates)
›Allele frequency distribution is used to
calculate likelihood of data
› Likelihood is numerically optimized
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference of divergence times: Drift only
35
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
› Comparable to state of the art methods
for inference of divergence times
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes
› An extension built on the beta approximation
› Improves the quality of the approximation
› Simple mathematical formulation
› Works under linear evolutionary forces
› Comparable to state of the art methods
for inference of divergence times
› Recursive formulation enables incorporation
of variable population size
36
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
› Incorporate selection
37
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
› Incorporate selection
› Non-linear evolutionary force
37
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
› Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
37
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
› Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
› Mean and variance are no longer available in closed form
37
An accurate Beta approximation
Paula Tataru [email protected]
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Future work
› Incorporate selection
› Non-linear evolutionary force
› Positive selection increases probability of fixation
› Mean and variance are no longer available in closed form
› Extend the approximation for loss/fixation probabilities to
mean and variance
37