View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Probabilistic Approaches to Probabilistic Approaches to PhylogenyPhylogeny
Wouter Van Gool & Thomas Jellema
Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny
Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions
8.1 Introduction8.1 Introduction Goal: • Formulate probabilistic models for phylogeny• Infer trees from sets of sequences
Aim Probability-based Phylogeny:Rank trees according to - likelihood P(data |tree) - posterior probability P(tree|data)
8.1 Introduction8.1 Introduction
Compute probability of a set of data given
A tree:
P(x* |T, t* )
x*: set of n sequences xj (j=1…n)
T : tree with n leaves, with sequence j at leaf j
t* : edge lengths of the tree
Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny
Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions
8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution
Given the sequence at the leafs x1…xn:1. Pick a model of evolution: P(x |y,t ),P(x)
2. Enumerate all possible tree topologies with n leaves
3. For each T, maximize over all possible edge lengths t:
4. Pick the T and t that have the largest probability
8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution
Simplifying Assumptions:1. Single base substitions only: ungapped alignments only
2. Each base evolves independently with the same model of evolution based on a substitution matrix
8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution
Substitution Matrix for PhylogenyMany important families of substitution matrices are
multiplicative: S(t)S(s) = S(T+s)
Substitution matrices used in Phylogeny: Jukes & Cantor Model [1969] Kimura DNA Model [1980] PAM Matrix [1978]
Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny
Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions
8.3 Calculating the likelihood for 8.3 Calculating the likelihood for ungapped alignmentsungapped alignments
Example: The likelihood of two nucleotide sequences
8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments
Likelihood for general case
Where node α(i) is the ancestor of node i
A fixed set of values t1…t2n-1 and topology T is required
8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments
Likelihood for general case
Where node α(i) is the ancestor of node i
A fixed set of values t1…t2n-1 and topology T is required
8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments
Felsenstein’s recursive algorithmDefine a table of probabilities Fk,a for each site u and
all tree nodes k and input characters a:
= probability at a site u for subtree below node k
assuming character u at node k is a
8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments
Felsenstein’s recursive algorithm
8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments
Likelihood for general case
Overall algorithm:• Enumerate each tree topology t• Enumerate sets of values t (using some n-
dimensional optimisation technique)• Run Felsenstein’s recursive algortihm for each site
u and multiply likelihoods• Return best T&t
8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments
Reversibility & independence of root position The score of the optimal tree is independent of the
root position if and only if:
- the substitution matrix is multiplicative
- the substitution matrix is reversible A substititution matrix is reversible if for all a,b
and t:
Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny
Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions
Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny
Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions
Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny
Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Maximum likelihood: The best tree “could be “ the tree that maximises the
likelihood Computationally demanding
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Sampling from the posterior distribution: We use Bayes’ rule to compute the posterior probability This is the probability of a model given the data
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Example
Model name prior chance of model data
Model 1 10 100% A
Model 2 40 50% A 50% B
Model 3 50 100% B
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Sampling from the posterior distribution: We use Bayes’ rule to compute the posterior probability This is the probability of a model given the data
100
30
33 10
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Metropolis algorithm It samples from the trees with probabilities given by their
posterior distribution. It is a sampling procedure that generates a sequence of
trees, each from the previous one.
1Tim
e fr
om r
oot
Order of traversal
2
3
4
5
8
7
6
A proposal distribution
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Metropolis algorithm
1Tim
e fr
om r
oot
Order of traversal
2
3
4
5
6
7
8
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Metropolis algorithm
1Tim
e fr
om r
oot
Order of traversal
2
3
4
5
6
7
8
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Metropolis algorithm
1Tim
e fr
om r
oot
Order of traversal
2
3
4
5
6
7
8
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Metropolis algorithm
1Tim
e fr
om r
oot
Order of traversal
2
3
4
5
6
7
8
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Other phylogenetic uses of sampling
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
AATC AATT
Other phylogenetic uses of sampling
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
AATC AATT
AATC
Other phylogenetic uses of sampling
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
AATT TTAA
Other phylogenetic uses of sampling
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
AATC AATT
TCAAAATC
AAAA
TTAA TCAA
Other phylogenetic uses of sampling Inferring the history of populations
Probability density of a coalesence in time =
Probability of a coalesence between any pair
= * =
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Inferring the history of populations When the value of n is large and the value of p is close to 0
the binomial distribution with parameters n and p can be approximated by a Poisson
distribution with mean n*p
n*p = = and x = 1
The probability of a coalesence at the end of the period tk
The total probability of the tree
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
The bootstrap The bootstrap can give a approximation to the posterior. To much labour, so it is an unattractive alternative for
sampling. The bootstrap is probably more useful for non-
probabilistic tree building methods.
8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference
Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny
Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions
Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny
Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions
Conclusion• The methods of today can be used to find the most
probable tree.• Most of the methods were computationally demanding• More realistic evolutionary models are explained Thursday
Probabilistic Approaches to Probabilistic Approaches to PhylogenyPhylogeny
Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny
Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions