49
Probabilistic Probabilistic Approaches to Approaches to Phylogeny Phylogeny Wouter Van Gool & Thomas Jellema

Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Probabilistic Approaches to Probabilistic Approaches to PhylogenyPhylogeny

Wouter Van Gool & Thomas Jellema

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

8.1 Introduction8.1 Introduction Goal: • Formulate probabilistic models for phylogeny• Infer trees from sets of sequences

Aim Probability-based Phylogeny:Rank trees according to - likelihood P(data |tree) - posterior probability P(tree|data)

8.1 Introduction8.1 Introduction

Compute probability of a set of data given

A tree:

P(x* |T, t* )

x*: set of n sequences xj (j=1…n)

T : tree with n leaves, with sequence j at leaf j

t* : edge lengths of the tree

8.1 Introduction8.1 Introduction

Example

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

Given the sequence at the leafs x1…xn:1. Pick a model of evolution: P(x |y,t ),P(x)

2. Enumerate all possible tree topologies with n leaves

3. For each T, maximize over all possible edge lengths t:

4. Pick the T and t that have the largest probability

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

Simplifying Assumptions:1. Single base substitions only: ungapped alignments only

2. Each base evolves independently with the same model of evolution based on a substitution matrix

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

Substitution Matrix for PhylogenyMany important families of substitution matrices are

multiplicative: S(t)S(s) = S(T+s)

Substitution matrices used in Phylogeny: Jukes & Cantor Model [1969] Kimura DNA Model [1980] PAM Matrix [1978]

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

Jukes-Cantor Model

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

Kimura DNA model

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

PAM matrix model

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

8.3 Calculating the likelihood for 8.3 Calculating the likelihood for ungapped alignmentsungapped alignments

Example: The likelihood of two nucleotide sequences

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Likelihood for general case

Where node α(i) is the ancestor of node i

A fixed set of values t1…t2n-1 and topology T is required

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Likelihood for general case

Where node α(i) is the ancestor of node i

A fixed set of values t1…t2n-1 and topology T is required

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Felsenstein’s recursive algorithmDefine a table of probabilities Fk,a for each site u and

all tree nodes k and input characters a:

= probability at a site u for subtree below node k

assuming character u at node k is a

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Felsenstein’s recursive algorithm

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Likelihood for general case

Overall algorithm:• Enumerate each tree topology t• Enumerate sets of values t (using some n-

dimensional optimisation technique)• Run Felsenstein’s recursive algortihm for each site

u and multiply likelihoods• Return best T&t

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Reversibility & independence of root position The score of the optimal tree is independent of the

root position if and only if:

- the substitution matrix is multiplicative

- the substitution matrix is reversible A substititution matrix is reversible if for all a,b

and t:

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

DemoDemo

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Maximum likelihood: The best tree “could be “ the tree that maximises the

likelihood Computationally demanding

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Sampling from the posterior distribution: We use Bayes’ rule to compute the posterior probability This is the probability of a model given the data

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Example

Model name prior chance of model data

Model 1 10 100% A

Model 2 40 50% A 50% B

Model 3 50 100% B

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Sampling from the posterior distribution: We use Bayes’ rule to compute the posterior probability This is the probability of a model given the data

100

30

33 10

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Metropolis algorithm It samples from the trees with probabilities given by their

posterior distribution. It is a sampling procedure that generates a sequence of

trees, each from the previous one.

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Metropolis algorithm

1Tim

e fr

om r

oot

Order of traversal

2

3

4

5

8

7

6

A proposal distribution

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Metropolis algorithm

1Tim

e fr

om r

oot

Order of traversal

2

3

4

5

6

7

8

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Metropolis algorithm

1Tim

e fr

om r

oot

Order of traversal

2

3

4

5

6

7

8

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Metropolis algorithm

1Tim

e fr

om r

oot

Order of traversal

2

3

4

5

6

7

8

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Metropolis algorithm

1Tim

e fr

om r

oot

Order of traversal

2

3

4

5

6

7

8

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Metropolis algorithm

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Other phylogenetic uses of sampling

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

AATC AATT

Other phylogenetic uses of sampling

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

AATC AATT

AATC

Other phylogenetic uses of sampling

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

AATT TTAA

Other phylogenetic uses of sampling

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

AATC AATT

TCAAAATC

AAAA

TTAA TCAA

Other phylogenetic uses of sampling Inferring the history of populations

Probability density of a coalesence in time =

Probability of a coalesence between any pair

= * =

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Inferring the history of populations When the value of n is large and the value of p is close to 0

the binomial distribution with parameters n and p can be approximated by a Poisson

distribution with mean n*p

n*p = = and x = 1

The probability of a coalesence at the end of the period tk

The total probability of the tree

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

The bootstrap The bootstrap can give a approximation to the posterior. To much labour, so it is an unattractive alternative for

sampling. The bootstrap is probably more useful for non-

probabilistic tree building methods.

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

DemoDemo

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Conclusion• The methods of today can be used to find the most

probable tree.• Most of the methods were computationally demanding• More realistic evolutionary models are explained Thursday

Probabilistic Approaches to Probabilistic Approaches to PhylogenyPhylogeny

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Questions????

Probabilistic Approaches to Probabilistic Approaches to PhylogenyPhylogeny