10
Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid ML attempts to find the tree that maximizes the probability of the data given a set of trees and a model Bayesian analyses attempts to find the tree that maximizes the probability of the tree given the data and model Impossible until recently – advances in computational methods (MCMC) and speed Based on Bayes’ Theorem - tells how to update or revise beliefs in light of new evidence a posteriori.

Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Embed Size (px)

Citation preview

Page 1: Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Molecular Systematics• Maximum likelihood approaches are time consuming• Bayesian approaches are similar in approach but more rapid

– ML attempts to find the tree that maximizes the probability of the data given a set of trees and a model

– Bayesian analyses attempts to find the tree that maximizes the probability of the tree given the data and model

• Impossible until recently – advances in computational methods (MCMC) and speed

• Based on Bayes’ Theorem - tells how to update or revise beliefs in light of new evidence a posteriori.

Page 2: Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Molecular Systematics• A simple example –

– Imagine a box of 100 dice– 90% are true, 10% are biased– You pick a die randomly and are asked

to determine if it is true or biased– With no other information you must

conclude that the probability of it being biased is 0.1

– What if you had additional information?

Page 3: Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Molecular Systematics– Roll the die twice – P[result | true die] = 1/62 = 1/36 = 0.0278– P[result | biased die] = 4/21 x 6/21 = 0.0544– The probability of the die being biased given this result is higher than the

probability of the die being true given this result– Bayes’ Theorem:

– P[biased die | results] = 0.18, an increase from the original 0.1– P[true die | results] = 0.82, a decrease from the original 0.9– These are the posterior probabilities that the die you chose is biased or true– You have more information and are able to make a more informed decision– In Bayesian phylogenetics we replace the dice with trees and attempt to maximize

the posterior probability of our final tree given random permutations to a start tree

P[biased die | results] = P[results | biased die] x P[biased die]

P[results | biased die] x P[biased die] + P[results | true die] x P[true die]

P[tree | data] = P[data | tree] x P[tree]

P[data]

P[biased die | results] = 0.0544 x 0.1

0.0544 x 0.1 + 0.0278 x 0.9

Page 4: Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Molecular Systematics• The development that made Bayesian phylogenetic estimation

possible is the Markov chain Monte Carlo (MCMC) method

• MCMC works by taking a series of steps that form a conceptual chain• At each step, a new location in parameter space is proposed via

random perturbation (usually a very small change) • The relative posterior-probability of the new location is calculated

– If the new location has a higher posterior-probability density than that of the present location of the chain, the move is accepted — the proposed location becomes the next link in the chain and the cycle is repeated.

– If the proposed location has a lower posterior-probability density, the move will be accepted only a proportion (p) of the time (small steps downward are accepted often, whereas big leaps down are discouraged)

Page 5: Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Molecular Systematics• If the proposed location is rejected, the present location is added as

the next link in the chain • By repeating this procedure millions of times, a long chain of

locations in parameter space is created• The proportion of the time that any tree (location) is visited along the

course of the chain is an approximation of the posterior probability

Page 6: Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Molecular Systematics• This method suffers from

the same local optimum problem as most other hill climbing methods

• Bayesian analyses overcome this by running several analyses simultaneously, usually 4

• These four independent chains occasionally exchange information in an effort to avoid getting trapped on less than optimal hills

Page 7: Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Molecular Systematics• A MrBayes analysis – a Metropolis-coupled MCMC (MCMCMC):

– Begins by proposing eight random trees (two independent sets of four chains each)

– For one of these sets, all four chains will randomly perturb the trees and recalculate the posterior probabilities

– One chain is considered ‘cold’. This is the chain whose posterior probability is actually measured

Page 8: Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Molecular Systematics– Other three chains are ‘hot’ and have a different (but similar) tree space.– The difference is in the magnitude of the peak heights– Because ‘drops’ are not as large on the heated chains, they are more free to

explore the tree space and less likely to become trapped.

Page 9: Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Molecular Systematics– All four chains continue with occasional switching between them to avoid getting

caught on particular hills– Eventually each set of runs will begin to plateau and run out of changes (even

chain switches) that can improve the tree (convergence)– How do we know when this has been reached?– That’s where the second set of chains comes in – Each set should converge on ~ the same tree– Average standard deviation of split frequencies is a measure of the tree similarity

for each set of chains.

Page 10: Molecular Systematics Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid –ML attempts to find

Molecular Systematics– Once convergence has

occurred, we need to generate a consensus tree

– Important that we don’t include any of the initial (essentially random) trees, only the ones that were obtained after the analysis reached a plateau

– The burn-in is the set of sampled trees that we discard in favor of the (likely) more accurate trees

– The consensus tree is generated from the data collected after the burn-in

burn-in