Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Week 7: Bayesian inference, Testing trees, Bootstraps
Genome 570
February, 2012
Week 7: Bayesian inference, Testing trees, Bootstraps – p.1/65
Bayes’ Theorem
Conditional probability of hypothesis given data is:
Prob (H | D) =Prob (H & D)
Prob (D)
SinceProb (H & D) = Prob (H) Prob (D | H)
Substituting this in:
Prob (H | D) =Prob (H) Prob (D | H)
Prob (D)
The denominator Prob (D) is the sum of the numerators over all possible
hypotheses H
Prob (H | D) =Prob (H) Prob (D | H)∑H Prob (H) Prob (D | H)
Week 7: Bayesian inference, Testing trees, Bootstraps – p.2/65
A visual example of Bayes’ Theorem
P(H 1
) P(H ) P(H )2 3
2P(D | H )
2
P(H |D ) = 3 2
+ +
12P(D | H ) 2
P(D )| H3
prior probabilities
likelih
oods
posterior probability
Week 7: Bayesian inference, Testing trees, Bootstraps – p.3/65
A dramatic, if not real, example
Example: “Space probe photos show no Little Green Men on Mars!”
priors
posteriors
no
yes no
yes
noyes
no
yes
likelihoods
0
no
yes
1
Week 7: Bayesian inference, Testing trees, Bootstraps – p.4/65
Calculations for that example
Using the Odds Ratio form of Bayes’ Theorem:
Prob (H1|D)
Prob (H2|D)=
Prob (D|H1)
Prob (D|H2)
Prob (H1)
Prob (H2)
︸ ︷︷ ︸ ︸ ︷︷ ︸ ︸︷︷︸posterior likelihood priorodds ratio ratio odds ratio
For the odds favoring their existence, the calculation is, for the optimist
about Little Green Men:
4
1× 1/3
1=
4/3
1= 4 : 3
While for the pessimist it is
1
4× 1/3
1=
1/12
1= 1 : 12
Week 7: Bayesian inference, Testing trees, Bootstraps – p.5/65
With repeated observation the prior matters less
If we send 5 space probes, and all fail to see LGMs, since the probability
of this observation is (1/3)5 if there are LGMs, and 1 if there aren’t,
we get for the optimist about Little Green Men:
4
1× (1/3)5
1= =
4/243
1= 4 : 243
while for the pessimist about Little Green Men:
1
4× (1/3)5
1= =
1/972
1= 1 : 972
Week 7: Bayesian inference, Testing trees, Bootstraps – p.6/65
A coin tossing example
The
prior
0.0 0.2 0.4 0.6 0.8 1.0
p0.0 0.2 0.4 0.6 0.8 1.0
p
The
likelihood
function
0.0 0.2 0.4 0.6 0.8 1.00
0.0 0.2 0.4 0.6 0.8 1.0
The
posterior
0.0 0.2 0.4 0.6 0.8 1.0
p
0.0 0.2 0.4 0.6 0.8 1.0
p
11 tosses with 5 heads 44 tosses with 20 heads
Week 7: Bayesian inference, Testing trees, Bootstraps – p.7/65
Markov Chain Monte Carlo sampling
To draw trees from a distribution whose probabilities are proportional to
f(t), we can use the Metropolis algorithm:
1. Start at some tree. Call this Ti.
2. Pick a tree that is a neighbor of this tree in the graph of trees. Callthis the proposal Tj.
3. Compute the ratio of the probabilities (or probability densityfunctions) of the proposed new tree and the old tree: trees:
R =f(Tj)
f(Ti)
4. If R ≥ 1, accept the new tree as the current tree.
5. If R < 1, draw a uniform random number (a random fraction between0 and 1). If it is less than R, accept the new tree as the current tree.
6. Otherwise reject the new tree and continue with tree Ti as thecurrent tree.
7. Return to step 2.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.8/65
Reversibility (again)
πi
jπPij
Pji
Week 7: Bayesian inference, Testing trees, Bootstraps – p.9/65
Does it achieve the desired equilibrium distribution?
If f(Ti) > f(Tj), then Prob (Ti | Tj) = 1, and Prob (Tj | Ti) = f(Tj)/f(Ti)so that
Prob (Tj | Ti)
Prob (Ti | Tj)=
f(Tj)
f(Ti)
(the same formula can be shown to hold when f(Ti) < f(Tj) ). Then in
both casesf(Ti) Prob (Tj | Ti) = Prob (Ti | Tj) f(Tj)
Summing over all Ti, the right-hand side sums up to f(Tj) so
∑
Ti
f(Ti) Prob (Tj | Ti) = f(Tj)
This shows that applying this algorithm, if we start in the desired
distribution, we stay in it. It is also true under most conditions that if youstart in any other distribution, you converge to this one.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.10/65
Bayesian MCMC
We try to achieve the posterior
Prob (T) Prob (D | T) / (denominator)
and this turns out to need the acceptance ratio
R =Prob (Tnew) Prob (D | Tnew)
Prob (Told) Prob (D | Told)
(the denominators are the same and cancel out. This is a great
convenience, as we often cannot evaluate the deonominator, but we canusually evaluate the numerators).
Note that we could also have a prior on model parameters too, and as wemove through tree space we could also be moving through parameterspace.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.11/65
Mau and Newton’s proposal mechanism
A C BD EF ACB D EF ACB D EF
ACB D EF ACB D EFACB D EF
Week 7: Bayesian inference, Testing trees, Bootstraps – p.12/65
Using MrBayes on the primates data
Bovine
Lemur
Tarsier
Crab E.Mac
Rhesus Mac
Jpn Macaq
BarbMacaq
Orang
Chimp
Human
Gorilla
Gibbon
Squir Monk
Mouse
0.416
0.899
1.00
0.999
1.00
0.997
0.986
0.905
0.949
0.938
0.856
Frequencies of partitions (posterior clade probabilities)Week 7: Bayesian inference, Testing trees, Bootstraps – p.13/65
Issues to think about with Bayesian inference
Where do you get your prior from?
Week 7: Bayesian inference, Testing trees, Bootstraps – p.14/65
Issues to think about with Bayesian inference
Where do you get your prior from?
Are you assuming each branch has a length drawn
independently from a distribution? How wide a distribution?
Week 7: Bayesian inference, Testing trees, Bootstraps – p.14/65
Issues to think about with Bayesian inference
Where do you get your prior from?
Are you assuming each branch has a length drawn
independently from a distribution? How wide a distribution?
Or is the tree drawn from a birth-death process? If so, what arethe rates of birth and death?
Week 7: Bayesian inference, Testing trees, Bootstraps – p.14/65
Issues to think about with Bayesian inference
Where do you get your prior from?
Are you assuming each branch has a length drawn
independently from a distribution? How wide a distribution?
Or is the tree drawn from a birth-death process? If so, what arethe rates of birth and death?Or is there also a stage where the species studied are chosen
are selected from all extant species of the group? How do you
model that? Are you modelling biologists’ decision-makingprocesses?
Week 7: Bayesian inference, Testing trees, Bootstraps – p.14/65
Issues to think about with Bayesian inference
Where do you get your prior from?
Are you assuming each branch has a length drawn
independently from a distribution? How wide a distribution?
Or is the tree drawn from a birth-death process? If so, what arethe rates of birth and death?Or is there also a stage where the species studied are chosen
are selected from all extant species of the group? How do you
model that? Are you modelling biologists’ decision-makingprocesses?
Is your prior the same as your reader’s prior?
Week 7: Bayesian inference, Testing trees, Bootstraps – p.14/65
Issues to think about with Bayesian inference
Where do you get your prior from?
Are you assuming each branch has a length drawn
independently from a distribution? How wide a distribution?
Or is the tree drawn from a birth-death process? If so, what arethe rates of birth and death?Or is there also a stage where the species studied are chosen
are selected from all extant species of the group? How do you
model that? Are you modelling biologists’ decision-makingprocesses?
Is your prior the same as your reader’s prior?
If not, what do you do?
Week 7: Bayesian inference, Testing trees, Bootstraps – p.14/65
Issues to think about with Bayesian inference
Where do you get your prior from?
Are you assuming each branch has a length drawn
independently from a distribution? How wide a distribution?
Or is the tree drawn from a birth-death process? If so, what arethe rates of birth and death?Or is there also a stage where the species studied are chosen
are selected from all extant species of the group? How do you
model that? Are you modelling biologists’ decision-makingprocesses?
Is your prior the same as your reader’s prior?
If not, what do you do?
Use several priors?
Week 7: Bayesian inference, Testing trees, Bootstraps – p.14/65
Issues to think about with Bayesian inference
Where do you get your prior from?
Are you assuming each branch has a length drawn
independently from a distribution? How wide a distribution?
Or is the tree drawn from a birth-death process? If so, what arethe rates of birth and death?Or is there also a stage where the species studied are chosen
are selected from all extant species of the group? How do you
model that? Are you modelling biologists’ decision-makingprocesses?
Is your prior the same as your reader’s prior?
If not, what do you do?
Use several priors?
Just give the reader the likelihood surface and let them provide
their own prior?
Week 7: Bayesian inference, Testing trees, Bootstraps – p.14/65
An example
Suppose we have two species with a Jukes-Cantor model, so that theestimation of the (unrooted) tree is simply the estimation of the branchlength between the two species.
We can express the result either as branch length t, or as the net
probability of base change
p =3
4
(1 − exp(−4
3t)
)
Week 7: Bayesian inference, Testing trees, Bootstraps – p.15/65
A flat prior on p
0.0 0.5 0.750.25
p
Week 7: Bayesian inference, Testing trees, Bootstraps – p.16/65
The corresponding prior on t
0 1 2 3 4 5
t
So which is the “flat prior”?Week 7: Bayesian inference, Testing trees, Bootstraps – p.17/65
Flat prior for t between 0 and 5
0 1 2 3 4 5
t
Week 7: Bayesian inference, Testing trees, Bootstraps – p.18/65
The corresponding prior on p
0.0 0.750.50.25
p
Week 7: Bayesian inference, Testing trees, Bootstraps – p.19/65
The invariance of the ML estimator to scale change
^t̂ = 0.383112
( p = 0.3 )
0 1 2 3 4 5−14
−12
−10
−8
−6
t
ln L
Likelihood curve for twhen 3 sites differ out of 10
Week 7: Bayesian inference, Testing trees, Bootstraps – p.20/65
The invariance of the ML estimator to scale change
0.0 0.2 0.4 0.6 0.8−14
−12
−10
−8
−6
ln L
p
p = 0.3^
^( t = 0.383112 )Likelihood curve for p
when 3 sites differ out of 10
Week 7: Bayesian inference, Testing trees, Bootstraps – p.21/65
When t has a wide flat prior
0
1
2
3
4
5
1 10 100 1000
T
t 95% interval
MLE
The 95% two-tailed credible interval for t with varioustruncation points on a flat prior for t
Week 7: Bayesian inference, Testing trees, Bootstraps – p.22/65
What is going on in that case is ...
0
T is so large this is < 2.5% of the area
T
Week 7: Bayesian inference, Testing trees, Bootstraps – p.23/65
The likelihood curve is nearly a normal distribution
for large amounts of data
θ
θ
θ from t(x), the "sufficient statistic"
the value for our data set
If we have large amounts of data, the values of parameters we need to tryare all very similar, and the shape of the distribution (which is nearlynormal) will not be too different for these values.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.24/65
The likelihood curve is nearly a normal distribution
for large amounts of data
θ
θ
θ from t(x), the "sufficient statistic"
the value for our data set
If we have large amounts of data, the values of parameters we need to tryare all very similar, and the shape of the distribution (which is nearlynormal) will not be too different for these values.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.25/65
The likelihood curve is nearly a normal distribution
for large amounts of data
θ
θ
θ from t(x), the "sufficient statistic"
the value for our data set
If we have large amounts of data, the values of parameters we need to tryare all very similar, and the shape of the distribution (which is nearlynormal) will not be too different for these values.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.26/65
The likelihood curve is nearly a normal distribution
for large amounts of data
θ
θ
θ from t(x), the "sufficient statistic"
the value for our data set
If we have large amounts of data, the values of parameters we need to tryare all very similar, and the shape of the distribution (which is nearlynormal) will not be too different for these values.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.27/65
The likelihood curve is nearly a normal distribution
for large amounts of data
θ
θ
θ from t(x), the "sufficient statistic"
the value for our data set
If we have large amounts of data, the values of parameters we need to tryare all very similar, and the shape of the distribution (which is nearlynormal) will not be too different for these values.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.28/65
The likelihood curve is nearly a normal distribution
for large amounts of data
θ
θ
θ from t(x), the "sufficient statistic"
the value for our data set
If we have large amounts of data, the values of parameters we need to tryare all very similar, and the shape of the distribution (which is nearlynormal) will not be too different for these values.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.29/65
The likelihood curve is nearly a normal distribution
for large amounts of data
θ
θ
θ from t(x), the "sufficient statistic"
the value for our data set
If we have large amounts of data, the values of parameters we need to tryare all very similar, and the shape of the distribution (which is nearlynormal) will not be too different for these values.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.30/65
The likelihood curve is nearly a normal distribution
for large amounts of data
θ
θ
θ from t(x), the "sufficient statistic"
the value for our data set
If we have large amounts of data, the values of parameters we need to tryare all very similar, and the shape of the distribution (which is nearlynormal) will not be too different for these values.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.31/65
Curvatures and covariances of ML estimates
ML estimates have covariances computable from curvatures of theexpected log-likelihood:
Var[θ̂]≃ −1
/ (d2E(log(L))
dθ2
)
The same is true when there are multiple parameters:
Var[θ̂
]≃ V ≃ −C
−1
where
Cij = E
(∂2 log(L)
∂θi ∂θj
)
Week 7: Bayesian inference, Testing trees, Bootstraps – p.32/65
With large amounts of data, asymptotically
When the true value of θ is θ0,
θ̂ − θ0√v
∼ N (0, 1)
Since 1/v is the negative of the curvature of the log-likelihood:
ln L (θ0) = ln L(θ̂) − 1
2
(θ0 − θ̂)2
v
so that twice the difference of log-likelihoods is the square of a normal:
2(ln L(θ̂) − ln L (θ0)
)∼ χ2
1
Week 7: Bayesian inference, Testing trees, Bootstraps – p.33/65
Corresponding results for multiple parameters
ln L(θ0) ≃ ln L(θ0) −1
2(θ0 − θ)
TC (θ0 − θ)
(θ − θ0)TC (θ − θ0) ∼ χ2
p
so that the log-likelihood difference is:
2(ln L(θ̂) − ln L (θ0)
)∼ χ2
p
When q of the p parameters are constrained:
2(ln L(θ̂) − ln L (θ0)
)∼ χ2
q
Week 7: Bayesian inference, Testing trees, Bootstraps – p.34/65
Likelihood ratio interval for a parameter
−2620
−2625
−2630
−2635
−2640
5 10 20 50 100 200
Transition / transversion ratio
ln L
Inferring the transition/transversion ratio for an F84 model with the14-species primate mitochondria data set.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.35/65
LRT of a molecular clock – how many parameters?
A B C D E
v2
v1
v3
v4
v5
v6
v7
v8
Constraints for a clock
v2
v1 =
v4
v5=
v3
v7
v4
v8=+ +
v1
v6
v3=+
How does each equation constrain the branch lengths in the unrooted
tree? What about the red equation?
Week 7: Bayesian inference, Testing trees, Bootstraps – p.36/65
Likelihood Ratio Test for a molecular clock
Using the 7-species mitochondrial DNA data set (the great apes plus
Bovine and Mouse), we get with Ts/Tn = 30 and an F84 model:
Tree ln L
No clock −1372.77620
Clock −1414.45053
Difference 41.67473
Chi-square statistic: 2 × 41.675 = 83.35, with n − 2 = 5 degrees of
freedom – highly significant.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.37/65
Model selection using the LRT
Jukes−Cantor K2P, T=2
K2P, T estimated
F84, T=2F81
25
26
28
29
27
F84, T estimated
Parameters
The problem with using likelihood ratio tests is the multiplicity of tests andthe multiple routes to the same hypotheses.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.38/65
The Akaike Information Criterion
Compare between hypotheses −2 ln L + 2p (the same as reducing the
log-likelihood by the number of parameters)
Number ofModel ln L parameters AIC
Jukes-Cantor −3068.29186 25 6186.58K2P, R = 2.0 −2953.15830 25 5956.32
K2P, R̂ = 1.889 −2952.94264 26 5957.89F81 −2935.25430 28 5926.51F84, R = 2.0 −2680.32982 28 5416.66
F84, R̂ = 28.95 −2616.3981 29 5290.80
Week 7: Bayesian inference, Testing trees, Bootstraps – p.39/65
Can we test trees using the LRT?
0.0 0.1 0.2−200
−198
−196
−194
−192
A B C
A
A C B
B C
0.2 0.1
t
t
t
t t
ln L
ike
lih
oo
d
If so, how many degrees of freedom for the comparison of the two peaks?
These are three-species clocklike trees (shown here plotted in a “profilelog-likelihood plot” plotting the highest likelihood for each value of the
interior branch length). Week 7: Bayesian inference, Testing trees, Bootstraps – p.40/65
The bootstrapθ(unknown) true value of
(unknown) true distribution
estimate of θ
Distribution of estimates of parameters
150 data points
(each 150 draws)
empirical distribution of sample
Bootstrap replicates
An example with mixed normal distributions. Draw from the empiricaldistribution 150 times if there are 150 data points. With replacement!
Week 7: Bayesian inference, Testing trees, Bootstraps – p.41/65
The bootstrap for phylogenies
OriginalData
sequences
sites
Bootstrapsample#1
Bootstrapsample
#2
Estimate of the tree
Bootstrap estimate ofthe tree, #1
Bootstrap estimate of
sample same number
of sites, with replacement
sample same number
of sites, with replacement
sequences
sequences
sites
sites
(and so on)the tree, #2
Drawing columns of the data matrix, with replacement.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.42/65
A partition defined by a branch in the first tree
Trees:
How many times each partition of species is found:
AE | BCDFACE | BDFACEF | BD 1AC | BDEFAEF | BCDADEF | BCABDF | ECABCE | DF
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
A C
Week 7: Bayesian inference, Testing trees, Bootstraps – p.43/65
Another partition from the first tree
Trees:
How many times each partition of species is found:
AE | BCDFACE | BDFACEF | BD 1AC | BDEFAEF | BCDADEF | BCABDF | ECABCE | DF
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
A C
1
Week 7: Bayesian inference, Testing trees, Bootstraps – p.44/65
The third partition from that tree
Trees:
How many times each partition of species is found:
AE | BCDFACE | BDFACEF | BD 1AC | BDEFAEF | BCDADEF | BCABDF | ECABCE | DF
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
A C
11
Week 7: Bayesian inference, Testing trees, Bootstraps – p.45/65
Partitions from the second tree
Trees:
How many times each partition of species is found:
AE | BCDF
ACEF | BD 1
AEF | BCDADEF | BCABDF | EC
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
A C
1ACE | BDF
AC | BDEF
ABCE | DF
B
DF
E
C
A
1
2
1
Week 7: Bayesian inference, Testing trees, Bootstraps – p.46/65
Partitions from the third tree
Trees:
How many times each partition of species is found:
ACE | BDFACEF | BDAC | BDEF 1
ABDF | ECABCE | DF
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
12
AE | BCDF
AEF | BCD 1ADEF | BC
B
DF
E
A C
2
1
1
Week 7: Bayesian inference, Testing trees, Bootstraps – p.47/65
Partitions from the fourth tree
Trees:
How many times each partition of species is found:
ACE | BDFACEF | BD 1AC | BDEF 1AEF | BCD 1
ABDF | EC
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
A C
2AE | BCDF 3
ADEF | BC 2
ABCE | DF
B
DF
E
C
A
2
Week 7: Bayesian inference, Testing trees, Bootstraps – p.48/65
Partitions from the fifth tree
Trees:
How many times each partition of species is found:
AE | BCDF 3
ACEF | BD 1AC | BDEF 1AEF | BCD 1ADEF | BC 2
B
DF
E
C
A
B
DF
E
C
A
ACE | BDF 3
ABDF | EC 1ABCE | DF 3
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
A C
Week 7: Bayesian inference, Testing trees, Bootstraps – p.49/65
The table of partitions from all trees
Trees:
How many times each partition of species is found:
AE | BCDF 3ACE | BDF 3ACEF | BD 1AC | BDEF 1AEF | BCD 1ADEF | BC 2ABDF | EC 1ABCE | DF 3
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
A C
Week 7: Bayesian inference, Testing trees, Bootstraps – p.50/65
The majority-rule consensus tree
C
A
Trees:
How many times each partition of species is found:
AE | BCDF 3ACE | BDF 3ACEF | BD 1AC | BDEF 1AEF | BCD 1ADEF | BC 2ABDF | EC 1ABCE | DF 3
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
C
A
B
DF
E
A C
B
DF
E60
60 60
Week 7: Bayesian inference, Testing trees, Bootstraps – p.51/65
How do we know that the MR consensus tree will be a tree?
Suppose that for each partition in a tree we construct a (fake)morphological character with 0 for one set in the partition, 1 for theother.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.52/65
How do we know that the MR consensus tree will be a tree?
Suppose that for each partition in a tree we construct a (fake)morphological character with 0 for one set in the partition, 1 for theother.
Such a character is compatible with a tree if (and only if) it containsthat partition.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.52/65
How do we know that the MR consensus tree will be a tree?
Suppose that for each partition in a tree we construct a (fake)morphological character with 0 for one set in the partition, 1 for theother.
Such a character is compatible with a tree if (and only if) it containsthat partition.
If two of these characters both occur in more than 50% of the trees,they must co-occur in at least one tree.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.52/65
How do we know that the MR consensus tree will be a tree?
Suppose that for each partition in a tree we construct a (fake)morphological character with 0 for one set in the partition, 1 for theother.
Such a character is compatible with a tree if (and only if) it containsthat partition.
If two of these characters both occur in more than 50% of the trees,they must co-occur in at least one tree.
Thus the set of these “characters” that occur in more then 50% ofthe trees are all pairwise compatible.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.52/65
How do we know that the MR consensus tree will be a tree?
Suppose that for each partition in a tree we construct a (fake)morphological character with 0 for one set in the partition, 1 for theother.
Such a character is compatible with a tree if (and only if) it containsthat partition.
If two of these characters both occur in more than 50% of the trees,they must co-occur in at least one tree.
Thus the set of these “characters” that occur in more then 50% ofthe trees are all pairwise compatible.
By the Pairwise Compatibility Theorem (remember that?) they mustthen be jointly compatible
Week 7: Bayesian inference, Testing trees, Bootstraps – p.52/65
How do we know that the MR consensus tree will be a tree?
Suppose that for each partition in a tree we construct a (fake)morphological character with 0 for one set in the partition, 1 for theother.
Such a character is compatible with a tree if (and only if) it containsthat partition.
If two of these characters both occur in more than 50% of the trees,they must co-occur in at least one tree.
Thus the set of these “characters” that occur in more then 50% ofthe trees are all pairwise compatible.
By the Pairwise Compatibility Theorem (remember that?) they mustthen be jointly compatible
So there must be a tree that contains them all.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.52/65
The MR tree with 14-species primate mtDNA data
Bovine
Mouse
Squir Monk
Chimp
Human
Gorilla
Orang
Gibbon
Rhesus Mac
Jpn Macaq
Crab−E.Mac
BarbMacaq
Tarsier
Lemur
80
72
74
99
99
100
77
42
35
49
84
Week 7: Bayesian inference, Testing trees, Bootstraps – p.53/65
Potential problems with the bootstrap
1. Sites may not evolve independently
2. Sites may not come from a common distribution (but can considerthem sampled from a mixture of possible distributions)
3. If do not know which branch is of interest at the outset, a“multiple-tests" problem means P values are overstated
4. P values are biased (too conservative)
5. Bootstrapping does not correct biases in phylogeny methods
Week 7: Bayesian inference, Testing trees, Bootstraps – p.54/65
Other resampling methods
Delete-half jackknife. Sample a random 50% of the sites, without
replacement.
Delete-1/e jackknife (Farris et. al. 1996) (too little deletion from a
statistical viewpoint).
Reweighting characters by choosing weights from an exponentialdistribution.
In fact, reweighting them by any exchangeable weights having
coefficient of variation of 1
Parametric bootstrap – simulate data sets of this size assuming the
estimate of the tree is the truth
(to correct for correlation among adjacent sites) (Künsch, 1989)
Block-bootstrapping – sample n/b blocks of b adjacent sites.
Week 7: Bayesian inference, Testing trees, Bootstraps – p.55/65
With the delete-half jackknife
Bovine
Mouse
Squir Monk
Chimp
Human
Gorilla
Orang
Gibbon
Rhesus Mac
Jpn Macaq
Crab−E.Mac
BarbMacaq
Tarsier
Lemur
80
99
100
84
98
69
72
80
50
59
32
Week 7: Bayesian inference, Testing trees, Bootstraps – p.56/65
Bootstrap versus jackknife in a simple case
Exact computation of the effects of
deletion fraction for the jackknife
n1
n2
n characters
n(1−δ) charactersm
1m
2
m2
>m1
Prob( )
m2
>m1
Prob( )
m2
>m1
Prob( )Prob(
m2
=m1
Prob( )+ 1
2
We can compute for various n’s the probabilities
of getting more evidence for group 1 than for group 2
A typical result is for n1
= 10, n2
= 8, n = 100 :
Bootstrap
Jackknife
δ = 1/2 δ = 1/e
0.6384
0.7230
0.6807
0.5923
0.7587
0.6755
0.6441
0.8040
0.7240
(suppose 1 and 2 are conflicting groups)
Week 7: Bayesian inference, Testing trees, Bootstraps – p.57/65
Probability of a character being omitted from a bootstrap
N (1 − 1/N)N
1 0 11 0.35049 25 0.36040 100 0.36603
2 0.25 12 0.35200 30 0.36166 150 0.36665
3 0.29630 13 0.35326 35 0.36256 200 0.36696
4 0.31641 14 0.35434 40 0.36323 250 0.36714
5 0.32768 15 0.35526 45 0.36375 300 0.36727
6 0.33490 16 0.35607 50 0.36417 500 0.36751
7 0.33992 17 0.35679 60 0.36479 1000 0.36770
8 0.34361 18 0.35742 70 0.36524 ∞ 0.36788
9 0.34644 19 0.35798 80 0.36557
10 0.34868 20 0.35849 90 0.36583
Week 7: Bayesian inference, Testing trees, Bootstraps – p.58/65
A toy example to examine bias of P values
0
True value of mean
"Topology" II "Topology" I
True distribution of sample meansDistribution of individualvalues of x
of sample means
Estimated distributions
Assuming a normal distribution, trying to infer whether the mean is above0, when the mean is unknown and the variance known to be 1
Week 7: Bayesian inference, Testing trees, Bootstraps – p.59/65
Bias in the P values
estimate of the "phylogeny"
topology Itopology II 0
P
note that thetrue P is moreextreme thanthe average ofthe P’s
the true mean
Week 7: Bayesian inference, Testing trees, Bootstraps – p.60/65
How much bias in the P values?
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Avera
ge P
True P
Week 7: Bayesian inference, Testing trees, Bootstraps – p.61/65
Bias in the P values with different priors
0.00
0.20
0.40
0.60
0.80
1.00
0.00 0.50 1.00
Pro
babili
ty o
f corr
ect to
polo
gy
1.0
0.1
2.02
= n σ
µP for expectation of
Week 7: Bayesian inference, Testing trees, Bootstraps – p.62/65
The parametric bootstrap
original
data
estimate
of tree
data
set #1
data
data
data
set #2
set #3
set #100
computer
simulation
estimation
of tree
T1
T
T
2
T3
100
Week 7: Bayesian inference, Testing trees, Bootstraps – p.63/65
The parametric bootstrap with the primates data
Bovine
Lemur
Tarsier
Squirrel Monkey
Mouse
Jp Macacque
Barbary Mac
Crab−Eating Mac
Rhesus Mac
Gorilla
Chimp
Human
Orang
Gibbon
96
95
96
93
98
82
98
81
78
83
98
Week 7: Bayesian inference, Testing trees, Bootstraps – p.64/65
Goldman’s test using simulation
data
T
Tc
no clock
clock
l
l
l − l2 (
. . .
data data data . . . data
estimating clocklike and nonclocklike trees from each data set ...
l − l2 ( l − l2 ( l − l2 (
l − l2 (
simulating data sets
. . .
(related to the "parametric bootstrap")
c )l − l2 ( )
c )
c )
c
c )
) c
c
log−likelihoodtree
Week 7: Bayesian inference, Testing trees, Bootstraps – p.65/65