Upload
maxwell-frost
View
34
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Parsimony, Likelihood, Common Causes, and Phylogenetic Inference. Elliott Sober Philosophy Department University of Wisconsin, Madison. 2 suggested uses of O’s razor. O’s razor should be used to constrain the order in which hypotheses are to be tested. - PowerPoint PPT Presentation
Citation preview
1
Parsimony, Likelihood, Common Causes, and Phylogenetic Inference
Elliott Sober
Philosophy Department
University of Wisconsin, Madison
2 suggested uses of O’s razor
• O’s razor should be used to constrain the order in which hypotheses are to be tested.
• O’s razor should be used to interpret the acceptability/support of hypotheses that have already been tested.
2
Pluralism about Ockham’s razor?
• [Pre-test] O’s razor should be used to constrain the order in which hypotheses are to be tested.
• [Post-test] O’s razor should be used to interpret the acceptability/support of hypotheses that have already been tested.
3
these can be compatible, but…
• [Pre-test] O’s razor should be used to constrain the order in which hypotheses are to be tested.
• [Post-test] O’s razor should be used to interpret the acceptability/support of hypotheses that have already been tested.
If pre-test O’s razor is “rejectionist,” then
post-test O’s razor won’t have a point.
4
these can be compatible, but…
• [Pre-test] O’s razor should be used to constrain the order in which hypotheses are to be tested.
• [Post-test] O’s razor should be used to interpret the acceptability/support of hypotheses that have already been tested.
If the pre-test idea involves testing hypotheses one
at time, then it views testing as noncontrastive.
5
within the post-test category of support/plausibility ...
• Bayesianism – compute posterior probs.
• Likelihoodism – compare likelihoods.
• Frequentist model selection criteria like AIC – estimate predictive accuracy.
6
I am a pluralist about these broad philosophies …
• Bayesianism – compute posterior probs
• Likelihoodism – compare likelihoods
• Frequentist model selection criteria like AIC – estimate predictive accuracy.
7
I am a pluralist about these broad philosophies …
• Bayesianism – compute posterior probs
• Likelihoodism – compare likelihoods
• Frequentist model selection criteria like AIC – estimate predictive accuracy.
Not that each is okay as a global thesis
about all scientific inference …
8
• Bayesianism – compute posterior probs
• Likelihoodism – compare likelihoods
• Frequentist model selection criteria like AIC – estimate predictive accuracy.
But I do think that each has its place.
I am a pluralist about these broad philosophies …
9
Ockham’s Razors*
Different uses of O’s razor have different
justifications and some have none at all.
* “Let’s Razor Ockham’s Razor,” in D. Knowles (ed.), Explanation and
Its Limits, Cambridge University Press, 1990, 73-94.10
Parsimony and Likelihood
In model selection criteria like AIC and BIC, likelihood and parsimony are conflicting desiderata.
AIC(M) = log[Pr(Data│L(M)] - k
11
In model selection criteria like AIC and BIC, likelihood and parsimony are conflicting desiderata.
In other settings, parsimony
has a likelihood justification.
Parsimony and Likelihood
12
a Reichenbachian idea
Salmon’s example of plagiarism
E1 E2 E1 E2
C C1 C2
[Common Cause] [Separate Causes]
14
a Reichenbachian idea
Salmon’s example of plagiarism
E1 E2 E1 E2
C C1 C2
[Common Cause] [Separate Causes]
more parsimonious
15
Reichenbach’s argument
IF(i) A cause screens-off its effects from each other
(ii) All probabilities are non-extreme (≠ 0,1)
(iii)a particular parameterization of the CC and SC models
(iv)cause/effect relationships are “homogenous” across branches.
THEN Pr[Data │Common Cause] > Pr[Data │Separate Causes].
16
IF(i) A cause screens-off its effects from each other.
(ii) All probabilities are non-extreme
(iii)parameterization of the CC and SC models.
(iv)cause/effect relationships are “homogenous” across branches.
THEN Pr[Data │Common Cause] > Pr[Data │Separate Causes].
The more parsimonious hypothesis
has the higher likelihood.
Reichenbach’s argument
18
IF(i) A cause screens-off its effects from each other.
(ii) All probabilities are non-extreme
(iii)parameterization of the CC and SC models.
(iv)cause/effect relationships are “homogenous” across branches.
THEN Pr[Data │Common Cause] > Pr[Data │Separate Causes].
Parsimony and likelihood are
ordinally equivalent.
Reichenbach’s argument
19
Some differences with Reichenbach
• I am comparing two hypotheses.
• I’m not using R’s Principle of the Common Cause.
• I take the evidence to be the matching of the students’ papers, not their “correlation.”
20
empirical foundations for likelihood ≈ parsimony
(i) A cause screens-off its effects from each other.
(ii) All probabilities are non-extreme
(iii)parameterization of the CC and SC models.
(iv)cause/effect relationships are “homogenous” across branches.
By adopting different assumptions, you can
arrange for CC to be less likely than SC.
Now likelihood and parsimony conflict!21
empirical foundations for likelihood ≈ parsimony
(i) A cause screens-off its effects from each other.
(ii) All probabilities are non-extreme
(iii)parameterization of the CC and SC models.
(iv)cause/effect relationships are “homogenous” across branches.
Note: the R argument shows that these are
sufficient for likelihood ≈ parsimony,
not that they are necessary. 22
Parsimony in Phylogenetic Inference
Two sources: ─ Willi Hennig
─ Luigi Cavalli-Sforza and Anthony Edwards
Two types of inference problem: ─ find the best tree “topology”
─ estimate character states of ancestors
23
24
1. Which tree topology is better?
H C G H C G
(HC)G H(CG)
MP: (HC)G is better supported than H(CG) by data D if and only if
(HC)G is a more parsimonious explanation of D than H(CG) is.
1 1 1 H C G
A=?
26
2. What is the best estimate of the character states of ancestors in an assumed tree?
1 1 1 H C G
A=?
MP says that the best estimate is that A=1.
27
2. What is the best estimate of the character states of ancestors in an assumed tree?
28
Maximum LikelihoodH C G H C G
(HC)G H(CG)
ML: (HC)G is better supported than H(CG) by data D if and only if Pr[D│(HC)G] > Pr[D│H(CG)].
29
Maximum LikelihoodH C G H C G
(HC)G H(CG)
ML: (HC)G is better supported than H(CG) by data D if and only if PrM[D│(HC)G] > PrM[D│H(CG)]. ML is “model dependent.”
the present situation in evolutionary biology
• MP and ML sometimes disagree.
• The standard criticism of MP is that it assumes that evolution proceeds parsimoniously.
• The standard criticism of ML is that you need to choose a model of the evolutionary process.
30
31
When do parsimony and likelihood agree?
• (Ordinal Equivalence) For any data set D and any pair of phylogenetic hypotheses H1and H2, Pr(D│H1) > Pr(D│H2) iff H1 is a more parsimonious explanation of D than H2 is.
32
When do parsimony and likelihood agree?
• (Ordinal Equivalence) For any data set D and any pair of phylogenetic hypotheses H1and H2, PrM(D│H1) > PrM(D│H2) iff H1 is a more parsimonious explanation of D than H2 is.
• Whether likelihood agrees with parsimony depends on the probabilistic model of evolution used.
33
When do parsimony and likelihood agree?
• (Ordinal Equivalence) For any data set D and any pair of phylogenetic hypotheses H1and H2, PrM(D│H1) > PrM(D│H2) iff H1 is a more parsimonious explanation of D than H2 is.
• Whether likelihood agrees with parsimony depends on the probabilistic model of evolution used.
• Felsenstein (1973) showed that the postulate of very low rates of evolution suffices for ordinal equivalence.
34
Does this mean that parsimony assumes that rates are low?
• NO: the assumptions of a method are the propositions that must be true if the method correctly judges support.
35
Does this mean that parsimony assumes that rates are low?
• NO: the assumptions of a method are the propositions that must be true if the method correctly judges support.
• Felsenstein showed that the postulate of low rates suffices for ordinal equivalence, not that it is necessary for ordinal equivalence.
36
Tuffley and Steel (1997)
• T&S showed that the postulate of “no-common-mechanism” also suffices for ordinal equivalence.
• “no-common-mechanism” means that each character on each branch is subject to its own drift process.
the two probability models of evolution
Felsenstein
• Rates of change are low, but not necessarily equal.
• Drift not assumed:
Pr(i j) and Pr(j i)
may differ.
Tuffley and Steel
• Rates of change can be high.
• Drift is assumed:
Pr(i j) = Pr(j i)
37
38
How to use likelihood to define what it means for parsimony to assume something
• The assumptions of parsimony = the propositions that must be true if parsimony correctly judges support.
• For a likelihoodist, parsimony correctly judges support if and only if parsimony is ordinally equivalent with likelihood.
• Hence, for a likelihoodist, parsimony assumes any proposition that follows from ordinal equivalence.
39
A Test for what Parsimony does not assume
Model M ordinal equivalence A
where A = what parsimony assumes
40
A Test for what Parsimony does not assume
Model M ordinal equivalence A
where A = what parsimony assumes
• If model M entails ordinal equivalence, and M entails proposition X,
X may or may not be an assumption of parsimony.
41
A Test for what Parsimony does not assume
Model M ordinal equivalence A
where A = what parsimony assumes
• If model M entails ordinal equivalence, and M entails proposition X,
X may or may not be an assumption of parsimony.
• If model M entails ordinal equivalence, and M does not entail proposition X, then X is not an assumption of parsimony.
42
applications of the negative test
• T&S’s model does not entail that rates of change are low; hence parsimony does not assume that rates are low.
• F’s model does not assume neutral evolution; hence parsimony does not assume neutrality.
43
How to figure out what parsimony does assume?
• Find a model that forces parsimony and likelihood to disagree about some example.
• Then, if parsimony is right in what it says about the example, the model must be false.
44
Example #1
Task: Infer the character state of the MRCA of species that
all exhibit the same state of a quantitative character.
10 10 … 10 10
A=?
The MP estimate is A=10.
When is A=10 the ML estimate? And when is it not?
45
Answer
10 10 … 10 10
A=?
ML says that A=10 is the best estimate (and thus agrees with MP) if there is neutral evolution or selection is pushing each lineage towards a trait value of 10.
46
Answer
10 10 … 10 10
A=?
ML says that A=10 is the best estimate (and thus agrees with MP) if there is neutral evolution or selection is pushing each lineage towards a trait value of 10.
ML says that A=10 is not the best estimate (and thus disagrees with MP) if (*) selection is pushing all lineages towards a single trait value different from 10.
47
Answer
10 10 … 10 10
A=?
ML says that A=10 is the best estimate (and thus agrees with MP) if there is neutral evolution or selection is pushing each lineage towards a trait value of 10.
ML says that A=10 is not the best estimate (and thus disagrees with MP) if (*) selection is pushing all lineages towards a single trait value different from 10.
So: Parsimony assumes, in this problem, that (*) is false.
48
Example #2 Task: Infer the character state of the MRCA of two species that
exhibit different states of a dichotomous character.
1 0
A=?
A=0 and A=1 are equally parsimonious. When are they equally likely? And when are they unequally likely?
49
Answer
1 0
A=?
ML agrees with MP that A=0 and A=1 are equally good estimates if the same neutral process occurs in the two lineages.
50
Answer
1 0
A=?
ML agrees with MP that A=0 and A=1 are equally good estimates if the same neutral process occurs in the two lineages.
ML disagrees with MP if (*) the same selection process occurs in both lineages.
51
Answer
1 0
A=?
ML agrees with MP that A=0 and A=1 are equally good estimates if the same neutral process occurs in the two lineages.
ML disagrees with MP if (*) the same selection process occurs in both lineages.
So: Parsimony assumes, in this problem, that (*) is false.
52
Conclusions about phylogenetic parsimony ≈ likelihood
• The assumptions of parsimony are the propositions that must be true if parsimony correctly judges support.
53
Conclusions about phylogenetic parsimony ≈ likelihood
• The assumptions of parsimony are the propositions that must be true if parsimony correctly judges support.
• To find out what parsimony does not assume, use the test described [M ordinal equivalence A].
54
Conclusions about phylogenetic parsimony ≈ likelihood
• The assumptions of parsimony are the propositions that must be true if parsimony correctly judges support.
• To find out what parsimony does not assume, use the test described [M ordinal equivalence A].
• To find out what parsimony does assume, look for examples in which parsimony and likelihood disagree, not for models that ensure that they agree.
55
Conclusions about phylogenetic parsimony ≈ likelihood
• The assumptions of parsimony are the propositions that must be true if parsimony correctly judges support.
• To find out what parsimony does not assume, use the test described [M ordinal equivalence A].
• To find out what parsimony does assume, look for examples in which parsimony and likelihood disagree, not for models that ensure that they agree.
• Maybe parsimony’s assumptions vary from problem to problem.
broader conclusions
• underdetermination: O’s razor often comes up when the data don’t settle truth/falsehood or acceptance/rejection.
56
broader conclusions
• underdetermination: O’s razor often comes up when the data don’t settle truth/falsehood or acceptance/rejection.
• reductionism: when O’s razor has authority, it does so
because it reflects some other, more fundamental, desideratum.
57
broader conclusions
• underdetermination: O’s razor often comes up when the data don’t settle truth/falsehood or acceptance/rejection.
• reductionism: when O’s razor has authority, it does so
because it reflects some other, more fundamental, desideratum. [But there isn’t a single global justification.]
58
broader conclusions
• underdetermination: O’s razor often comes up when the data don’t settle truth and falsehood.
• reductionism: when O’s razor has authority, it does so
because it reflects some other, more fundamental, desideratum.
• two questions: When parsimony has a precise meaning, we can investigate: What are its presuppositions? What suffices to justify it?
59
A curiosity: in the R argument, to get a difference in likelihood, the hypotheses should not specify the
states of the causes.
E1 E2 E1 E2
p1 p2 p1 p2
C C1 C2
[Common Cause] [Separate Causes]
60
61
Example #0
Task: Infer the character state of the MRCA of species that
all exhibit the same state of a dichotomous character.
1 1 … 1 1
A=?
The MP inference is that A=1. When is A=1 the ML inference?
62
Example #0
Task: Infer the character state of the MRCA of species that
all exhibit the same state of a dichotomous character.
1 1 … 1 1
A=?
The MP inference is that A=1. When is A=1 the ML inference?
Answer: when lineages have finite duration and the process is Markovian. It doesn’t matter whether selection or drift is the process at work.