Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

1

Parsimony, Likelihood, Common Causes, and Phylogenetic Inference

Elliott Sober

Philosophy Department

University of Wisconsin, Madison

2 suggested uses of O’s razor

• O’s razor should be used to constrain the order in which hypotheses are to be tested.

• O’s razor should be used to interpret the acceptability/support of hypotheses that have already been tested.

2

Pluralism about Ockham’s razor?

• [Pre-test] O’s razor should be used to constrain the order in which hypotheses are to be tested.

• [Post-test] O’s razor should be used to interpret the acceptability/support of hypotheses that have already been tested.

3

these can be compatible, but…



If pre-test O’s razor is “rejectionist,” then

post-test O’s razor won’t have a point.

4

these can be compatible, but…



If the pre-test idea involves testing hypotheses one

at time, then it views testing as noncontrastive.

5

within the post-test category of support/plausibility ...

• Bayesianism – compute posterior probs.

• Likelihoodism – compare likelihoods.

• Frequentist model selection criteria like AIC – estimate predictive accuracy.

6

I am a pluralist about these broad philosophies …

• Bayesianism – compute posterior probs

• Likelihoodism – compare likelihoods


7





Not that each is okay as a global thesis

about all scientific inference …

8




But I do think that each has its place.


9

Ockham’s Razors*

Different uses of O’s razor have different

justifications and some have none at all.

* “Let’s Razor Ockham’s Razor,” in D. Knowles (ed.), Explanation and

Its Limits, Cambridge University Press, 1990, 73-94.10

Parsimony and Likelihood

In model selection criteria like AIC and BIC, likelihood and parsimony are conflicting desiderata.

AIC(M) = log[Pr(Data│L(M)] - k

11

In model selection criteria like AIC and BIC, likelihood and parsimony are conflicting desiderata.

In other settings, parsimony

has a likelihood justification.

Parsimony and Likelihood

12

the Law of Likelihood

Observation O favors H1 over H2

iff

Pr(O│H1) > Pr(O│H2)

13

a Reichenbachian idea

Salmon’s example of plagiarism

E1 E2 E1 E2

C C1 C2

[Common Cause] [Separate Causes]

14

a Reichenbachian idea

Salmon’s example of plagiarism

E1 E2 E1 E2

C C1 C2


more parsimonious

15

Reichenbach’s argument

IF(i) A cause screens-off its effects from each other

(ii) All probabilities are non-extreme (≠ 0,1)

(iii)a particular parameterization of the CC and SC models

(iv)cause/effect relationships are “homogenous” across branches.

THEN Pr[Data │Common Cause] > Pr[Data │Separate Causes].

16

parameters and homogeneity

E1 E2 E1 E2

p1 p2 p1 p2

C C1 C2


17

IF(i) A cause screens-off its effects from each other.

(ii) All probabilities are non-extreme

(iii)parameterization of the CC and SC models.



The more parsimonious hypothesis

has the higher likelihood.


18

IF(i) A cause screens-off its effects from each other.





Parsimony and likelihood are

ordinally equivalent.


19

Some differences with Reichenbach

• I am comparing two hypotheses.

• I’m not using R’s Principle of the Common Cause.

• I take the evidence to be the matching of the students’ papers, not their “correlation.”

20

empirical foundations for likelihood ≈ parsimony

(i) A cause screens-off its effects from each other.




By adopting different assumptions, you can

arrange for CC to be less likely than SC.

Now likelihood and parsimony conflict!21

empirical foundations for likelihood ≈ parsimony

(i) A cause screens-off its effects from each other.




Note: the R argument shows that these are

sufficient for likelihood ≈ parsimony,

not that they are necessary. 22

Parsimony in Phylogenetic Inference

Two sources: ─ Willi Hennig

─ Luigi Cavalli-Sforza and Anthony Edwards

Two types of inference problem: ─ find the best tree “topology”

─ estimate character states of ancestors

23

24

1. Which tree topology is better?

H C G H C G

(HC)G H(CG)

MP: (HC)G is better supported than H(CG) by data D if and only if

(HC)G is a more parsimonious explanation of D than H(CG) is.

25

An Example of a Parsimony Calculation

1 1 0 1 1 0H C G H C G

0 0 (HC)G H(CG)

1 1 1 H C G

A=?

26

2. What is the best estimate of the character states of ancestors in an assumed tree?

1 1 1 H C G

A=?

MP says that the best estimate is that A=1.

27

2. What is the best estimate of the character states of ancestors in an assumed tree?

28

Maximum LikelihoodH C G H C G

(HC)G H(CG)

ML: (HC)G is better supported than H(CG) by data D if and only if Pr[D│(HC)G] > Pr[D│H(CG)].

29

Maximum LikelihoodH C G H C G

(HC)G H(CG)

ML: (HC)G is better supported than H(CG) by data D if and only if PrM[D│(HC)G] > PrM[D│H(CG)]. ML is “model dependent.”

the present situation in evolutionary biology

• MP and ML sometimes disagree.

• The standard criticism of MP is that it assumes that evolution proceeds parsimoniously.

• The standard criticism of ML is that you need to choose a model of the evolutionary process.

30

31

When do parsimony and likelihood agree?

• (Ordinal Equivalence) For any data set D and any pair of phylogenetic hypotheses H1and H2, Pr(D│H1) > Pr(D│H2) iff H1 is a more parsimonious explanation of D than H2 is.

32


• (Ordinal Equivalence) For any data set D and any pair of phylogenetic hypotheses H1and H2, PrM(D│H1) > PrM(D│H2) iff H1 is a more parsimonious explanation of D than H2 is.

• Whether likelihood agrees with parsimony depends on the probabilistic model of evolution used.

33


• (Ordinal Equivalence) For any data set D and any pair of phylogenetic hypotheses H1and H2, PrM(D│H1) > PrM(D│H2) iff H1 is a more parsimonious explanation of D than H2 is.

• Whether likelihood agrees with parsimony depends on the probabilistic model of evolution used.

• Felsenstein (1973) showed that the postulate of very low rates of evolution suffices for ordinal equivalence.

34

Does this mean that parsimony assumes that rates are low?

• NO: the assumptions of a method are the propositions that must be true if the method correctly judges support.

35

Does this mean that parsimony assumes that rates are low?

• NO: the assumptions of a method are the propositions that must be true if the method correctly judges support.

• Felsenstein showed that the postulate of low rates suffices for ordinal equivalence, not that it is necessary for ordinal equivalence.

36

Tuffley and Steel (1997)

• T&S showed that the postulate of “no-common-mechanism” also suffices for ordinal equivalence.

• “no-common-mechanism” means that each character on each branch is subject to its own drift process.

the two probability models of evolution

Felsenstein

• Rates of change are low, but not necessarily equal.

• Drift not assumed:

Pr(i j) and Pr(j i)

may differ.

Tuffley and Steel

• Rates of change can be high.

• Drift is assumed:

Pr(i j) = Pr(j i)

37

38

How to use likelihood to define what it means for parsimony to assume something

• The assumptions of parsimony = the propositions that must be true if parsimony correctly judges support.

• For a likelihoodist, parsimony correctly judges support if and only if parsimony is ordinally equivalent with likelihood.

• Hence, for a likelihoodist, parsimony assumes any proposition that follows from ordinal equivalence.

39

A Test for what Parsimony does not assume

Model M ordinal equivalence A

where A = what parsimony assumes

40




• If model M entails ordinal equivalence, and M entails proposition X,

X may or may not be an assumption of parsimony.

41




• If model M entails ordinal equivalence, and M entails proposition X,

X may or may not be an assumption of parsimony.

• If model M entails ordinal equivalence, and M does not entail proposition X, then X is not an assumption of parsimony.

42

applications of the negative test

• T&S’s model does not entail that rates of change are low; hence parsimony does not assume that rates are low.

• F’s model does not assume neutral evolution; hence parsimony does not assume neutrality.

43

How to figure out what parsimony does assume?

• Find a model that forces parsimony and likelihood to disagree about some example.

• Then, if parsimony is right in what it says about the example, the model must be false.

44

Example #1

Task: Infer the character state of the MRCA of species that

all exhibit the same state of a quantitative character.

10 10 … 10 10

A=?

The MP estimate is A=10.

When is A=10 the ML estimate? And when is it not?

45

Answer

10 10 … 10 10

A=?

ML says that A=10 is the best estimate (and thus agrees with MP) if there is neutral evolution or selection is pushing each lineage towards a trait value of 10.

46

Answer

10 10 … 10 10

A=?


ML says that A=10 is not the best estimate (and thus disagrees with MP) if (*) selection is pushing all lineages towards a single trait value different from 10.

47

Answer

10 10 … 10 10

A=?


ML says that A=10 is not the best estimate (and thus disagrees with MP) if (*) selection is pushing all lineages towards a single trait value different from 10.

So: Parsimony assumes, in this problem, that (*) is false.

48

Example #2 Task: Infer the character state of the MRCA of two species that

exhibit different states of a dichotomous character.

1 0

A=?

A=0 and A=1 are equally parsimonious. When are they equally likely? And when are they unequally likely?

49

Answer

1 0

A=?

ML agrees with MP that A=0 and A=1 are equally good estimates if the same neutral process occurs in the two lineages.

50

Answer

1 0

A=?


ML disagrees with MP if (*) the same selection process occurs in both lineages.

51

Answer

1 0

A=?


ML disagrees with MP if (*) the same selection process occurs in both lineages.

So: Parsimony assumes, in this problem, that (*) is false.

52

Conclusions about phylogenetic parsimony ≈ likelihood

• The assumptions of parsimony are the propositions that must be true if parsimony correctly judges support.

53



• To find out what parsimony does not assume, use the test described [M ordinal equivalence A].

54




• To find out what parsimony does assume, look for examples in which parsimony and likelihood disagree, not for models that ensure that they agree.

55




• To find out what parsimony does assume, look for examples in which parsimony and likelihood disagree, not for models that ensure that they agree.

• Maybe parsimony’s assumptions vary from problem to problem.

broader conclusions

• underdetermination: O’s razor often comes up when the data don’t settle truth/falsehood or acceptance/rejection.

56

broader conclusions


• reductionism: when O’s razor has authority, it does so

because it reflects some other, more fundamental, desideratum.

57

broader conclusions



because it reflects some other, more fundamental, desideratum. [But there isn’t a single global justification.]

58

broader conclusions

• underdetermination: O’s razor often comes up when the data don’t settle truth and falsehood.


because it reflects some other, more fundamental, desideratum.

• two questions: When parsimony has a precise meaning, we can investigate: What are its presuppositions? What suffices to justify it?

59

A curiosity: in the R argument, to get a difference in likelihood, the hypotheses should not specify the

states of the causes.

E1 E2 E1 E2

p1 p2 p1 p2

C C1 C2


60

61

Example #0


all exhibit the same state of a dichotomous character.

1 1 … 1 1

A=?

The MP inference is that A=1. When is A=1 the ML inference?

62

Example #0


all exhibit the same state of a dichotomous character.

1 1 … 1 1

A=?

The MP inference is that A=1. When is A=1 the ML inference?

Answer: when lineages have finite duration and the process is Markovian. It doesn’t matter whether selection or drift is the process at work.

Documents

Parsimony, Likelihood, Common Causes, and Phylogenetic Inference