57
Tree Searching Methods Exhaustive search (exact) Branch-and-bound search (exact) Heuristic search methods (approximate) – Stepwise addition – Branch swapping – Star decomposition

Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Embed Size (px)

Citation preview

Page 1: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Tree Searching Methods

• Exhaustive search (exact)

• Branch-and-bound search (exact)

• Heuristic search methods (approximate)– Stepwise addition

– Branch swapping

– Star decomposition

Page 2: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Exhaustive Search

12

12

11

12

13

13

13

13

13

13

12

13

13

13

13

Page 3: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Searching for trees

• Generation of all possible trees

1.Generate all 3 trees for first 4 taxa:

Page 4: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Searching for trees

2. Generate all 15 trees for first 5 taxa:

(likewise for each of the other two 4-taxon trees)

Page 5: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Searching for trees

3. Full search tree:

Page 6: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Searching for trees

Branch and bound algorithm:

The search tree is the same asfor exhaustive search, with treelengths for a hypothetical dataset shown in boldface type. If atree lying at a node of thissearch tree has a length thatexceeds the current lower boundon the optimal tree length, thispath of the search tree isterminated (indicated by a cross-bar), and the algorithmbacktracks and takes the nextavailable path. When a tip of thesearch tree is reached (i.e.,when we arrive at a treecontaining the full set of taxa),the tree is either optimal (andhence retained) or suboptimal(and rejected). When all pathsleading from the initial 3-taxontree have been explored, thealgorithm terminates, and allmost-parsimonious trees willhave been identified. Asterisksindicate points at which thecurrent lower bound is reduced.Circled numbers represent theorder in which phylogenetic treesare visited in the search tree.

Page 7: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Stepwise Addition (in a nutshell)

3

2

1

42

31

43

21

34

21

Page 8: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Searching for trees

Stepwise addition

A greedy stepwise-addition search appliedto the example used for branch-and-bound.The best 4-taxon tree is determined byevaluating the lengths of the three treesobtained by joining taxon D to tree 1containing only the first three taxa. Taxa Eand F are then connected to the five andseven possible locations, respectively, ontrees 4 and 9, with only the shortest treesfound during each step being used for thenext step. In this example, the 233-step treeobtained is not a global optimum. Circlednumbers indicate the order in whichphylogenetic trees are evaluated in thestepwise-addition search.

Page 9: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Stepwise Addition Variants

• As Is– add in order found in matrix

• Closest– add unplaced taxa that requires smallest increase

• Furthest– add unplaced taxa that requires largest increase

• Simple– Farris’s (1970) “simple algorithm” uses a set of pairwise

reference distances

• Random– random permutation of taxa is used to select the order

Page 10: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Branch swappingNearest Neighbor Interchange (NNI)

EA

CB

D

A

D

E

CB

DA

CB

E

Page 11: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Branch swappingSubtree Pruning and Regrafting (SPR)

D

AB

C

GF

E

"D

GF

E

AB

C

G

DE

F

BA

C

a

Page 12: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Branch swappingTree Bisection and Reconnection (TBR)

D

AB

C

GF

ED

GF

E

AB

C

G

DE

F

BC

A

G

DE

F

BA

C

G

DE

F

CA

B

"

Page 13: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Reconnection limits in TBR

1

2 3 45

6

x zy

r

s

t u v

w

1

2 3 45

6

x zx'

u v

w1

2 4 3 5

6

1

2 3 45

6

0 01

1

2

2

Reconnection distances:

Page 14: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

(D)

1

2 3 45

6

y

r

s

v

wy'

3

1 2 54

6

01

1

2 3 45

6

1

1

1

0Reconnection distances:

In PAUP*, use “ReconLim” to set maximum reconnection distance

Reconnection limits in TBR

Page 15: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Star-decomposition search

Page 16: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Overview of maximum likelihood as usedin phylogenetics

• Overall goal: Find a tree topology (and associated parameter estimates)that maximizes the probability of obtaining the observed data, given amodel of evolution

Likelihood(hypothesis) µProb(data|hypothesis)

Likelihood(tree,model) = k Prob(observed sequences|tree,model)

[not Prob(tree|data,model)]

Page 17: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Computing the likelihood of a single tree

1 j N(1) C…GGACA…C…GTTTA…C(2) C…AGACA…C…CTCTA…C(3) C…GGATA…A…GTTAA…C(4) C…GGATA…G…CCTAG…C

(1)

(2)

(3)

(4)

CC A G

(6)

(5)

Page 18: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Computing the likelihood of a single tree

Prob

CC A G

A

A

Likelihood at site j =

+ Prob

CC A G

A

C

Prob

CC A G

T

T+ … +

But use Felsenstein (1981) pruning algorithm

Page 19: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Computing the likelihood of a single tree

L = L1L2LLN = L jj=1

N

lnL = ln L1 + lnL2 +Lln LN = lnL1j=1

N

Â

Note: PAUP* reports -ln L, so lower -ln L implies higher likelihood

Page 20: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Finding the maximum-likelihood tree(in principle)

• Evaluate the likelihood of each possibletree for a given collection of taxa.

• Choose the tree topology whichmaximizes the likelihood over allpossible trees.

Page 21: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Probability calculations require…

• An explicit model of substitution that specifies changeprobabilities for a given branch length

“Instantaneous rate matrix”

Jukes-CantorKimura 2-parameterHasegawa-Kishino-Yano (HKY)Felsenstein 1981, 1984General time-reversible

Q =

p ArAA p CrAC p GrAG p T rAT

p ArCA p CrCC p GrCG p T rCT

p ArGA p CrGC p GrGG p T rGT

p ArTA p CrTC p GrTG p T rTT

Ê

Ë

Á Á Á Á

ˆ

¯

˜ ˜ ˜ ˜

P(v) = eQn

• An estimate of optimal branch lengths in units ofexpected amount of change (n = rate x time)

Page 22: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

For example:

Q =

- a a a

a - a a

a a - a

a a a -

Ê

Ë

Á Á Á Á

ˆ

¯

˜ ˜ ˜ ˜

Jukes-Cantor (1969)

Q =

- b a b

b - b a

a b - b

b a b -

Ê

Ë

Á Á Á Á

ˆ

¯

˜ ˜ ˜ ˜

Kimura (1980) “2-parameter”

Q =

- p Cb p Ga p Tb

p Ab - p Gb p Ta

p Aa p Cb - p Tb

p Ab p Ca p Gb -

Ê

Ë

Á Á Á Á

ˆ

¯

˜ ˜ ˜ ˜

Hasegawa-Kishino-Yano (1985)

Q =

p ArAA p CrAC p GrAG p T rAT

p ArCA p CrCC p GrCG p T rCT

p ArGA p CrGC p GrGG p T rGT

p ArTA p CrTC p GrTG p T rTT

Ê

Ë

Á Á Á Á

ˆ

¯

˜ ˜ ˜ ˜

General-Time Reversible

Page 23: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

E.g., transition probabilities forHKY and F84:

Pij t( ) =

p j +p j1

P j

-1Ê

Ë Á Á

ˆ

¯ ˜ ˜ e

-mn +P j -p j

P j

Ê

Ë Á Á

ˆ

¯ ˜ ˜ e

- mnA (i = j)

p j +p j1

P j

-1Ê

Ë Á Á

ˆ

¯ ˜ ˜ e

-mn -p j

P j

Ê

Ë Á Á

ˆ

¯ ˜ ˜ e

- mnA (i ≠ j, transition)

p j 1 - e-mn( ) (i ≠ j, transversion)

Ï

Ì

Ô Ô Ô Ô Ô

Ó

Ô Ô Ô Ô Ô

Page 24: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

A Family of Reversible Substitution Models

GTR

SYMTrN

F81

JC

K3ST

K2P

HKY85F84

Equal base frequencies

3 substitution types(transitions,2 transversion classes)

2 substitution types(transitions vs. transversions)

3 substitution types(transversions, 2 transition classes)

2 substitution types(transitions vs.transversions)

Single substitution type

Equal basefrequencies

Single substitution typeEqual base frequencies

(general time-reversible)

(Tamura-Nei)

(Hasegawa-Kishino-Yano)

(Felsenstein)

Jukes-Cantor

(Kimura 2-parameter)

(Kimura 3-subst. type)

(Felsenstein)

Page 25: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

The Relevance of Branch LengthsC C A A A A A A A A

A

C

C C A A A A A A A A

CA

Page 26: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

When does maximum likelihood workbetter than parsimony?

• When you’re in the “Felsenstein Zone”

A C

B D

(Felsenstein, 1978)

Page 27: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

In the Felsenstein Zone

A C G TA - 5 6 2C 5 - 3 8G 6 3 - 1T 2 8 1 -

Substitution rates:

Base frequencies: A=0.1 C=0.2 G=0.3 T=0.4

A B

C D

0.1

0.1 0.1

0.8 0.8

Page 28: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

In the Felsenstein Zone

0

0.2

0.4

0.6

0.8

1

0 5000 10000Sequence Length

parsimonyML-GTR

Pro

port

ion

corr

ect

Page 29: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

The long-branch attraction (LBA) problem

Pattern type

1 4A I = Uninformative (constant) A

A A 2 3

The true phylogeny of1, 2, 3 and 4

(zero changes required on anytree)

Page 30: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

The long-branch attraction (LBA) problem

Pattern type

1 4A I = Uninformative (constant) AA II = Uninformative G

A A 2 3

The true phylogeny of1, 2, 3 and 4

(one change required on any tree)

Page 31: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

The long-branch attraction (LBA) problem

Pattern type

1 4A I = Uninformative (constant) AA II = Uninformative GC III = Uninformative G

A A 2 3

The true phylogeny of1, 2, 3 and 4

(two changes required on any tree)

Page 32: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

The long-branch attraction (LBA) problem

Pattern type

1 4A I = Uninformative (constant) AA II = Uninformative GC III = Uninformative GG IV = Misinformative G

A A 2 3

The true phylogeny of1, 2, 3 and 4

(two changes required on true tree)

Page 33: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

The long-branch attraction (LBA) problem

G 4

A 2

A 3

G 1

… but this tree needs only one step

Page 34: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Concerns about statistical propertiesand suitability of models

(assumptions)

Consistency

If an estimator converges to the true value of aparameter as the amount of data increases towardinfinity, the estimator is consistent.

Page 35: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

When do both methods fail?

• When there is insufficient phylogenetic signal...

2

1 3

4

Page 36: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

When does parsimony work “better”than maximum likelihood?

• When you’re in the Inverse-Felsenstein (“Farris”) zone

A

B

C

D

(Siddall, 1998)

Page 37: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Siddall (1998) parameter space

a

a

b

b

b

Both methods do poorly

Parsimony has higheraccuracy than likelihood

Both methods do well

pa

pb0 0.75

0.75

Page 38: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Parsimony vs. likelihood in the Inverse-Felsenstein Zone

B

BB B B B B B B B B B

J J

JJ J J J

J

J

J

J

J

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

20 100 1,000 10,000 100,000

Sequence length

B

J

ParsimonyML/JC

15%67.5%

67.5%

(expected differences/site)

Acc

urac

y

Page 39: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Why does parsimony do so well in theInverse-Felsenstein zone?

A

A

C

C

AC

A

A

C

C

AG

A

C G

C

A

A

C

CAC

AC

True synapomorphy

Apparent synapomorphiesactually due tomisinterpreted homoplasy

Page 40: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Parsimony vs. likelihood in the Felsenstein Zone

B

B

BB B B B B B B B B

JJ

J

J

J

J

J

J J J J J

15%

67.5% 67.5%

Acc

urac

y

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

20 100 1,000 10,000 100,000

B

J

ParsimonyML/JC

(expected differences/site)

Sequence length

Page 41: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

From the Farris Zone to the Felsenstein Zone

C

D

A

B

C

D

A

B

C

D

A

B

B

C

D

A

B

D

C

A

External branches = 0.5 or 0.05 substitutions/site, Jukes-Cantor model of nucleotide substitution

Page 42: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

JJ

JJ

JJ

J

J

J

J

G

G

G

G

G

G

G

GH H HH

H

H H

0

0.2

0.4

0.6

0.8

1.0

0.05 0.04 0.03 0.02 0.01 0 0.01 0.02 0.03 0.04 0.05

J 100 sitesG 1,000 sitesH 10,000 sites ML/JC

Length of internal branch ( d)Farris zone Felsenstein zone

H GHGH

JGH

GH

JJ

JJ

J

JJ

0

0.2

0.4

0.6

0.8

0.05 0.04 0.03 0.02 0.01 0 0.01 0.02 0.03 0.04 0.05Length of internal branch ( d)Farris zone Felsenstein zone

J 100 sitesG 1,000 sitesH 10,000 sites

JHG GH GHGHGH

J J

GHGH HG

1.0

J

GH

HGGHJJ

HG

Accu

racy

Accu

racy

Parsimony

Likelihood

Simulationresults:

Page 43: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Maximum likelihood models areoversimplifications of reality. If I assume the

wrong model, won’t my results be meaningless?

• Not necessarily (maximum likelihood is pretty robust)

Page 44: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Model used for simulation...

A C G TA - 5 6 2C 5 - 3 8G 6 3 - 1T 2 8 1 -

Substitution rates:

Base frequencies: A=0.1 C=0.2 G=0.3 T=0.4

A B

C D

0.1

0.1 0.1

0.8 0.8

Page 45: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Performance of ML when its model isviolated (one example)

0

0.2

0.4

0.6

0.8

1

100 1000 10000Sequence Length

parsimonyML-JCML-K2PML-HKYML-GTR

Page 46: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Among site rate heterogeneity

• Proportion of invariable sites– Some sites don’t change do to strong functional or structural constraint (Hasegawa et

al., 1985)

• Site-specific rates– Different relative rates assumed for pre-assigned subsets of sites

• Gamma-distributed rates– Rate variation assumed to follow a gamma distribution with shape parameter a

Lemur AAGCTTCATAG TTGCATCATCCA …TTACATCATCCAHomo AAGCTTCACCG TTGCATCATCCA …TTACATCCTCATPan AAGCTTCACCG TTACGCCATCCA …TTACATCCTCATGoril AAGCTTCACCG TTACGCCATCCA …CCCACGGACTTAPongo AAGCTTCACCG TTACGCCATCCT …GCAACCACCCTCHylo AAGCTTTACAG TTACATTATCCG …TGCAACCGTCCTMaca AAGCTTTTCCG TTACATTATCCG …CGCAACCATCCT

equal rates?

Page 47: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Performance of ML when its model isviolated (another example)

.....

0

0.02

0.04

0.06

0.08

0 1 2

Rate

a=50

a=200

Modeling among-site rate variation with a gamma distribution...

…can also estimate a proportion of “invariable” sites (pinv)

a=2

a=0.5

Fre

quen

cy

Page 48: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Performance of ML when its model isviolated (another example)

Sequence Length

Prop

ortio

n Co

rrect

Tree a = 0.5, pinv=0.5 a = 1.0, pinv=0.5 a = 1.0, pinv=0.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000

GTRig

GTRgHKYgGTRiHKYiGTRerHKYerparsimony

HKYig

GTRig

GTRgHKYgGTRiHKYiGTRerHKYerparsimony

HKYig

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000

GTRig

GTRgHKYgGTRiHKYiGTRerHKYerparsimony

HKYig

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000

GTRigHKYigGTRgHKYgGTRiHKYiGTRerHKYerParsimony

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000

GTRigHKYigGTRgHKTgGTRiHKYiGTRerHKYerparsimony

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000

GTRigHYYigGTRgHKYgGTRiHKYiGRTerHKYerparsimony

0

0.1

0.2

0.3

0.4

0.5

0.60.7

0.8

0.9

1

100 1000 10000 100000

GTRig

GTRgHKYgGTRiHKYiGTRerHKYerparsimony

HKYig

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000

GTRig

GTRgHKYgGTRiHKYiGTRerHKYerparsimony

HKYig

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000

Page 49: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

“MODERATE”–Felsenstein zone

a = 1.0, pinv=0.5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000

JCerJC+GJC+IJC+I+GGTRerGTR+GGTR+IGTR+I+Gparsimony

Page 50: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

“MODERATE”–Inverse-Felsenstein zone

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 1000 10000 100000

JCerJC+GJC+IJC+I+GGTRerGTR+GGTR+IGTR+I+Gparsimony

Page 51: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Bayesian Inference in Phylogenetics

• Uses Bayes formula:Pr(q|D) = Pr(D|q) Pr(q) Pr(D)

µ Pr(D|q) Pr(q)

µ L(q) Pr(q)

• Calculation involves integrating over all treetopologies and model-parameter values,subject to assumed prior distribution onparameters

(q =tree topology,branch-lengths, andsubstitution-modelparameters)

Page 52: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Bayesian Inference in Phylogenetics

• To approximate this posterior density (complicatedmultidimensional integral) we use Markov chain Monte Carlo(MCMC)– Simulated Markov chain in which transition probabilities are

assigned such that the stationary distribution of the chain isthe posterior density of interest

– E.g., Metropolis-Hastings algorithm: Accept a proposedmove from one state q to another state q* with probabilitymin(r,1) where

r = Pr(q*|D) Pr(q| q*)Pr(q|D) Pr(q*| q)

– Sample chain at regular intervals to approximate posteriordistribution

• MrBayes (by John Huelsenbeck and Fredrik Ronquist) is mostpopular Bayesian inference program

Page 53: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

AB

C D

AB

C D

Like

lihoo

d

Iterations

A brief intro to Markov chain Monte Carlo (MCMC)

A

B

C D

...

If the chain is run “long enough”, the stationary distribution of states in the chain will represent agood approximation to the target distribution (in this case, the Bayesian posterior)

1. Initialize the chain, e.g., by picking a random state X0 (topology,branch lengths, substitution-modelparameters) from the assumed prior distribution

A

B

C

D

AB|CD

A

B

C

D

AB|CD

AB

C D

BC|AD

AB

C D

BC|AD

AB

C D

BC|AD

AB

C D

BC|AD

B

CD

A

AC|BDAB|CD

A

B

C

D

a(X,Y ) = min 1, Pr Y | D( )q(X |Y )Pr X | D( )q(Y | X)

Ê

Ë Á

ˆ

¯ ˜ = min 1, p (Y)

p (X)¥

Pr(D |Y)Pr(D | X)

¥q(X |Y )q(X |Y )

Ê

Ë Á

ˆ

¯ ˜

2. For each time t, sample a new candidate state Y from some proposal distribution q(.|Xt) (e.g.,change branch lengths or topology plus branch lengths)

Calculate acceptance probability

3. If Y is accepted, let Xt+1 = Y; otherwise let Xt+1 = Xt

“burn in”

Page 54: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Model-based distances• Can also calculate pairwise distances based on these models

• These distances estimate the number of substitutions per sitethat have accumulated since the two sequences shared acommon ancestor, allowing for superimposed substitutions(“multiple hits”)

• E.g.:

– Jukes-Cantor distance

– Kimura 2-parameter distance

– General maximum-likelihood distances available for othermodels

Page 55: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

1 3

42

a d

ec

b

-

d12 -

d13 d23 -

d14 d24 d34 -

1

2

3

4

1 2 3 4

p12 = a+bp13 = a+c+dp14 = a+c+ep23 = b+c+dp24 = b+c+ep34 = d+e

pij = dij for all i and j if the treetopology is correct and distancesare additive

Distance-based optimality criteria“Additive trees”

Page 56: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Distances in general will not be additive, sochoose optimal tree according to one of the

following criteria (objective functions):

"Goodness - of - fit" : minimize wij pij - diji < jÂ

r

Typically, r = 2 (least-squares) and wij = 1/dij2 ("Fitch-

Margoliash" method)

"Minimum - evolution" : minimize vkk= 1

#branches

 or vkk =1

# branches

Â

Page 57: Tree Searching Methods •Exhaustive search (exact) …predrag/classes/2004falli400/swafford.pdfTree Searching Methods •Exhaustive search (exact) •Branch-and-bound search (exact)

Distance-based optimality criteriaMinimum evolution and least-squares

Pongo

Lemur catta

Pan

Homo sapiens

Gorilla0.044

0.0850.286

0.015

0.0500.045

0.050

0.39646 0.39021 0.0000390.39838 0.39602 0.0000060.09506 0.09507 0.0000000.37222 0.38084 0.0000740.11172 0.11011 0.0000030.11431 0.11592 0.0000030.37096 0.37096 0.0000000.18107 0.18894 0.0000620.19399 0.19475 0.0000010.18820 0.17958 0.000074

0.000261

pijdij SS

Least-Squares

0.286110.044360.015110.044630.050440.050380.084850.57588

Minumumevolution(ME)

LS branch lengths