19
9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

Embed Size (px)

Citation preview

Page 1: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

9/1/2005 1

Ultrametric phylogenies

By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

Page 2: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

29/1/2005

Introduction – additive trees

In the last lecture we saw the concept of distance based phylogenetic trees

d(i,j) is the distance between the objects indexed i and j In particular, we discussed additive sets, in which:

For each i: d(i,i) = 0, and for each ji: d(i,j)0 For each i,j: d(i,j) = d(j,i) For each i,j,k: d(i,k) ≤ d(i,j) + d(j,k) [triangle inequality] Any subset of four objects can be labelled i,j,k,l such that

d(i,j) + d(k,l) ≤ d(i,l) + d(j,k) = d(i,k) + d(j,l)[four points condition]

An additive set defines a tree. Every tree defines an additive distance matrix between its leaves

Page 3: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

39/1/2005

Molecular clocks

Let us assume that “stable” mutations in the genome occur uniformly over long time periods

This defines a “molecular clock” – each mutation stands for a constant period of time

We can therefore approximate the time since any two taxa diverged from their last common ancestor by the number of differences between the genomes in conserved regions

Page 4: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

49/1/2005

Ultrametric trees

Given a group of taxa with distances, if we assume the “molecular clock” model and wish to find the evolutionary tree, the number of mutations from the last common ancestor to every taxon should be similar

This means that the distance from the root of the evolutionary tree to each leaf is the same

Such a tree is called an Ultrametric tree

Page 5: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

59/1/2005

Ultrametric trees (cont.)

If we have a set of objects with a distance between them, we want to know if this set is ultrametric

For ultrametric sets, these condition hold: For each i: d(i,i) = 0, and for each ji: d(i,j)0 For each i,j: d(i,j) = d(j,i) For each i,j,k: d(i,k) ≤ max{d(i,j), d(j,k)}

[ultrametric condition] The last condition can be replaced by this one:

Any subset of three objects can be labelled i,j,k such that d(i,j) ≤ d(j,k) = d(i,k)

Page 6: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

69/1/2005

Ultrametric trees (cont.)

An ultrametric set is also additive The opposite is not always true

Distance matrices

Additive matrices

Ultrametric matrices

Page 7: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

79/1/2005

Ultrametric decision

Given a set of n objects with distances, we want to determine if the set is ultrametric

The naïve approach – go over all triplets, and check if the ultrametric condition holds

Complexity – O(n3) More efficient algorithms exists (Gusfield gives a

simple O(n2logn) and a more sophisticated O(n2) algorithm with partial proofs)

Page 8: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

89/1/2005

Approximations

However, for most biological data there is no accurate “ultrametric solution”

This means that some heuristic is needed The most popular method is UPGMA, which

stands for Unweighted Pair Group Method using Arithmetic mean

Introduced by Sokal and Michener (1958)

Page 9: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

99/1/2005

UPGMA

Input: A set of n objects, with a distance between every two objects

Output: an ultrametric tree with the given objects as leaves

The main data structures used by the algorithm are a graph G=(V,E) which contains trees with the objects as leaves, and a distance matrix between each two roots of trees in the graph

Page 10: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

109/1/2005

UPGMA (cont.)

Initialization: Each object in a separate tree, distance by input

We will use an example of 5 mammal speciesBear Raccoon Weasel Seal Dog

Bear 0 26 34 29 32

Raccoon 26 0 42 44 48

Weasel 34 42 0 44 51

Seal 29 44 44 0 50

Dog 32 48 51 50 0

Bear Raccoon Weasel Seal Dog

Page 11: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

119/1/2005

UPGMA (cont.) We iterate until there is only one tree At each iteration we perform:

Find the two trees x and y with minimal distance d(x,y)

Add a new node, and connect the roots of x and y to this node. The result is a new tree z. The height of the root of z is d(x,y)/2

Compute the distance between z and the other remaining trees (without x and y)

Page 12: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

129/1/2005

UPGMA (cont.) First iteration:

Bear Raccoon Weasel Seal Dog

Bear 0 26 34 29 32

Raccoon 26 0 42 44 48

Weasel 34 42 0 44 51

Seal 29 44 44 0 50

Dog 32 48 51 50 0

Bear Raccoon Weasel Seal Sea lion

BR

13 13

Page 13: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

139/1/2005

UPGMA (cont.) Update computation – denote the number of leaves

in the tree x by nx, then for each t x,y we set:

Bear Raccoon Weasel Seal Dog

Bear 0 26 34 29 32

Raccoon 26 0 42 44 48

Weasel 34 42 0 44 51

Seal 29 44 44 0 50

Dog 32 48 51 50 0

),(),(),( tyDn

ntxD

n

ntzD

z

y

z

x

BR Weasel Seal Dog

BR 0 38 36.5 40

Weasel 38 0 44 51

Seal 36.5 44 0 50

Dog 40 51 50 0

Page 14: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

149/1/2005

UPGMA (cont.) Second iteration:

Bear Raccoon WeaselSeal Dog

BR

1313

BR Weasel Seal Dog

BR 0 38 36.5 40

Weasel 38 0 44 51

Seal 36.5 44 0 50

Dog 40 51 50 0

BRS

18.25

18.25-13=5.25

Page 15: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

159/1/2005

UPGMA (cont.) Third iteration:

BRS Weasel Dog

BRS 0 40 43.3

Weasel 40 0 51

Dog 43.3 51 0

Bear Raccoon WeaselSeal Dog

BR

1313

BRS

18.25

18.25-13=5.25

BRSW

2020-18.25=1.75

Page 16: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

169/1/2005

UPGMA (cont.) Fourth (and last) iteration:

BRSW Dog

BRSW 0 45.25

Dog 45.25 0

Bear Raccoon WeaselSeal Dog

BR

1313

BRS

18.25

18.25-13=5.25

BRSW

2020-18.25=1.75

BRSWD

22.62522.625-20=2.625

Page 17: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

179/1/2005

UPGMA - complexity

A simple implementation takes n-1 iterations, where in each iteration we find the minimal distance at O(n2), with total complexity of O(n3)

We can keep a list of the smallest distance in each row. This way it takes O(n) to find the minimal distance, while updating the list is also O(n) at each iteration. Therefore, the total complexity is O(n2).

Page 18: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

189/1/2005

Ultrametric evaluation

UPGMA gives us an ultrametric tree Is this tree the best possible? Depends on how we measure the quality of

an approximated tree for a given matrix Let U(i,j) be the distance in the ultrametric

tree U between the objects indexed i and j The L norm is defined by:

),(),(max,

jiUjidLji

Page 19: 9/1/2005 1 Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein

199/1/2005

Ultrametric evaluation (cont.) There is an O(n2) algorithm for finding the

ultrametric tree U with minimal L norm (Farach, Kannan and Warnow, 1995)

Is this tree the best possible? It would be better to include all distances

The L1 norm is defined by:

ji

jiUjidL,

1 ),(),(

Finding U with minimal L1 norm is NP-hard!(Day, 1987)