60
. Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido Wexler. Slight modifications by Benny Chor

Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

Embed Size (px)

Citation preview

Page 1: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

.

Intro to Phylogenetic TreesComputational Genomics

Lecture 4b

Sections 7.1, 7.2, in Durbin et al.

Chapter 17 in Gusfield

Slides by Shlomo Moran and Ido Wexler.

Slight modifications by Benny Chor

Page 2: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

2

Evolution

Evolution of new organisms is driven by

Diversity Different individuals

carry different variants of the same basic blue print

Mutations The DNA sequence

can be changed due to single base changes, deletion/insertion of DNA segments, etc.

Selection bias

Page 3: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

3

The Tree of Life

Sou

rce:

Alb

erts

et

al

Page 4: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

4

D’après Ernst Haeckel, 1891

Tree of life- a better picture

Page 5: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

5

Primate evolution

A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

Page 6: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

6

Historical Note Until mid 1950’s phylogenies were constructed by

experts based on their opinion (subjective criteria)

Since then, focus on objective criteria for constructing phylogenetic trees

Thousands of articles in the last decades

Important for many aspects of biology Classification Understanding biological mechanisms

Page 7: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

7

Morphological vs. Molecular

Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc.

Modern biological methods allow to use molecular features

Gene sequences Protein sequences

Analysis based on homologous sequences (e.g., globins) in different species

Page 8: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

8

Topology based on Morphology

BonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboonWhite-fronted capuchinSlow lorisTree shrewJapanese pipistrelleLong-tailed batJamaican fruit-eating batHorseshoe bat

Little red flying foxRyukyu flying foxMouseRatVoleCane-ratGuinea pigSquirrelDormouseRabbitPikaPigHippopotamusSheepCowAlpacaBlue whaleFin whaleSperm whaleDonkeyHorseIndian rhinoWhite rhinoElephantAardvarkGrey sealHarbor sealDogCatAsiatic shrewLong-clawed shrewSmall Madagascar hedgehogHedgehogGymnureMoleArmadilloBandicootWallarooOpossumPlatypus

Archonta

Glires

Ungulata

Carnivora

Insectivora

Xenarthra

(Based on Mc Kenna and Bell, 1997)

Page 9: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

9

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

From Sequences to a Phylogenetic Tree

Different genes/proteins may leadto different phylogenetic trees

Page 10: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

10

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

From sequences to a phylogenetic tree

There are many possible types of sequences to use (e.g. mitochondrial vs. nuclear proteins).

Page 11: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

11

DonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPigLittle red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat

Asiatic shrewLong-clawed shrew

MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon

White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus

Perissodactyla

Carnivora

Cetartiodactyla

Rodentia 1

HedgehogsRodentia 2

Primates

ChiropteraMoles+ShrewsAfrotheria

XenarthraLagomorpha

+ Scandentia

Topology1 , based on Mitochondrial DNA(Based on Pupko et al.,)

Page 12: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

12

Round Eared Bat

Flying Fox

Hedgehog

Mole

Pangolin

Whale

Hippo

Cow

Pig

Cat

Dog

Horse

Rhino

Rat

Capybara

Rabbit

Flying Lemur

Tree Shrew

Human

Galago

Sloth

Hyrax

Dugong

Elephant

Aardvark

Elephant Shrew

Opossum

Kangaroo

1

2

3

4

Cetartiodactyla

Afrotheria

Chiroptera

Eulipotyphla

Glires

Xenarthra

CarnivoraPerissodactyla

Scandentia+Dermoptera

Pholidota

Primate

(tree by Madsenl)

(Based on Pupko et al. slide)Topology2 ,based on Nuclear DNA

Page 13: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

13

Theory of Evolution

Basic idea speciation events lead to creation of different

species. Speciation caused by physical separation into

groups where different genetic variants become dominant

Any two species share a (possibly distant) common ancestor

Page 14: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

14

Phylogenenetic trees

Leafs - current day species Nodes - hypothetical most recent common ancestors Edges length - “time” from one speciation to the next

Aardvark Bison Chimp Dog Elephant

Page 15: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

15

Types of Trees

A natural model to consider is that of rooted trees

CommonAncestor

Page 16: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

16

Types of treesUnrooted tree represents the same phylogeny without

the root node

Depending on the model, data from current day species often does not distinguish between different placements of the root.

Page 17: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

17

Rooted versus unrooted treesTree a

ab

Tree b

c

Tree c

Represents all three rooted trees

Page 18: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

18

Positioning Roots in Unrooted Trees

We can estimate the position of the root by introducing an outgroup:

a set of species that are definitely distant from all the species of interest

Aardvark Bison Chimp Dog Elephant

Falcon

Proposed root

Page 19: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

20

Types of Data Distance-based

Input is a matrix of distances between species. Can be fraction of residue they disagree on, or

alignment score between them, etc. Character-based

Input is a multiple sequence alignment. Sequences consist of characters (e.g., residues) that are examined separately.

Genome/Proteome –based Input is whole genome or proteome sequences. No MSA or obvious distance definition.

Page 20: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

21

Tree Construction: Two Popular Methods

Distance Based- A weighted tree that realizes the distances between the objects (or gets close to it).

Character Based – A tree that optimizes an objective function based on all characters in input sequences (major methods are parsimony and likelihood).

We start with distance based methods, considering the following question:Given a set of species (leaves in a supposed tree), and distances between them – construct a phylogeny which best “fits” the distances.

USER
לפני הבניה יש להכניס את משפט 4 הנקודות (מקובץ נפרד), שיחליף את ההוכחה הקודמת שלו בהרצאה 12. כמו כן ייתכן שכדאי לוותר על UPGMA. הערה זו משפיעה כמובן גם על הרצאה 12.שלמה 12.3.03
Page 21: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

22

Exact solution: Additive sets

Given a set M of L objects with an L×L distance matrix:d(i,i)=0, and for i≠j, d(i,j)>0d(i,j)=d(j,i). For all i,j,k it holds that d(i,k) ≤ d(i,j)+d(j,k).

Can we construct a weighted tree which realizes these distances?

Page 22: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

23

Additive sets (cont)

We say that the set M with L objects is additive if there is a tree T, L of its nodes correspond to the L objects, with positive weights on the edges, such that for all i,j, d(i,j) = dT(i,j), the length of the path from i to j in T.

Note: Sometimes the tree is required to be binary, and then the edge weights are required to be non-negative.

Page 23: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

24

Three objects sets always additive:

For L=3: There is always a (unique) tree with one internal node.

( , )( , )( , )

d i j a bd i k a cd j k b c

ab

c

i

j

k

m

Thus0

2

1 )],(),(),([),( jidkjdkidmkdc

Page 24: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

25

How about four objects?

L=4: Not all sets with 4 objects are additive:

e.g., there is no tree which realizes the distances below.

i j k l

i 0 2 2 2

j 0 2 2

k 0 3

l 0

Page 25: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

26

The Four Points Condition

Theorem: A set M of L objects is additive iff any subset of four objects can be labeled i,j,k,l so that:

d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l) We call {{i,j},{k,l}} the “split” of {i,j,k,l}.

ik

lj

Proof:Additivity 4 Points Condition: By the figure...

Page 26: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

27

4P Condition Additivity:Induction on the number of objects, L.For L ≤ 3 the condition is empty and tree exists. Consider L=4. B = d(i,k) +d(j,l) = d(i,l) +d(j,k) ≥ d(i,j) + d(k,l) = A

Let y = (B – A)/2 ≥ 0. Then the tree should look as follows:We have to find the distances a,b, c and f.

a b

i j

k

m

c

y

l

n

f

Page 27: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

28

Tree construction for L=4

a

b

i

j

k

m

c

y

l

n

f

Construct the tree by the given distances as follows:1. Construct a tree for {i, j,k}, with internal vertex m2. Add vertex n ,d(m,n) = y3. Add edge (n,l), c+f=d(k,l)

n

f

n

f

n

fRemains to prove:

d(i,l) = dT(i,l)d(j,l) = dT(j,l)

Page 28: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

29

Proof for L=4

a

b

i

j

k

m

c

y

l

n

f

By the 4 points condition and the definition of y:d(i,l) = d(i,j) + d(k,l) +2y - d(k,j) = a + y + f = dT(i,l) (the middle equality holds since d(i,j), d(k,l) and d(k,j) are realized by the tree)d(j,l) = dT(j,l) is proved similarly.

Page 29: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

30

Induction step for L>4: Remove Object L from the set By induction, there is a tree, T’, for {1,2,…,L-1}. For each pair of labeled nodes (i,j) in T’, let aij, bij, cij be

defined by the following figure:

aij

bij

cij

i

j

L

mij

1[ ( , ) ( , ) ( , )]

2ijc d i L d j L d i j

Page 30: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

31

Induction step: Pick i and j that minimize cij.

T is constructed by adding L (and possibly mij) to T’, as in the figure. Then d(i,L) = dT(i,L) and d(j,L) = dT(j,L)

Remains to prove: For each k ≠ i,j: d(k,L) = dT(k,L).

aij

bij

cij

i

j

L

mij

T’

Page 31: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

32

Induction step (cont.)Let k ≠i,j be an arbitrary node in T’, and let n be the branching point of k in the path from i to j.

By the minimality of cij , {{i,j},{k,L}} is not a “split” of {i,j,k,L}. So assume WLOG that {{i,L},{j,k}} is a

“split” of {i,j, k,L}.

aij

bij

cij

i

j

L

mij

T’

k

n

Page 32: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

33

Induction step (end)

Since {{i,L},{j,k}} is a split, by the 4 points condition

d(L,k) = d(i,k) + d(L,j) - d(i,j)

d(i,k) = dT(i,k) and d(i,j) = dT(i,j) by induction, and

d(L,j) = dT(L,j) by the construction.

Hence d(L,k) = dT(L,k).

QED

aij

bij

cij

i

j

L

mij

T’

k

n

Page 33: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

34

From Additive Distance to a Tree

By following the proof, the four point condition can be used to construct a tree from a distance matrix, or to decide that there is no such tree (namely that the distance is not additive).

But this algorithm will go over all quartets, resulting in O(L4) many steps for L species (too sllllllllllllow).

The most popular method for constructing trees for additive sets uses the neighbor joining approach.

Page 34: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

35

Constructing additive trees:The neighbor joining problem

• Let i, j be sisters (neighboring leaves) in a tree, let k be

their father, and let m be any other vertex.• Using eq.

we can compute the distances from k to all other leaves.

This suggest the following method to construct tree from an

additive distance matrix:

1. Find sisters i,j in the tree,

2. Replace i,j by their father, k, and recursively construct a

tree T for the smaller set.

3. Add i,j as children of k in T.

[ ( , ) ( , )( , ) ( , )]/ 2d i m dd j m d i jk m

Page 35: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

36

Neighbor Finding

How can we find from distances alone a pair of sisters (neighboring leaves)?

Closest nodes are not necessarily neighboring leaves.

AB

CD

Next, we show a way to find neighbors from distances.

Page 36: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

37

Neighbor Finding: Seitou & Nei method

Theorem (Saitou&Nei) Assume d is additive, with all tree edge weights positive. If D(i,j) is minimal (among all pairs of leaves), then i and j are sister

taxa in the tree.

ij

kl

m

T1T2

is a leaf

For a leaf , le ( , )t . im

i r d i m , ).: Let be two leaves (out of leaves in Definition

divergenc ( ,Then their is e ( , ) ( ) /( 2) ) i j

i j L TD i j d i j r r L

The proof is rather involved, and will be skipped (no tears pls).

Page 37: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

39

A simpler neighbor finding method:Select an arbitrary (fixed) node r. For each pair of labeled nodes (i,j) let C(i,j) be defined

by the following expression (also see figure):

C(i,j)

i

j

r

Claim: Let i, j be such that C(i,j) is maximized.Then i and j are neighboring leaves.

)],(),(),([),( jidrjdridjiC 2

1

Page 38: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

40

Sisters Identification: Example

AB

CD

5 4 6

2025

)],(),(),([),( jidrjdridjiC 2

1

Select arbitrarily r=A.

C(B,C)=(15+25-30)/2=5

C(B,D)=(15+34-31)/2=8

C(C,D)=(25+34-49)/2=5

Claim: Let i, j be such that C(i,j) is maximized.Then i and j are neighboring leaves.

Page 39: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

41

Neighbor Joining Algorithm Set M to contain all leaves, and select a root r. |M|=L If L =2, return a tree of two vertices

Iteration: Choose i,j such that C(i,j) is maximal Create a new vertex k, and update distances

remove i,j, and add k to M Recursively construct a tree on the smaller set. When done, add i,j as children on k, at distances d(i,k) and d(j,k).

ij

k

m

[ ( , ) ( , ) ( , )] / 2

( , ) ( , )

1for each other node ,

( , )

(

[ ( , ) ( , ) (

, )

( , , )]2

)

d i j d i r d j r

d i j

d

d

i k

d j k

d

i k

m d i m d j m d jm ik

Page 40: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

42

Complexity of Neighbor Joining Algorithm

Naive Implementation:

Initialization: θ(L2) to compute the C(i,j)’s.

Each Iteration: O(L) to update {C(i,k):i L} for the new node k. O(L2) to find the maximal C(i,j).

Total of O(L3).i

j

k

m

Page 41: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

43

Complexity of Neighbor Joining Algorithm

Using a Heap to store the C(i,j)’s:

Initialization: θ(L2) to compute and heapify the C(i,j)’s.

Each Iteration: O(1) to find the maximal C(i,j). O(L log L) to delete {C(m,i), C(m,j)} and add C(m,k) for

all vertices m.

Total of O(L2 log L).

(implementation details are omitted)

Page 42: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

44

Reconstructing Trees from Additive Matrices

A B C D E

A 0 2 7 4 7

B 0 7 4 7

C 0 7 6

D 0 7

E 0

A

C

1

B

1

1

2

2D

E

3

3

Given a distance matrix constituting an additive metric, the topology of the corresponding additive tree is unique.

Q: Do we have to test additivity before running NJ?

A: This would be bad news, as this takes O(L4) time!

Page 43: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

45

Reconstructing Trees from Additive Matrices

A B C D E

A 0 2 7 4 7

B 0 7 4 7

C 0 7 6

D 0 7

E 0

A

C

1

B

1

1

2

2D

E

3

3

Q: Do we have to test additivity before running NJ?

A: By Seito-Nei, if matrix is additive, NJ will construct the correct tree. Algorithm does not care about awareness and need not know anything about the matrix!

Page 44: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

46

NJ Algorithm: Example

1

( , )n

ij

r d i j

• Identify i,j as neighbours if their divergence is minimal.

• Combine i,j into a new node u.

• update the distance matrix.

• If only 3 nodes are left – finish.

Let ri be the sum of distances

from i to every other node

Here, we use the divergence,

( , ) ( ) /( 2, ) )( i jD d i j r ri Lj

i m

j n

0.1 0.1 0.1

0.40.4

k l

Page 45: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

47

Distance Matrix

A B C D

A 0 2 3 6

B 2 0 3 5

C 3 3 0 6

D 5 6 6 0

17111011 DCBA rrrr

( , ) 8.5

( , ) 8

( , ) 8

( , ) 7.5

( , ) 8.5

( , ) 8

D A B

D A C

D A D

D B C

D B D

D C D

U

BA

Page 46: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

48

Distance Matrix

U C D

U 0 3 5.5

C 3 0 6

D 5.5 6 0

5.1195.8 DCU rrr( , ) 5.75

( , ) 4.5

( , ) 4.25

X U C

X U D

X C D

U

BA

Y

C

Page 47: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

49

Distance Matrix

Y D

Y 0 5.6

D 5.6 0

U

BA

Y

C

D

Z

Page 48: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

50

Reconstructing Trees from non Additive Matrices

Q: What if the distance matrix is not additive?

A: We could still run NJ!

Q: But can anything be said about the resulting tree?

A: Not really. Resulting tree topology could even vary according to way ties are resolved on the way.

Page 49: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

51

Almost Additive Matrix

A distance matrix d’ is “almost additive” if there exists an additive matrix d such that

, ,,

' '| | min{| |} mi(

n2

)i j i j

i j ed d d d

l e

Atteson: If d’ is almost additive with respect to a tree T, then the output of NJ is a tree T’ with the same topology as T

Page 50: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

52

Distance Matrix

Page 51: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

53

Unrooted Tree - NJ

Root

Page 52: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

54

Output - NJ

Branch lengthis proportional

to distance

Page 53: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

55

N-J Method produces an Unrooted, Additive tree

Page 54: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

56

PAM Spinach Rice Mosquito Monkey HumanSpinach 0.0 84.9 105.6 90.8 86.3Rice 84.9 0.0 117.8 122.4 122.6Mosquito 105.6 117.8 0.0 84.7 80.8Monkey 90.8 122.4 84.7 0.0 3.3Human 86.3 122.6 80.8 3.3 0.0

What is required for the Neighbour joining method?

Distance matrix0. Distance Matrix

Neighbor-Joining MethodAn Example

Page 55: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

57

PAM distance 3.3 (Human - Monkey) is the minimum. So we'll join Human and Monkey to MonHum and we'll calculate the new distances.

Mon-Hum

MonkeyHumanSpinachMosquito Rice

1. First Step

Page 56: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

58

After we have joined two species in a subtree we have to compute the distances from every other node to the new subtree. We do this with a simple average of distances:Dist[Spinach, MonHum]

= (Dist[Spinach, Monkey] + Dist[Spinach, Human])/2 = (90.8 + 86.3)/2 = 88.55

Mon-Hum

MonkeyHumanSpinach

2. Calculation of New Distances

Page 57: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

59

PAM Spinach Rice Mosquito MonHumSpinach 0.0 84.9 105.6 88.6Rice 84.9 0.0 117.8 122.5Mosquito 105.6 117.8 0.0 82.8MonHum 88.6 122.5 82.8 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

3. Next Cycle

Page 58: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

60

PAM Spinach Rice MosMonHumSpinach 0.0 84.9 97.1Rice 84.9 0.0 120.2MosMonHum 97.1 120.2 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

Spin-Rice

4. Penultimate Cycle

Page 59: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

61

PAM SpinRice MosMonHumSpinach 0.0 108.7MosMonHum 108.7 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

Spin-Rice

(Spin-Rice)-(Mos-(Mon-Hum))

5. Last Joining

Page 60: Intro to Phylogenetic Trees Computational Genomics Lecture 4b Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and Ido

62

Human

Monkey

MosquitoRice

Spinach

The result:Unrooted Neighbor-Joining Tree