32
. Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny Chor

Intro to Phylogenetic Trees Lecture 5 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran. Slight modifications by Benny

Embed Size (px)

Citation preview

.

Intro to Phylogenetic TreesLecture 5

Sections 7.1, 7.2, in Durbin et al.

Chapter 17 in Gusfield

Slides by Shlomo Moran. Slight modifications by Benny Chor

1.6.04: ההרצאה נגמרה כ 10 דקות לפני הזמן, למרות שלא הזדרזתי - יש מקום להוסיף?

2

Evolution

Evolution of new organisms is driven by

Diversity Different individuals

carry different variants of the same basic blue print

Mutations The DNA sequence

can be changed due to single base changes, deletion/insertion of DNA segments, etc.

Selection bias

3

The Tree of Life

Sou

rce:

Alb

erts

et

al

4

D’après Ernst Haeckel, 1891

Tree of life- a better picture

5

Primate evolution

A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

6

Historical Note Until mid 1950’s phylogenies were constructed by

experts based on their opinion (subjective criteria)

Since then, focus on objective criteria for constructing phylogenetic trees

Thousands of articles in the last decades

Important for many aspects of biology Classification Understanding biological mechanisms

7

Morphological vs. Molecular

Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc.

Modern biological methods allow to use molecular features

Gene sequences Protein sequences

Analysis based on homologous sequences (e.g., globins) in different species

8

Morphological topology

BonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboonWhite-fronted capuchinSlow lorisTree shrewJapanese pipistrelleLong-tailed batJamaican fruit-eating batHorseshoe bat

Little red flying foxRyukyu flying foxMouseRatVoleCane-ratGuinea pigSquirrelDormouseRabbitPikaPigHippopotamusSheepCowAlpacaBlue whaleFin whaleSperm whaleDonkeyHorseIndian rhinoWhite rhinoElephantAardvarkGrey sealHarbor sealDogCatAsiatic shrewLong-clawed shrewSmall Madagascar hedgehogHedgehogGymnureMoleArmadilloBandicootWallarooOpossumPlatypus

Archonta

Glires

Ungulata

Carnivora

Insectivora

Xenarthra

(Based on Mc Kenna and Bell, 1997)

9

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

From sequences to a phylogenetic tree

There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).

10

DonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPigLittle red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat

Asiatic shrewLong-clawed shrew

MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon

White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus

Perissodactyla

Carnivora

Cetartiodactyla

Rodentia 1

HedgehogsRodentia 2

Primates

ChiropteraMoles+ShrewsAfrotheria

XenarthraLagomorpha

+ Scandentia

Mitochondrial topology(Based on Pupko et al.,)

11

Nuclear topology

Round Eared Bat

Flying Fox

Hedgehog

Mole

Pangolin

Whale

Hippo

Cow

Pig

Cat

Dog

Horse

Rhino

Rat

Capybara

Rabbit

Flying Lemur

Tree Shrew

Human

Galago

Sloth

Hyrax

Dugong

Elephant

Aardvark

Elephant Shrew

Opossum

Kangaroo

1

2

3

4

Cetartiodactyla

Afrotheria

Chiroptera

Eulipotyphla

Glires

Xenarthra

CarnivoraPerissodactyla

Scandentia+Dermoptera

Pholidota

Primate

(tree by Madsenl)

(Based on Pupko et al. slide)

12

Theory of Evolution

Basic idea speciation events lead to creation of different

species. Speciation caused by physical separation into

groups where different genetic variants become dominant

Any two species share a (possibly distant) common ancestor

13

Phylogenenetic trees

Leafs - current day species Nodes - hypothetical most recent common ancestors Edges length - “time” from one speciation to the next

Aardvark Bison Chimp Dog Elephant

14

Types of Trees

A natural model to consider is that of rooted trees

CommonAncestor

15

Types of treesUnrooted tree represents the same phylogeny without

the root node

Depending on the model, data from current day species does not distinguish between different placements of the root.

16

Rooted versus unrooted treesTree a

ab

Tree b

c

Tree c

Represents all three rooted trees

17

Positioning Roots in Unrooted Trees

We can estimate the position of the root by introducing an outgroup:

a set of species that are definitely distant from all the species of interest

Aardvark Bison Chimp Dog Elephant

Falcon

Proposed root

18

Type of Data

Distance-based Input is a matrix of distances between species Can be fraction of residue they disagree on, or

alignment score between them, or …

Character-based Examine each character (e.g., residue)

separately

19

Two Methods of Tree Construction

Distance- A weighted tree that realizes the distances between the objects.

Character Based – A tree that optimizes an objective function based on all characters in input sequences (major methods are parsimony and likelihood).

We start with distance based methods, considering the following question:Given a set of species (leaves in a supposed tree), and distances between them – construct a phylogeny which best “fits” the distances.

USER
לפני הבניה יש להכניס את משפט 4 הנקודות (מקובץ נפרד), שיחליף את ההוכחה הקודמת שלו בהרצאה 12. כמו כן ייתכן שכדאי לוותר על UPGMA. הערה זו משפיעה כמובן גם על הרצאה 12.שלמה 12.3.03

20

Exact solution: Additive sets

Given a set M of L objects with an L×L distance matrix:d(i,i)=0, and for i≠j, d(i,j)>0d(i,j)=d(j,i). For all i,j,k it holds that d(i,k) ≤ d(i,j)+d(j,k).

Can we construct a weighted tree which realizes these distances?

21

Additive sets (cont)

We say that the set M with L objects is additive if there is a tree T, L of its nodes correspond to the L objects, with positive weights on the edges, such that for all i,j, d(i,j) = dT(i,j), the length of the path from i to j in T.

Note: Sometimes the tree is required to be binary, and then the edge weights are required to be non-negative.

22

Three objects sets always additive:

For L=3: There is always a (unique) tree with one internal node.

( , )( , )( , )

d i j a bd i k a cd j k b c

ab

c

i

j

k

m

Thus0

2

1 )],(),(),([),( jidkjdkidmkdc

23

How about four objects?

L=4: Not all sets with 4 objects are additive:

eg, there is no tree which realizes the below distances.

i j k l

i 0 2 2 2

j 0 2 2

k 0 3

l 0

24

The Four Points Condition

Theorem: A set M of L objects is additive iff any subset of four objects can be labeled i,j,k,l so that:

d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l) We call {{i,j},{k,l}} the “split” of {i,j,k,l}.

ik

lj

Proof:Additivity 4 Points Condition: By the figure...

25

4P Condition Additivity:Induction on the number of objects, L.For L ≤ 3 the condition is empty and tree exists. Consider L=4. B = d(i,k) +d(j,l) = d(i,l) +d(j,k) ≥ d(i,j) + d(k,l) = A

Let y = (B – A)/2 ≥ 0. Then the tree should look as follows:We have to find the distances a,b, c and f.

a b

i j

k

m

c

y

l

n

f

26

Tree construction for L=4

a

b

i

j

k

m

c

y

l

n

f

Construct the tree by the given distances as follows:1. Construct a tree for {i, j,k}, with internal vertex m2. Add vertex n ,d(m,n) = y3. Add edge (n,l), c+f=d(k,l)

n

f

n

f

n

fRemains to prove:

d(i,l) = dT(i,l)d(j,l) = dT(j,l)

27

Proof for L=4

a

b

i

j

k

m

c

y

l

n

f

By the 4 points condition and the definition of y:d(i,l) = d(i,j) + d(k,l) +2y - d(k,j) = a + y + f = dT(i,l) (the middle equality holds since d(i,j), d(k,l) and d(k,j) are realized by the tree)d(j,l) = dT(j,l) is proved similarly.

28

Induction step for L>4: Remove Object L from the set By induction, there is a tree, T’, for {1,2,…,L-1}. For each pair of labeled nodes (i,j) in T’, let aij, bij, cij be

defined by the following figure:

aij

bij

cij

i

j

L

mij

1[ ( , ) ( , ) ( , )]

2ijc d i L d j L d i j

29

Induction step: Pick i and j that minimize cij.

T is constructed by adding L (and possibly mij) to T’, as in the figure. Then d(i,L) = dT(i,L) and d(j,L) = dT(j,L)

Remains to prove: For each k ≠ i,j: d(k,L) = dT(k,L).

aij

bij

cij

i

j

L

mij

T’

30

Induction step (cont.)Let k ≠i,j be an arbitrary node in T’, and let n be the branching point of k in the path from i to j.

By the minimality of cij , {{i,j},{k,L}} is not a “split” of {i,j,k,L}. So assume WLOG that {{i,L},{j,k}} is a

“split” of {i,j, k,L}.

aij

bij

cij

i

j

L

mij

T’

k

n

31

Induction step (end)

Since {{i,L},{j,k}} is a split, by the 4 points condition

d(L,k) = d(i,k) + d(L,j) - d(i,j)

d(i,k) = dT(i,k) and d(i,j) = dT(i,j) by induction, and

d(L,j) = dT(L,j) by the construction.

Hence d(L,k) = dT(L,k).

QED

aij

bij

cij

i

j

L

mij

T’

k

n

32

Dangers of Paralogs

Speciation events

Gene Duplication

1A 2A 3A 3B 2B 1B

If we happen to consider genes 1A, 2B, and 3A of species 1,2,3, we get a wrong tree that does not represent the phylogeny of the host species of the given sequences because duplication does not create new species.

In the sequel we assume all given sequences are orthologs.

S

SS