40
. Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al.

Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

.

Phylogenetic TreesLecture 11

Sections 7.1, 7.2, in Durbin et al.

1.6.04: ההרצאה נגמרה כ 10 דקות לפני הזמן, למרות שלא הזדרזתי - יש מקום להוסיף?1.05: הוספתי את משפט סייטו וניי (ללא ההוכחה), עדיין סיימתי 5 דקות לפני הזמן, בניחותא. שעה ראשונה הגעתי עד 24 (הגדרת עצים אדיטיביים).1.06: שינויים קלים (הורדת תמונה, פרוט דוגמאות והוכחות). ההרצאה נגמרה בזמן בקצב איטי והרבה שאלות. יש מקום להוסיף שקף לבניה במקרה של n=4
a simpler proof of the 4 points condition was inserted after class
Page 2: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

2

Evolution

Evolution of new organisms is driven by

Diversity Different individuals

carry different variants of the same basic blue print

Mutations The DNA sequence

can be changed due to single base changes, deletion/insertion of DNA segments, etc.

Selection bias

Page 3: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

3

The Tree of Life

Sou

rce:

Alb

erts

et

al

Page 4: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

4

Primate evolution

A phylogeny is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species; also called a phylogenetic tree.

Page 5: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

5

Historical Note Until mid 1950’s phylogenies were constructed by

experts based on their opinion (subjective criteria)

Since then, focus on objective criteria for constructing phylogenetic trees

Thousands of articles in the last decades

Important for many aspects of biology Classification Understanding biological mechanisms

Page 6: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

6

Morphological vs. Molecular

Classical phylogenetic analysis: morphological features: number of legs, lengths of legs, etc.

Modern biological methods allow to use molecular features

Gene sequences Protein sequences

Analysis based on homologous sequences (e.g., globins) in different species

Page 7: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

7

Morphological topology

BonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboonWhite-fronted capuchinSlow lorisTree shrewJapanese pipistrelleLong-tailed batJamaican fruit-eating batHorseshoe bat

Little red flying foxRyukyu flying foxMouseRatVoleCane-ratGuinea pigSquirrelDormouseRabbitPikaPigHippopotamusSheepCowAlpacaBlue whaleFin whaleSperm whaleDonkeyHorseIndian rhinoWhite rhinoElephantAardvarkGrey sealHarbor sealDogCatAsiatic shrewLong-clawed shrewSmall Madagascar hedgehogHedgehogGymnureMoleArmadilloBandicootWallarooOpossumPlatypus

Archonta

Glires

Ungulata

Carnivora

Insectivora

Xenarthra

(Based on Mc Kenna and Bell, 1997)

Page 8: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

8

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

From sequences to a phylogenetic tree

There are many possible types of sequences to use (e.g. Mitochondrial vs Nuclear proteins).

Page 9: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

9

DonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPigLittle red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat

Asiatic shrewLong-clawed shrew

MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon

White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus

Perissodactyla

Carnivora

Cetartiodactyla

Rodentia 1

HedgehogsRodentia 2

Primates

ChiropteraMoles+ShrewsAfrotheria

XenarthraLagomorpha

+ Scandentia

Mitochondrial topology(Based on Pupko et al.,)

Page 10: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

10

Nuclear topology

Round Eared Bat

Flying Fox

Hedgehog

Mole

Pangolin

Whale

Hippo

Cow

Pig

Cat

Dog

Horse

Rhino

Rat

Capybara

Rabbit

Flying Lemur

Tree Shrew

Human

Galago

Sloth

Hyrax

Dugong

Elephant

Aardvark

Elephant Shrew

Opossum

Kangaroo

1

2

3

4

Cetartiodactyla

Afrotheria

Chiroptera

Eulipotyphla

Glires

Xenarthra

CarnivoraPerissodactyla

Scandentia+Dermoptera

Pholidota

Primate

(tree by Madsenl)

(Based on Pupko et al. slide)

Page 11: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

11

Theory of Evolution

Basic idea speciation events lead to creation of different

species. Speciation caused by physical separation into

groups where different genetic variants become dominant

Any two species share a (possibly distant) common ancestor

Page 12: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

12

Phylogenenetic trees

Leaves - current day species (or taxa – plural of taxon) Internal vertices - hypothetical common ancestors Edges length - “time” from one speciation to the next

Aardvark Bison Chimp Dog Elephant

Page 13: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

13

Dangers in Molecular Phylogenies

We have to emphasize that gene/protein sequence can be homologous for several different reasons:

Orthologs -- sequences diverged after a speciation event

Paralogs -- sequences diverged after a duplication event

Xenologs -- sequences diverged after a horizontal transfer (e.g., by virus)

USER
אסתי יגר לוטם 31.12.02: פרלןג - גנים ששוכפלו בתוך אותו יצור, וקבלו תכונות שונות. השואת גנים (למשל המוגלובין אלפה ובטה, גם אצל עכבר וגם אצל אדם - השואת אלפה של אדם עם בטה של עכבר תיתו נתונים מוטעים על מידת המרחק)
Page 14: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

14

Dangers of Paralogs

1 2 3

Consider evolutionary tree of three taxa:

…and assume that at some point in the past a gene duplication event occurred.

Gene Duplication

Page 15: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

15

Dangers of Paralogs

Speciation events

Gene Duplication

1A 2A 3A 3B 2B 1B

The gene evolution is described by this tree (A, B are the copies of the same gene).

Page 16: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

16

Dangers of Paralogs

Speciation events

Gene Duplication

1A 2A 3A 3B 2B 1B

If we happen to consider genes 1A, 2B, and 3A of species 1,2,3, we get a wrong tree that does not represent the phylogeny of the host species of the given sequences because duplication does not create new species.

In the sequel we assume all given sequences are orthologs – created from a common ancestor by specification events.

S

SS

Page 17: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

17

Types of Trees

A natural model to consider is that of rooted trees

CommonAncestor

Page 18: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

18

Types of treesUnrooted tree represents the same phylogeny without

the root node

Depending on the model, data from current day species does not distinguish between different placements of the root.

Page 19: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

19

Rooted versus unrooted treesTree a

ab

Tree b

c

Tree c

Represents the three rooted trees

Page 20: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

20

Positioning Roots in Unrooted Trees

We can estimate the position of the root by introducing an outgroup:

a set of species that are definitely distant from all the species of interest

Aardvark Bison Chimp Dog Elephant

Falcon

Proposed root

Page 21: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

21

Type of Data

Distance-based Input is a matrix of distances between species Can be fraction of residue they disagree on, or

alignment score between them, or …

Character-based Examine each character (e.g., residue)

separately

Page 22: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

22

Two Methods of Tree Construction

Distance- A weighted tree that realizes the distances between the objects.

Parsimony – A tree with a total minimum number of character changes between nodes.

We start with distance based methods, considering the following question:Given a set of species (leaves in a supposed tree), and distances between them – construct a phylogeny which best “fits” the distances.

USER
לפני הבניה יש להכניס את משפט 4 הנקודות (מקובץ נפרד), שיחליף את ההוכחה הקודמת שלו בהרצאה 12. כמו כן ייתכן שכדאי לוותר על UPGMA. הערה זו משפיעה כמובן גם על הרצאה 12.שלמה 12.3.03
Page 23: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

23

Exact solution: Additive sets

Given a set M of L objects with an L×L distance matrix:d(i,i)=0, and for i≠j, d(i,j)>0d(i,j)=d(j,i). For all i,j,k it holds that d(i,k) ≤ d(i,j)+d(j,k).

Can we construct a weighted tree which realizes these distances?

Page 24: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

24

Additive sets (cont)

We say that the set M with L objects is additive if there is a tree T, L of its nodes correspond to the L objects, with positive weights on the edges, such that for all i,j, d(i,j) = dT(i,j), the length of the path from i to j in T.

Note: Sometimes the tree is required to be binary, and then the edge weights are required to be non-negative.

Page 25: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

25

Three objects sets are additive:

For L=3: There is always a (unique) tree with one internal node.

( , )( , )( , )

d i j a bd i k a cd j k b c

ab

c

i

j

k

m

For instance0

2

1 )],(),(),([),( jidkjdkidmkdc

i j k

i 0 a+b a+c

j 0 b+c

k 0

Page 26: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

26

How about four objects?

L=4: Not all sets with 4 objects are additive:

eg, there is no tree which realizes the below distances.

i j k l

i 0 2 2 2

j 0 2 2

k 0 3

l 0

Page 27: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

27

The Four Points Condition

A necessary condition for a set of four objects to be additive: its objects can be labeled i,j,k,l so that:

d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l)

Proof: By the figure...

{{i,j},{k,l}} is a “split” of {i,j,k,l}.

ik

lj

Page 28: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

28

The Four Points Condition

Definition: A set M of L objects satisfies the four points condition iff any subset of four objects can be labeled i,j,k,l so that:

d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l)

ik

lj

Page 29: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

29

The Four Points ConditionTheorem: The following 3 conditions are equivalent for a distance matrix D on a set M of L

objects

1. D is additive

2. D satisfies the four points condition for all quartets in M.

3. There is an object r in M, s.t. D satisfies the 4 points condition for all quartets that include r.

ik

lj

Page 30: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

30

The Four Points Condition

Proof: we’ll show that 1231.1 2Additivity 4P Condition satisfied by al quartets: By the figure...

ik

lj

23: trivial

Page 31: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

31

Proof that 3 1

Induction on the number of objects, L.For L ≤ 3 the condition is trivially true and a tree exists.

For L=4: Consider 4 points which satisfy d(i,k) +d(j,l) = d(i,l) +d(j,k) ≥ d(i,j) + d(k,l)

a b

i j

k

m

c

y

l

n

fWe will construct a tree T with 4 leaves, s.t. dT(,x,y) = d(x,y) for each pair x,y in {i,j,k,l},

Page 32: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

32

Tree construction for L=4

i

j

k

m

l

Assume split {{i,j},{k,l}}: d (i,j)+d (k,l) d (j,k)+d (i,l)

1. Construct a tree for {i, j,k}, with internal vertex m2. Construct a tree for {i,k,l}, by adding the vertex n and the edge (n,l).

n

The construction guarantees that dT(,x,y)=d(x,y) for all (x,y) except (j,l).

1.06: ייתכן שיש מקום להוסיף שקף להוכחת הקיום והיחידות של הבניה.
Page 33: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

33

Tree construction for L=4

i

j

k

m

l

n

dT(,x,y)=d(x,y) for all (x,y) except (j,l).

Thus, since dT(i,j) + dT(k,l) dT(j,k) + dT (i,l), {{i,j},{k,l}} is a split of the tree T.

By the proof that 12, we have for the tree T: d(j,l) = d(i,l)+ d(j,k)- d(i,k)=dT(i,l)+ dT (j,k)- dT (i,k)= dT(j,l)

And hence dT(x,y)=d(x,y) for all x,y.

Page 34: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

34

i

j

k l

Corollary from the constructionCorollary F: If d(i,k) +d(j,l) = d(i,l) +d(j,k) ≥ d(i,j) + d(k,l), then there is a unique tree which realizes all the distances except d(j,l), and this tree realizes also the distance d(j,l).*

*(j,l) can be replaced by any pair in {i,j}{k,l}.

Page 35: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

35

Induction step for L>4: For each pair of labeled nodes (i,j) in T’, let cij be

defined by the following figure:

cij

i

j

r

mij

1[ ( , ) ( , ) ( , )]

2ijc d i r d j r d i j

Pick i and j that maximize cij.

Page 36: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

36

Induction step: Construct (by induction) T’ on M \{i}. Add i (and possibly mij) to T’, as in the figure. Then d(i,r) = dT(i,L) and d(j,r)

= dT(j,r)

Remains to prove: For each k {r ,j} it holds that : d(i,k) = dT(i,k).

cij

i

j

r

mij

T’

Page 37: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

37

Induction step (cont.)Let k ≠i,r be an arbitrary node in T’. The maximality of cij means that {{r,k},

{i,j}} is a split of {i,j,k,r}.

Thus, by Corollary F, since d(x,y)=dT(x,y) for each x,y in {i,j,k,r}, except

d(k,i), we have also that d(k,i)=dT(k,i) too.

cij

i

j

r

mij

T’

k

Page 38: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

38

Constructing additive trees:The neighbor joining problem

Let i, j be neighboring leaves in a tree, let k be their parent, and let

m be any other vertex.

The formula

shows that we can compute the distances of k to all other leaves.

This suggest the following method to construct tree from a

distance matrix:

1. Find neighboring leaves i,j in the tree,

2. Replace i,j by their parent k and recursively construct a tree T

for the smaller set.

3. Add i,j as children of k in T.

)],(),(),([),( jidmjdmidmkd 2

1

Page 39: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

39

Neighbor Finding

How can we find from distances alone a pair of nodes which are neighboring leaves?

Closest nodes aren’t necessarily neighboring leaves.

AB

CD

Next we show one way to find neighbors from distances.

Page 40: Phylogenetic Trees Lecture 11 Sections 7.1, 7.2, in Durbin et al

40

Neighbor Finding: Seitou&Nei method

Theorem (Saitou&Nei) Assume all edge weights are positive. If D(i,j) is minimal (among all pairs of leaves), then i and j are neighboring leaves in the tree.

)(),()(),(

:,

.),(

ji

ui

rrjidLjiD

ji

uidri

2

leavesFor

let , leaf aFor leaf a is

Definitions