Molecular Phylogeny in a context of possible Lateral Gene Transfers Eric Bapteste W.F. Doolittle Lab

Preview:

Citation preview

Molecular Phylogeny in a context of possible Lateral Gene Transfers

Eric Bapteste

W.F. Doolittle Lab

The reason(s) why we doubt a strict tree-like representation

should be used

• Biological processes favoring lateral exchanges of DNA... are powerful

• Phylogenetic evidence for a unique Tree of Life are weak

• Molecular phylogenies might even suggest that LGT happens

… at least in some lineages

Biological Processes contribute to lateral exchanges of DNA

Internal source of variation

Mutator phenotype

Baseline replicationerrors (point mutations)

Intragenomic recombination(legitimate and illegitimate)

Hypervariable loci

Genome of the Organism

Deletion of geneticmaterial (Gene loss)

Gene duplication

Vertical inheritance

Genome of the Descendent

External source of variation

DNA viruseslytic RNA virusesretroviruses

Conjugativeplasmids andtransposons

DNA from divergentlineage

Transduction

Transformation

Conjugation

Horizontalinheritance

Cell fusions

membranevesicle transfer

Phylogenetic evidence for a unique Tree of Life are weak

“The general lack of conflict observed among the 203 remaining families was not due to the absence of phylogenetic signal in the gene alignments because most genes did conflict with several other topologies (see Figure 3). We interpreted this congruence as a reflection of shared history and a lack of LGT. Therefore, we chose these genes as the basis for inferring the true organismal phylogeny for these 13 species.”

Gamma-proteobacteria: an apparent agreement on a tree

Lerat E et al., PLoS Biol. 2003 Oct;1(1):E19.

AU test

0

20

40

60

80

100

120

140

160

180

200

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105

0

40

80

120

160

200

SH test

1

Blue : non different from the ML tree (5%)

Red: different from the ML tree

Topologies

Num

ber

of a

lignm

ents Phylogenetic evidence for a

unique Tree of Life are weak

Testing the congruence/conflict between markersTopologies

Genes

1

2

3

4

5

6

7

0

0,1

4,1

0,6

0,1

0,4

0,3

4

5,4

7,8

4,3

4,7

3,1

9,7

13

13

0,4

17

9

22

47

42

27

0

33

37

29

41

0,2

1,7

3,3

0,1

0

0,4

0,5

…..

R²=0,9

R²=0,05

Principal Component Analysis of p-values for each gene and topology

62 1

4 75 3

-1.5

-1

-0.5

0

0.5

1

1.5

-2 -1.5 -1 -0.5 0 0.5 1

genes1 LGT event

Principal Component Analysis of 205 genes of gamma-proteobacteriaand simulated markers with transfers

-1.5

-1

-0.5

0

0.5

1

1.5

-2 -1.5 -1 -0.5 0 0.5 1

genes

1 LGT event

2 LGT events

Principal Component Analysis of 205 genes of gamma-proteobacteriaand simulated markers with transfers

-1

-1.5

-1

-0.5

0

0.5

1

1.5

-2 -1.5 -0.5 0 0.5 1

genes

1 LGT event

2 LGT events

3 LGT events

Principal Component Analysis of 205 genes of gamma-proteobacteriaand simulated markers with transfers

-1.5

-1

-0.5

0

0.5

1

1.5

-2 -1.5 -1 -0.5 0 0.5 1

genesRandom1 LGT event2 LGT events3 LGT events

Principal Component Analysis of 205 genes of gamma-proteobacteriaand simulated markers with transfers

GENE NUMBER i

TO

PO

LO

GIE

S N

UM

BE

R i

P-value

GENES

TO

PO

LO

GIE

S

CLUSTER OF

GENES

CL

US

TE

R O

F P

LA

US

IBL

E

TO

PO

LO

GIE

S

BLUE:Genes with LGT

RED:genes

CLUSTER OF

GENES

CL

US

TE

R O

F P

LA

US

IBL

E

TO

PO

LO

GIE

S

CLUSTER OF GENES C

LU

ST

ER

OF

PL

AU

SIB

LE

T

OP

OL

OG

IES

1 2 34

genes clearlyshowing lateral

transfergenes showing nothing clearly

genes clearly showing vertical

descent

enthusiastic lateralists

committed verticalists

INCONGRUENCE OF ORTHOLOGOUS GENES: HOW MUCH IS NOISE, HOW MUCH IS TRANSFER (ORTHOLOGOUS REPLACEMENT)? TRUTH IS, NO ONE REALLY KNOWS

What we propose to do

A synthesis

Vertical part Horizontal part

Principles to make a synthesis

Reference phylogeny

ABCDEF

ABCDEFPhylogeny of gene 2

AB

CDE

F99

99

Synthesis

Phylogeny of gene 1

F

B

CDE

A

99

9999

From a tree …

ML TreesBV > 50strict consensus

… to a synthesis

Conclusions

We need better trees to have better synthesis

LGT should be accounted for when reconstructing the evolutionary history

Many interesting biological and epistemological avenues to explore in the near future

Many thanks to The Doolittle and Roger labs

Celine BrochierYan Boucher

Dave MacLeod Robert Charlebois

Jessica Leigh

Ed Susko Ford DoolittleDavid Walsh

Topology

I respect ( and more) Vincent DaubinThe reason why my interpretation of the dataset is different :

- I believe that these most of these genes do not contain enough phylogenetic signal to tell the whole history of gamma proteobacteria alone

This is the very motive for concatenation: genes are too weak alone

However, based on biological evidence, transfer could have happened,

- so we should not prejudge that these genes with a unknown history have been transmitted only vertically. In context of LGT, concatenation is not safe a priori.

In other words, in the possible presence of LGT,« when we do not know, we do not know! »

- Test concatenations of markers of entirely simulated data, full of transfers, also gives robust phylogenies (Douady and Doolittle, unpublished)

So, even a good support for a tree coming from a concatenation is no garantee that the true history has been recovered. Careful analyses of each marker are required.

- During these analyses, if we also see some conflict. We should show it, and then do a synthesis instead of a tree

The phylogenetic signal is not robust over the whole Synthesis: basal branches are poorly supported.

Distribution of the phylogenetic signal along the synthesis

00.10.20.30.40.50.60.70.80.91

1 2 3 4 5 6 7 8distance from the root

Total phylogenetic signal

Longest consecutive vertical path supported

7 564

3

2

More precisely, many inner nodes are only supported by a minority of the genes (in purple). There are always genes (in dark green) for which we ignore their phylogenetic

history.

0

50

100

150

200

1A 1B 2A 2B 3A 4A 4B 5A 5B 6A 6BBap

hiEco

liHinf

lPae

r

Pmult

Styphi

VcholW

iggXax

o

Xcam

pXfa

st

YpesC

O92

YpesK

IM

Horizontal and vertical inheritance Mode of transmission unknown

Xfast

Xaxo

Xcamp

Paer

Wigg

Baphi

Vchol

Pmult

Hinfl

YpesCO92

YpesKIM

Styphi

Ecoli A brief view of the differences between the 16 plausible topologies (AU test, 5%)

What are the main evolutionary routes?

GenesGenes

The road of relationshipsAre there main routes? Unique routes? Side-issues?

Are the genes involved in LGT especially mobile ones?

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

au

Average p-value (AU test) for each topology over all the genes

The average p-value of the best tree for each gene is: 0.83

The concatenate tree is a good “average”, but for most genes is not the best tree

Concatenate tree

Electric network Railway network

Maritime networkCrystal Web

NEW QUESTIONS: Optimisation, functionality, economy, shorter paths ?

70 Bacteria

Genes which were Laterally transferred: 0rpl4_1.puz_bip.out, 0efg_1.puz_bip.out, 0rpl18_1.puz_bip.out, 0fmt_1.puz_bip.out

Archaea 70

Euka 70

Chloro 70

Strict ConsensusBV > 50 %

Genes which were Laterally transferred:gp25boocon.txt.out, gp46boocon.txt.out

These two events of transfers make a support for two phylogenetic relationships:the last common ancestor of (133, rb69 and T4) would have given two genes to the last common ancestor of 25, 31, and 44RR

“A radical departure from conventional thinking”W. Martin/M. Embley

Me crazy, but on the shoulders of many philosophers:

Leibniz, Whitehead, Deleuze, Parrochia, etc.

“A radical departure from thinking?”

Rivera and L

ake, Nature, 2004

ROOT OF THE RING

B

HM

E

PY1

Y2

E

HM

B

P

Y1Y2

P

BH

M

E

Y1Y2

B

HM

P

EY1

Y2

E

HM

P

B

Y1Y2

60.5% 16.8% 10%

7.2%1.8%

H

MB

E

P

Y1

Y2Unknown descendent

Unknown descendent 16.8

16.8

10

10

7.2

1.8

96.3

77.7 79.1 89.1

96.3

Y1Y2

PE

BH

MUnknown Descendent

Unknown Descendent

96.3

79.1 77.7 94.596.3

10

10

16.8

16.8

1.8

7.2

CLUSTER OFGENES

CL

US

TE

R O

F P

LA

US

IBL

E

TO

PO

LO

GIE

S

We can question:-the choice of the drawing of evolution

-if a non-tree like null hypothesis should not be considered to build evolutionary scenarios

Heuristic of the synthesis...

There are 26 vertical branches and 11 lateral branchesThe total vertical thickness is about 13 times more important than the total horizontal thickness Yet, 18 genes were laterally transferred8 lateral branches are mostly compatible with the reference tree3 lateral branches are mostly incompatible with the reference treeThus, 72.7% of LGT are mostly compatible with the reference tree