53
Bastien Boussau LBBE, CNRS, Université de Lyon Genome-scale phylogenomics

Genome-scale phylogenomics

  • Upload
    boussau

  • View
    183

  • Download
    1

Embed Size (px)

Citation preview

Bastien Boussau

LBBE, CNRS, Université de Lyon

Genome-scale phylogenomics

Collaborators• Lyon collaborators:

• Adrián Arellano Davín

• Gergely Szöllősi (Budapest),

• Eric Tannier,

• Vincent Daubin,

• Thomas Bigot,

• Magali Semeria,

• Manolo Gouy,

• Laurent Duret

• Austin collaborators:

• Siavash Mirarab

• Md. Shamsuzzoha Bayzid

• Tandy Warnow

• RevBayes collaborators:

• Sebastian Hoehna • Michael Landis • Tracy Heath • Fredrik Ronquist • Brian Moore • John Huelsenbeck • …

To study genome evolution:

1. One species tree:

!!!

2. Thousands of gene trees:

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

To study genome evolution:

1. One species tree:

!!!

2. Thousands of gene trees:

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Why our current pipeline can be improved

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

Why our current pipeline can be improved

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

Why our current pipeline can be improved

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

•Gene trees: •based on alignments •Point estimates

Why our current pipeline can be improved

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

•Gene trees: •based on alignments •Point estimates

•Species trees: •based on gene trees

Why our current pipeline can be improved

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

•Gene trees: •based on alignments •Point estimates

•Species trees: •based on gene trees

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

D

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

D DL

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGTD DL

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILSD DL

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILS

DL: Boussau et al., Genome Research 2013

D DL

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILS

DL: Boussau et al., Genome Research 2013

D DLDL+T:!

Szöllősi et al. "PNAS 2013

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILSILS: !

Mirarab et al. Science 2014

DL: Boussau et al., Genome Research 2013

D DLDL+T:!

Szöllősi et al. "PNAS 2013

(thousands of alignments)

PHYLDOG

All gene families

Rooted species tree,numbers of duplications

and losses,rooted gene trees D1

D2

D3D4

D5

D6

L2L1

L4L3

L5

L6

Joint reconstruction of the species tree, gene trees, and

numbers of duplications and losses

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

D1D3

D2 D4

D5 D6

L1L3

L2 L4

L5 L6

Boussau et al., Genome Research 2013

(thousands of alignments)

PHYLDOG

All gene families

Rooted species tree,numbers of duplications

and losses,rooted gene trees D1

D2

D3D4

D5

D6

L2L1

L4L3

L5

L6

Joint reconstruction of the species tree, gene trees, and

numbers of duplications and losses

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

D1D3

D2 D4

D5 D6

L1L3

L2 L4

L5 L6

Probabilistic models: • sequence evolution • gene family evolution

Boussau et al., Genome Research 2013

Sus scrofa

Felis catus

Ornithorhynchus anatinus

Oryctolagus cuniculus

Loxodonta africana

Mus musculus

Gorilla gorilla

Dipodomys ordii

Monodelphis domestica

Vicugna pacos

Macaca mulatta

Tupaia belangeri

Procavia capensis

Spermophilus tridecemlineatus

Pongo pygmaeus

Tursiops truncatus

Microcebus murinus

Callithrix jacchus

Equus caballus

Erinaceus europaeus

Tarsius syrichta

Choloepus hoffmanni

Ochotona princeps

Cavia porcellus

Pan troglodytes

Bos taurus

Rattus norvegicus

Homo sapiens

Otolemur garnettii

Dasypus novemcinctusEchinops telfairi

Pteropus vampyrus

Macropus eugenii

Canis familiaris

Sorex araneus

Myotis lucifugus

Laurasiatheria

Afrotheria

Xenarthra

Marsupials

Primates

Glires

010

000

010

000

010

000

010

000

010

000

010

000

010

000PHYLDOG

TreeBeSTPhyML

PHYLDOG: better trees for better ancestral genomes

An example gene family

0.1

Ornithorhynchus anatinus

0.3

Ornithorhynchus anatinusMus musculusMus musculusMus musculusCavia porcellusMus musculus

Oryctolagus cuniculusCanis familiaris

Bos taurusHomo sapiens

Pongo pygmaeusOryctolagus cuniculus

Cavia porcellusEquus caballusEquus caballus

Bos taurusCallithrix jacchusHomo sapiens

Monodelphis domesticaSpermophilus tridecemlineatus

Homo sapiensOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinus

Mus musculusMus musculus

Ornithorhynchus anatinusOrnithorhynchus anatinus

Mus musculusMus musculusMus musculus

Cavia porcellus

Mus musculus

Oryctolagus cuniculus

Canis familiaris

Bos taurus

Homo sapiens

Pongo pygmaeus

Oryctolagus cuniculus

Cavia porcellus

Equus caballusEquus caballus

Bos taurus

Callithrix jacchusHomo sapiens

Monodelphis domestica

Spermophilus tridecemlineatus

Homo sapiens

Ornithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinus

Mus musculusMus musculus

TreeBeST PHYLDOG

Boussau et al., Genome Research 2013

Species: A B C D

TIME

ILS: !Mirarab et al. Science 2014

DL: Boussau et al., Genome Research 2013DL+T:!

Szöllősi et al. "PNAS 2013

Species: A B C D

TIME

LGT ILSILS: !

Mirarab et al. Science 2014

DL: Boussau et al., Genome Research 2013

D DLDL+T:!

Szöllősi et al. "PNAS 2013

Gene transfers and the quixotic pursuit of the TOL

Doolittle WF, Science 1999

Gene transfers and the quixotic pursuit of the TOL

Doolittle WF, Science 1999

Gene transfers and the quixotic pursuit of the TOL

Doolittle WF, Science 1999

“The monistic concept of a single universal tree appears […] increasingly obsolete. […][It is] no longer the most scientifically productive position to hold[…][It] accounts for only a minority of observations from genomes.”!

Bapteste, O’Malley, Beiko, Ereshefsky, Gogarten, Franklin-Hall, Lapointe, Dupré, Dagan, Boucher, Martin, !

Biology Direct 2009.

Using transfers to date clades

?T IM E

Using transfers to date clades

?T IM E

Using transfers to date clades

?T IM E

Using transfers to date clades

?T IM E

Using transfers to date clades

?T IM E

Using transfers to date clades

?T IM E

Using transfers to date clades

?T IM E

Because we can identify gene transfers, we have information for ordering the nodes of a species tree

Bayesian species tree inference

accounting for DTL events

• STRALE: • A Bayesian probabilistic method that can interpret thousands of

gene trees in terms of: • speciation events • duplication events (D) • transfer events (T) • loss events (L)

• A method able to estimate the DTL rates • A method able to reconstruct the species tree • A method able to order the nodes of the species tree

Simulation to test the species tree reconstruction• 20 species • 200 gene families

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

1 5

1

3

14

10

6

8

12

18

13

5

4

2

9

0

11

19

7

16

17

0.0 0.25 0.5 0.75 1.0 1.25

2

13

7

17

15

1

5

12

10

16

11

9

0

4

8

3

14

19

6

18

Simulated Inferred

Better gene trees, fewer transfers

Usual approach

ALE+DTL

RF d

ista

nce

to re

al tr

ee

Szöllősi et al., Syst. Biol. 2013

Better gene trees, fewer transfers

Usual approach

ALE+DTL

Tran

sfer

eve

nts

per f

amily

Usual approach

ALE+DTL

RF d

ista

nce

to re

al tr

ee

Szöllősi et al., Syst. Biol. 2013

Better gene trees, fewer transfers

Usual approach

ALE+DTL

Tran

sfer

eve

nts

per f

amily

Usual approach

ALE+DTL

RF d

ista

nce

to re

al tr

ee

Szöllősi et al., Syst. Biol. 2013

Better ancestral genomes:

go see Adrián Arellano Davín’s poster on reconstructing ancestral genomes across the

tree of life!

Species: A B C D

TIME

ILS: !Mirarab et al. Science 2014

DL: Boussau et al., Genome Research 2013DL+T:!

Szöllősi et al. "PNAS 2013

Species: A B C D

TIME

LGT ILSILS: !

Mirarab et al. Science 2014

DL: Boussau et al., Genome Research 2013

D DLDL+T:!

Szöllősi et al. "PNAS 2013

18

Statistical binning

Mirarab et al., Science 2014

18

Statistical binning

Mirarab et al., Science 2014

MP-EST

19

Statistical binning

Mirarab et al., Science 2014

MP-EST

19

Statistical binning

Mirarab et al., Science 2014

MP-EST

MP-EST

20

Statistical binning improves

species tree inference

Mirarab et al., Science 2014

21

Statistical binning

Mirarab et al., Science 2014

22

Jarvis et al., Science 2014Statistical binning and birds

RevBayes

• Collaborative effort

• Model-based phylogenetics

• Many models of sequence evolution

• Models for dating

• Models for phylogeography

• Models for continuous traits

• Models for gene tree/species tree inference

• http://revbayes.net

• Sebastian Hoehna • Michael Landis • Tracy Heath • Fredrik Ronquist • Nicolas Lartillot • Brian Moore • John Huelsenbeck • …

Conclusions

• We develop methods for gene tree and species tree inference

• Improvement of gene trees and species trees in the presence of:

• duplications and losses,

• transfers,

• incomplete lineage sorting

• Parallel algorithms applicable to genome-scale data

Thanks!

• Lyon collaborators:

• Adrián Arellano Davín

• Gergely Szöllősi (Budapest),

• Eric Tannier,

• Vincent Daubin,

• Thomas Bigot,

• Magali Semeria,

• Manolo Gouy,

• Laurent Duret

• Austin collaborators:

• Siavash Mirarab

• Md. Shamsuzzoha Bayzid

• Tandy Warnow

Thanks!

• Lyon collaborators:

• Adrián Arellano Davín

• Gergely Szöllősi (Budapest),

• Eric Tannier,

• Vincent Daubin,

• Thomas Bigot,

• Magali Semeria,

• Manolo Gouy,

• Laurent Duret

• Austin collaborators:

• Siavash Mirarab

• Md. Shamsuzzoha Bayzid

• Tandy Warnow

Go see Adrián Arellano Davín’s poster on reconstructing ancestral genomes across the tree of life!