25
Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering, University of Connecticut, USA The Phylogenetic Networks Workshop July 27, 2015

Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Deciphering Reticulate Evolution UsingPhylogenetic Reconciliation

Mukul S. Bansal

Department of Computer Science and Engineering,University of Connecticut,

USA

The Phylogenetic Networks WorkshopJuly 27, 2015

Page 2: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Supertrees or Supernetworks?

Page 3: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Gene Family Evolution

ProblemHow did any given gene family evolve?

I Gene families evolve inside species trees.I Affected by evolutionary events such as gene duplication,

horizontal gene transfer, and gene loss.

A B C D A B C D

y

x

y

z

a1 c1 b1 d1a2 c2

Species Tree S

x

z

Evolution of Gene Family G

Duplication

Loss

Transfer

Gene Tree on G

Page 4: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Definition: DTL Reconciliation

A B C D

y

Species Tree S

x

z

a1 c1 b1 d1a2 c2

Gene Tree on G

a1 c1 b1 d1a2 c2

G with evolutionary history

x,Σ

y,Δ

y,Σ z,Σ D,Θ

A B C D

x

y

z

Duplication

Loss

Transfer

Page 5: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Definition: DTL Reconciliation

A B C D

y

Species Tree S

x

z

a1 c1 b1 d1a2 c2

Gene Tree on G

a1 c1 b1 d1a2 c2

G with evolutionary history

x,Σ

y,Δ

y,Σ z,Σ D,Θ

Input: A gene tree for that gene family, and a trusted rootedspecies tree.Output: An evolutionary history of that gene family showinghorizontal gene transfers, gene duplications, losses, and speciationevents.

Page 6: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

DTL Reconciliation Problem Formulation

Parsimony formulation:

I Costs are assigned to duplications, transfers, and losses.

I Goal: Find the reconciliation that minimizes the total cost.

I Easy to compute cost for a given reconciliation.

a1 c1 b1 d1a2 c2

x,Σ

y,Δ

y,Σ z,Σ D,Θ

Costs: L=1, Δ=2, Θ=3

2 + 3 + 2x1 = 7

1Δ, 1Θ, 2L

Different reconciliation could have different cost.

Page 7: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Applications of DTL Reconciliation

A B C D

y

Species Tree S

x

z

a1 c1 b1 d1a2 c2

Gene Tree on G

a1 c1 b1 d1a2 c2

G with evolutionary history

x,Σ

y,Δ

y,Σ z,Σ D,Θ

I Understanding how gene families evolve.

I Dating gene birth.

I Inferring orthologs/paralogs/xenologs.

I Gene tree error-correction.

I Whole genome species tree construction.

I Constructing species phylogenetic networks.

Page 8: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Applications of DTL Reconciliation

A B C D

y

Species Tree S

x

z

a1 c1 b1 d1a2 c2

Gene Tree on G

a1 c1 b1 d1a2 c2

G with evolutionary history

x,Σ

y,Δ

y,Σ z,Σ D,Θ

I Understanding how gene families evolve.

I Dating gene birth.

I Inferring orthologs/paralogs/xenologs.

I Gene tree error-correction.

I Whole genome species tree construction.

I Constructing species phylogenetic networks.

Page 9: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Background: Computing Optimal DTL Reconciliations

For undated species tree:I Best time-consistent reconciliation: NP-hard (Tofigh et al.,

2011; Ovadia et al., 2011)I If time-consistency not enforced: O(mn)-time algorithm

(Bansal et al., 2012)

For dated species trees:I Best time-consistent reconciliation: O(mn2)-time algorithm

(Doyon et al., 2010)

DTLI model:I Considers ILS at unresolved species tree nodes (Stolzer et al.,

2012)

A B C D

Species Tree S

a1 c1 b1 d1a2 c2

Gene Tree G

g sΣ

Page 10: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Background: Handling Multiple Optima

I There can be many optimal DTL reconciliations.

110

1001000

10000100000

100000010000000

1000000001E+091E+101E+111E+121E+131E+141E+151E+16

0 20 40 60 80 100 120 140 160 180 200

Num

ber o

f Sol

utio

ns

Gene Tree Size

Number of Optimal Solutions by Gene Tree Size

I Enumeration: Exponential in input size (Tofigh. et al., 2011; Chenet al., 2012)

I Uniform random sampling and aggregation: O(mn2)-time (Bansalet al., 2013)

I Compact representation in reconciliation graph: O(mn3)-time(Scornavacca et al., 2013)

I Median reconciliation (Nguyen et al., 2013)

Page 11: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Background: Gene Tree Error-Correction

Gene tree construction is highly error-prone. Garbage in, garbageout.

I First-attempts: Mowgli-NNI (Nguyen et al., 2012), AnGST(David and Alm, 2011)

I New methods: TreeFix-DTL (Bansal et al., 2015), ALE(szollosi et al. 2013) ), TERA (Scornavacca et al., 2015))

0

norm

alize

d RF

dist

ance

0.02

0.04

0.06

0.08

0.10

0.12

lowererror

0.14 B

rate 1 rate 3 rate 5 rate 10

RAxMLNOTUNG

Treefix

MowgliNNIAnGSTTreeFix-DTL

varying substitution rate

low-DTL medium-DTL high-DTL

A varying DTL rate

RAxMLNOTUNG

Treefix

MowgliNNIAnGSTTreeFix-DTL

sequence length 173 sequence length 333

RAxMLNOTUNGTreefix

MowgliNNIAnGSTTreeFix-DTL

C varying sequence length

Page 12: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Background: Event Cost Assignment

Fundamental questions:

I What are the “best” event costs to use?

I How do reconciliations vary as we change event costs?

Algorithm to partition event cost space into equivalence regionsbased on pareto-optimality of event counts: O(m5n logm)-time(Libeskind-Hadas et al., 2014)

Tra

nsfe

r co

st r

elat

ive

to d

uplic

atio

n

5

4

3

2

1

54321

<6, 1, 2, 3> Count = 3<6, 0, 3, 1> Count = 2<4, 0, 5, 0> Count = 8<4, 1, 4, 0> Count = 2<6, 2, 1, 5> Count = 1<5, 4, 0, 10> Count = 1

Loss cost relative to duplicationT

rans

fer

cost

rel

ativ

e to

dup

licat

ion

Loss cost relative to duplication1 2 3 4 5

1

2

3

4

5(a) (b) (c)

Page 13: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Constructing Phylogenetic Networks

I Microbial phylogenetic networks are distinct fromhybridization networks.

Problem formulationInput: A collection of gene families (sequence alignments) and areference species tree.Output: Species tree augmented with horizontal edges(representing transfer events), and labels for each vertical andhorizontal edge specifying the genes that traveled through it.

Gene Trees

Species Tree

Genes ...

Genes ...

Genes ... Genes ...

Genes ...

Page 14: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Constructing Phylogenetic Networks

Advantages of using a reference species tree:

I Improved complexity and scalability.

I Inferred network does not depend on reference tree.

I Customizable and easily interpretable network view.

Basic implementation:

1. Infer gene trees.

2. Use DTL reconciliation to reconcile individual gene trees withreference tree.

3. Aggregate transfers inferred for gene trees onto species tree;e.g., NOTUNG.

Page 15: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Challenges due to reconciliation uncertainty

I Multiple optima, gene tree error, and event cost assignmentconfound reconciliation accuracy.

73.15

0.90 1.29

19.10

5.55

0

10

20

30

40

50

60

70

80

90

100

100 [90, 99] [80, 89] [50, 79] [0, 49]

Frac

tion

of n

odes

(%)

Consistency (%)

Page 16: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Challenges due to reconciliation uncertainty

I Multiple optima, gene tree error, and event cost assignmentconfound reconciliation accuracy.

% p

reci

sion

020

6010

0

80 81 8194

% s

ensi

tivity

020

6010

0

8189 90

98

% p

reci

sion

020

6010

0

3871 76

96

% s

ensi

tivity

020

6010

0

68 68 73

92

RAxML AnGST TreeFix-DTL True

Duplications Transfers Losses

% p

reci

sion

020

6010

0

31

73 7685

% s

ensi

tivity

020

6010

0

7687 87 92

Page 17: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Possible Solution

Proposal: Distinguish between highly-supported andweakly-supported events across multiple optima, multiple eventcosts, and gene tree topologies.

I Easy to do for multiple optima and event costs based ondeveloped algorithms.

I No such algorithms for gene tree error.

Goal: Develop algorithms to generate alternative gene treetopologies and study variability of reconciliation across them.

Page 18: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Optimal Gene Tree Resolution (OGTR)

Input: A non-binary gene tree GN , a species tree S , and eventcosts.Output: Find a binary resolution GB of GN such that, the mostparsimonious DTL reconciliation of GB and S has smallestreconciliation cost.

I Provides rigorous framework for uniformly sampling from allcandidate gene trees.

I Non-binary gene tree obtained by collapsing weak edges.I Fundamental question for DTL reconciliation.

A B C D

Species Tree SGene Tree

A AC B C D A B C D

y

Species Tree S

x

z

Σ

Θ

Gene Tree

A AC B C D

Σ

Σ

Θ

(a) (b)

GN GB

(b)

Page 19: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

OGTR is NP-hard

I OGTR is NP-hard for both undated and dated species trees(Kordi and Bansal, 2015).

I Surprising since problem is linear-time solvable underduplication-loss model (Zheng and Zhang, 2014).

Key Ideas:

I Reduction from minimum 3 set cover problem.

I Subtrees in species tree correspond to sets

I Subtrees in unresolved gene tree correspond to elements ofthe universe.

I Structure of gadget forces root of each subtree (element) tomap to a “set” in which that element occurs.

I Using more sets for the mapping results in higherreconciliation cost.

Page 20: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

OGTR is NP-Hard

Page 21: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

A Fixed-Parameter Algorithm for OGTR

I DP algorithms for binary gene trees can be extended to workwith non-binary gene trees.

I Consider all possible resolutions at each non-binary gene treenode and fill DP-table for subproblem with best score over allresolutions.

I Gives O(2k · k! · ln + mn) and O(2k · k! · ln + mn2)-timealgorithms for dated and undated species trees, respectively

A B C D

Species Tree S

a1 c1 b1 d1a2 c2

Gene Tree G

g s15

Page 22: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

A Fixed-Parameter Algorithm for OGTR

(a) (b)

(c) (d)

20 40 60 80 100 12 140 0 500

1000 1500

2000 2500

30003500

40004500

5000

2 5 10 15 20 25 30 35 40

3 4 5 6 7 83-8 -6 -4 -2 0

2 4 6 8

10

3 4 5 6 7 8

6

8

10

12

14

16

18

20

(

1122

3344

5

Maximum out-degree

4500 5000

4000 3500 3000

2500 2000 1500

1000 500

0 0

(b

Percentage of non-binary nodes

50% 80%

30 35

50% 80%

Maximum out-degree Maximum out-degree

50

050505

105

45

4332211

Page 23: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Uniform Sampling of Optimal Resolutions

I DP framework allows for sampling optimal resolutionsuniformly at random.

I Samples can be combined with event cost and multipleoptima analyses to assess robustness of individual events andmappings.

I Enables identification of highly- and weakly-supported eventsand mappings for network construction.

Page 24: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Future Directions: Assigning Weakly-Supported Events

1. Leverage high-support events to improve assignment of allother events:

I Use highly-supported events to infer highways of transfers.I Improve assignment of other events based on inferred

highways.

2. Use gene tree and species tree branch lengths to choosebetween alternative scenarios for low-support events.

Page 25: Deciphering Reticulate Evolution Using Phylogenetic ... · Deciphering Reticulate Evolution Using Phylogenetic Reconciliation Mukul S. Bansal Department of Computer Science and Engineering,

Thank You!

Questions!