Conducting efficient tree searches - Harvard...

Preview:

Citation preview

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Conducting efficient tree searches

• Algorithms– Hill climbing “traditional” algorithms

• SPR, TBR

• Ratchet

• Optimizing branches in ML analyses

– Genetic algorithms

– Divide and conquer algorithms

– Simulated annealing algorithms

• Strategies– Tree buffers

– Constrained searches

– Previous searches “pre-processed searches” or “jumpstartingphylogenetics”

– SATF (sensitivity analysis tree fusing)

Parsimony ratchet (island hopper)

1. Generate a starting tree (e.g. a Wagner tree followed by somelevel of branch swapping)

2. Re-weight a randomly selected subset of characters (e.g. givea weight of 2 to 50% of the characters, and 1 to the remaining50%)

3. Search on the current tree (holding only one tree). Any kind ofswapping strategy may be used.

4. Re-weight all the characters back to the original weights, andswap on the tree found in step 3.

5. Return to step 2 to begin another iteration starting with the treefound in step 4. Continue this cycle for N iterations (e.g., 20,50, 100…)

Nixon, K. C. 1999 The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics 15,

407-414.

Vos, R. A. 2003. Accelerated likelihood surface exploration: the likelihood ratchet. Systematic Biology

52: 368-373.

• Tree-Fusing (TF)• Exchanges subgroups (e.g. 5 taxa) between different trees. The

subgroups must be of identical composition.

1. Obtain several trees via some sort of tree search

2. Randomly select a tree (the “target” tree)

3. Randomly select one of the remaining trees (the “source” tree)

4. Evaluate the results of moving each clade in the source tree to the target tree

5. Repeat several times (“rounds”) (e.g. 3 to 5)

• 10 RAS + TBR + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Moilanen, A. 2001. Simulated evolutionary optimization and local search: Introduction and application to tree

search. Cladistics 17: S12-S25.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Lemmon, A. R. & Milinkovitch M. C. 2002. The metapopulation genetic algorithm: An efficient solution for the

problem of large phylogeny estimation. Proc. Natl. Acad. Sci. USA 99: 10516-10521.

• Sectorial Searches (SS)

• Need a tree as starting point and reanalyze sectors separately. Sectors can be

selected randomly or based on a consensus

– Random Sectorial Searches (RSS)

– Consensus-Based Sectorial Searches (CSS)

– Mixed Sectorial Searches (MSS)

• RAS + TBR + SS

• Tree-Drifting (DFT)

• Accepts suboptimal solutions with a certain probability (simulated annealing)

• Combined strategies: RAS + TBR + SS + DFT + TF

Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics

15: 415-428.

Goloboff, P. A. 2002. Techniques for analyzing large data sets. In R. DeSalle, G. Giribet and W. Wheeler (eds),

Techniques in Molecular Systematics and Evolution. Brikh隔ser Verlag, Basel, pp. 70-79.

Roshan, U., T. Warnow, B. M. E. Moret and T. L. Williams. 2004. Rec-I-DCM3: a fast algorithmic technique for

reconstructing large phylogenetic trees In Proceedings of the 2004 IEEE Computational Systems

Bioinformatics Conference (CSB 2004): 12.

• Tree buffers

• Constrained searches

• “Pre-processed searches”

• Sensitivity Analysis output + TF

– Generate a diversity of cladograms under different parameters/models

(don’t need to be full searches)

– Collect all trees in a file

– Submit the trees to tree fusing and other refining algorithms

• Other strategies: bootstrapping or jackknifing trees

Strategies

Multiple trees or multiple hits?

• Driven searches

– Minimum number of hits to optimal trees

– Achieving a stable consensus

– Consensus techniques

Software implementations

• Ratchet: WinClada, TNT, POY, PAUP scripts:PRAP or PAUPRat

• Tree Fusing: TNT, POY, MetaPIGA

• Tree Drifting: TNT

• Sectorial Searches/DCM: TNT, POY, Rec-I-DCM3

• Constrained searches: Most softwarepackages

• Driven searches:– Hits to optimal trees: TNT, POY

– Stabilize consensus: TNT

Recommended