CIPRES Software architecture/development Focus Leader: Mark Holder (FSU) Architecture:Wayne Maddison...

Preview:

Citation preview

CIPRES Software architecture/development

Focus Leader : Mark Holder (FSU)

Architecture: Wayne Maddison (UBC)Mark Holder (FSU) David Swofford (FSU)

Implementation:Project manager: Mark Miller (SDSC)

Chief programmers:Mark Holder (FSU)Terri (Liebowitz) Schwartz (SDSC)

Other programmers:Alex Borchers (SDSC) Zhijie Guan (SDSC)Tim McPhillips (SCSC)

Other contributors: Paul Lewis (UConn), Yu Fan (UConn), David Maddison (Arizona), Peter Midford (UBC), Rutger Vos (UBC), Tandy Warnow (UT), Bernard Moret (UNM)

The Grand Goal:Phylogeny of all Life

www.tolweb.org

The Individual Goal:Phylogeny of My Group

The Grand Goal will be achieved by combining efforts

The Grand Goal will be achieved by combining efforts

Combining data: Supermatrices

The Grand Goal will be achieved by combining efforts

Combining data: Supermatrices

Combining results: Supertrees

The Grand Goal will be achieved by combining efforts

Combining data: Supermatrices

Combining results: Supertrees

Combining programming: Supertools

As data grow, computational demand grows

Problem 1: How to make analytical tools that can handle trees of 100,000 taxa X 100,000 characters?

Solution?: Hire 25 programmers to write the Monster App For Phylogenetics

No: A short term strategy. Better to build a foundation that will enable a worldwide community of programmers to contribute

Solution?: Change our mode of software development

Current analytical tools:

PAUP*, MrBayes, PHYLIP, TNT: ≤ 3 authors

Future analytical tools:

Need community programming, with shared effort and rapid incorporation of new algorithmic ideas

The workflow of a phylogenetic study involves many data, many analyses, many results

Problem 2: How to manage information?

HabronattusPellenesBianorPlexippus...

Samples

Preservation

Sequencing

Alignment

tissues

alignedsequences

Tree inference—choice of method—model inference—search strategy

Implications of trees— character evolution— speciation/extinction— coevolution

Phenotypic observation

sequences

Coding

matrix

datamatrix

Publish!

Data storage (e.g. NEXUS, database)

Tree storage (e.g, NEXUS, database)

trees

Clustal,etc.

Sequencher, etc.

PAUP*, MrBayes, PHYLIP, TNT, etc.

Discrete, MacClade, Mesquite, etc.

Workflow for phylogenetic analysis

HabronattusPellenesBianorPlexippus...

Samples

Preservation

Sequencing

Alignment

tissues

alignedsequences

Genbank

Tree inference—choice of method—model inference—search strategy

Implications of trees— character evolution— speciation/extinction— coevolution

trees

Databases

Phenotypic observation

sequences

Coding

matrix

datamatrix

Specimen database

Publish!

TreeBASE

Data storage (e.g. NEXUS, database)

Tree storage (e.g, NEXUS, database)

matrix

HabronattusPellenesBianorPlexippus...

Samples

Preservation

Sequencing

Alignment

tissues

alignedsequences

Genbank

Tree inference—choice of method—model inference—search strategy

Implications of trees— character evolution— speciation/extinction— coevolution

trees

Information transfer

Phenotypic observation

sequences

Coding

matrix

datamatrix

Specimen database

Publish!

TreeBASE

Data storage (e.g. NEXUS, database)

Tree storage (e.g, NEXUS, database)

matrix

Footnote

Programs serve so many functions, CIPRES can't possibly make major improvements for all of these software needs

Solutions for information management

A communications infrastructure can mediate information transfer between programs

Databases can store data, methods, results

CIPRES architecture, first goal: to build a communications infrastructure

Modules (programs) communicate data, commands and results

Modules depend on each other for services

Modules can be on different machines, in different languages

CIPRES modular architecture

Tree improver

Tree evaluator

Coordinator

Tree decomposer

Tree Merger

Front EndDCM (highly simplified)

Why start by building a modular system (i.e., a communications infrastructure)?

— handles information transfer among modules

— provides flexibility of analysis

— facilitates quick implementation of new algorithms

— facilitates distributed processing

— via API, defines a common language of phylogenetic communication

— engages/creates broad community of programmers

CIPRES Communication

Tree improver

Tree evaluator

Coordinator

Tree decomposer

Tree Merger

Front Endmulti-platformmulti-languageGUI <—> command translationsnapshottingerror handlinglogging

And how far have we come, exactly?