34
The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution 2011 Jun 17-21, 2011

The iPlant Tree of Life Project and Toolkit

Embed Size (px)

DESCRIPTION

The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science ResearchGiven at Evolution 2011An overview of the iPlant and iPToL project

Citation preview

Page 1: The iPlant Tree of Life Project and Toolkit

The iPlant Tree of Life Project and Toolkit: Building a

Cyberinfrastructure for Plant Science Research

Naim MatasciThe iPlant Collaborative

Evolution 2011

Jun 17-21, 2011

Page 2: The iPlant Tree of Life Project and Toolkit

What is iPlant?

Page 3: The iPlant Tree of Life Project and Toolkit

Discovery Environment

NEW RELEASE COMING SOON!

NEW RELEASE COMING SOON!

http://www.iplantcollaborative.org/discovery-environment-preview-access

Page 4: The iPlant Tree of Life Project and Toolkit

4

Page 5: The iPlant Tree of Life Project and Toolkit

Physical Infrastructure

Computation•63K cores cluster•20K cores cluster •1 TB RAM

Storage•2 PB •20 PB archive

Page 6: The iPlant Tree of Life Project and Toolkit

Cloud Storage

• Store, access and share large datasets

• Multiple points of entry: web interface, mounted FS, API

• Free and secure

AVAILABLE NOW!

AVAILABLE NOW!

http://www.iplantcollaborative.org/about/policies/data-set-hosting

Page 7: The iPlant Tree of Life Project and Toolkit

Cloud Computing

• Virtual Machines– Up to 4 cores, 32 GB RAM,

100 GB dedicated disk– Run any x86-compatible OS

(even Windows)– Persistent or on-demand– Log in via SSH or secure VNC

• Use Cases– Internet-enabled Servers– Database management

appliances– Virtual desktops– …The sky is the limit!

AVAILABLE NOW!

AVAILABLE NOW!

http://www.iplantcollaborative.org/atmosphere-preview

Page 8: The iPlant Tree of Life Project and Toolkit

Consumer Applications

8

iPlant's CI

Page 9: The iPlant Tree of Life Project and Toolkit

iPlant Tree of Life Grand Challange

Large phylogenetic inferenceBuilding a tree of life for up to 500,000 green plants

Tree VisualizationScalable visualization for small to large trees

Data Assembly and IntegrationAcquisition, organization and processing the data

Taxonomic IntelligenceSorting out different names for the same species

Tree ReconciliationResolving discordant gene and species trees

Trait EvolutionUsing trees to understand how traits evolved

Page 10: The iPlant Tree of Life Project and Toolkit

Big TreesTo optimize existing methods to construct phylogenetic trees in the order of 500K taxa.

Page 11: The iPlant Tree of Life Project and Toolkit

Big Trees

NINJA/WINDJAMMER (Travis Wheeler)Neighbor-Joining implementation that can analyze > 200K species

Six day run time reduced 32-fold to 4.5 hours for 220K species data set

Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set

RAxML-Light (Alexandros Stamatakis)

Large Scale Maximum Likelihood implementation

55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414)

AVAILABLE NOW!

AVAILABLE NOW!

Page 12: The iPlant Tree of Life Project and Toolkit

Tree VisualizationTo develop an application for viewing, analyzing and exploring large phylogenetic trees.

Page 13: The iPlant Tree of Life Project and Toolkit

Tree Visualization

• > 500K Taxa• Fast• Web based, platform independent• Semantic zooming• Metadata driven display of information

Page 14: The iPlant Tree of Life Project and Toolkit

iPlant Tree Viewer Prototype

AVAILABLE NOW!

AVAILABLE NOW!

http://portnoy.iplantcollaborative.org/

Page 15: The iPlant Tree of Life Project and Toolkit

1KPCollaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project

Page 16: The iPlant Tree of Life Project and Toolkit

1KP

unexplored territory

N(g

enes

)

dozens of species completed genomes

N(species)

dozens of genes PCR in 104 species

Page 17: The iPlant Tree of Life Project and Toolkit

Broad phylogenetic coverage

algae non-flowering flowering (angiosperm)

on role of polyploidy in

Darwin’s “abominable

mystery”

Phylogenomics of 1000 species across plant taxa

Page 18: The iPlant Tree of Life Project and Toolkit

Tree ReconciliationTo reconcile the evolutionary history of genes and species.

Page 19: The iPlant Tree of Life Project and Toolkit

Gene family data courtesy John Bowers

Tree Reconciliation

Page 20: The iPlant Tree of Life Project and Toolkit
Page 21: The iPlant Tree of Life Project and Toolkit

Taxonomic Name ResolutionCollaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names.

Page 22: The iPlant Tree of Life Project and Toolkit

Taxonomic uncertainty

1. Non-existent names• Misspellings• Contamination

• Annotations• Morphospecies• Digitization issues (frame shifts, character encoding)Lexical

variants (digitization conventions)2. Synonymy

• Nomenclatural synonyms• Taxonomic synonyms / concepts

3. Misidentifications, incomplete identifications

Page 23: The iPlant Tree of Life Project and Toolkit
Page 24: The iPlant Tree of Life Project and Toolkit

AS SEEN IN NATURE!

AS SEEN IN NATURE!

AVAILABLE

NOW!AVAILABLE

NOW!

Page 25: The iPlant Tree of Life Project and Toolkit
Page 26: The iPlant Tree of Life Project and Toolkit

Taxonomic Name Resolution Service

• Computer assisted standardization of plant names

• Corrects spelling errors and alternative spellings to a standard list of names

• Convert out-of-date names to currently accepted names

Page 27: The iPlant Tree of Life Project and Toolkit

Trait EvolutionTo develop an infrastructure for downstream analysis of large trees.

Page 28: The iPlant Tree of Life Project and Toolkit

Trait Evolution

• Toolkit to study the evolution of traits of interest on very large phylogenies– Diversification– Biogeographic patterns– Adaptation– Co-evolution – …

Page 29: The iPlant Tree of Life Project and Toolkit

Current analyses (Proof of concept)

• Phylogenetically Independent Contrasts(Felsenstein 1985)

• Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004)

• Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004)

Page 30: The iPlant Tree of Life Project and Toolkit

Community Integrated (2 ½ Days Workshop)

Page 31: The iPlant Tree of Life Project and Toolkit

My-Plant.orgTo easily share information and research, collaborate, and stay on top of the latest news in the field.

Page 32: The iPlant Tree of Life Project and Toolkit

Collaborative Tool

AVAILABLE NOW!

AVAILABLE NOW!

NEW AND

IMPROVED!NEW AND

IMPROVED!

http://my-plant.org/

Page 33: The iPlant Tree of Life Project and Toolkit
Page 34: The iPlant Tree of Life Project and Toolkit

http://www.iplantcollaborative.org