27
www.iplantcollaborative.org [email protected] The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve Goff iPlant Collaborative, BIO5 Institute School of Plant Science University of Arizona

Www.iplantcollaborative.org [email protected] The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

Embed Size (px)

Citation preview

Page 1: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

The iPlant CollaborativeIBP Annual Meeting – June 1st 2011

Steve Goff

iPlant Collaborative, BIO5 Institute

School of Plant Science

University of Arizona

Page 2: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

What is iPlant?• iPlant’s mission is to build the CI to support plant

biology’s Grand Challenge solutions

• Phase I – Community Input

• Phase II – Building the CI Foundation

• Next Phase – Enabling Plant Science Discovery

Now need to integrate workflows and test theories

Will support tool integration and synthesis activities

Page 3: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

NSF Cyberinfrastructure Vision

• High Performance Computing• Data and Data Analysis• Virtual Organizations• Learning and Workforce

Ref: “Cyberinfrastructure Vision for 21st Century Discovery”, NSF Cyberinfrastructure Council, March 2007.

Page 4: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

CI for Plant Science: Observations • Investment in data creation is high • Sources of data are disparate.• Investment in existing tools is significant• Tools shouldn’t be discarded• Tools shouldn’t be reproduced, but lack:– Interoperability w/other tools 

–Data standards        

–Scalability  

–Consistency of interface access & use  

–Experimental reproducibility

Page 5: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

iPlant is a process and a platform

(or set of platforms, depending on your point of view).

Page 6: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

Computational & Storage Capability– Compute: Ranger, Lonestar, Stampede (UT/TeraGrid) Saguaro, Sonora

(ASU) Marin, Ice (UA) • ~700 Teraflops

– Storage: Corral, Ranch (UT), Ocotillo (ASU) • > 10 Petabytes of storage available for the project

– Visualization: Spur, Stallion (UT), Matinee (ASU), UA-Cave• Among the world’s largest visualization systems

– Virtualized/Cloud Services: iPlant, TeraGrid, vendor clouds• Cloud tech to deliver persistent gateways and user services

Thanks to large-scale NSF investments, iPlant has excellent CI access

Page 7: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

BenchBiologist

s

APIsAPIs Data

Algorithms

DiscoveryEnvironment

Data Store Atmosphere

Computational Biologists

Semantic Web Layer

iPlant Cyberinfrastructure

Page 8: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

Overview of Components

• iPlant Discovery Environment - Core Software • iRODS Integration – Core Services• Atmosphere Cloud – Core Services• Semantic Web Tech – SSWAP Team• iPlant Tool/Workflow API – Core Software &

Engagement Teams

Page 9: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

DiscoveryEnvironment

DNASubway

3rd Party Science

Gateways

User Scripts &Applications

Public APIs

Low-Level Services

Event I/O Data Apps Job Profile Auth

CondorPBSSGFLSFLL

iRODS

MySQL

LDAP

Eucalyptus

ActionFolders

Shibboleth

Globus/Unicore

GPIR

MyProxy XSEDE

iPlant Hardware Resources

High Perf Computing Databases Storage Cloud Systems

Semantic Web

Page 10: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

iRODSIntegrated Rule-Oriented Data System

www.irods.org• Why iRODS?

– Large data storage in simple format

– Sharing of large data among iPlant CI Resources

– Sharing of large data with colleagues and collaborators

– Processing large data with TACC resources

• General information on iRODS: www.irods.org

• Access iPlant’s iRODS: irodsweb.iplantcollaborative.org

• Documentation: https://pods.iplantcollaborative.org/wiki/display/systems/iRODS

Page 11: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected] 11

AtmosphereiPlant’s Cloud Computing Resources

http://atmosphere.iplantcollaborative.org

• Tutorial: https://pods.iplantcollaborative.org/wiki/display/atmosphere/Demo+with+picture+walkthrough

• Why Atmosphere?

–Use a virtual machine (VM) with preinstalled software

–Create a VM to install complex software

–Create and share an image of a VM (VMI)

–Mount data from iPlant iRODS for use by your VM

Page 12: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected] 12

Semantic Webhttp://www.iplantcollaborative.org/communities/developers/semanticweb

•Why Semantic Web Technology?–Provides a means for web-services to

communicate and be aware of one another

iPlant Consumer

Semantic Web

Remote Service

User-Created Service in

Atmosphere

Semantic Web

iPlant’s Discovery

Environment

iPlant Service

Semantic Web

Remote Consumer

Page 13: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

iPG2P: From Genotype to Phenotype

• Visual Analytics– R. Grene and G. Abram: Information Visualization Tools capable of

displaying diverse types of data from laboratory, field, in silico analyses and simulations

• Data Integration– D. Ware and C. Jordan: Methods for describing and unifying data sets

into systems that support iPG2P activities• Statistical Inference

– D. Kliebenstein and E. Buckler: Platform for using advanced computational approaches to statistically link genotype to phenotype

• Modeling Tools– J. White, C. Myers, S. Welch : Framework for the construction,

simulation and analysis of computational models of plant• Ultra High Throughput Sequencing

– T. Brutnell and M. Vaughn: HPC resources and applications to process large-volume sequence data

Page 14: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

Genome Services

Ultra High-Throughput Sequencing

Scalable computing

Data• NCBI SRA• Desktop

• AmazonS3• FTP

• HTTP

Data Wrangling• Quality Control• Preprocessing

• Rescaling• Barcoding

Alignments• BWA

• TopHat

Cufflinks

SAMTools

SAM Alignments

ExpressionLevels

(RPKM)

Genome Variants(VCF3.3)

Community Use CasesExpression studies

Forward genetic screensAssociation studies

Page 15: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

High Throughput Image Analysis

Scope: Enable image-based plant sciences research by incorporating image processing algorithms, grid computing, and databasing into an analysis pipeline

Objectives

1. Integrate Phytomorph and BISQUE as PhytoBisque2. Broaden access to algorithms that benefit the community

3. Automate workflows so that plant biologists need not be computer scientists

Storage

Authentication

APIs

Compute cluster

E. Spalding @ U of Wisconsin, B.S Majunath and K. Kvilekval @ UCSB

Page 16: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

Phytobisque: Example Use CaseGiven a flatbed scanner image of Arabidopsis seeds, measures the length, width, and area and produce a population estimate for each trait

Seed trait QTL can be mapped when applied to mapped populations like Ler x CVI

Page 17: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

Basic QTL/GWAS analysis• R/Qtl, QTLcartographer, et al.

• Community can integrate these into the CI

Iterative analyses• iPlant workflow

management simplifies automation

• Compare methods!

Exploratory methods• Hand-built R, Python,

SAS, C codes• Easy integration into

iPlant CI via API • Adopt common data

model

Scalability Challenges: High-density markers, large

populations, combinatorial analyses

• iPlant-authored parallel GLM (etc) implementations

• Common data model• Utilize workflow framework

A Strategy for Association

Studies

Page 18: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

• Simplest case*: a few minutes using GLM on desktop TASSEL

• 1000-replicate bootstrap: 75-150 hours / trait

• Runtimes only gets larger (days to years) for more complex analyses

* One trait x 40 million markers with no bootstrapping or epistasis testing

Statistical Inference: Scalable GLM

6 traits of interest

40 million markers in maize NAM

1000 replicate analyses

Epistasis testing

X X

Genotype Phenotype

ANOVA

Page 19: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

GPU-based QTL Mapping

• Aspects of the problem are highly parallel• Re-architect data flow and mapping algorithms for GPU

architecture• Interface for C and GPU implementations will be identical

Ali Akoglu and Dave Lowenthal, UArizonaAlignment-based protein searches sped up 6-10x

Page 20: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

iPlant Tree of Life (iPToL)

Large phylogenetic inferenceBuilding a tree of life for up to 500,000 green plants

Tree VisualizationScalable visualization for small to large trees

Data Assembly and IntegrationAcquisition, organization and processing the data

Taxonomic IntelligenceSorting out different names for the same species

Tree ReconciliationResolving discordant gene and species trees

Trait EvolutionUsing tree to understand how traits evolved

Page 21: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

Phyloviewer: visualization of large phylogenetic trees

21

Page 22: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

My-Plant

• Social networking for plant biologists

• Organized by clade

• Used to organize the data collection for the “big tree”

Page 23: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

Taxonomic Name Resolution Service

Page 24: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected]

Integration of New Tools w/o Programming

This part is done!!!

This part is coming soon!

Page 25: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

Related Activities

Integrated Breeding Platform Social networking portal for plant breeders R analysis packages Breeders fieldbook

1kp (1,000 plant transcriptomes) DOE’s Knowledgebase (Kbase) Seed projects Elixir CoGe

Page 26: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

Future Workshop Activities Small tool/workflow integration meetings

2-3 days each, 10-20 local participants 4-5 meetings starting in June 2011

Addressing specific biological questions With appropriate test data and available software

Building on iPlant’s cyberinfrastructure Complementary tools and additional data access Preference for broad use, high impact tools & workflows Can be kept private until published Positive results will stimulate additional support

Page 27: Www.iplantcollaborative.org sgoff@iplantcollaborative.org The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve

www.iplantcollaborative.org [email protected] 27

iPlant’s Building Blocks

27

Metadata Data Tools Workflows Viz

Executive Team:Steve GoffDan Stanzione

Staff:Greg AbramVictoria BryanRion DooleyAndy EdmondsJuan Antonio Raygoza GarayKarla GendlerDamian GesslerCornel GhibanMichael GonzalesHariolf HäfeleMatthew Helmke

Faculty Advisors:Greg AndrewsKobus BarnardSusan BrownVicki ChandlerJohn HartmanNirav Merchant

Students:Storme BriscoeSteven GregoryMonica LentBansri PoduvalPavithra RaviShannon WermesJill Yarmchuk

Sudha RamAnn StapletonLincoln SteinDoreen WareSue WesslerRamin Yadegari

Natalie HenriquesUwe HilgertNicole HopkinsLisa HowellsKathleen KennedyMohammed KhalfanSeung-jin KimAdam KubachSangeeta KuchimanchiTina LeeAndrew LenardsSonya Lowry

Jerry LuEric LyonsNaim MatasciSheldon McKayDave Micklos Andy MuirMartha NarroChristos NoutosDennis RobertsBernice RogowitzJerry SchneiderBruce Schumaker

Edwin SkidmoreSriram SrinivasanMary Margaret SprinkleMatthew VaughnLiya WangSharon Wei Jason WilliamsFrank WillmoreJohn WregglesworthWeijia Xu