The iPlant Collaborative Community Cyberinfrastructure for Life Science Arthropod Genomics Research...

Preview:

Citation preview

The iPlant Collaborative Community Cyberinfrastructure for Life Science

Arthropod Genomics Research in ARS Workshop  

Jason Williams / @JasonWilliamsNY

Cold Spring Harbor Laboratory, iPlant

Goals for today’s talk

• Begin the process of adopting /adapting iPlant to build your own community capacity

• Learn about what you hope iPlant may be able to offer

• Highlight existing capabilities of the platform

• Explain some of the context and rationale behind iPlant

The iPlant CollaborativeVision

Enable life science researchers and educators to use and extend iPlant's foundational cyberinfrastructure to understand and ultimately predict the complexity of biological systems and their dynamic nature under various environmental conditions.

The iPlant CollaborativeWhat is Cyberinfrastructure?

The iPlant CollaborativeWhat is Cyberinfrastructure?

Platforms, tools, datasets Storage and compute Training and support

The iPlant CollaborativeWhat problems can iPlant Solve?

Crops and model plant systems Animal and livestock Agronomic microbes, insects…

The iPlant CollaborativeWhat problems can iPlant Solve?

iPlant is built for Data

The iPlant CollaborativeHow was iPlant built?

The iPlant CollaborativeLandscape of community identified priorities

Genomic data and analysis:• Reference guided assembly• De novo assembly• RNA-Seq (expression; gene/isoform discovery)• Variant calling• Genome/Transcriptome annotation• ChIP-Seq/Integration of epigenetic information• Multiple sequencing platforms• New and evolving technologies

The iPlant CollaborativeLandscape of community identified priorities

Genotypic Environmental Phenotypic

Comparative Genomics

Sequencing & Assembly

Annotation

Environmental datasets

Climate model products Image-based

Phenotyping

Molecular Phenotyping

Trait Data

In planningIn progressFoundation in placeEvolutionary

ModelsEcological

Models

Association Studies

PathwayAnalysis

iPlant is a collaborative virtual organization

The iPlant CollaborativeWho makes up iPlant?

The iPlant CollaborativeHow is iPlant funded?

Funded by NSF

• First funding ($50 Million) in 2008

• Renewal funding ($50.3 Million) in 2013

o Scientific Advisory Boardo Focus on Genotype-Phenotype scienceo NSF Recommended expansion of scope

beyond plants

Ultracentrifuge - Electrophoresis

Cycle sequencing – HTS

~20 years

Technology…Transition…

Enablement…

The iPlant CollaborativeWhat a unified platform gets you

• Ability to access and manage data• Software to analyze data• Computing resources• Skills and help to use software and interpret results

Get Science Done

The iPlant CollaborativeWhat a unified platform gets you

• Metadata management• Ability to share data and workflows• Open source sustainable tools

Reproducibility

The iPlant CollaborativeWhat a unified platform gets you

• High-performance and scalable computing• Ability automate and collaborate• Funding spent on science, not software or hardware

Productivity

The iPlant CollaborativeSupport for a diverse user base

Bioinformatics Users:

• Easy-to-use tools/interfaces (little or no command-line)

• Generous data storage, end-to-end workflows

• Access to training and support

The iPlant CollaborativeSupport for a diverse user base

Bioinformaticians:

• (More) access to HPC

• Make tools and algorithms more accessible to users

• Better ways to manage large-project metadata

The iPlant CollaborativeSupport for a diverse user base

Bioinformatics Engineers (community/core support):

• Ways to scale support for community or institutional users

• Optimization of software

• Shared data storage and user portals

The iPlant CollaborativeProducts

What do you get with your account?

The iPlant CollaborativeProducts

• We strive to be the CI Lego blocks• Danish 'leg godt' - 'play well’• Also translates as 'I put

together' in Latin• If a solution is not available you

can craft your own using iPlant CI components

iPlant Data Store

Initial 100 GB allocation – TB allocations available

Automatic data backup

Easy upload /download and sharing

The resources you need to share and manage data with your lab, colleagues and community

Discovery EnvironmentHundreds of bioinformatics Apps in an easy-to-use interface

A platform that can run almost any bioinformatics application

Seamlessly integrated with data and high performance computing

User extensible – add your own applications

AtmosphereCloud computing for the life sciences

Simple: One-click access to more than 200 virtual machine images

Flexible: Fully customize your software setup

Powerful: Integrated with iPlant computing and data resources

Science APIsFully customize iPlant resources

Science-as-a-service platform

Define your own compute, and storage resources (local and iPlant)

Build your own app store of scientific codes and workflows

DNA SubwayEducational workflows for Genomes, DNA Barcoding, RNA-Seq

Commonly used bioinformatics tools in streamlined workflows

Teach important concepts in biology and bioinformatics

Inquiry-based experiments for novel discovery and publication of data

BisqueImage analysis, management, and metadata

Secure image storage, analysis, and data management

Integrate existing applications or create new ones

Custom visualization and image handling routines and APIs

The iPlant CollaborativeGenome Assembly and Annotation

The iPlant CollaborativeGenome Assembly and Annotation

Annotation of the Lobolly Pine Mega genome—Jill Wegrzyn20.15 Gb assembly—split into 40 jobs—216 CPU/job (8640 CPU total)—17 hours

22,656 CPU cores on1,888 nodes

Genome AssemblySize (Mb) CPU

Run Time

Arabidopsis thaliana TAIR10 120 600 2:44Arabidopsis thaliana TAIR10 120 1500 1:27Zea mays RefGen_v2 2067 2172 2:53

TACC Lonestar Supercomputer

Campbell et al. Plant Physiology. December 4, 2013, DOI:10.1104/pp.113.230144

The iPlant CollaborativeAn Evolving Data Commons

specimencollection

analysis

project creation publication

data discovery and re-use

The iPlant CollaborativeChallenge: Transform existing datasets to do custom queries

The iPlant CollaborativeLeveraging iPlant Data Store and iRODS

The iPlant Collaborative

Collaborating with us

The iPlant Collaborative

• “Powered by iPlant” supports a variety of ways of using the iPlant infrastructure underneath another application that communicates with users; usually outside the iPlant project.

• Other major projects have adtoped the iPlant CI as their underlying infrastructure (some completely, some in limited ways – more on this later).

Example “Powered by iPlant” Impact

CoGE usage and user count after federation and interoperability with iPlant

Extended SupportMake bioinformatics tools better

• We find example after example of codes that get well below .01% of peak on a single core

• By the end of the year, it will be difficult to get a server below 20 cores.

• There is little sympathy for data/computing challenges when the software is willing to ignore at least 95-99.99% of available performance

D.Stanzione, Director TACC

The iPlant CollaborativeGetting tools out there

GenSel installed by developers, made available through the DEFor whole-genome predictions, widely used in breeding

Dorian Garrick, Iowa State University

The iPlant CollaborativeSolving problems faster

iAnimal genotyping pipeline developed for 1000 Bulls processes two terabytes (TB) of raw sequence data to DNA variants in less than 8 hours

James Koltes, Iowa State University

Where to go from here:

iPlant Learning Center

• Get Started Guide• Tutorials and Videos• Documentation

Upcoming Events

• Workshops• Webinars

iPlant can come to you…

Tools & Services Workshops Genomics in Education Workshops

• Targeted to researchers• Hands-on learning modules• Individual consultations

• Targeted to educators• Pair bioinformatics with classroom labs• Help for generating lesson plans

• Pairs with asynchronous learning• Reach broader audiences• Follow up with workshop learners

Webinars

Where to go from here:

• If iPlant can, we’ll help show you how…• If iPlant can’t we’ll find the path that gets you what you need

Don’t hesitate to ask “Can iPlant do this?”

Keep asking at ask.iplantcollabortive.org

Staff:Greg AbramSonali AdityaRitu AroraRoger BarthelsonRob BovillBrad BoyleGordon BurleighJohn CazesMike ConwayVictor CorderoRion DooleyAaron DubrowAndy EdmondsDmitry FedorovMelyssa FratkinMichael GattoUtkarsh GaurCornel Ghiban

Executive Team

Steve Goff - UAMatthew Vaughn - TACCNirav Merchant – UAEric Lyons - UADoreen Ware – CSHL

Current and Former:

Faculty Advisors & Collaborators:Ali AkogluKobus BarnardTimothy ClausnerBrian EnquistDamian GesslerRuth GreneJohn HartmanMatthew HudsonDavid LowenthalB.S. Manjunath

Students:Peter BaileyJeremy BeaulieuDevi BhattacharyaStorme BriscoeYaDi ChenDavid ChoiBarbara Dobrin

David NealeBrian O’MearaSudha RamDavid SaltMark SchildhauerDoug SoltisPam SoltisEdgar SpaldingAlexis StamatakisSteve Welch

Zhenyuan LuAaron MarcuseKubitzRobert McLayNathan MillerSteve Mock Martha NarroBenoit ParmentierJmatt PetersonDennis RobertsPaul SarandoJerry SchneiderBruce Schumaker

Steve GregoryMatthew HanlonNatalie HenriquesUwe HilgertNicole HopkinsEunSook JeongLogan JohnsonChris JordanKathleen KennedyMohammed KhalfanDavid KnappLars KoersterkSangeeta KuchimanchiKristian KvilekvalSue LauterTina Lee

Edwin SkidmoreBrandon SmithMary Margaret Sprinkle Sriram SrinivasanJosh SteinLisa StillwellJonathan StrootmanPeter Van BurenHans VasquezGrossRebeka VillarrealRamona WalllsLiya WangAnton Westveld Jason WilliamsJohn WregglesworthWeijia Xu

Andrew PredoehlSathee RavindranathKyle SimekGregory StriemerJason VandeventerNicholas WoodwardKuan Yang

Postdocs:Barbara BanburyChristos Noutsos Solon PissisBrad Ruhfel

John DonoghueYekatarina KhartianovaChris La RoseAmgad MadkourAniruddha MaratheAndre MercerKurt MichaelsZack Pierce

The iPlant CollaborativeWho makes up iPlant?

Download these slides…

www.iplantc.org/arswiki1

@JasonWilliamsNY

@iPlantCollab

Jason Williams – Williams@cshl.edu

Recommended