17
Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Getting started with WDL & CromwellBioinformatics workflows at any scale

Ruchi MunshiData Sciences Platform

Broad Institute

Page 2: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

The backdrop: data generation set to explode

Story begins here

Quarterly output (in TBases) of the Genomics Platform

Page 3: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

• Execution engine that can – Run on any platform (HPC and on Cloud)– Seamlessly scale based on workflow needs– Provide maximal flexibility for all use cases– https://github.com/broadinstitute/cromwell

• Workflow language that humans can read/write– Methods developers and biomedical scientists at large– https://github.com/openwdl/wdl/

Meet Cromwell & WDL

Page 4: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Use containers for portability & reproducibility

A container encapsulates all the software dependencies associated with running a program

Takes the guesswork out of running workflows on different platforms!

GATK 2.8

Java 7

R 2.5.0

GATK 3.8

Java 8

R 3.0.1 BWA

Picard

Modified from https://www.docker.com/what-container

Page 5: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Use a workflow execution engine that runs anywhere

Cromwell

HPC TESLocal Google

Funnel

https://github.com/broadinstitute/cromwell

AWS Alicloud

Page 6: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Run using HPC and Cloud resources!

Databuckets

Compute environment

Cloud

Persistent Cromwell server

HPC

Local

Page 7: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Command Line• Simple, self-contained command• Appropriate for independent

analysts• Call Caching

Server• API endpoints• More scalable, appropriate for

production environments• Call caching

Two main ways to run Cromwell

java -jar cromwell.jar \run hello.wdl \hello_inputs.json

Page 8: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Managing data- Root directory- Data handling strategies- Support for object stores

HPC Backend Configuration

https://cromwell.readthedocs.io/en/stable/tutorials/HPCIntro/

Page 9: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Managing resources- CPU- Memory- Custom attributes

HPC Backend Configuration

https://cromwell.readthedocs.io/en/stable/tutorials/HPCIntro/

Page 10: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Run command- Built-in variables- Full flexibility

HPC Backend Configuration

https://cromwell.readthedocs.io/en/stable/tutorials/HPCIntro/

Page 11: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Plenty of workflow solutions to go around

Randall Munroe, XKCDhttps://www.xkcd.com/927/

So of course we decided to create a new one.

Page 12: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Workflow description Language (WDL)

Page 13: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

WDL runtime parameters

resourcing

cost savings!

containers

Page 14: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Basic WDL plumbing options

call stepAcall stepB { input: in=stepA.out }call stepC { input: in=stepB.out }

LINEAR CHAINING

MULTI-IN/OUT

call stepC { input : in1=stepB.out1, in2=stepB.out2 }

Array[File] inputFiles

scatter(oneFile in inputFiles) {call stepA { input: in=oneFile }

}

call stepB { input: files=stepA.out }

SCATTER-GATHER

Page 15: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

But what about CWL?

Randall Munroe, XKCDhttps://www.xkcd.com/1739/

Thanks to our Workflow Object Model(WOM), Cromwell now supports multiple versions of WDL as well as CWL 1.0!

Page 16: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Cromwell has been busy

Cromwell in production at Broad:

Processed 47.5 million jobsover the last two years

And this is just the tip of the iceberg!

Page 17: Getting started with WDL & Cromwell - Blue Waters...Getting started with WDL & Cromwell Bioinformatics workflows at any scale Ruchi Munshi Data Sciences Platform Broad Institute

Want to discuss further?

My Email: [email protected]

More Information:Docs: http://cromwell.readthedocs.io/en/develop/Github: https://www.github.com/broadinstitute/cromwellWDL: http://www.openwdl.org

Example Pipelines: https://github.com/gatk-workflows