Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm

Preview:

Citation preview

Genomic Computation at Scalewith Serverless, StackStorm, and DockerSC17, 14 Nov 2017Dmitri ZimineFellow @ Extreme Networks@dzimine

Image by Miki Yoshihito, Creative Commons license

Genomic Sequencing and Annotation

ACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGGTAACGTACGCCTACGTGACCGGTACTGGTAACGTATACACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGCTGGTAACGTATACCTCT...

Sequencer

Sequenced Genome

DNA Sample

Annotated Sequence

Computein silko

3

So that…

Source: http://www.yourgenome.org

Victor SolovyevPartner,

Leading scientist in computational

biologyVictor Solovyev is a leading scientist in computational biology. His experience is a good mixture of academic positions, including Professor at Royal Holloway and KAUST, and various industry roles. His research on bioinformatics and genomic computations are published in Nature, Science, Genome Research and highly cited.

As Chief Sci. Officer at Softberry, he is leading software development for biomedical data analysis and research in computational biology. Softberry software products have been used in over 2000 research publications in 2016 alone. Fgenesh program has been cited in ~ 3200, Bprom program in ~ 800, Fgenesb pipeline in ~500 scientific publications.

5

fgenesb pipeline: some [prev] results

PROPERTIES:

Challenges:• Offer annotation pipelines online• Use cloud, for large elastic capacity• Handle scale - spiky workload• Economically

GAaaS – Genomic Annotation as a Service

Agenda

8

Problem & Solution

Domain demands, technology selection & serverless, toolchain, solution overview

Show & Tell Demo

Discussion Lessons learned, what to keep & what to refactor, the path forward

Typical genomic annotation pipelineSearch for similar

proteins in databases

KEGG

Prediction of genes and proteins

Compilation and presentation of

results

NR

fgenesb

Blast(NR)

GCView

50-100Gb

KOALA(KEGG)

1Mb-3Gb

HighlyParallel-able

Annotation Pipelines

A basic exome pipeline delivering called variants from raw sequence could consist of as few as 12 steps, most of which can be run in parallel, but a real analysis will typically involve several additional downstream steps and complex report generation.

Source: Brief Bioinform bbw020. DOI: https://doi.org/10.1093/bib/bbw020

Annotation Pipelines

A basic exome pipeline delivering called variants from raw sequence could consist of as few as 12 steps, most of which can be run in parallel, but a real analysis will typically involve several additional downstream steps and complex report generation.

Source: Brief Bioinform bbw020. DOI: https://doi.org/10.1093/bib/bbw020

PROPERTIES:

• Steps: • jobs/functions • Run times – may be hours & days• Diverse (a.k.a. “don’t run on the same box”)

• Workflow orchestration:• Logical patterns: splits, parallels, joins• Data flow:

Upstream results –> downstream inputs• Scale dimentions: spiky load

• Low volume of requests, • Very high compute demand per request

Properties:

Serverless

Authoritative: Mike Roberts on martinfowler.com:

My summary• Function, not service: “down when done”• Scale – elastic, infinite, transparent for developer• Pay per use consumption model

https://goo.gl/bTfgfU

What is Serverless?

14

Serverless fits!

*) BYOC – Bring Your Own Code (see the serverless compute manifesto, https://goo.gl/q9HsXB

Typical Serverless requirements:

• “Functions”, not “servers”, down when done

• Elastic scale: handle spiky workload pattern

• BYOC*: package algorithms into containers

• Launch on a variety of events

Additional requirements:

• Long running times: hours

• Pipeline orchestration: execution logic and data passing

• Local Dev environment, consistent and convenient

15

Serverless fits, but…

Typical Serverless requirements:

• Elastic scale: handle spiky workload pattern

• “Functions”, not “servers”, down when done

• BYOC*: package programs into containers, run everywhere

• Launch on a variety of events

Why not <…>

16

AWS Lambda? 5 min limitation - jobs run for hours and days

Azure? No native support for Functionsin docker containers *

OpenWhisk?Lacks powerful workflow to orchestrate pipelines (only sequences)

*) At the time of selecting. I will cover ”what has changed” in Discussion.

D I Y

18

Terraform provisions infra on AWS (WIP);

Vagrant for local dev infra.

Ansible deploys & cofigures software on

Infra.

Docker to containerize functions and

push to local Docker Registry.

StackStorm orchestrates pipeline

executions,

invokes Swarm to run functions,

dynamically scales Swarm on load.

Tool Chain

StackStorm, in 1 minute

ActionsSensors

WorkflowsRules

IT Domains

Config mgmtStorageNetworking ContainersCloud InfraMonitoring Ops Support

Triggers Calls

©2017 Extreme Networks, Inc. All rights reserved

StackStorm is like …

ActionsSensors

WorkflowsRules

Step Functions

AWS Lambda

OpenSource, for DIY Serverless

Three Sides to Serverless Story

DevOps

Developer

End User

Submits sequence,Gets results,fast and cheap.

Packs algorithms incontainers, Defines pipelines

Provides infrastructure

1. DevOps: deploys serverless solution

23share(:rw) data(:ro)

StackStorm

other infra…

f(x)

Registry

Controller

f(x)

f(x)

f(x)

Worker

f(x)

f(x)

f(x)

Worker

f(x)

f(x)

f(x)

Worker

/share /data

$ function

Scale

DevOps

2. Developer: creates functions, defines pipeline

25

StackStorm

Registry

Create functions (BYOC), pack into Docker image,push to local Registry

Define pipelines as StackStorm workflowsDeveloper

1

2

f(x)

f(x)

f(x)

f(x)

StackStorm

StackStorm sends results back to user

Swarmcontroller

2

46Docker pulls

function’s images 5Functions run in containers, produce data

f(x)

StackStorm runs workflowschedules functionsas jobs on Swarm

SwarmWorker

3Swarm schedulesservices

User sendssequence data1

f(x) f(x)

Registry

3. User submits data, System runs pipeline & produces results

End User

27

Genomic annotation pipeline with StackStorm, Docker,

and Docker Swarm

Show & Tell, PART 1

Scale: dynamically, on load

29

share(:rw) data(:ro)

StackStorm

other infra…

f(x)

Registry

Controller

f(x)

f(x)

f(x)

Worker

f(x)

f(x)

f(x)

Worker

f(x)

Worker

Scale

30

Show & Tell, PART 2

Dynamically scaling Swarm cluster on AWS,

on workload

Agenda

32

Problem & Solution

Domain demands, technology selection & serverless, toolchain, solution overview

Show & Tell Demo

Discussion Lessons learned, what to keep & what to refactor, the path forward

Serverless hype accelerates

25+ framewors … but no turn-key fit yet

Kubernetes Won Container Arm Race

now with built-in AWS autoscaler .

Azure Introduced Container Instances

no messing with VMs, per-second billing .

We are outpaced by technology

We are outpaced by technology

So What?

Path Forward: Options

Option 1: Kubernetes

• Use Kubernetes pack from StackStorm Exchange• Utilize k8s “run to completion” jobs• Deploy on AWS, minikube for local development, • Leverage AWS autoscaler for elastic capacity

StackStorm handles pipeline workflow, calls k8s Jobs. Same app developer experience.

39

Path Forward: Options

Option 2: Azure

• Use Azure’s ”Self-orchestration” option with StackStorm• Azure provides containers on demand (no VMs!)• Per container, per second billing

StackStorm handles pipeline workflow, calls Azure containers. App developer experience stays the same.

40

StackStorm

StackStorm sends results back to user

Azure Container

Service

2

46Docker pulls

function’s imagesfrom Registry

5Functions run in containers, produce data

f(x)

StackStorm runs workflowschedules functionsas containers on Azure

AzureContainerInstance

3Azure schedulescontainer instances

User sendssequence data1

f(x) f(x)

Registry

Path forward: Change to Azure Container Instances

End User

42

43

STACKSTORM EVENT-DRIVEN AUTOMATION ALLOWS YOU TO GET YOUR SOLUTION UP AND RUNNING QUICKLY SO YOU CAN DELIVER BUSINESS FAST, EXPERIMENT AND INNOVATE. ONCE YOU HAVE IT JUST RIGHT, YOU CAN BUILD A MORE PERMANENT VERSION WITH MICROSERVICES

ActionsSensors

WorkflowsRules

44

StackStorm is an innovation platform where we can build solutions, experiment and learn, while deliver business value, before moving implementation to dedicated services

46

StackStorm OpenSourcePlatform

Brocade Workflow Composer(StackStorm Enterprise Edition)

Network Automation

StackStorm Exchange Community

Security AssistedNetworking

©2017 Extreme Networks, Inc. All rights reserved

Come and see! SC17 Excibition, Booth #519

47

Image by Miki Yoshihito, Creative Commons license

Dmitri ZimineExtreme Networks@dziminehttp://github.com/dzimine/serverless-swarm

@Stack_Stormhttp://github.com/StackStorm/st2 Star 2,317

Thank You!

Recommended