26
Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK [email protected] The team: Jacek Cala, Hugo Hiden, Simon Woodman, David Leahy With thanks to: -Vlad Sykora, Martyn Taylor, Christophe Poulain, Savas Parastatidis -Microsoft External Research & the EU (Venus-C) for their financial support

Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK [email protected]

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Running Large Workflows in the Cloud

Paul Watson

School of Computing Science & Digital Institute

Newcastle University, UK

[email protected]

The team: Jacek Cala, Hugo Hiden, Simon Woodman, David Leahy

With thanks to:

-Vlad Sykora, Martyn Taylor, Christophe Poulain, Savas Parastatidis

-Microsoft External Research & the EU (Venus-C)

for their financial support

Page 2: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

The Promise of Clouds

• Reduced capital expenditure

• Reduced Energy

• Scalability

• “On-demand resources”

Page 3: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

The Problem with Clouds

Cloud Infrastructure: Storage & Compute

Ap

p 1

....

Ap

p n

Building scalable,

reliable, secure systems on cloud

infrastructure is still hard

• deep IT skills

• bespoke

• on-going management costs

• lock-in

Realising cloud advantages is

beyond most who could benefit

Page 4: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Cloud Options

Cloud Infrastructure: Storage & Compute

Cloud Platform

App 1 .... App n

Users

Cloud Infrastructure: Storage & Compute

Ap

p 1

....

Ap

p n

Users

Page 5: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Cloud Infrastructure: Storage & Compute

Cloud Platform

App 1 .... App n

Users

e-Science Central

Cloud Platform

for developers

Science as a Service

for users & programs

Portable: run on

- Azure

- Amazon

- Private Cloud

Page 6: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Cloud

Platform

App ....

Workflow Enactment

App API

Social Networking

Security

Processing

e-Science

Central

Storage

App

Analysis Services

Cloud

Infrastructure

Provenance

Metadata

Files Databases

Page 7: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk
Page 8: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk
Page 9: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk
Page 10: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk
Page 11: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Used for variety of

research:

- activity recognition

- spectroscopy

- driving analysis

- chemistry…

Page 12: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Chemists want to know:

Q1. What are the properties of this molecule?

Toxicity

Solubility

Biological Activity

Q2. What molecule would have aqueous solubility of 0.1 μg/mL?

Page 13: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Answering the Question by performing experiments

..... time consuming, expensive, ethical Issues

Page 14: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Quantitative Structure Activity Relationship

- predict properties based on similar molecules

An alternative to experimentation: QSAR

Activity ≈ f( ) quantifiable structural attributes, e.g.

#atoms

logp

shape

.....

Page 15: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

New/ Improved Models

New Data

or

Model-Builders

Data Model-

Builders

Model Generation

Models

Generating the models -

Discovery Bus (Leahy et al)

www.openqsar.com

Page 16: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Increasing amounts of data for model building...

CHEMBL : data on 622,824 compounds,

collected from 33,956 publications

WOMBAT-PK: data on 1230 compounds,

for over 13,000 clinical measurements

WOMBAT : data on 251,560 structures,

for over 1,966 targets

More models

Better models

est. 5 years to process new

datasets on existing server

Page 17: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

JUNIOR Project Aim

Use Azure to generate models in weeks not years

Page 18: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Selected Descriptors + Responses

Separate Training & Test Data

Chemical Structures & their Activities

Calculate Descriptors from Structures Training Data

Combine Descriptors Descriptors + Responses

Filter Descriptors Combined Descriptors + Responses

Multiple Linear

Regression

Neural Network

Partial Least Squares

Classification Trees .....

Build & Test Models Independently

Select Best Models

Add to Model Database

Test Data

Potential for parallelism...

Page 19: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Workflow Enactment

App API

Social Networking

Security

Processing

e-Science

Central

Storage

Analysis Services

Azure

Provenance

Metadata

Discovery Bus

Co-ordinator

Amazon Approach #1 Minimal change to

Discovery Bus

Page 20: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

0

5

10

15

20

25

30

35

0 20 40 60 80 100

Throughput (Requests/min)

Azure Nodes

Succeeded in reducing time from (est.) 5 years to 3 weeks, but…

Page 21: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Approach #2

• run entirely within Azure

– through e-Science Central on Azure

Page 22: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

7 Workflows with 67 blocks

Big run:

- 460K workflow invocations

- 4.4 million service invocations

Page 23: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

web browser

Azure Cloud Drive

DB data store

Azure Cloud Drive

e-ScienceCentral data store

Azure Cloud Drive

AppServer data store

e-Science Central

Azu

re w

ork

er

po

ol

<<worker role>>

e-SC workflow engine

<<worker role>>

e-SC workflow engine

<<worker role>>

e-SC workflow engine

Work Queue

REST API

rich client

web UI

Approach #2

Page 24: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Early Results

• Reduces time to 9 days

– 5yrs 22 days 9 days

– but room for improvement….

Page 25: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Scalability

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

0 5 10 15 20 25

Execu

tio

n t

ime [

10

3s]

Number of workflow engines

Ideal

Actual

Page 26: Running Large Workflows in the Cloud · 2018-01-29 · Running Large Workflows in the Cloud Paul Watson School of Computing Science & Digital Institute Newcastle University, UK Paul.Watson@ncl.ac.uk

Summary

• Discovery Bus exemplifies a good Cloud pattern

– large, variable, bursty requirements

• e-Science Central is a scalable, secure, portable cloud

platform for Azure (and Amazon, and Private Clouds)

• next steps

– optimize large workflow scheduling

– automatically adapt #workers

Time

Com

pute