29
Autonomic Cloud Provisioning For Scientific Workflows Ryan Chard Victoria University of Wellington

Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

  • Upload
    buikiet

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Autonomic Cloud

Provisioning For Scientific

WorkflowsRyan Chard

Victoria University of Wellington

Page 2: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Science in the Cloud

Page 3: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Science in the Cloud

Taverna

Page 4: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

The Globus Galaxies Platform

A platform for creating cloud-hosted Science as a

Service gateways:

Galaxy to create, manage, execute, share workflows

Globus for data and identity management

HTCondor for job scheduling and execution

AWS for executing analyses

Spot instances

Elastic provisioner

20 gateways, 300+ researchers

Page 5: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Globus Genomics Usage

urces

Page 6: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Cloud Provisioning Challenges:

Naïve use can be expensive and limit performance

38 instance types, multiple regions and availability zones

Compute/memory/network/GPU optimised

Different pricing models (On-demand, spot, reserved)

Differing tool requirements

Technical challenges

Instance setup, tools, dependencies, termination

Autoscaling/resource management

Page 7: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

A Globus Galaxies Gateway

Worker

Worker

Instance Provisioning

Galaxy

Cost-aware

provisionerGlobus

Provisioning

HistoryTool Profiles

NFS

HTCondorQueue

Worker

Page 8: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Provisioning System

Worker

Worker

Instance Provisioning

Galaxy

Globus

Provisioning

History

NFS

HTCondorQueue

Worker

Cost-aware

provisioner

Tool ProfilesProvisioning

History

Page 9: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Improving the Cloud Provisioner

Simplify cloud usage

Make it cheaper and faster

Monitor arbitrary queues to acquire resources and fulfil workloads

Applications

Platform

ProvisioningService

Cost Analysis

Tool Profiling

Page 10: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Smarter Cloud Use

Broaden search scopes

Use many instance types and all AZs

Use profiles to match workloads to

instances

Encode memory, CPU, and disk

requirements in ClassAds etc.

Evaluate spot market prices

All availability zones (AZs) and all suitable

instance types

Self termination

Over-provision requests

Request more resources than needed

Repurpose excess spot

requests

Migrate requests to idle jobs

Revert to on-demand

Cost Throughput

Platform

ProvisioningService

Cost Analysis

Page 11: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Cost-aware Provisioning Approach

1. Filter instance types with profiles

2. Determine price for each instance type across all AZs

3. Rank potential requests

4. Make requests and monitor

5. Cancel or repurpose excess active requests once one is fulfilled

$$$

???

Page 12: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Evaluation

Six production GG gateways

Galaxy/Condor logs + spot price

history

145 day period

Evaluation by mapping jobs to AWS spot

price history

Single Instance, Single AZ (SI-SAZ)

Single Instance, Multi AZ (SI-MAZ)

Multi Instance, Single AZ (MI-SAZ)

Multi Instance, Multi AZ (MI-MAZ)

Four provisioning scopes

Page 13: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Cost-aware Results

Cost/Day

Page 14: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Cost-aware Results

Increasing the search scope to include multiple instance types and availability zones results in cost savings between 43% and 95% per gateway

Overall reduction in cost of 92.1%

The difference between MI-SAZ and MI-MAZ less than 1%

$27,686.99 $23,321.29 $2,259.36 $2,200.24

SI-SAZ SI-MAZ MI-SAZ MI-MAZ

Page 15: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Profiling Application

Executions

Predict execution time and cost

Enable co-allocation of jobs

Minimise monetary cost of execution

Maximise performance

Applications

Provisioning

Service

Tool Profiling

Matching tools to resources

AWS is flexible

38 instance types

We need a way to determine resource

usage

Identify suitable instance types

Trial and error is tedious

What Why

How

A profiling service!

Page 16: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Profiler

1. Submit

profiling request 5. Return profiles

Worker

Worker web service

PCP HTCondor

2. Provision

workers

3. Start/monitor profiling

Worker

W orker w eb service

PCP HTCondor

Worker

W orker w eb service

PCP HTCondor

Worker

W orker w eb service

PCP HTCondor

4. Parse PCP

log and store

profiles

A Cloud Tool Profiling Service

1. Describe profile requests in

JSON

2. Provision resources and apply

a profiling Web Service

3. Use Performance Co-pilot

(PCP) to capture usage

4. Capture and process PCP logs

5. Return profiles as JSON (or logs

via s3)

Page 17: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Evaluation

10 instance types

Instance storage (not EBS)

Five Globus Genomics tools

Range from 20-90 minute exec time

Account for 17.7% of total compute across four gateways

Capture:

Execution time, CPU utilisation, memory utilisation, disk, and network

Assess impact on cost

c3.2xlarge

c3.4xlarge

c3.8xlarge

g2.2xlarge

g2.8xlarge

r3.xlarge

r3.2xlarge

r3.4xlarge

r3.8xlarge

m3.2xlarge

BWA ALN

FASTQC

Bowtie

MarkDups

BWA MEM

BWA ALN

FASTQC

Bowtie

MarkDups

BWA MEM

Tools Instances Profiles

Page 18: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Execution Time

BWA MEM BWA ALN

Page 19: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Utilisation (Bowtie)

CPU Memory

Page 20: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Spot Instance Price

A dataset of 2000 production jobs

from GG gateways

Use submission time and AWS spot

price history to determine price per

instance type

Adaptive fastest, slowest, cheapest,

costliest = quickest, slowest,

cheapest, most expensive instance

types

Globus Genomics = naïve instance/zone selection

Page 21: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Provisioning as a Service

A service to acquire and manage cloud resources

Gateways subscribe to the service

Combines profiles and cost-aware techniques

Monitor a queue (HTCondor/Spark) and provision resources on

demand

Applications

Platform

ProvisioningService

Cost Analysis

Tool Profiling

Page 22: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

The Globus Galaxies Platform

Worker

Worker

Instance Provisioning

Galaxy

Cost-aware

provisionerGlobus

Provisioning

HistoryTool Profiles

NFS

HTCondorQueue

Worker

Page 23: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Instance Provisioning

RDS

Provisioning History

Tool

Profiles

Galaxy

Globus

NFS

HTCondor

Queue

Worker

Worker

Worker

Cost-aware

provisionerHTCondor

Queue

A Provisioning Service

Page 24: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Cost-aware

provisionerHTCondor

Queue

Cost-aware

provisionerHTCondor

Queue

Instance Provisioning

RDS

Provisioning

History

Tool Profiles

Cost-aware

provisionerHTCondor

Queue

Galaxy

Globus

NFS

HTCondor

Queue

Worker

Worker

Worker

Galaxy

Globus

NFS

HTCondor

Queue

Worker

Worker

Worker

Galaxy

Globus

NFS

HTCondor

Queue

Worker

Worker

Worker

Provisioning Architecture

Page 25: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Evaluation

Tune configurable parameters and see how it

changes performance

Run rate, idle requirement, repurpose

requests, resource restrictions

A reproducible dataset of production GG workloads (busiest days from 5 gateways)

1000 jobs (total ~8500 in 24 hours)

~3.5 hours of jobs

~8 hours execution duration

Transform exec time (sleep) by instance type

Monitor number of requests, fulfilled

instances, and cost to fulfil workload

Page 26: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Average Wait Per Job

Dashboard like configuration

AZ=1: single availability zone

IR: Idle requirement of jobs before

resources are provisioned

Re: Whether or not requests are

repurposed once orphaned

RR: Run rate, or interval between provisioning rounds

Base: base case (Re=T, RR=5,

IR=120,AZ=All)

Page 27: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Instances and Cost

Page 28: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Summary

Using the cloud naïvely can increase costs and decrease

performance

There are simple techniques that can substantially reduce

the cost of using the cloud

Matching tools to resources is essential for effective cloud

use

The provisioning service and profiler are available on github

Page 29: Autonomic Cloud Provisioning For Scientific Workflows · PDF fileAutonomic Cloud Provisioning For Scientific Workflows ... HTCondor for job scheduling and execution ... Autonomic Cloud

Thanks!

Questions?