34
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications Piyush Shivam, Shivnath Babu, Jeffrey Chase Duke University

Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

  • Upload
    thais

  • View
    20

  • Download
    0

Embed Size (px)

DESCRIPTION

Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications. Piyush Shivam, Shivnath Babu, Jeffrey Chase Duke University. Site A. Site C. Site B. Networked Computing Utility. Task workflow. A network of clusters or grid sites. Task scheduler. - PowerPoint PPT Presentation

Citation preview

Page 1: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Active and Accelerated Learning of Cost Models for Optimizing Scientific

Applications

Piyush Shivam, Shivnath Babu, Jeffrey Chase

Duke University

Page 2: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

C3

C1

C2

Site A

Site B

Site C

Task scheduler

Task workflow•A network of clusters or grid

sites

Networked Computing Utility

•Each site is a pool of heterogeneous resources

•Jobs are task workflows

•Challenge: choose good resource assignments for the jobs

Page 3: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

C3C1

C2

Site A

Site B

Site C

home file server

P1

P2P3

• A workflow with a single task

Example: Assigning Resources to Run Tasks

P1 Site A Site A

• Task input data at Site A

• Execution plan Ξ Resource assignment

P2 Site B Site A

P3 Site B Site B

Plan CPU Storage

Page 4: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Plan Selection Problem

Choose Best Plan

Plans CPU Storage

P1 Site A Site A

P2 Site B Site A

… … …

Task workflow

Plan Enumeration

Cost

T1

T2

Cost: Plan Execution

Time

Challenge: Need cost models to estimate plan execution time

Page 5: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Generating Cost Models is Hard

• Non-declarative

– Scientific workflow tasks are usually scripts (matlab, perl)

– Such tasks are not database operators like join or select

– Hence: task is a black box with no prior knowledge

• Heterogeneous resources

– Computational grid setting

– Performance varies a lot across resource assignments

• Data dependency

– Performance can vary significantly based on properties of input data & parameters to scripts

Page 6: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Problem Setting• Scientific workflows at DSCR (Duke Shared Cluster

Resource)

• Important scientific workflows are run repeatedly

– Opportunity to observe & learn task behavior

– Better plan selection for subsequent runs

• Sequential scientific workflows

– Each task runs on a single node

– >90% of workflows at DSCR are sequential

Page 7: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

NIMO SystemNonInvasive Modeling for

Optimization

NIMO learns cost models for task workflows

– End-to-end cost models

• Incorporate properties of tasks, resources, & data

– Non-invasive

• No changes to tasks

– Automated and active

• Automatically collects training data for learning cost models

C3

C1

C2

Site A

Site B

Site C

Scheduler NIMO

NIMO SystemNonInvasive Modeling for

Optimization

Page 8: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

NIMO Fills a Gap

• WorkFlow Management Systems (WFMSs)

– WFMSs use database technology for managing all aspects of scientific workflows [Liu ‘04, Shankar ‘05]

• Batch scheduling systems

– Knowledge of plan execution time is assumed for optimizing resource assignments [Casanova ‘00, Phan ‘05, Kelly ‘03]

NIMO generates cost models for these systems

Page 9: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Roadmap

• Cost models

• NIMO: active learning of cost models

• Experimental evaluation

• Related work

• Conclusions

• Future work

Page 10: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Cost Model

Task

Executiontime

Resource assignment

Cost Modelfor Task Input data

Total workflow execution time can be derived usingthe cost models for individual tasks

Task workflow

Page 11: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Oa

(compute

occupancy)

Os

(stall occupancy)

Task Cost Model

compute phase(compute resource busy)

stall phase(compute resource

stalled on I/O)

Od

(storage

occupancy)

On

(network

occupancy)

+ + )(T = D *totaldata

exec.time

occupancy: average time spent per unit of data

Page 12: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Cost ModelTask

Executiontime

Resourceassignment

Cost Model

Input dataT = D * (Oa + On + Od)

Resource profile

Data profile

Task profile

Page 13: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Learning Cost Models

Learning the cost model = Learning profiles + Learning predictors

Page 14: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Independent variables

Resource profile ( )

Dataprofile ( )

Statistical Learningof Predictors

Dependent variables

Ex: Learn each predictor as a regression modelfrom the training data

Page 15: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Challenges in Learning

• Cost of sample acquisition

• Coverage of system operating range

• Curse of dimensionality

– Suppose: 10 profile attributes X 10 values per attribute, and 5 minutes for a task run (sample) We sample 1% of space and build cost model

Passive learning

Elapsed Time

Accuracy of

currentbest

model

951 years!

Active & AcceleratedLearning

Best accuracy possible

Page 16: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Active (and Accelerated) Learning

• Which predictors are important?

• Which profile attributes should each predictor have?

• What values to consider for each profile attribute during training?

Resource profile Data profile

Page 17: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

WANemulator(nistnet)

NIMO workbench

Training setdatabase

Active &Accel.

learning

C3

C1

C2

Site A

Site B

Site C

Scheduler

NIMO System

Taskprofiler

Resourceprofiler

Run standard benchmarks

Dataprofiler

Page 18: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Active Learning Algorithm

Initialization

While( ) {

}

Page 19: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

• Relearn predictors with the new set of training samples

• Compute current prediction error of each predictor

– Fixed test set

– Cross-validation

Active Learning Algorithm

Initialization

While( ) {

}

Pick a new assignment

Run task on chosen assignment

Relearn predictors

Relearn Predictors

10ms256M1GHz 1G512MB 6 8T44

Page 20: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Active Learning Algorithm

Initialization

While( ) {

}

Run task on chosen assignment

Relearn predictors

10ms256M1GHz 1G512MB 6 8T44

Choose a predictor to refine

Choose attributes for the predictor

Choose attribute values for the run

Predictor Choice• Predictors – fa, fn, fd, fD

• Order predictors + Traverse this order

– Ex: relevance-based order (Plackett-Burman)

– Ex: choose predictor with current max. error

Page 21: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Active Learning Algorithm

Initialization

While( ) {

}

Run task on chosen assignment

Relearn predictors

10ms256M1GHz 1G512MB 6 8T44

Choose a predictor to refine

Choose attributes for the predictor

Choose attribute values for the run

Attribute Choice

• Each predictor takes profile attributes as input

• Not all attributes are equally relevant

• Order attributes + Traverse this order

Page 22: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Active Learning Algorithm

Initialization

While( ) {

}

Run task on chosen assignment

Relearn predictors

10ms256M1GHz 1G512MB 6 8T44

Choose a predictor to refine

Choose attributes for the predictor

Choose attribute values for the run

Value Choice

• Cover the operating range of attributes

• Expose main interactions with other attributes

Page 23: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Experimental Results

• Biomedical workflows (from DSCR)

– BLAST, fMRI, NAMD, CardioWave

– Single task workflows

• Plan space in the heterogeneous networked utility

– 5 CPU speeds, 6 Network latencies, 5 Memory sizes

– 5 X 6 X 5 = 150 resource plans

• Goal: Converge quickly to a fairly-accurate cost model

– We use regression models for the predictors

– Model validation details in previous work (ICAC 2005)

Page 24: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Performance Summary

• Error: Mean absolute % error in predicted execution time• A separate test set for evaluating the error

Page 25: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

BLAST Application: Predictor Choice

Page 26: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

BLAST Application: Attribute Choice

Page 27: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Related Work

• Workflow Management Systems (WFMSs)

– [Shankar ’05, Liu ’04 etc.]

• Performance prediction in scientific applications

– [Carrington ’05, Rosti ’02, etc.]

• Learning cost models using statistical techniques

– [Zhang ’05, Zhu ’96, etc.]

• NIMO is end-to-end, noninvasive, and active (acquires model learning data automatically)

Page 28: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Conclusions

• NIMO:

– Learns cost models for scientific workflows

– Noninvasive and end-to-end

– Active and accelerated learning: Learns accurate cost models quickly

– Fills a gap in Workflow Management Systems

Page 29: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

• NIMO + SHIRAKO

– A policy-based resource-leasing system that can slice-and-dice virtualized resources

• NIMO + Fa

– Processing system-management queries (e.g., root-cause diagnosis, forecasting performance problems, capacity-planning)

C3

C1

C2

Site A

Site B

Site C

Scheduler NIMO

Future Work

Page 30: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Backup Slides for Explanation

Page 31: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

See Paper for Details of Steps• Each algorithm step has sub-algorithms

• Example: Choosing the predictor to refine in current step

– Goal: learn most relevant predictors first

– Static Vs. dynamic ordering

• Static:

– Define total order: a priori or using estimates of influence (Plackett-Burman)

– Traverse the order: round-robin Vs. improvement-threshold-based

• Dynamic: choose the predictor with maximum current prediction error

Page 32: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Active and Accelerated Learning

Page 33: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Latency hiding

Page 34: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Saturation