37
Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters Shuangcheng Niu 1 , Jidong Zhai 1 , Xiaosong Ma 2,3 Xiongchao Tang 1 , Wenguang Chen 1 THU 1 & NCSU 2 & ORNL 3

Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

  • Upload
    diallo

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters. Shuangcheng Niu 1 , Jidong Zhai 1 , Xiaosong Ma 2,3 Xiongchao Tang 1 , Wenguang Chen 1 THU 1 & NCSU 2 & ORNL 3. “HPC in Cloud” Is Trend?. HPC in cloud On-demand Elastic No upfront cost - PowerPoint PPT Presentation

Citation preview

Page 1: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

Shuangcheng Niu1, Jidong Zhai1, Xiaosong Ma2,3

Xiongchao Tang1, Wenguang Chen1

THU1 & NCSU2 & ORNL3

Page 2: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

2

“HPC in Cloud” Is Trend?

HPC in cloud◦ On-demand◦ Elastic◦ No upfront cost◦ Saving management fee◦ …

More and more engineers start using HPC cloud

Page 3: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

3

“On-demand Model” Is Effective?

Reserved instance pricing model◦ 6 reserved instance classes in Amazon EC2 CCI◦ Discounted charge rate with upfront fee

0 200 400 600 800 10000

10000

20000

30000

40000

Amazon’s EC2 cc2.8xlarge Pricing Model

On-Demand3Y-Light3Y-Medium

Time (day)

Tota

l in

stance

cost

($)

6.8%38.3%

Page 4: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

4

“On-demand Model” Is Lower Utilized!Reserved instance pricing model

◦ Difficult to be utilized for individuals

SDSC Data Star system trace◦ 391 day◦ 460 users◦ 1 user, 1 3Y-Light

Instance Type Used

3Y-Medium 0

3Y-Light 0.15 %

On-demand 99.85 %

Page 5: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

5

Short Jobs

Hourly-charging granularity

Several minutes delay when start

Maybe I should pack my short

jobs to lower my rental cost.

70%

Page 6: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

6

Our Proposal

Semi-Elastic Cluster computing model◦ Organization-owned◦ Cloud-based virtual cluster◦ Dynamic capacity◦ Sharing resources between users

Page 7: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

7

SEC Architecture

Page 8: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

8

SEC Model

Traditional local cluster

A (0,1.5)

Wait time: 15 minUtilization: 56.7%

C(1,0.75) D (1.75,1.5)

Page 9: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

9

SEC Model

Traditional local clusterPure on-demand cloud

A (0,1.5)D (1.75,1.5)

C(1,0.75)

A (0,1.5)

Wait time: 0 minUtilization: 70.8%

Wait time: 15 minUtilization: 56.7%

Page 10: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

10

SEC Model

Traditional local clusterPure on-demand cloudSemi-elastic cluster A (0,1.5)

D (1.75,1.5)

C(1,0.75)

A (0,1.5)

Wait time: 0 minUtilization: 70.8%

A (0,1.5)

C(1,0.75)

D (1.75,1.5)

Wait time: 0 minUtilization: 77.3%

Wait time: 15 minUtilization: 56.7%

Page 11: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

11

53.0069444444444 570

50

100

150

200

Used Size Allocated

Time (day)

Num

of

Inst

ance

s

Aggregated Workloads

SEC trace slices with SDSC Data Star workload

3Y-Medium,73.66 %

3Y-Light,15.75 %

On-Demand, 10.59 %

Page 12: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

12

SEC Challenges

Finer-tuned capacity

◦ Intelligently controlled capacity according to job queue and submission history

◦ Tradeoff between responsiveness and lower cost

Aggregated workloads◦ Predict long-term resource requirements◦ Auto resource provisioning

Evaluation without real traces

Page 13: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

13

Job Scheduling & Cluster Size ScalingProblem definition

◦ Configurable wait time constraint◦ Minimize total cost

Batch scheduling◦ Extended backfilling algorithms◦ Dynamic resource provisioning

Resource provisioning strategies◦ Wait-time bounded instance acquisition◦ Expanding capacity according to job queue

Job placement policies

Page 14: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

14

Experimental Setup

Workload◦ 391-day trace from SDSC’s Data Star system

Cloud platform◦ Amazon's EC2 Cluster Compute Instances (CCIs)◦ Eight Extra Large Instances (cc2.8xlarge)◦ 16 processors (2 × Intel Xeon E5-2670, eight-core)◦ 60.5 GB memory ◦ 4 × 850 GB instance storage

Page 15: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

15

0 100 200 300 4000

1

2

3

4

NoWait SEC-On-Demand

SEC-Hybrid

Individual

Avg. Wait Time (sec)

Avg

. C

ost

Ra

te (

$/h

ou

r)

SEC vs. On-demand Model

◦ Individual◦ NoWait◦ SEC-On-Demand◦ SEC-Hybrid

Trace: SDSC DS

61.0%

13.3%

Page 16: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

16

0 200 400 600 800 1000 12000

0.5

1

1.5

2

Local-1.5XLocal-1.75X

Local-2X

SEC-Hybrid

Avg. Wait Time (sec)

Avg

. C

ost

Ra

te (

$/h

ou

r)

SEC vs. Local Cluster

◦ Traditional local cluster◦ SEC-Hybrid

Trace: SDSC DS

Page 17: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

17

Offline Reserved Instance ConfigurationOffline configuration problem

◦ Input Utilization matrix Un×m (from given cluster capacity

trace) Pricing classes {C0, C1, C2,…Ch}

◦ Solution Purchased instance matrix: Rn×m, where Ri,k≥0

◦ Optimization Minimizing total rental cost

A hard problem!

Page 18: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

18

Choosing larger time interval, e.g. a week ◦ Reduce computation granularity

Offline Forward Greedy Algorithm

Running: At beginning of each time interval

Steps:

1) Calculate all instances' utilization level based on given

future demands

2) Identify first economical class for each instance

3) Summarize provisioning plan

4) Compare provisioning plan with current inventory and

decide amount of purchased

5) Adjusting active reserved instances

Running: At beginning of each time interval

Steps:

1) Calculate all instances' utilization level based on given

future demands

2) Identify first economical class for each instance

3) Summarize provisioning plan

4) Compare provisioning plan with current inventory and

decide amount of purchased

5) Adjusting active reserved instances

Page 19: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

19

Offline Optimal-Competitive Algorithm

Transform the original pricing classes into new classes

TotalCost (Ck) ≥ TotalCost(Ck’) =

Transform the original pricing classes into new classes

TotalCost (Ck) ≥ TotalCost(Ck’) =

Page 20: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

20

Online Reserved Instance ConfigurationUse weekly time intervals

◦ Reduce computation complexity◦ Reduce short-term variance◦ Less impact on long-term reservation decisions

Evolution model◦ Assumed a quadratic polynomial model

Page 21: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

21

Long-Term Demand Prediction

Classical Exponential Smoothing (ES) method◦ Relatively simple ◦ Quite robust for processing non-stationary noises◦ Widely used

Our prediction method◦ Extended Holt's double-parameter ES method ◦ Auto adjusting smoothing factors

Page 22: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

22

Verifying Workloads

Validation workloads

•Bounded by fixed machine size

•6 real traces

HPC cluster

•Semi-elastic machine size

SEC

•Not bounded

•6 SNS traces

SNS

Page 23: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

23

SNS-based Synthetic Workloads

SearchTraffic

ActiveUsersSNS

ActiveUsers

ResourceDemandHPC

SNS search traffic

HPC trace slices Syntheti

c workload

SyntheticWorkloadGeneration

Page 24: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

24

Reserved Instance Configuration AnalysisHPC trace

SDSC DS HPC2N Sandia Ross0

0.5

1

1.5

2

2.5

Optimal-Competitive Offline-Greedy Online-SEC

Online-3Y-Only Online-1Y-Only Online-OD-Only

Avg

. C

ha

rge

Ra

te (

$/h

ou

r)

Page 25: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

25

Reserved Instance Configuration AnalysisSynthetic workloads using SNS trace

Facebook MySpace Flickr Renren0

0.5

1

1.5

2

2.5

Optimal-Competitive Offline-Greedy Online-SEC

Online-3Y-Only Online-1Y-Only Online-OD-Only

Avg

. C

ha

rge

Ra

te (

$/h

ou

r)

Page 26: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

26

Overhead Analysis with SEC Prototype Overhead for data protection with instance reuse

◦ Reformatting EC2 ephemeral 4×845GB disks◦ 3.4 seconds

Configuration overhead when requesting new instances◦ Configuring host names, hosts file, file system, etc.◦ About 8.0 seconds

Configuration overhead when releasing instances◦ About 5.0 seconds

Page 27: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

27

Conclusion

SEC : A new execution model for HPC◦ Organization-owned dynamic cloud-based clusters◦ Reduced costs by workload aggregations◦ Better responsiveness through instance reuse◦ Higher utilization level by efficient utilizing residual

resources

SEC can potentially become a viable alternative to organizations owning and managing physical clusters

Page 28: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

28

Related Work

[1] Parallel Workloads Archive. http://www.cs.huji.ac.il/labs/parallel/workload/, 2012.

[2] SLURM: A Highly Scalable Resource Manager. https://computing.llnl.gov/linux/slurm/, 2012.

[3] StarCluster. http://web.mit.edu/star/cluster/, 2012.

[4] Google Trends. http://www.google.com/trends/, 2013.

[5] E. S. Gardner Jr. Exponential smoothing: The state of the art. Journal of Forecasting, 1985.

[6] W. Voorsluys, S. Garg, and R. Buyya. Provisioning spot market cloud resources to create cost-effective virtual clusters. Algorithms and Architectures for Parallel Processing, 2011.

[7] H. Zhao, M. Pan, X. Liu, X. Li, and Y. Fang. Optimal resource rental planning for elastic applications in cloud market. In Parallel & Distributed Processing Symposium (IPDPS), IEEE, 2012.

Page 29: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

29

Acknowledgments

We would thanks to ◦ HPC Workloads archive

◦ Anonymous reviewers and shepherd

◦ Research grants from Chinese 863 project, NSF grants, a joint faculty appointment between ORNL and NCSU, and a senior visiting scholarship at Tsinghua University

Page 30: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

30

Thanks!

Page 31: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

31

Classical HPC traces

SDSC’s Data Star, SDSC's Blue Horizon (SDSC Blue), SDSC's IBM SP2 (SDSC SP2), Cornell Theory Center IBM SP2 (CTC SP2), High Performance Computing Center North (HPC2N), Sandia Ross cluster(Sandia Ross).

Variance in node-hour per active user

Page 32: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

32

Synthesis workloads

SNS search trace from Google Trends

Page 33: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

33

Cost-responsiveness analysis

Local cluster expense items

Page 34: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

34

Impact of scheduling parameters

Page 35: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

35

Impact of scheduling parameters

Average wait timeExpandin

g strategie

s

Wait Time

Threshold

Page 36: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

36

Impact of scheduling parameters

Average charge rateExpandin

g strategie

s

Wait Time

Threshold

Page 37: Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters

37

Overhead Analysis with SEC Prototype Overhead for data protection with instance reuse

◦ Reformatting EC2 ephemeral 4×845GB disks◦ 3.4 seconds

Configuration overhead when requesting new instances◦ Configuring host names, hosts file, and the file system◦ Set up user accounts and add nodes to the SLURM partition.

Configuration overhead when releasing instances