Towards a Unified View of Cloud Elasticity

Towards a Unified View of Elasticity

Srikumar Venugopal & Team

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia

[email protected]

Acknowledgements

•  Basem Suleiman •  Han Li •  Reza Nouri •  Freddie Sunarso •  Richard Gow

Agenda

•  Introduction to elasticity and its challenges

•  Performance Modeling of Elasticity Rules •  Autonomic Decentralised Elasticity

Management of Cloud Applications •  Efficient Bootstrapping for Decentralised

Shared-nothing Key-value Stores

Simple Service Deployment on Cloud

Elasticity

The ability of a system to change its capacity in direct response to the workload demand

Different Views of Elasticity

•  Performance View – When to scale and how much ?

•  Application View – Does the architecture accommodate scaling ? – How is state managed ?

•  Configuration View – Are there changes in configuration due to

scaling?

Elastic Deployment Architecture

Elasticizing Application Layer

Trigger – Controller – Action

•  Trigger: Threshold Breach •  Controller: Intelligence/Logic •  Action: Add or Remove Capacity

State-of-the-art in Auto-scaling

Product/Project Trigger Controller Ac3ons

Amazon Autoscaling

Cloudwatch metrics/ Threshold

Rule-‐based/Schedule-‐based

Add/Remove Capacity

WASABi Azure Diagnos3cs/Threshold

Rule-‐based Add/Remove Capacity, Custom

RightScale/Scalr Load monitoring Rule-‐based/Schedule-‐based

Add/Remove Capacity, Custom

Google Compute Engine

CPU Load, etc. Rule-‐based Add/Remove Capacity

Academic

CloudScale Demand Predic3on Control theory Voltage-‐scaling

Cataclysm Threshold-‐based Queueing-‐model Admission Control

IBM Unity Applica3on U3lity U3lity func3ons/RL Add/Remove Capacity

Summary

•  Currently, the most popular mechanisms for auto-scaling are rule-based mechanisms

•  The effectiveness of rule-based autoscaling is determined by the trigger conditions

•  So, how do we know how to set up the right triggers ?

Performance Modeling of Elasticity Rules

Basem Suleiman

Elasticity (Auto-Scaling) Rules

Examples: •  If CPU Utilization ≥ 85% for 7 min. add 1 server (Scale Out) •  If RespTimeSLA ≥ 95% for 10 min. remove 1 server (Scale In)

B. Suleiman, S. Venugopal, Modeling Performance of Elasticity Rules for Cloud-based Applications, EDOC 2013.

Performance of Different Elasticity Rules

•  How well do elasticity rules perform in terms of SLA satisfaction, CPU utilization , costs and % served request?

Rule Elasticity Rules

CPU75 If CPU Util.>75% for 5 min; add 1 server If CPU Util.≤30% for 5 min; remove 1 server



SLA90 If SLA < 90% for 5 min; add 1 server If SLA ≥ 90% for 5 min; remove 1 server

SLA95 If SLA < 95% for 5 mins; add 1 server If SLA ≥ 95% for 5 mins; remove 1 server

B. Suleiman, S. Sakr, S. Venugopal, W. Sadiq, Trade-‐off Analysis of Elas2city Approaches for Cloud-‐Based Business Applica2ons, Proc. WISE 2012

Cloud Testbed for Collecting Metrics

TPC-W database

EC2

EC2

TPC-W application

.......

Elastic Load Balancer

EC2

EC2

% SLA Satisfaction, Avg. CPU Utilization Server Costs and % served Requests

Response Time

B. Suleiman, S. Sakr, S. Venugopal, W. Sadiq, Trade-‐off Analysis of Elas2city Approaches for Cloud-‐Based Business Applica2ons, Proc. WISE 2012

Performance Evaluation - Different Elasticity Rules

Max

Min

Median

Q3

Q1

Mean

Legend

$0.00

$0.50

$1.00

$1.50

$2.00

$2.50 CPU75

CPU80

CPU85

SLA90

SLA95

Cos

ts

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

CPU75

CPU80

CPU85

SLA90

SLA95 CPU

Util

izatio

n B. Suleiman, S. Sakr, S. Venugopal, W. Sadiq, Trade-‐off Analysis of Elas2city Approaches for Cloud-‐Based Business Applica2ons, Proc. WISE 2012

The Challenges of Thresholds

You must be at least this tall to scale up!

•  Threshold values determine performance and cost

•  E.g. Low CPU utilization => Higher cost, Better Performance

•  Thresholds vary from one application to another

•  Empirically determining thresholds is expensive.


Can we construct a model that allows us to establish the right thresholds ?

Queue Model of 3-tier

B. Suleiman, S. Venugopal, Modeling Performance of Elas2city Rules for Cloud-‐based Applica2ons, EDOC 2013 (Accepted)

Establishing Rule Thresholds

•  Developed a model based on M/M/m queuing model – Simultaneous session initiations on 1 server – Provisioning Lag Time of the provider – Cool-down interval after elasticity action – Algorithms to model scale-in and scale-out – Request Mix

•  Compared model fidelity with actual cloud execution of TPC-W workload.


Experiments: Methodology

•  Run the TPC-W workload on Amazon cloud resources using thresholds

•  Simulate the model using MATLAB with the same thresholds

•  Compare the simulation results to the results from the actual execution –  If both are equivalent, then we are good J

B. Suleiman, S. Venugopal, Modeling Performance of Elas2city Rules for Cloud-‐based Applica2ons, EDOC 2013 (Accepted)

Experiments: Testbed

TPC-W database

EC2

TPC-W user emulation

Linux – Extra-large

EC2

TPC-W application

.......

Elastic Load Balancer

EC2

Small/Medium server Linux – JBoss/JSDK

Extra-large server Linux - MySQL

EC2

Experiments: Input Workload

0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 5700

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

2400

Req

uest

Arr

ival

Rat

e (r

eq/m

in)

Time (minutes)

Workload

•  Used TPC-W Browsing profile (95% read) •  Stress on application tier •  Number of concurrent-users – Zipf •  Inter-arrival times - Poisson

Experiments: Elasticity Rules

Rule Rule Expansion

CPU75 If CPU Util. > 75% for 5 min, add 1 server If CPU Util. < 30% for 5 min, remove 1 server

CPU80 If CPU Util. > 80% for 5 min, add 1 server If CPU Util. < 30% for 5 min, remove 1 server

Common parameters: •  Waiting time – 10 mins., Measuring interval – 1 min. Metrics Captured: •  Average CPU Utilization across all the servers •  Average Response Time in a time interval •  Number of servers in operation at any point of time

Results

CPU Utilization

CPU75M CPU75E CPU80M CPU80E0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Elasticity Rules - Model (M) & Empirical (E)

Avg

. CP

U U

tiliz

atio

n

CPU75M CPU75E CPU80M CPU80E

Average Response Time

CPU75M CPU75E CPU80M CPU80E0.0

0.1

0.2

0.3

0.4

0.5

Elasticity Rules - Models (M) & Empirical (E)

Avg

. Res

pons

e Ti

me

(sec

)


0 40 80 120 160 200 240 280 320 360 400 440 480 520 5600%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%A

vg. C

PU

Util

izat

ion

(%)

Time (minutes)

CPU80M CPU80E

CPU Utilization over Time

0 40 80 120 160 200 240 280 320 360 400 440 480 520 5600

1

2

3

4

5

6

No.

Ser

vers

(App

. Tie

r)

Time (minutes)


Number of Servers Initialized

Summary

•  Developed a queueing model that can be used to reason about elasticity

•  Model captures effects of thresholds and can be used for testing different rules

•  Evaluations show that the model approx. real-world conditions closely

•  Future work: handling initial bursts in workload

Autonomic Decentralised Elasticity Management of Cloud Applications

Reza Nouri and Han Li

Cons of Rule-based Autoscaling

•  Commercial products are rule-based – Gives “illusion of control” to users – Leads to the problem of defining the “right”

thresholds •  Centralised controllers

– Communication overhead increases with size – Processing overhead also increases (Big

Data!) •  One application/VM at a time

Challenges of large-scale elasticity

•  Large numbers of instances and apps – Deriving solutions takes time

•  Dynamic conditions – Apps are going into critical all the time

•  Shifting bottlenecks – Greedy solutions may create bottlenecks in

other places •  Network partitions, fault tolerance…

H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of 8th ICAC '11.

Initial Conditions

Instance1 App Server1

app1 app2


app3 app4

IaaS Provider

A Critical Event


app1 app2

IaaS Provider


app3 app4

Placement 1


app1

IaaS Provider


app3 app4 app2

Placement 2


app2

IaaS Provider


app3 app4


app1

$$

Placements 3 & 4


app2

IaaS Provider


app3 app4


app2

IaaS Provider


app3 app4


app1 app1

app1 app1

Problems for Automatic Placement

•  Provisioning – Smallest number of servers required to satisfy

resource requirements of all the applications •  Dynamic Placement

– Distribute applications so as to maximise utilisation yet meet each app’s response time and availability requirements


Co-ordinated Control of Elasticity

•  Instances control their own utilisation – Monitoring, management and feedback

•  Local controllers are learning agents – Reinforcement Learning

•  Controllers learn from each other – Share their knowledge and update their own

•  Servers are linked by a DHT – Agility, Flexibility, Co-ordination

H. Li, S. Venugopal, “Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform”, Proceedings of 8th ICAC '11.

Abstract View of the Control Scheme

H. Li, S. Venugopal, “Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform”, Proceedings of 8th ICAC '11.

Fuzzy Thresholds


Basic Actions

Instance

Applica3on

create! terminate! find!

move! duplicate! merge!

(-‐3.5) (3.5) (3.5)

(0.5) (0.5) (0.5)

Co-ordination using find!

•  Server looks up other servers with the least load – DHT lookup

•  Sends a move message to the selected server

•  Replies with accept or reject!– accept has a +ve reward

Shrinking

•  The controller is always reward maximising – Highest Reward is for merge+terminate

•  A controller initiates its own shutdown – Low load on its applications

•  Gets exclusive lock on termination – Only one instance can terminate at a time

•  Transfers state before shutdown

Experiments

•  Six web applications – Test Application: Hotel Management – Search à Book à Confirm

•  Five were subjected to a background load – Uniform Random

•  One was subjected to the test load •  Application threshold: 200 and 500 ms •  Metrics

– Average Response Time, Drop Rate, Servers

H. Li, S. Venugopal, “Using Reinforcement Learning for Controlling an Elas3c Web Applica3on Hos3ng Pla\orm”, Proceedings of 8th ICAC '11.

Experimental Results (EC2)

Elasticising Persistence Layer

Efficient Bootstrapping for Decentralised Shared-nothing Key-

value Stores

Han Li

Key-value Stores

•  The standard component for cloud data management

•  Increasing workload à Node bootstrapping –  Incorporate a new, empty node as a member of KVS

•  Decreasing workload à Node decommissioning –  Eliminate an existing member with redundant data off

the KVS

H. Li, S. Venugopal, Efficient Node Bootstrapping for Decentralised Shared-Nothing Key-Value Stores, Proceedings of MIddleware 2013.

Research Questions

•  As the system scales, how to efficiently incorporate or remove data nodes? – Load balancing, migration overheads, etc.

•  How to partition and place the data replicas when the system is elastic? – Data consistency, durability, availability, etc..


Elasticity in Key-Value Stores

•  Minimise the overhead of data movement –  How to partition/store data?

•  Balance the load at node bootstrapping

–  Both data volume and workload –  How to place/allocate data?

•  Maintain data consistency and availability –  How to execute data movement?


A

B

G

F

C

D

E

I

H

Key space

Split-Move Approach

A

I

C CD

Node 1 Node 2

Node 3 Node 4

B

IB

B

A

Master Replica Slave Replica

A

H

A

I B2

C CD

Node 1 Node 2

Node 3 Node 4

New Node

B1 B2

I

B1

B2

A

B1


A

H①

①①

A

B

G

F

C

D

E

I

HB2

B1

①

Key space

②A

I B2

C CD

B2

A B1

Node 1 Node 2

Node 3 Node 4

New Node②

B1 B2

I

B1

B2

A

B1


A

H

A

I B2

C CD

B2

A B1

Node 1 Node 2

Node 3 Node 4

New Node②②

B1 B2

I

B1

B2

A

B1


To be deleted

③

A

H

Partition at node bootstrapping


Virtual-Node Approach

A

B

G

F

C

D

E

I

H

Key spaceD B

E H

I G

A C

D F

G I

A B

C E

I

C D

F H

G

Node 1 Node 2

Node 3 Node 4

D B

E H

I G

A C

D F

G I

A B

C E

I

C D

F H

G

Node 1 Node 2

Node 3 Node 4

New NodeD B

E H

I G

A C

D F

G I

A B

C E

I

C D

F H

G

B A

E F

H

Node 1 Node 2

Node 3 Node 4

New Node

......Partition at system startup

Data skew: e.g., the majority of data is stored in a minority of partitions. Moving around giant partitions is not a good idea.


Our Solution •  Virtual-node based movement

–  Each partition of data is stored in separated files –  Reduced overhead of data movement –  Many existing nodes can participate in bootstrapping

•  Automatic sharding –  Split and merge partitions at runtime –  Each partition stores a bounded volume of data –  Easy to reallocate data –  Easy to balance the load


The timing for data partitioning •  Shard partitions at writes (insert and delete)

–  Split: Size(Pi) ≤ Θmax –  Merge: Size(Pi) + Size(Pi+1) ≥ Θmin

Split

Delete

Insert

Merge

BA

CD

E

B1A

CD

E

B2

B1A

CD

E

B2

B1A

M

DE

Split

Delete

InsertB

A

CD

E

B1A

CD

E

B2

B1A

CD

E

B2

Split

InsertB

A

CD

E

B1A

CD

E

B2B

A

CD

E

Θmax ≥ 2Θmin

Avoid oscilla3on!


Sharding coordination •  Solution: Election-based coordination

Node-A

Node-C

Node-E

Node-B

SortedList:C, E, ..., A, ..., B Step1

Election

Node-A

CoordinatorNode-C

Node-E

Node-B

Step 2Enforce Split/Merge

Data/Node mappingNode-A

CoordinatorNode-C

Node-E

Node-B 1st

Data/Node mapping

Step 3 Finish Split/Merge

2nd

3rd

4th

Node-A

CoordinatorNode-C

Node-E

Node-B

Step 4Announce to all nodes


Node failover during sharding Non-

coordinatorsNon-

coordinatorsNon-

coordinatorsElection

Notification:Shard Pi

Time

Beforeexecution

Duringexecution

Afterexecution

Replace Replicas

Coordinator

Announce:Successful

Step2

Step3

Step4

Step1Non-

coordinatorsNon-

coordinators

Removed from candidate list

Non-coordinatorsElection

Failed Resurrectyes

No

Yes


Append to candidate list

Gossip

No Dead

Time

Beforeexecution

Duringexecution

Afterexecution

Replace Replicas

Coordinator

Announce:Successful

Step2

Step3

Step4

Step1Non-

coordinatorsNon-

coordinatorsNon-



Gossip Continue without coordinator Resurrect

Dead

No

Yes

Time

Beforeexecution

Duringexecution

Afterexecution

Failed

Replace Replicas

Coordinator

Announce:Successful

Step2

Step3

Step4

Step1Non-

coordinatorsNon-

coordinatorsNon-



Failed

Gossip

Yes

Continue without coordinator

ElectNew coordinator

NoInvalidate Piin this node

Timeout

Time

Beforeexecution

Duringexecution

Afterexecution

Replace Replicas

Coordinator

Announce:Successful

Step2

Step3

Step4

Step1


Evaluation Setup

•  ElasCass: An implemention of auto-sharding, building on Apache Cassandra (version 1.0.5), which uses Split-Move approach.

•  Key-value stores: ElasCass vs. Cassandra (v1.0.5)

•  Test bed: Amazon EC2, m1.large type, 2 CPU cores, 8GB ram

•  Benchmark: YCSB


Evaluation – Bootstrap Time

•  Start from 1 node, with 100GB of data, R=2. Scale up to 10 nodes.

•  In Split-Move, data volume transferred reduces by half from 3 nodes onwards.

•  In ElasCass, data volume transferred remains below 10GB from 2 nodes.

•  Bootstrap time is determined by data volume transferred. ElasCass exhibits a consistent performance at all scales.


Conclusions

•  We have designed and implemented a decentralised auto-sharding scheme that – consolidates each partition replica into single

transferable units to provide efficient data movement;

– automatically shards the partitions into bounded ranges to address data skew;

–  reduces the time to bootstrap nodes, achieves more balancing load and better performance of query processing.

A Unified View of Elasticity (?)

Final Thoughts

•  Elasticising Application Logic is done – How do we eliminate thresholds ? – Should it be more autonomic ?

•  Application View of Elasticity – Managing state is the big challenge – Decoupling of different components (service-

oriented model) – How would you scale interconnected

components ?

Questions ?

[email protected]

Thank you!

Technology

Towards a Unified View of Cloud Elasticity