Higher SLA Satisfaction in Datacenters with Continuous VM Placement Constraints

Preview:

Citation preview

Higher SLA Satisfaction in Datacenters with Continuous Placement Constraints

Huynh Tu Dang hdang@polytech.unice.fr

Fabien Hermenier fabien.hermenier@unice.fr

SLA for a virtualised application

SLA for a virtualised application

spread the replicas

SLA for a virtualised application

performance guarantee

SLA for a virtualised application

low latency

N3

N2

time

SLA: spread(VM1, VM2)

VM1

VM2

reconfiguration algorithm

N1

N3

N2

time

SLA: spread(VM1, VM2)sys-admin query: offline(N1)

VM2VM1

VM2

reconfiguration algorithm

VM1

N1

Rec

onfig

urat

ion

proc

ess

N3

N2

time

SLA: spread(VM1, VM2)sys-admin query: offline(N1)

VM2VM1

VM2

reconfiguration algorithm

VM1

N1

with discrete restrictions

Discrete restriction is not enough

not an unpredictable situation,an algorithmic issue

Evaluating the reliability of discrete placement constraints

• simulate a 256-server datacenter

• running 350 HA webapp (5,200 VMs)

• BtrPlace as the reconfiguration algorithm

• 4 reconfiguration scenarios that mimic industrial use case

• 100 instances per scenario

Studied constraints

among

singleResourceCapacity

maxOnline

splitAmong

spread

DBs on a same edge-switch for a fast synchronisation.

keep resource for hypervisor management operations

webapp split over 2 clusters for disaster recovery

240 nodes online at maximum to fit licensing policy

replicas on distinct servers for fault tolerance

verti

cal e

lastic

ity

Tiers 1

Tiers 2

Tiers 3

scenario

verti

cal e

lastic

ity

Tiers 1

Tiers 2

Tiers 3

scenario

scenariohorizontal elasticity

Tiers 1

Tiers 2

Tiers 3

scenariohorizontal elasticity

Tiers 1

Tiers 2

Tiers 3

scenarioboot storm x 400

scenarioserver failure

Scenario Violated SLAs

ActionsVM Boot Migrate Node Boot Node Shutdown

Vertical Elasticity 40.72 0% 99.99% 0.005% 0.005%

Horizontal Elasticity 0.19 99.82% 0.18% 2.82% 0%

Server Failure 29.56 61.29% 35.89% 2.82% 0%

Boot Storm 0.35 98.57% 1.43% 0% 0%

Migrations lead to unanticipated placements

0

25

50

75

100

VerticalElasticity

HorizontalElasticity

ServerFailure

BootStorm

Violations

spread among splitAmong maxOnline

performance lossspof

failure

Migrations tend to violate relative placement constraints

spread(VM[1,2])

Trading unreliable discrete constraints …

we addressed an assignment problem

spread(VM[1,2])

… for safe continuous constraints

we must address a scheduling problem

Continuous placement constraints with

from discrete to continuous among|simpleAmong

N1N2 N3

N4 N5N6

stay on a same partition by the end of the reconfiguration process

from discrete to continuous among|simpleAmong

N1N2 N3

N4 N5N6

stay on a same partition by the end of the reconfiguration process

from discrete to continuous among|simpleAmong

N1N2 N3

N4 N5N6

stay on a same partition by the end of the reconfiguration process

Disallow movements between partitions • basic knowledge of a reconfiguration process • still an assignment problem

N1N2 N3

N4 N5N6

from discrete to continuous among|simpleAmong

Disallow movements between partitions • basic knowledge of a reconfiguration process • still an assignment problem

N1N2 N3

N4 N5N6

from discrete to continuous among|simpleAmong

allDi↵erent(dhost

1 , d

host

2 )

discrete spread(VM[1,2]) ::= continuous spread(VM[1,2]) ::=

allDi↵erent(dhost

1 , d

host

2 ) ^d

host

1 = c

host

2 =) a

start

1 � a

end

2 ^d

host

2 = c

host

1 =) a

start

2 � a

end

1

continuous spread

Disallow temporary overlapping • require to know this may happen • scheduling 101

continuous maxOnlinediscrete maxOnline(N[1..10], 7)::=

continuous maxOnline(N[1..10], 7)::=

10X

i=1

nqi 7

8i 2 [1, 10], n

on

i

=

⇢0 if n

q

i

= 1

a

start

i

otherwise

n

off

i

=

⇢max (T ) if n

q

i

= 0

a

end

i

otherwise

8t 2 T, card({i|non

i

� t ^ n

off

i

}) 7

scheduling 201

detailed knowledge of a reconfiguration process

harder to imagine, model & implement

5 10 20 50 100Duration (sec.)

Solve

d in

stan

ces

discretecontinuous

020

4060

8010

0

10 20 50 100Duration (sec.)

Solve

d in

stan

ces

discretecontinuous

020

4060

8010

0

5 10 20 50 100Duration (sec.)

Solve

d in

stan

ces

discretecontinuous

020

4060

8010

0

boot storm

horitzontal elasticity server failure

Performance overhead

5 10 20 50 100 200Duration (sec.)

Solve

d in

stan

ces

discretecontinuous

020

4060

8010

0

vertical elasticity

5 10 20 50 100Duration (sec.)

Solve

d in

stan

ces

discretecontinuous

020

4060

8010

0

10 20 50 100Duration (sec.)

Solve

d in

stan

ces

discretecontinuous

020

4060

8010

0

5 10 20 50 100Duration (sec.)

Solve

d in

stan

ces

discretecontinuous

020

4060

8010

0

boot storm

horitzontal elasticity server failure

Performance overhead

5 10 20 50 100 200Duration (sec.)

Solve

d in

stan

ces

discretecontinuous

020

4060

8010

0

vertical elasticity

• discrete restriction is not enough

• continuous restriction is a solution

• a different view on the problem

• challenging, but still possible to implement

Conclusions

Future Work

• a broader range of constraints and objectives

• reducing performance overhead

• static analysis to detect un-necessary continuous constraints

• controlled relaxation to handle hard situations

open source, 20+ placement constraints, demo, tutorials, everything for reproducibility

http://btrp.inria.fr

Recommended