Hybrid'Machine' Learning/Analytical'Models'for ...romanop/files/papers/tutorial-icpe15.pdf · STOB&white&box&modeling • Focus&on&performance&on&sequencer • It&is&representative&of&the&whole&system

Hybrid'Machine'Learning/Analytical'Models'for'

Performance'Prediction'Diego&Didona and&Paolo&Romano

INESC3ID&/&Instituto Superior&Técnico

6th ACM/SPEC&International&Conference&on&Performance&Engineering&(ICPE)

Feb&1st,&2015

Outline

• Base&techniques&for&performance&modeling–White&box&modeling– Black&box&modeling–Modeling&and&optimization&on&two&case&studies

• Hybrid&modeling&techniques– Divide&et&impera– Bootstrapping– Ensemble

• Closing&remarks

Modeling&a&system

SYSTEMINPUT&FEATURES KEY&PERFORMANCE& INDICATORS

Modeling&a&system

SYSTEMINPUT&FEATURES KEY&PERFORMANCE& INDICATORS

Workload:• Intensity,&small&vs large&jobs

Infrastructure• #&servers,&type&of&servers

Application3specific• Replication

Throughput• Max&jobs&/&sec

Response&time• Exec.&time&of&a&job

Consumed&energy• Joules&/&job

What&is&a&performance&model?

• Approximator of&a&KPI&function

• Relates&input&to&target&output

• Can&be&implemented&in&different&ways–White&box&– Black&box

Applications&of&Performance&Modeling

• Capacity&planning– Avoid&overload&in&datacenters

• Anomaly&detection–Model&“normalcy”&to&detect&anomalies

• Self3tuning–Maximize&performance

• Resource&provisioning– Elastic&scaling&in&the&Cloud

Accuracy&of&a&performance&model

• Approximation&accuracy&metrics–MAPE&(Mean&Absolute&Percentage&Error)

– RMSE&(Root&Mean&Square&Error)

White&box&performance&modeling&• Leverage&on&knowledge&about&target&app’s&internals

• Formalize&a&mapping&between– Application,&hosting&platform&and– Performance&

• Formalization&can&be– Analytical&(e.g.,&Queueing Theory)&[45]– Simulation,&e.g.,&[36]

Queueing Theory• A&resource&is&modeled&as&a&server&+&a&queue• Possible&target&KPIs– Resource&utilization– Throughput– Response&time

• Key&factors&impacting&queue’s&performance– Arrival&of&jobs– Service&demands– Service&policy&(e.g.,&FCFS)– Load&generation&model&(e.g.,&open&vs closed)

From&single&queues&to&networks

Queueing Theory&pros&and&cons

• Accurate&for&wide&spectrum&of&input&parameters• Specifically&crafted&for&target&app• Analytical&tractability&often&requires– Assumptions&(e.g.,&independent&job&flows)– Approximations– Simplifications&&(e.g.,&Poisson&arrival)&

Simulation

• Encode&system&dynamics&via&a&computer&program• Alternative&w.r.t.&analytical&modeling– Simpler&(code&vs equations)–May&rely&on&less&assumptions– Slower&to&produce&output

• Similar&trade3offs&w.r.t analytical&modeling– Still&uses&simplifications&to&avoid&overly&complex&code

Black&box&performance&modeling

• Definition• Taxonomy&(Offline&vs Online,&supervised&vsunsupervised,&regression&vs classification)

• Examples&(DT,&SVM,&ANN,&KNN,&UCB,&Gradient),&• Ensemble• Optimization

Building&black&box&models

• Infer performance&model&from&behavior&

• Machine&Learning&[8]– Observe&Y&corresponding&to&different&X– Obtain&a&statistical&performance&model

Machine&Learning&pros&and&cons

• No&need&for&domain&knowledge• High&accuracy& in&interpolation– i.e.,&for&input&values&close&to&the&observed&ones

• Curse&of&dimensionality– #&required&samples&grows&exp.&with&input&size&– Long&training&phase&to&build&model

• Poor&accuracy&in&extrapolation– i.e.,&for&input&values&far&away&from&the&observed&ones

Black&box&modeling&taxonomy

• Target&output&feature&y– Classification&(discrete&y)&vs Regression&(y&in&R)

• Training&phase&timing– Online&vs Offline

• Predict&or&find&hidden&structures– Supervised&vs unsupervised&learning

OFF3LINE&SUPERVISED&LEARNING

• Supervised– Known&inputs&x&have&a&corresponding&known&y&=&f(x)

• Offline&–Model&built&on&a&training&dataset– Dataset&{<x,y>&:&y&=&f(x)}– Learn&f’&:&f’(x)&~&f(x)• While&being&able&to&generalize&outside&the&known&dataset

Decision&Trees&[55]

• Predictive&model&is&a&tree3like&graph• Intermediate&nodes&are&predicate• Classifications:& leaves&are&classes• Regression:&leaves&are&functions– Piecewise&approximation&of&nonlinear&functions

DT:&an&exampleInput&features• Income&range• Criminal&records• #&years&in&present&job• use&credit&card

Support&Vector&Machines&[16]&

• A&tuple&is&a&point&in&a&multidimensional&space• Find&hyperplane s.t. different&classes&are&as&much&distant&as&possible

Credits&to&Erik&Kim&for&SVM3related&images,&http://www.eric3kim.net/eric3kim3net/posts/1/kernel_trick.html

Support&Vector&Machines&

What&if&points&are&not&linearly&separable?

SVM:&the&kernel&trick,&I

• Map&points&to&a&higher&dimensional&space&• In&that&space,&points&are&linearly&separable

• Here,&kernel&is&f(x,&y)&=&(x,&y,&x2 +&y2)

SVM:&the&kernel&trick,&II

• Nonlinear&separation&in&original&domain

• Here,&kernel&is&f(x,&y)&=&(x,&y,&x2 +&y2)

Artificial&Neural&Network&[79]

• Inner&model&is&a&graph• Resembles&neurons&connections&in&brain

Credits&to&KonéMamadou Tadiou for&imagehttp://futurehumanevolution.com/artificial3intelligence3future3human3evolution/artificial3neural3networks

ANN&internals

• Neuron&structure

• Weighted&sum&of&inputs• Activation&0/1&function&as&output

Credits&to&http://www.theprojectspot.com/tutorial3post/introduction3to3artificial3neural3networks3part31/7

Building&an&ANN

• Determining&its&structure– #&layers– #&neurons&per&layer

• Activation&function&per&neuron• Iteratively&learn&weights&depending&on&error

K&Nearest&Neighbors&[2]

• Predict&based&on&closest&known&values&to&target• Proximity&given&by&a&function

Pic&from&http://www.cs.bham.ac.uk/internal/courses/robotics/halloffame/2010/team12/knn.htm

K&Nearest&Neighbors

• Classification:– Class&of&X&is&the&most&common&in&neighborhood

• Regression– Value&for&X&is&a&function&of&the&values&in&the&neighb.

Pic&from&http://www.cs.bham.ac.uk/internal/courses/robotics/halloffame/2010/team12/knn.htm

ONLINE&LEARNING

• We&consider&Reinforcement&Learning– Training&set&not&available&(nor&stored)– Given&a&set&of&<State,&Action>&pairs– Find&sequence&of&actions&that&maximizes&payoff&(reward)

– Collect&feedback&from&system• Tradeoff&between&– Exploration&(try&new&actions)– Exploitation&(use&good&known&actions)

[70]

Multi3armed&bandit&(MAB)

• Inspired&by&gambling&at&slot&machines.&Find–Which&arm&to&play– How&many&times– In&which&order

Upper&Confidence&Bound&[3]

• Popular&set&of&algorithms&for&MAB

• At&any&time&choose&the&arm&that1. maximizes&reward,&while…2. minimizing&regret:&• utility&loss&due&to&sub3optimal&choices

• Efficiency:&regret&is&logarithmic&in&the&#&of&trials

Hill&Climbing

• Not&really&“learning”,&but&online&optimization• Explore&function&in&the&direction&that&increases/decreases&its&value

• Possibly&coupled&with&randomization&to&avoid&local&max/min

Credit&to&Ben&Etzkorn for&pic.&http://www.benetzkorn.com/2011/09/non3linear3model3optimization3using3palisades3evolver/

NO&FREE&LUNCH&THEOREM&FOR&ML

• There&is&no&“absolute&best&learner”

• Best&learner&and&parameters&depend&on&data

• When&working&in&extrapolation,& there&are&no&a&priori&distinctions&between&learning&algorithms&[80]

ML&optimization

• A&ML&algorithm&has&meta3parameters– #&features&of&the&input&data– #&min&of&cases&per&leaf&in&DT– Kernel&and&its&parameters&in&SVM– Neurons,&layers,&activation&functions&in&ANN

• How&to&choose&them&to&maximize&accuracy?– It&depends&on&the&problem&at&hand!

Features&selection&[40]

• Identify&features&of&inputs&that&are&correlated&the&most&with&target&output– Speedup&in&building&the&modelIncrease&accuracy&by&reducing&noise

Features&selection

• Wrapper:&use&target&ML&with&different&combinations&of&features– Forward&selection,&Backward&elimination,&…

• Filter:&independent&of&the&target&ML– E.g.,&discard&1&between&2&highly&correlated&variables

• Dimensionality&reduction&(PCA,&SVD)– Find&features&that&account&for&most&of&the&variance

Hyperparameters optimization

• Find&hyper3parameters& that&maximize&accuracy• Based&on&cross3validation– Use&part&of&the&set&as&training&and&part&as&test

• Different&approaches– Grid&search– Random&search&[6]– Bayesian&optimization&[74]

Grid&Search

1. Uniformly&discretize&features’&domain

2. Take&the&Cartesian&product&of&features

Random&search

• Include&randomness– Increase&sampling&granularity&of&important&param.&

ENSEMBLING

• Solution&to&counter&NFL&theorem• Employ&multiple&learners&together• Bagging&[9]– Train&learners&on&different&training&sets

• Boosting&[66]– Generate&1&strong&learner&from&N&weak&ones

• Stacking&[79]– Combine&output&of&learners&depending&on&input

Bagging

• Average&output&of&sub3models• Generate&N&sets&of&size&D’– Draw&uniformly&at&random&with&repetition&from&D

• Generate&N&black&box&models– Voting&for&classification– Averaging&for&regression

• Cannot&improve&predictive&power&(in&extrap.)&…• Can&reduce&variance&(i.e.,&better&interp.&accuracy)

Bagging&example

• 100&bootstrapped&learners• Reduce&variance&and&overfit w.r.t.&single&models

Boosting

• Build&a&strong&learner&from&many&weak&ones• Stage3wise&training&phase– Training&at&stage&i depends&on&output&of&i31

• 0/1&Adaboost– Base&learners&Bi:&can&classify&correctly&with&p&>&½– Iteratively&try&to&classify&better&mis3classified&samples– At&stage&i,&drawn&training&set&according&to&dist.&Di

– Di+1 s.t. mis3classified&samples&have&higher&relevance– Output&weighted&average&of&weak&learners

Adaboost,&training

Credits&to&Kihwan Kim.&http://www.cc.gatech.edu/~kihwan23/imageCV/Final2005/FinalProject_KH.htm

Adaboost,&result

Pictures&from&http://www.ieev.org/2010/03/adaboost3haar3features3face3detection.html

• A&meta3learner&combine&output&of&ML• Partition&D&in&D’,&D’’• Train&1…N&learners&on&D’• MLN+1 trained&on&ML1…MLN predictions&on&D’’

Stacking

.'.'.'ML2 MLN

MLN+1

ML1

Background&Case&studies

• Total&Order&Broadcast&primitive– Analytical&model– Black&box&online&optimization

• Distributed&NoSQL transactional&data&grid– Simulation&model– Black&box&offline&supervised&learning

Total&Order&Broadcast&case&study• TOB&allows&a&set&of&nodes&to&deliver&broadcast&messages&in&the&same&order

• Incarnates&the&popular&consensus&problem– Fundamental&abstraction&for&dependable&computing

• We&consider&Sequencer3based&TOB– Messages&are&broadcast&normally– A&Sequencer&node&decides&the&delivery&order

Sequencer3Based&TOB

52

M2M1

M2 M1

M1 M2

Broadcast&messages Broadcast&Seq.&No.& Ordered&deliver

N1

N2

N0(Sequencer)

M1,&M2

M1,&M2

Performance&of&STOB

• STOB&minimizes&messages&exchange,&but…• The&sequencer&may&become&the&bottleneck• Possible&solution:&batching• The&sequencer–Waits&to&receive&N&>&1&msgs– Send&a&single,&bigger&seq.&msg for&the&N&msgsinstead&of&N&smaller

Batching&in&STOB

• At&high&load&batching– Allows&for&amortizing&msgs sequencing&cost– Increases&sequencer&capacity&and&throughput

• At&load&load&batching– Introduces&useless&delays– The&sequencer&waits&too&much&and&wastes&time

The&need&for&self3tuning&STOB&Batching

• Optimal&batching&depending&on&msgs rate

0

5000

10000

15000

20000

10 20 30 40 50 60

Self

Deliv

ery

Late

ncy

(µ

sec)

Batch Level

1K 5K 10K 15K 20K

Tuning&the&batching&level

• White&box&approaches– Forecast&the&impact&of&batching&given&workload

• Black&box&approaches– On3line&optimization

STOB&white&box&modeling

• Focus&on&performance&on&sequencer• It&is&representative&of&the&whole&system

0.1

1

10

100

1000

10000

1 10 100 1000 10000 100000

Avg.

Self

-Deli

very

Late

ncy a

t the

Seq

uenc

er n

ode\

n (m

s

Avg. Self-Delivery Latency over Non-Sequencer Nodes\n (msec)

Figure 1. Self-delivery latency at the sequencer and at the non-sequencernodes.

the sequencer process: the message arrival rate, and the self-delivery latency, i.e., the latency experienced by the sequencerto TO-deliver messages whose TO-broadcast was activatedlocally.

The choice of measuring exclusively the self-delivery la-tencies allows to circumvent the issue of ensuring accurateclock synchronization among the communicating nodes, whichwould have clearly been a crucial requirement in case we hadopted for monitoring the delivery latencies of messages gener-ated by different nodes. Preliminary experiments conducted inour cluster have indeed highlighted that the accuracy achiev-able using conventional clock synchronization schemes, suchas NTP, is inadequate for collecting measurements of the TObroadcast inter-nodes delivery latency in LAN environments,being the latter frequently around or less than a millisecond.

Further, we experimentally verified that, at least in typicalLAN settings, there is a very high correlation between theself-delivery latency at the sequencer and the self-deliverylatency experienced by the other nodes in the group. Thisclaim is supported by the data in Figure 1 and Figure 2.The data in the plots was obtained using a plain (not self-tuning) STOB protocol in a cluster of 10 machines equippedwith two Intel Quad-Core XEON at 2.0 GHz, 8 GB ofRAM, running Linux 2.6.32-26-server and interconnected viaa private Gigabit Ethernet. All the experimental data reportedin the remainder of this paper have been collected using thiscluster. In this experiment, we let the batching level b vary inthe set {1,2,3,4,6,8,16,32,64,128} and, for each batching level,we injected 512 bytes messages at an arrival rate ranging from1 msg/sec up to saturating the GCS. In all of the experiments,independently of the batching level used, it was verified thatthe bottleneck has always resulted to be the sequencer CPU,and the available network bandwidth was always far frombeing saturated. In Figure 1 we report on the x axis theself-delivery (in msec) experienced on average by the non-sequencer nodes, and on the y axis the self-delivery latencyexperienced by the sequencer node.

To generate the plot in Figure 2 we manually computed the

1

10

100

1 10 100

Optim

al b

base

d on

Self

-Deli

very

Late

ncy

at t

he S

eque

ncer

Nod

e

Optimal b based on Self-Delivery Latency Across Non-Sequencer Nodes

Figure 2. Optimal batching value computed on the basis of the sequencer’svs non-sequencer nodes’ self-delivery latency.

optimal batching value that, given the current load, minimizedi) the self-delivery latency of the sequencer node, and ii) theaverage self-delivery latency over all the non-sequencer nodes.The correlations of the datasets in Figure 1 and Figure 2are, respectively, 0.85 and 0.99. This experimental evidenceconfirms the fact that, in a LAN, by relying exclusively onlocal measures of self-delivery latency taken at the sequencernode, it is possible to obtain an accurate picture of theperformance perceived, on average, by any node of the system.

V. THE ANALYTICAL PERFORMANCE MODEL

As discussed in the Introduction section, our self-tuningrelies on an analytical model which is used to initialize areinforcement learning system. In this section we first presentthe analytical model. Next we discuss how it is possibleto determine its parameters’ values. Finally we validate itsaccuracy and effectiveness in driving the self-tuning processwithout additional feedback from the reinforcement learningsubsystem.

A. The Mathematical Model

The design of the analytical performance model used inour system has been driven by two key requirements: i) highcomputational efficiency, thus allowing to compute its solutionalso in real-time without significantly overloading the CPU ofthe sequencer node; ii) possibility to identify its parameters viaan automatized and extremely fast procedure, thus making iteasily employable in practical settings also by non-specialists.

These two requirements led us to opt for a relatively simplemodel that captures the impact of batching on self-deliverylatency focusing on two main aspects: the additional delayintroduced by forcing the sequencer to wait for the completionof a batch of messages, and the alleviation of the load pressureon the CPU of the sequencer due to the generation of a reducednumber of sequencing messages. We choose to explicitlymodel the possible effects of contention on the network, astypical applications that use TOB services in a LAN [5], [26]generate messages large at most a few KBs. In these settings,

788

STOB&model&input

• m&=&messages&generation&rate• b&=&batching&level• T_1&=&time&to&process&1st message&in&batch• T_Add =&time&to&process&additional&msgs– Batching&makes&sense&when&T_1>T_Add

STOB&analytical&model&[59]

• Sequencer&=&M/M/1&queue

• Batch&generation&rate

• Batch&service&rate&

• Taking&derivatives,&optimal&b&is&computed

STOB&model’s&accuracy

• Assumptions&and&simplifications– Exponential&arrival&rate&and&service&rate&(M/M/1)– In&computing&arrivals&and&computation&overlapping

STOB&black&box&optimization&[24]

• Learn&optimal&waiting&time&for&a&batch&of&size&b– Computed&at&the&sequencer

• Hill&climbing&for&each&value&of&b– In/decrease&wait&time&@b&depending&on&feedback

• When&delivering&a&batch&of&size&b– Confirm&previous&decision&if&delivery&time&is&lower– Revert&previous&decision&if&delivery&time&is&higher

Hill&Climbing&in&STOB

• But&limited&expressiveness:– Self3tuning&at&the&cost&of&no&predictability

0

4000

8000

12000

16000

20000

20 40 60 80 100 120 140

Time (sec)

Arr

iva

l Ra

te (

Msg

s/se

c)

0

2

4

6

8

10

12

14

16

18

20

Ba

tch

Le

vel

Transactional&NoSQL store&case&study

• Distributed&transactional&data&store– Nodes&maintain&elements&of&a&dataset• Full&vs partial&replication&(#&copies&per&item)

– Transactional&33ACI(D)– manipulation&of&data• Concurrency&control&scheme&(enforce&isolation)• Replication&protocol&(disseminate&modifications)

Replication&protocols:&which&one?transactional&data

consistency&protocols

Single&master&(primary3backup)

Multi&master&

2PC3basedTotal&order3based

State&machine& replicationCertification

Non3votingBFC

Voting

DSTM&Performance

0

1000

2000

3000

4000

5000

6000

2 3 4 5 6 7 8 9 10

Com

mitt

ed T

ransa

ctio

ns/

sec

Number of nodes

RG - Small RG - Large TPC-C

• Heterogeneous,&nonlinear&scalability&trends!

Factors&limiting&scalability

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2 3 4 5 6 7 8 9 10

Co

mm

it P

rob

ab

ility

Number of nodes


500

1000

1500

2000

2500

3000

3500

4000

4500

5000

5500

2 3 4 5 6 7 8 9 10

Ne

two

rk R

TT

La

ten

cy (

mic

rose

c)

Number of nodes


Aborted&transactions&because&of&conflicts

Network&latency&in&commit&phase

White&box&modeling

Simulator&[21]

• Assumptions&and&approximations– CPU&=&G/M/K– Fixed&point&to&point&network&latency

• Accuracy&/&resolution&time&trade3off

Black&box&modeling&

• MorphR [20]– Automatic&switching&among&replication&protocols

• Decision&tree&classifier&(C5.0)• Workload&characterization– Xact mix,&#ops,&throughput,&abort&rate

• Physical&resource&usage– CPU,&memory,&commit&latency

• Output:&optimal&replication&protocol

MorphR in&action

0 100 200 300 400 500 600 700 800 900

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Thro

ughp

ut (c

omm

itted

tx/s

ec)

Time (minutes)

TW1 TW2 TW3

PB 2PC TOB MorphR

Gray&box&modeling

• Combine&WB&and&BB&modeling– Lower&&training&time&thx&to&WBM– Incremental&learning&thx&to&BBM

• Techniques&in&this&tutorial– Divide&et&impera– Bootstrapping– Hybrid&ensembling

Gray&box&modeling


Divide&et&impera

• Modular&approach–WBM&of&what&is&observable/easy&to&model– BBM&of&what&is&un3observable&or&too&complex

• Reconcile&their&output&in&a&single&function

• Higher&accuracy& in&extrapolation&thx&to&WBM• Apply&BBM&only&to&sub3problem– Less&features,&lower&training&time

NoSQL optimization&in&the&Cloud

• Important&to&model&network3bound&ops&but…• Cloud&hides&detail&about&network&!– No&topology&info– No&service&demand&info– Additional&overhead&of&virtualization&layer

• BBM&of&network3bound&ops&performance– Train&ML&on&the&target&platform

TAS/PROMPT&[28,30]• Analytical&modeling– Concurrency&control&scheme&

• E.g.,&encounter& time&vs commit&time&locking– Replication&protocol&

• E.g.,&PB&vs 2PC– Replication&scheme&

• Partial&vs full– CPU

• Machine&Learning– Network&bound&op&(prepare,&remote&gets)– Decision&tree&regressor

Analytical&model&in&TAS/PROMPT

• Concurrency&control&scheme&(lock3based)– A&lock&is&a&M/G/1&server– Conflict&prob =&utilization&of&the&server

• Replication&protocol– 2PC:&all&nodes&are&similar&" one&model– PR:&primary&vs backups&" two&models

• Replication&scheme– Probability&of&accessing&remote&data– #&nodes&involved&in&commit

Machine&Learning&in&TAS/PROMPT

• Decision&tree&regressor• Operation3specific&models– Latency&during&prepare– Latency&to&retrieve&remote&data

• Input– Operations&rate&(prepare,&commit,&remote&get…)– Size&of&messages– #&nodes&involved&in&commit

ML&accuracy&for&network&bound&ops

0

5000

10000

15000

20000

25000

30000

35000

0 5000 10000 15000 20000 25000 30000 35000

Pre

dic

ted T

pre

p(µ

sec)

Real Tprep(µsec)

EC2

500

1000

1500

2000

2500

3000

3500

4000

500 1000 1500 2000 2500 3000 3500 4000

Pre

dic

ted T

pre

p(µ

sec)

Real Tprep(µsec)

Private Cluster

• Seamlessly&portable&across&infrastructures&– Here,&private&cloud&and&Amazon&EC2

AM&and&ML&coupling

• At&training&time,&all&features&are&monitorable• At&query&time&they&are&NOT!

EXAMPLE• Current&config:&5&nodes,&full&replication– Contact&all&5&nodes&at&commit

• Query&config:&10&nodes,&partial&replication– How&many&contacted&nodes&at&commit??

• AM&can&provide&(estimates&of)&missing&input• Iterative&coupling&scheme

ML&takes some&input&parameters from AM

AM&takes&latencies&forecast&by&ML&as&input&parameter

Model&resolution

Model’s&accuracy

0.5

0.6

0.7

0.8

0.9

1

2 4 6 8 10 12 14 16 18 20

Co

mm

it P

rob

ab

ility

Number of nodes

EC2

TPCC-W-RealTPCC-W-Pred

TPCC-R-RealTPCC-R-Pred

0

500

1000

1500

2000

2500

3000

3500

4000

2 4 6 8 10 12 14 16 18 20

Thro

ughput (t

x/se

c)

Number of nodes

EC2



0.5

0.6

0.7

0.8

0.9

1

2 3 4 5 6 7 8 9 10

Co

mm

it P

rob

ab

ility

Number of nodes

PB, write transactions only



0

500

1000

1500

2000

2500

3000

3500

4000

4500

2 3 4 5 6 7 8 9 10

Thro

ughput (t

x/se

c)

Number of nodes




0

500

1000

1500

2000

2500

3000

3500

4000

4500

2 3 4 5 6 7 8 9 10

Thro

ughput (t

x/se

c)

Number of nodes




TOP:&PB,&only&master&node.&&BOTTOM:&2PC.&&&&FULL&REPL.

COMPARISON&WITH&PURE&BLACK,&I

• YCSB&(transactified)&workloads&while&varying&– #&operations/tx– Transactional&mix– Scale– Replication&degree

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 20 40 60 80

Me

an

re

lativ

e e

rro

r

Percentage of additional training set

CubistM5R

SMORegMLP

PROMPTDivide et Impera

COMPARISON&WITH&PURE&BLACK,&II

• ML&trained&with&TPCC3R&and&queried&for&TPCC3W• Pure&ML&blunders&when&faced&with&new&workloads

0

500

1000

1500

2000

2500

3000

3500

4000

2 4 6 8 10 12 14 16 18 20

Th

rou

gh

pu

t (t

x/se

c)

Number of nodes

real TAS pure ML

TAS/PROMPT&integration

• TAS/PROMPT&are&baseline&AM&for&case&studies

• We&will&use&TAS/PROMPT&as&pure'white'AM– Trained&with&fixed&network&model– i.e.,&we&do&not&retrain&it&as&new&data&are&collected&(But&it&is&possible)

– Representative&of&pure&white&box&models

Gray&box&modeling


BOOTSTRAPPING&[27]

• Obtain&zero3training3time&ML&via&initial&AM1. Initial&(synthetic)& training&set&of&ML&from&AM2. Retrain&periodically&with&“real”&samples

Analytical !model!

Boostrapping"training set!

Machine learning!!

Gray box "model!

Sampling of "the Parameter Space!

Model construction!

Current training set!

Machine learning!

Gray box "model!

New data"come in!

(1)

(2)

How&many&synthetic&samples?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

2000 4000 6000 8000 10000 12000 14000 0.01

0.1

1

10

100

1000

10000

MA

PE

Tra

inin

g t

ime

(se

c, lo

gsc

ale

)

Training set size

KVS-MAPEKVS-time

TOB-MAPETOB-time

• Important&tradeoff– Higher&#&" lower&fitting&error&over&the&AM&output– Lower&#&" higher&density&of&real&samples&in&dataset

How&to&update

• Merge:&simply&add&real&samples&to&synthetic&set

• Replace&only&the&nearest&neighbor&(RNN)

• Replace&neighbors&in&a&given&region&(RNR)– Two&variants

Real&vs AM&function

Real&function

AM&function

• Assuming&enough&point&to&perfectly&learn&AM

Real&vs learnt

Synthetic&sample

ML&function

• Add&real&samples&to&synthetic

Merge

Real&sample

• Problem:&same/near&samples&have&diff.&output

Merge

• Remove&nearest&neighbor

Replace&Nearest&Neighbor&(RNN)

• Preserve&distribution…


• …&but&may&induce&alternating&outputs


• Add&real&and&remove synth.&samples&in&a&radius&

Replace&Nearest&Region&(RNR)

• R&=&radius&defining&neighborhood


R

• R&=&radius&defining&neighborhood


R

• Skew&samples’&distribution


• Replace all&synthetic&samples&in&a&radius&R

Replace&Nearest&Region&2&(RNR2)

R

Replace&Nearest&Region2&(RNN2)

• Maintain&distribution,&piecewise&approximation

Weighting

• Give&more&relevance&to&some&samples

• Fit&better&the&model&around&real samples– “Trust”&real samples&more&than&synthetic&ones– Useful&especially&in&Merge

• Too&high&can&cause&over3fitting!– Learner&too&specialized&only&in&some&regions

1&K&synthetic&samples&&&&&&&&&&&&&&&&10&K&synthetic&samples

• Weighting&more&real&samples&reduces&error

Merge&,&TOB

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

20 40 60 80 100 120 140 160 180 200

MA

PE

Weight

Merge 20%Merge 70%

CUB 20%CUB 70%

AM

0

0.2

0.4

0.6

0.8

1

1.2

20 40 60 80 100 120 140 160 180 200M

AP

EWeight

Merge 20%Merge 70%

CUB 20%CUB 70%

AM

Replace,&KVS

0.2

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

20 40 60 80 100 120 140 160 180 200

MA

PE

Weight

RNR2 0.01RNR2 0.3RNR 0.1

RNR 0.3RNN

AM

CUB

0.1

0.15

0.2

0.25

0.3

0.35

0.4

20 40 60 80 100 120 140 160 180 200

MA

PE

Weight

RNR2 0.01RNR2 0.3RNR 0.01

RNR 0.3RNN

AM

CUB

• Examples&of&over3fitting

70%20%

• Assuming&optimal&parameterization• Merge&and&Replace&seem&*very*&similar…

MERGE&VS&REPLACE&(TOB)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

10 20 30 40 50 60 70 80 90

MA

PE

Additional training %

Merge RNR2 CUB AM

• …&BUT&replace&is&better&if&base&model&is&poor– It&evicts&synthetic&samples&more&aggressively

Impact&of&base&model&(TOB)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

10 20 30 40 50 60 70 80 90

MA

PE

Additional training %

Merge RNR2 CUB AM

Visualizing&the&correction&(STOB)BASE&MODEL PURE&ML&(70%&TS)

BOOTSTRAPPED& ML&(70%&TS)

Visualizing&the&correction&(KVS)BASE&MODEL PURE&ML&(70%&TS)

BOOTSTRAPPED& ML&(70%&TS)

BOOTSTRAPPING&in&RL&[59]

• Optimize&batching&level&in&STOB• Base&AM&already&presented

Hybrid&RL&in&STOB

• UCB:&find&optimal&batch&size&(b*)&for&a&given&msg.&arrival&rate&(m)– Discretize&m&domain&into&M={m_min…m_max}– A&UCB&instance&for&each&m_i– For&each&instance,&a&lever&for&each&b

• Initial&rewards&are&determined&via&AM– Convergence&speed&of&UCB&insufficient&at&high&arr.&:• Enhance&convergence&speed&using&initial&knowledge&of&AM

Bootstrapped&model

• Enhance&response&time&by&better&batching• Faster&convergence&than&UCB&(&&no&thrashing)

Gray&box&modeling


Hybrid&Ensemble&[26]

• Combine&output&of&AM&and&ML

– Hybrid&boosting:&correct&errors&of&single&models

– KNN:&select&best&model&depending&on&query

– Probing:&train&ML&only&where&AM&is&not&accurate

Hybrid&Boosting

• Implements&Logistic&Additive&Regression

• Chain&composed&by&AM&+&cascade&of&ML

• ML1 trained&over&residual&error&of&AM

• MLi,&i>1&&trained&over&residual&error&of&MLi31

• Chain&of&3&BBMs&(&>&3&were&useless&here)– DT,&ANN,&SVM

BOOSTING:&sensitivity

0

0.5

1

1.5

2

2.5

0.2 0.3 0.4 0.5 0.6 0.7 0.8

RM

SE

norm

. w

.r.t.

ΓA

M


Cubist HyBoost

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

20 30 40 50 60 70 80

RM

SE

norm

. w

.r.t.

ΓA

M


Cubist HyBoost

Online&variant&of&HyBoost

• SelfAcorrecting'Transactional'Auto'Scaler (SCATAS)'[28]

• identifying&optimal&level&of&parallelism&in&a&distributed&NoSQL transactional&store– #&nodes&in&the&platforms– #&threads&active&on&each&node

Parallelism&tuning&in&DTM

• Why&not&using&a&simpler&exploration&based&approach,&e.g.&hill3climbing?– Adapting&number&of&threads&per&node&is&simple&and&effective– Changing&#&nodes&is&costly:&state&transfer!

• Model3based&solution– Input:&workload,&#&nodes,&#&threads/node– Output:&throughput– Obtain:&highest3throughput&configuration

Final&Workshop&of&the&Euro3TM&COST&Action#,&Amsterdam,&2015&Jan.&19th

Implemented&solution:&SC3TAS

• Exploration +&&modeling&+&Machine&Learning1. Explore&to&gather&feedback&on&model’s&accuracy2. LEARN corrective&functions&to&“patch”&model&

• Try&to&avoid&global&reconfiguration&(#&nodes)– Rely&on&local&#&threads&exploration&(cheap)

• Increase&accuracy


SC3TAS&control&loop

Collect&&Data

End&Exploration?Explore&locally Invoke&SC3TAS

Update&SC3TASLocal scaling Global scaling

NO YES

• Workload,&#thread,&#nodes,&TAS’&error

SC3TAS&control&loop

Collect&&Data



NO YES

• Re3train&hyboost “patching”&ML

SC3TAS&control&loop

Collect&&Data



NO YES

• Yes&if&min<#thread&exploration<max&&&&&• Accuracy&of&the&patched&model&considered&“OK”

SC3TAS&control&loop

Collect&&Data



NO YES

• Patch&is&not&enough• Change&#&of&threads&and&repeat

SC3TAS&control&loop

Collect&&Data



NO YES

• Model&is&supposedly patched• Invoke&the&patched&model&

Dynamics&of&SC3TAS

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Ab

so

lute

Re

lative

Err

or

Number of iterations

µ=3 µ=5 µ=7

Fig. 6: Average of the relative absoluteerror across all scenarios.

0

500

1000

1500

2000

2500

3000

3500

2 3 4 5 6 7 8 9 10

Th

rou

gh

pu

t (t

x/se

c)

Number of nodes

2 th real2 th TAS

2 th SCTAS

4 th real4 th TAS

4 th SCTAS

8 th real8 th TAS

8 th SCTAS

Fig. 7: Forecasts of SCTAS after a fulloptimization cycle (µ = 7).

concerning the prediction’s errors of TAS, can significantly improve its accuracyboth in explored and unexplored scenarios, lowering the average relative abso-lute error across all scenarios from 103% to 5%. These results highlight that theself-correcting capabilities of SCTAS can be beneficial not only to optimize themultiprogramming level of DTM applications, but also in applications (such asQoS/cost driven elastic scaling policies and what-if analysis of the scalability ofDTM applications) requiring to speculate on the performance of the platform indi�erent scale settings.

5 Conclusion

In this paper, we proposed and evaluated algorithms aimed to self-tune themulti-programming level in two radically di�erent types of TM systems: a sharedmemory STM and a distributed STM. We showed that for shared memory a sim-ple exploration-based hill-climbing algorithm can be extremely e�ective, evenwhen faced with challenging workloads. However, in the distributed case, pureexploration-based approaches are no longer a viable option, as testing configu-rations with a di�erent number of nodes requires triggering costly state transferphases. We tackled this problem by introducing a novel, hybrid approach thatcombines performance models and local exploration, in order to achieve the bestof the two methodologies: quick convergence towards the global optimum androbustness to possible inaccuracies of the performance models.

References

1. Herlihy, M., Moss, J.E.B.: Transactional memory: Architectural support for lock-free data structures. In: Proc. of ISCA. (1993)

2. Harris, T., Larus, J.R., Rajwar, R.: Transactional Memory. 2nd edition edn. Syn-thesis Lectures on Computer Architecture. Morgan & Claypool Publisher (2010)

3. Reimer, N., Haenssgen, S., Tichy, W.F.: Dynamically adapting the degree of par-allelism with reflexive programs. In: Proc. of IRREGULAR. (1996)

• μ&=&min&#&of&thread&explorations&per&node

SC3TAS:&before&and&after


0

500

1000

1500

2000

2500

3 4 5 6 7 8 9 10

Th

rou

gh

pu

t (t

x/se

c)

Number of nodes

2 th real2 th pred

4 th real4 th pred

8 th real8 th pred

0

500

1000

1500

2000

2500

3000

3500

2 3 4 5 6 7 8 9 10

Th

rou

gh

pu

t (t

x/se

c)

Number of nodes

2 th real2 th TAS

4 th real4 th TAS

8 th real8 th TAS






Hybrid&KNN

• Split&D&into&D’,&D’’

• Train&ML1…MLN on&D’–ML&can&differ&in&nature,&parameters,&training&set…

• For&a&query&sample&x– Pick&the&K&training&samples&in&D’’&closest&to&x– Find&the&model&with&lowest&error&on&the&K&samples– Use&such&model&to&predict&x

• Low&cut3off&&&&low&%&training&" collapse&to&AM• High&cut3off& &&&high&%&training&" collapse&to&ML

KNN,&sensitivity&(TOB)

0.9

1

1.1

1.2

1.3

1.4

1.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RM

SE

norm

. w

.r.t.

ΓA

M

c

Cubist-20

KNN-20

Cubist-30

KNN-30

0.65

0.7

0.75

0.8

0.85

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RM

SE

norm

. w

.r.t.

ΓA

M

c

Cubist-50

KNN-50

Cubist-80

KNN-80






PROBING• Build&a&ML&model&as&specialized&as&possible&– Use&AM&where&it&is&accurate– Train&ML&only&where&AM&fails

• Differences&with&KNN– In&KNN,&ML&is&trained&on&all&samples:• Here,&only&when&AM&is&found&to&be&inaccurate

– In&KNN,&voting&decides&on&ML&vs AM:• Here,&binary&classifier&determines&in&which&regions&the&AM&is&inaccurate

Probing&at&work

1. DML =&empty&set2. Train&a&classifier:&for&each&x&in&D– If&error&of&AM&on&x&>&cut3off,&map&x&to&ML&and&add&

x&to&DML

– Else&map&x&to&AM3. Train&ML&on&DML

• QUERY&for&input&z– If&classify(z)&=&AM,&return&AM(z);&else&return&ML(z)

• High&cut3off&" collapses&to&AM• Low&cut3off&" collapses&to&ML

Probing&Sensitivity&(KVS)

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

RM

SE

norm

w.r

.t.

ΓA

M

c

Cubist-50

Prob-50

Cubist-80

Prob-80

NFL&strikes&again

• No&one3size3fits3all&hybrid&models&exist!• Choosing&best&hybrid&model&(with&right&parameters)&can&be&cast&to&a&parameter&optimization&problem

0.6

0.8

1

1.2

1.4

1.6

1.8

20 30 50 80

RM

SE

norm

. w

.r.t.

ΓA

M

Percentage of additional training setCubist KNN HyBoost Prob

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

20 30 50 80

RM

SE

no

rm w

.r.t

. Γ

AM

Percentage of additional training setCubist KNN HyBoost Prob

Concluding&remarks

• WBM&and&BBM&often&conceived&as&antithetic• They&can&be&leveraged&on&synergistically&– Increased&predictive&power&thx&to&WBM– Incremental&learning&capabilities&thx&to&BBM

• Gray&box&approaches– Divide&et&impera,&Bootstrapping,&Hybrid&ensembling– Design,&implementation&and&use&cases– Can&deliver&better&accuracy&than&pure&B/W&

THANK&YOU

[email protected]

www.gsd.inesc3id.pt/~didona

Hybrid Machine Learning/Analytical Models forPerformance Prediction: Bibliography

Diego Didona and Paolo RomanoINESC-ID / Instituto Superior Tecnico, Universidade de Lisboa

White box performance modeling: principles, applications and fundamentalresults.[45] [49] [50] [71] [46] [4] [39] [44] [58] [51] [36]Principles of Machine Learning.[8][5][80] [65][66][48][34][9][79][10][70][41][3][77][62] [16][33][54] [55] [47][56][53] [42] [76] [2]ML ensembling, features selection and hyper-parameter optimizations.[32] [12] [74] [6] [69] [40]Application of ML to performance modeling.[13] [60][18][20][15] [59][31][75][38][37][68] [1][82][81] [57]Divide et impera.[30] [43] [28] [22]Bootstrapping.[73] [59] [67] [72] [61] [27]Hybrid ensembling.[26] [25] [14]Case studies: introduction and performance modeling/optimization.[11] [35] [63] [24] [18] [21] [52] [7] [29] [17] [19] [23] [64] [20] [78][83] [84] [85] [86] [87] [73]

1

References

[1] Mert Akdere, Ugur Cetintemel, Matteo Riondato, Eli Upfal, and Stanley B.Zdonik. Learning-based query performance modeling and prediction. InProceedings of the 2012 IEEE 28th International Conference on Data En-gineering, ICDE ’12, pages 390–401, Washington, DC, USA, 2012. IEEEComputer Society.

[2] Naomi S Altman. An introduction to kernel and nearest-neighbor nonpara-metric regression. The American Statistician, 46(3):175–185, 1992.

[3] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis ofthe multiarmed bandit problem. Machine Learning, 2002.

[4] Forest Baskett, K. Mani Chandy, Richard R. Muntz, and Fernando G. Pala-cios. Open, closed, and mixed networks of queues with different classes ofcustomers. J. ACM, 22(2):248–260, April 1975.

[5] Richard Bellman. Dynamic Programming. Princeton University Press,Princeton, NJ, 1957.

[6] James Bergstra and Yoshua Bengio. Random search for hyper-parameter op-timization. J. Mach. Learn. Res., 13(1):281–305, February 2012.

[7] Philip A Bernstein and Nathan Goodman. Concurrency control in distributeddatabase systems. ACM Computing Surveys (CSUR), 13(2):185–221, 1981.

[8] Christopher M. Bishop. Pattern Recognition and Machine Learning (Infor-mation Science and Statistics). Springer-Verlag New York, Inc., 2006.

[9] Leo Breiman. Bagging predictors. Mach. Learn., 24(2):123–140, August1996.

[10] Leo Breiman. Stacked regressions. Machine learning, 24(1):49–64, 1996.

[11] Christian Cachin, Rachid Guerraoui, and Luıs Rodrigues. Introduction toReliable and Secure Distributed Programming (2. ed.). Springer, 2011.

[12] Rich Caruana , Alexandru Niculescu-Mizil , Geoff Crew , Alex Ksikes En-semble selection from libraries of models. In Proc. of ICML, 2004.

[13] Jin Chen, G. Soundararajan, and C. Amza. Autonomic provisioning of back-end databases in dynamic content web servers. In Proceedings of the 2006IEEE International Conference on Autonomic Computing, ICAC ’06, pages231–242, Washington, DC, USA, 2006. IEEE Computer Society.

2

[14] Jin Chen, G. Soundararajan, S. Ghanbari, and C. Amza. Model ensembletools for self-management in data centers. In Data Engineering Workshops(ICDEW), 2013 IEEE 29th International Conference on, pages 36–43, April2013.

[15] Tianshi Chen, Qi Guo, Ke Tang, Olivier Temam, Zhiwei Xu, Zhi-Hua Zhou,and Yunji Chen. Archranker: A ranking approach to design space exploration.SIGARCH Comput. Archit. News, 42(3):85–96, June 2014.

[16] Corinna Cortes and Vladimir Vapnik. Support-vector networks. MachineLearning, 1995.

[17] Maria Couceiro, Diego Didona, Lus Rodrigues, and Paolo Romano. Self-tuning in distributed transactional memory. In Rachid Guerraoui and PaoloRomano, editors, Transactional Memory. Foundations, Algorithms, Tools,and Applications, volume 8913 of Lecture Notes in Computer Science, pages418–448. Springer International Publishing, 2015.

[18] Maria Couceiro, Paolo Romano, and Luis Rodrigues. A machine learningapproach to performance prediction of total order broadcast protocols. In Self-Adaptive and Self-Organizing Systems (SASO), 2010 4th IEEE InternationalConference on, pages 184–193. IEEE, 2010.

[19] Maria Couceiro, Paolo Romano, and Luis Rodrigues. Polycert: Polymorphicself-optimizing replication for in-memory transactional grids. In Proceed-ings of the 12th ACM/IFIP/USENIX International Conference on Middle-ware, Middleware’11, pages 309–328, Berlin, Heidelberg, 2011. Springer-Verlag.

[20] Maria Couceiro, Pedro Ruivo, Paolo Romano, and Luis Rodrigues. Chas-ing the optimum in replicated in-memory transactional platforms via protocoladaptation. In Proceedings of the 2013 43rd Annual IEEE/IFIP InternationalConference on Dependable Systems and Networks (DSN), DSN ’13, pages1–12, Washington, DC, USA, 2013. IEEE Computer Society.

[21] Pierangelo Di Sanzo, Francesco Antonacci, Bruno Ciciani, RobertoPalmieri, Alessandro Pellegrini, Sebastiano Peluso, Francesco Quaglia,Diego Rughetti, and Roberto Vitali. A framework for high performance simu-lation of transactional data grid platforms. In Proceedings of the 6th Interna-tional ICST Conference on Simulation Tools and Techniques, SimuTools ’13,pages 63–72, ICST, Brussels, Belgium, Belgium, 2013. ICST (Institute forComputer Sciences, Social-Informatics and Telecommunications Engineer-ing).

3

[22] Pierangelo Di Sanzo, Francesco Quaglia, Bruno Ciciani, Alessandro Pel-legrini, Diego Didona, Paolo Romano, Roberto Palmieri, and SebastianoPeluso. A flexible framework for accurate simulation of cloud in-memorydata stores. ArXiv e-prints, December 2014.

[23] Pierangelo Di Sanzo, Diego Rughetti, Bruno Ciciani, and Francesco Quaglia.Auto-tuning of cloud-based in-memory transactional data grids via machinelearning. In Proceedings of the 2012 Second Symposium on Network CloudComputing and Applications, NCCA ’12, pages 9–16, Washington, DC,USA, 2012. IEEE Computer Society.

[24] Diego Didona, Daniele Carnevale, Sergio Galeani, and Paolo Romano. Anextremum seeking algorithm for message batching in total order protocols. InSASO, pages 89–98. IEEE Computer Society, 2012.

[25] Diego Didona, Pascal Felber, Derin Harmanci, Paolo Romano, and JoergSchenker. Identifying the optimal level of parallelism in transactional mem-ory applications. Computing Journal, pages 1–21, December 2013.

[26] Diego Didona, Francesco Quaglia, Paolo Romano, and Ennio Torre. Enhanc-ing performance prediction robustness by combining analytical modeling andmachine learning. In Proceedings of the 2015 ACM/SPEC 6th InternationalConference on Performance Engineering (ICPE 2015), ICPE ’15, 2015.

[27] Diego Didona and Paolo Romano. On Bootstrapping Machine Learning Per-formance Predictors via Analytical Models. ArXiv e-prints, October 2014.

[28] Diego Didona and Paolo Romano. Performance modelling of partially repli-cated in-memory transactional stores. In Proceedings of the 22nd IEEE Inter-national Symposium on Modeling, Analysis and Simulation of Computer andTelecommunication Systems (MASCOTS 2014), MASCOTS ’14, 2014.

[29] Diego Didona and Paolo Romano. Self-tuning transactional data grids: Thecloud-tm approach. In Proceedings of the Symposium on Network CloudComputing and Applications, (NCCA), pages 113–120. IEEE, 2014.

[30] Diego Didona, Paolo Romano, Sebastiano Peluso, and Francesco Quaglia.Transactional auto scaler: Elastic scaling of replicated in-memory transac-tional data grids. ACM Trans. Auton. Adapt. Syst., 9(2):11:1–11:32, July2014.

[31] Nuno Diegues and Paolo Romano. Self-tuning intel transactional synchro-nization extensions. In 11th International Conference on Autonomic Com-

4

puting (ICAC 14), pages 209–219, Philadelphia, PA, June 2014. USENIXAssociation.

[32] Thomas G. Dietterich. Ensemble methods in machine learning. In Proc. ofMCS Workshop, 2000.

[33] Harris Drucker, Chris, Burges* L. Kaufman, Alex Smola, and Vladimir Vap-nik. Support vector regression machines. In Advances in Neural InformationProcessing Systems 9, volume 9, pages 155–161, 1997.

[34] Jerome H. Friedman. Stochastic gradient boosting. Comput. Stat. Data Anal.,38(4):367–378, February 2002.

[35] Toy Friedman and Robbert Van Renesse. Packing messages as a tool forboosting the performance of total ordering protocls. In Proceedings of the6th IEEE International Symposium on High Performance Distributed Com-puting, HPDC ’97, pages 233–, Washington, DC, USA, 1997. IEEE Com-puter Society.

[36] Richard M. Fujimoto. Parallel discrete event simulation. Commun. ACM,33(10):30–53, October 1990.

[37] Archana Ganapathi, Harumi Kuno, Umeshwar Dayal, Janet L. Wiener, Ar-mando Fox, Michael Jordan, and David Patterson. Predicting multiple metricsfor queries: Better decisions enabled by machine learning. In Proceedings ofthe 2009 IEEE International Conference on Data Engineering, ICDE ’09,pages 592–603, Washington, DC, USA, 2009. IEEE Computer Society.

[38] Saeed Ghanbari, Gokul Soundararajan, Jin Chen, and Cristiana Amza. Adap-tive learning of metric correlations for temperature-aware database provision-ing. In Proceedings of the Fourth International Conference on AutonomicComputing, ICAC ’07, pages 26–, Washington, DC, USA, 2007. IEEE Com-puter Society.

[39] Donald Gross, John F Shortle, James M Thompson, and Carl M Harris. Fun-damentals of queueing theory. John Wiley & Sons, 2013.

[40] Isabelle Guyon and Andre Elisseeff. An introduction to variable and featureselection. The Journal of Machine Learning Research, 3:1157–1182, 2003.

[41] Martin T. Hagan, Howard B. Demuth, and Mark Beale. Neural NetworkDesign. PWS Publishing Co., Boston, MA, USA, 1996.

5

[42] Mark Hall et al. The weka data mining software: An update. SIGKDD Explor.Newsl., 11(1):10–18, November 2009.

[43] Herodotos Herodotou, Fei Dong, and Shivnath Babu. No one (cluster) sizefits all: automatic cluster sizing for data-intensive analytics. In Proc. of theACM Symposium on Cloud Computing (SOCC), 2011.

[44] James R Jackson. Networks of waiting lines. Operations research, 5(4):518–521, 1957.

[45] Leonard Kleinrock. Queueing Systems, volume I: Theory. Wiley Interscience,1975.

[46] John DC Little. A proof for the queuing formula: L= l w. Operationsresearch, 9(3):383–387, 1961.

[47] Wei-Yin Loh. Classification and regression trees. Wiley InterdisciplinaryReviews: Data Mining and Knowledge Discovery, 1(1):14–23, 2011.

[48] Philip M. Long and Rocco A. Servedio. Random classification noise defeatsall convex potential boosters. Mach. Learn., 78(3):287–304, March 2010.

[49] Daniel A. Menasce and Virgilio Almeida. Capacity Planning for Web Ser-vices: Metrics, Models, and Methods. Prentice Hall PTR, Upper SaddleRiver, NJ, USA, 1st edition, 2001.

[50] Daniel A. Menasce, Lawrence W. Dowdy, and Virgilio A. F. Almeida. Perfor-mance by Design: Computer Capacity Planning By Example. Prentice HallPTR, Upper Saddle River, NJ, USA, 2004.

[51] T. Murata. Petri nets: Properties, analysis and applications. Proceedings ofthe IEEE, 77(4):541–580, Apr 1989.

[52] Matthias Nicola and Matthias Jarke. Performance modeling of distributedand replicated databases. IEEE Trans. on Knowl. and Data Eng., 2000.

[53] J. R. Quinlan. Learning with continuous classes. In Proceedings of the 5thAustralian Joint Conference on Artificial Intelligence (AI), pages 343–348.World Scientific, 1992.

[54] J. R. Quinlan. Improved use of continuous attributes in c4.5. J. Artif. Int.Res., 4(1):77–90, March 1996.

[55] J. R. Quinlan. Learning decision tree classifiers. ACM Comput. Surv.,28(1):71–72, March 1996.

6

[56] J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan KaufmannPublishers Inc., 1993.

[57] Jia Rao, Xiangping Bu, Cheng-Zhong Xu, Leyi Wang, and George Yin.Vconf: A reinforcement learning approach to virtual machines auto-configuration. In Proceedings of the 6th International Conference on Au-tonomic Computing, ICAC ’09, pages 137–146, New York, NY, USA, 2009.ACM.

[58] M. Reiser and S. S. Lavenberg. Mean-value analysis of closed multichainqueuing networks. J. ACM, 27(2):313–322, April 1980.

[59] Paolo Romano and Matteo Leonetti. Self-tuning batching in total order broad-cast protocols via analytical modelling and reinforcement learning. In In-ternational Conference on Computing, Networking and Communications.,ICNC, 2011.

[60] Diego Rughetti, Pierangelo Di Sanzo, Bruno Ciciani, and Francesco Quaglia.Machine learning-based self-adjusting concurrency in software transactionalmemory systems. In Proc. of the International Symposium on Model-ing, Analysis and Simulation of Computer and Telecommunication Systems,MASCOTS, 2012.

[61] Diego Rughetti, Pierangelo Di Sanzo, Bruno Ciciani, and Francesco Quaglia.Analytical/ml mixed approach for concurrency regulation in software trans-actional memory. In IEEE/ACM International Symposium on Cluster, Cloudand Grid Computing, CCGRID, 2014.

[62] G. A. Rummery and M. Niranjan. On-line Q-learning using connectionistsystems. Technical Report TR 166, Cambridge University Engineering De-partment, Cambridge, England, 1994.

[63] Nuno Santos and Andre Schiper. Optimizing paxos with batching and pipelin-ing. Theor. Comput. Sci., 496:170–183, July 2013.

[64] Pierangelo Di Sanzo, Francesco Molfese, Diego Rughetti, and Bruno Ciciani.Providing transaction class-based qos in in-memory data grids via machinelearning. In Proceedings of the 2014 IEEE 3rd Symposium on Network CloudComputing and Applications (Ncca 2014), NCCA ’14, pages 46–53, Wash-ington, DC, USA, 2014. IEEE Computer Society.

[65] Robert E. Schapire. The strength of weak learnability. Mach. Learn.,5(2):197–227, July 1990.

7

[66] Robert E. Schapire. A brief introduction to boosting. In Proceedings ofthe 16th International Joint Conference on Artificial Intelligence - Volume 2,IJCAI’99, pages 1401–1406, San Francisco, CA, USA, 1999. Morgan Kauf-mann Publishers Inc.

[67] Bianca Schroeder, Mor Harchol-Balter, Arun Iyengar, Erich Nahum, andAdam Wierman. How to determine a good multi-programming level forexternal scheduling. In Proc. of the International Conference on Data En-gineering, ICDE, 2006.

[68] Karan Singh, Engin Ipek, Sally A. McKee, Bronis R. de Supinski, MartinSchulz, and Rich Caruana. Predicting parallel application performance viamachine learning approaches: Research articles. Concurr. Comput. : Pract.Exper., 19(17):2219–2235, December 2007.

[69] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian opti-mization of machine learning algorithms. In Advances in Neural InformationProcessing Systems, pages 2951–2959, 2012.

[70] Richard S. Sutton and Andrew G. Barto. Introduction to ReinforcementLearning. MIT Press, Cambridge, MA, USA, 1st edition, 1998.

[71] Y. C. Tay. Analytical Performance Modeling for Computer Systems, SecondEdition. Morgan & Claypool Publishers, 2013.

[72] Gerald Tesauro, Nicholas K. Jong, Rajarshi Das, and Mohamed N. Bennani.On the use of hybrid reinforcement learning for autonomic resource alloca-tion. Cluster Computing, 2007.

[73] Eno Thereska and Gregory R. Ganger. Ironmodel: Robust performance mod-els in the wild. SIGMETRICS Perform. Eval. Rev., 36, June 2008.

[74] Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown.Auto-weka: Combined selection and hyperparameter optimization of classi-fication algorithms. In Proceedings of the 19th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, KDD ’13, pages 847–855, New York, NY, USA, 2013. ACM.

[75] D. Tsoumakos, I. Konstantinou, C. Boumpouka, S. Sioutas, and N. Koziris.Automated, elastic resource provisioning for nosql clusters using tiramola. InCluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM Inter-national Symposium on, pages 34–41, May 2013.

8

[76] Laurens JP van der Maaten, Eric O Postma, and H Jaap van den Herik. Di-mensionality reduction: A comparative review. Technical Report TiCC-TR2009-005, Tilburg University, 2009.

[77] Christopher John Cornish Hellaby Watkins. Learning from delayed rewards.PhD thesis, University of Cambridge, 1989.

[78] Pawel T. Wojciechowski, Tadeusz Kobus, and Maciej Kokocinski. Model-driven comparison of state-machine-based and deferred-update replicationschemes. In Proceedings of the 2012 IEEE 31st Symposium on Reliable Dis-tributed Systems, SRDS ’12, pages 101–110, Washington, DC, USA, 2012.IEEE Computer Society.

[79] David H. Wolpert. Original contribution: Stacked generalization. NeuralNetw., 5(2):241–259, February 1992.

[80] David H. Wolpert. The lack of a priori distinctions between learning algo-rithms. Neural Comput., 8(7):1341–1390, October 1996.

[81] Pengcheng Xiong, Yun Chi, Shenghuo Zhu, Hyun Jin Moon, Calton Pu,and Hakan Hacigumus. Intelligent management of virtualized resources fordatabase systems in cloud environment. In Proceedings of the 2011 IEEE27th International Conference on Data Engineering, ICDE ’11, pages 87–98, Washington, DC, USA, 2011. IEEE Computer Society.

[82] Pengcheng Xiong, Yun Chi, Shenghuo Zhu, Junichi Tatemura, Calton Pu, andHakan HacigumuS. Activesla: A profit-oriented admission control frame-work for database-as-a-service providers. In Proceedings of the 2Nd ACMSymposium on Cloud Computing, SOCC ’11, pages 15:1–15:14, New York,NY, USA, 2011. ACM.

[83] Steve Zhang, Ira Cohen, Julie Symons, and Armando Fox. Ensembles ofmodels for automated diagnosis of system performance problems. In Pro-ceedings of the 2005 International Conference on Dependable Systems andNetworks, DSN ’05, pages 644–653, Washington, DC, USA, 2005. IEEEComputer Society.

[84] Ludmila Cherkasova, Kivanc Ozonat, Ningfang Mi, Julie Symons, and Ev-genia Smirni. Automated anomaly detection and performance modeling ofenterprise applications. ACM Trans. Comput. Syst., 27(3):6:1–6:32, Novem-ber 2009.

9

[85] Yongmin Tan, Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Chitra Venkatra-mani, and Deepak Rajan. Prepare: Predictive performance anomaly preven-tion for virtualized cloud systems. In Proceedings of the 2012 IEEE 32NdInternational Conference on Distributed Computing Systems, ICDCS ’12,pages 285–294, Washington, DC, USA, 2012. IEEE Computer Society.

[86] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection:A survey. ACM Computing Surveys (CSUR), 41(3):15, 2009.

[87] Ira Cohen, Moises Goldszmidt, Terence Kelly, Julie Symons, and Jeffrey S.Chase. Correlating instrumentation data to system states: A building blockfor automated diagnosis and control. In Proceedings of the 6th Conferenceon Symposium on Opearting Systems Design & Implementation - Volume 6,OSDI’04, pages 16–16, Berkeley, CA, USA, 2004. USENIX Association.

10

Documents

Hybrid'Machine' Learning/Analytical'Models'for ...romanop/files/papers/tutorial-icpe15.pdf · STOB&white&box&modeling • Focus&on&performance&on&sequencer • It&is&representative&of&the&whole&system