CMG-T 2011 Page 1
Modeling & ForecastingMichael A. Salsburg, Ph.D.
CMG-T 2011 Modeling & Forecasting Slide 2
Summary of Three Sessions
• Session 1– Basic definitions– History Queueing Network Modeling– Fundamental queueing model concepts
• Session 2– Characterizing workloads using basic distributions– Analytical Queueing Models– Basics of Simulation Modeling
• Session 3– A Simulation model example– Forecasting
CMG-T 2011 Page 3
Background
CMG-T 2011 Slide 4
What is a Model?
• A model is an abstraction of a complex system that is constructed to observe phenomena that cannot be easily observed with the original system
Modeling & Forecasting
CMG-T 2011 Slide 5
Computer Performance Model
• An Essential Building Block
ServerQueue
DeparturesArrivals
Question – what is the nature of the arrival and departure pattern?
Modeling & Forecasting
CMG-T 2011 Slide 6
Why is Queueing So Important?
Modeling & Forecasting
CMG-T 2011 Slide 7
One versus Two Queues
teller
customer
Two Queues One Queue
Modeling & Forecasting
CMG-T 2011 Slide 8
The Workload• Transaction-Oriented System (single server)
– Average Transaction Arrival Rate and distribution– For Each Transaction:
• Average CPU Time and distribution• Average number of overall storage I/Os• For Each Storage Device:
Average number of I/OsAverage % of Seeks
• ExampleChecking transactions arrive at a rate of 2 / second. Each
transaction uses, on average, 250 ms of CPU and requests, on average, 30 I/Os to disk “C:” For Disk C, the average transfer length is 16,000 bytes. 50% of the I/Os require a seek.
Question – have we measured and analyzed so that we can specify a workload in this manner?
Average Seek TimeAverage Transfer Length
Modeling & Forecasting
CMG-T 2011 Slide 9
The Configuration
• A Simple Configuration– On CPU Server with a “C” and “D” drive.
2 GHz Intel
Seagate ST3160023ARBarracuda (7200 RPM 160 GB Ultra ATA)
C
D
Modeling & Forecasting
CMG-T 2011 Slide 10
Modeling “What-If” Scenarios
• Performance models are constructed using a configuration and a workload.
• What-If Scenarios have two main areas of interest:– What if the workload changes?– What if the configuration changes?
• Examples:– Given the workload configuration, what if the transaction rate
increases by 50%?– What if we replace Disk “C” with an in-memory disk?– What if we decide to use a virtual server?
Question – can we accurately predict the effects of these changes?
CMG-T 2011 Slide 11
Let’s go to the “Wayback Machine”Erlang
• He was a member of the Danish Mathematicians' Association through which he made contact with other mathematicians including members of the Copenhagen Telephone Company. He went to work for this company in 1908 as scientific collaborator and later as head of its laboratory. Erlang at once started to work on applying the theory of probabilities to problems of telephone traffic and in 1909 published his first work on it "The Theory of Probabilities and Telephone Conversations“ proving that telephone calls distributed at random follow Poisson's law of distribution. At the beginning he had no laboratory staff to help him, so he had to carry out all the measurements of stray currents. He was often to be seen in the streets of Copenhagen, accompanied by a workman carrying a ladder, which was used to climb down into manholes.
Modeling & Forecasting
CMG-T 2011 Slide 12
Let’s go to the “Wayback Machine”Feller
• He was the foremost probabilist outside of Russia. In the middle of the 20th century, probability was not generally viewed as a fruitful area of research in mathematics except in Russia, where Kolmogorov and others were influential. Feller contributed to the study of the relationship between Markov chains and differential equations. He wrote a two-volume treatise on probability that has since been universally regarded as one of the most important treatments of that subject.
• Feller provided the proof of the Central Limit Theorem – a limiting distribution (or LAW)
Modeling & Forecasting
CMG-T 2011 Slide 13
Let’s go to the “Wayback Machine”Kleinrock
• “Queueing Systems “ (1975) - is the bible for knowledge on queueing theory
• Dr. Leonard Kleinrock created the basic principles of packet switching, the technology underpinning the Internet, while a graduate student at MIT. In this effort, he developed the mathematical theory of data networks. This was a decade before the birth of the Internet which occurred when his host computer at UCLA became the first node of the Internet in September 1969.
• He was listed by the Los Angeles Times in 1999 as among the "50 People Who Most Influenced Business This Century".
Modeling & Forecasting
CMG-T 2011 Slide 14
Let’s go to the “Wayback Machine”Arnold Allen
• IBM Systems Science Institute Instructor (1978)
• Invaluable in its clarity and simplicity of exposition
• Former CMG Member
CMG-T 2011 Slide 15
Let’s go to the “Wayback Machine”Computer Performance Models
• 1957 – Jackson publishes an analysis of a multiple device system with parallel servers and jobs
• 1967 – Gordon & Newell simplified the models for “closed systems”
• 1967 – Scherr – used “Machine Repairman” problem to model an MIT Time sharing system
• 1971 – Buzen – introduces the “Central Server Model” and fast computational algorithms
Modeling & Forecasting
CMG-T 2011 Page 16
Analytic Queueing Models
CMG-T 2011 Slide 17
Open and Closed Networks
• Open Networks have a population that can be theoretically infinite
ServerQueue
Modeling & Forecasting
CMG-T 2011 Slide 18
Machine Repairman Model
K = 7 = machinesC = 2 = repairmen
• Closed Networks have a finite population
Modeling & Forecasting
CMG-T 2011 Slide 19
Open and Closed Networks
• Scherr – used machine repairman problem to model a time-shared system
• Original Time-Sharing systems• Batch workloads• Multi-Threaded Applications
Computer System
Max population - 2
“Think Time”
Modeling & Forecasting
CMG-T 2011 Slide 20
Kendall Notation
• A/B/c/k• A/ & B/ are:
• GI – general independent interarrival time• G general service time distribution• Hk – k-state hyper exponential interarrival of service time
distribution• Ek – Erlang-k interarrival or service time distribution• M – Exponential interarrival time or service time distribution• D – Deterministic (constant) interarrival or service time distribution
• /c – number of servers• /k – population count
Modeling & Forecasting
CMG-T 2011 Slide 21
Kendall Notation
• M/M/1
• M/M/2/7
Modeling & Forecasting
CMG-T 2011 Slide 22
Essential Statistics
E[a] expected arrival rate (λ)E[s] expected service time (1/μ)E[w] expected wait timeLq length of queueL average number of “customers” in system (queue + service) Wq waiting time in queueW waiting time in the system (Wq +E[s])
Modeling & Forecasting
CMG-T 2011 Slide 23
Little’s Law• The average number of objects in a steady-state queue/system is
the product of the arrival rate and the average time spent in the queue/system.
• In a “steady state” queueing system (completion rate = arrival rate):
qq WaEL
WaEL
][
][
Example – iostats (Unix/Linux) and Windows performance monitorprovide the average number of I/Os in the Disk queue and the arrival rate
Using Little’s Law, you can estimate the average waiting time in the queueNote – this is independent of distribution
Modeling & Forecasting
CMG-T 2011 Slide 24
Discrete Distribution
• Uniform
nxf
1)(
for x = k1, k2, … kn
nxxF )(
][)( xXPxF ][)( xXPxf p.d.f c.d.f
6
1
6
1
5.36
1)(][
ii
iiFXE
Modeling & Forecasting
CMG-T 2011 Slide 25
Continuous Distribution
• Uniform
1
0
)(ab
axxF
for x < a
for a ≤ x < b
for x ≥ b
][)( xXPxF ][)( xXPxf
2)(][
abdFXdxxfxxE
xb
ax
xb
ax
0
1)( abxf
for a < x < b
otherwise
Modeling & Forecasting
CMG-T 2011 Slide 26
Continuous Distributions• The “Normal” or “Gaussian” Distribution or “Law”
– This is a “limiting distribution”
][)( xXPxF ][)( xXPxf ][xE
Modeling & Forecasting
CMG-T 2011 Slide 27
Continuous Distributions
• Exponential
0)(
xexf
for x > 0
for x ≤ 0
0
1)(
xexF
for x > 0
for x ≤ 0
][xEModeling & Forecasting
CMG-T 2011 Slide 28
The Memoryless Property
)()|)( sXPtXtsXP for s, t > 0
)(
)(
)(
)(
),()|)(
)(
sXP
e
e
e
tXP
tsXP
tXP
tXtsXPtXtsXP
s
t
ts
Only the exponential distribution has the memoryless property:The probability of an occurrence after time s is equal to the probabilityof an occurrence at time s+t
Modeling & Forecasting
CMG-T 2011 Slide 29
The Inspection Paradox
• (Feller) Taxis pass a specific corner with an average time of 20 minutes between taxis. You walk up at a random time. How long will you wait for the taxi?
• Answer – 20 minutes
Modeling & Forecasting
CMG-T 2011 Slide 30
The Poisson Process
• Let (N(t), t ≥ o} be a Poisson process with rate
• The random variable Y describing the number of events in any time interval of length t > 0 has a Poisson distribution with parameter
• Key attribute – the interarrival times are exponentially distributed
• Memoryless – the number of arrivals occurring in any bounded interval of time t is independent of the number of arrivals occurring before time t
0
t
!
)(][
k
tekYP
kt K = 0, 1, 2, …
Modeling & Forecasting
CMG-T 2011 Slide 31
Markov Birth-Death ModelsM/M/1
State 0 1 2 n-1 n n+1
λ0 λ1
μ1 μ2 μn μn+1
][][/
...
...
1......
......1
,...3,2;
1...
210
210
21
110
21
10
1
00
11
0210
sEaE
p
npp
pppp
n
n
nnnn
nn
)1/(][/][
)1/(][
)1(][
)1(1
)1/(1
...11
00
32
sELwEW
NEL
nNPp
pSp
S
S
nn
Empty system Two in system
arrival rate
ser vice rate
λ n-1 λn
MEMORYLESS
Applying Little’s Law…
(geometric pdf)
Modeling & Forecasting
CMG-T 2011 Slide 32
Markov Birth-Death ModelsM/M/1
State 0 1 2 n-1 n n+1
λ0 λ1
μ1 μ2 μn μn+1
)1(][
)1/(][
][
sEW
sEW
sEWW
q
q
Empty system Two in system
1 arrival / s
4 completions / s => E[s] = .25s
λ n-1 λn
25.4/1][][/ sEaE
3/25.
333.)1/(][/][
3/175./25.)1/(][
qW
ssELwEW
NEL
Modeling & Forecasting
CMG-T 2011 Slide 33
M/M/2/7
/1][ OE - Avg operation time
/1][ sE - Avg repair time
][ nNPpn
n
cn
n
n
OE
sE
n
K
cc
n
OE
sE
n
K
p
p
)(
][
!
!
)(
][
0
n = 1, 2, …, c
n = c+1, …, K
1
1 00 1
K
k
k
p
pp
K
cnnq pcnL
1
)(
)/(])[][( qqq LKsEOELW
Modeling & Forecasting
CMG-T 2011 Slide 34
Here’s the Short Cuts….
• See Arnold Allen’s Book – Appendix C (either edition)
• Modeling Packages– Integrated with workload monitoring– Translates raw statistics to statistics that can be used for
modeling– Visualizes modeled results
Modeling & Forecasting
CMG-T 2011 Page 35
Back to The REAL Model
When theory is too theoretical
CMG-T 2011 Slide 36
Back to our Model Example
Checking transactions arrive at a rate of 2 / second. Each transaction uses, on average, 250 ms of CPU and requests, on average, 30 I/Os to disk “C:” For Disk C, the average transfer length is 16,000 bytes. 50% of the I/Os require a seek.
2 GHz Intel
Seagate ST3160023ARBarracuda (7200 RPM 160 GB Ultra ATAAvg Seek Time = 8.5 msAvg Transfer Rate = 58 Mb/sAvg Rotation = 60 sec / 7200 revolutions = 8.3 ms/ revolutionAvg Latency = 4.15
C
D
Modeling & Forecasting
CMG-T 2011 Slide 37
Back to our ExampleThe Central Server Model
E[a]=2 / s
E[s]C = Latency + Seek + Transfer
= 4.15+(.5*8.5)+.016/(58/8)
= (4.15 + 4.25 + 2.2)
= 10.6ms
E[a]cpu = 2*30 = 60/s
E[s]cpu = 250 / 30 = 8.333 ms
C
D
100%E[a]E[a]cpu
E[a]C
E[s]cpu
E[s]C
30 I/Os
U = ρ*100
ρ CPU =( 2/ s)*250 ms = 500 ms/s =.5*100
ρ C = 10.6*30*2 = 636ms / 1000ms = .636
Modeling & Forecasting
CMG-T 2011 Slide 38
Back to our ExampleThe Central Server Model
C
D
100%E[a]E[a]cpu
E[a]C
E[s]cpu
E[s]C
30 I/Os
E[s]CPU=8.333
E[s]C=10.6
ρ CPU =( 2/ s)*250 ms = 500 ms/s =.5
ρ C = 636 / 1000 = .636
WCPU=16.66
WC=29.12
Transaction Time = 30*(16.66+29.12) = 1373.6 ~ 1.4 secondsModeling & Forecasting
CMG-T 2011 Slide 39
What Next?
• Validate the Model with real observations– There could be other effects, such as software queueing
• Now you are ready for the fun part – “What IF”?– The transaction rate doubles? (E[a] increases)– The CPU is upgraded to a faster model (E[s]CPU decreases)
Modeling & Forecasting
CMG-T 2011 Slide 40
Fly in Ointment
• For disk I/Os, the random variables are Uniformly distributed, not exponentially distributed
• Solution – M/G/1
22
2
][
][
2
1
1][
sE
sVarC
mean
stddevC
CsEW
s
s
sq
Modeling & Forecasting
CMG-T 2011 Slide 41
Variance and Queueing
• Exponential Distribution
1
1][;
1][
)(
2
2
s
x
C
xVarxE
exf
• Uniform Distribution
3/1
12
)(][;
2][
0
1)(
2
2
sC
abxVar
abxE
abxf
Modeling & Forecasting
CMG-T 2011 Slide 42
Estimated Waiting Time in Queue - M/G/1
1 2 3 4 5 6 7 8 90
1
2
3
4
5
6
7
8
9
10
Exponential
Uniform
M/D/1
ρ
Modeling & Forecasting
CMG-T 2011 Slide 43
Another Fly…
C
D
Poisson Process?
Modeling & Forecasting
CMG-T 2011 Page 44
Simulation Models
CMG-T 2011
Modeling Using Simulation
• Simulation is not a four letter word– In the past, simulation was considered too CPU intensive
• Simulation emerged as a viable alternative in the 1990s– Fast, cheap CPU cycles (SimCity)– Networking protocols violate a number of traditional
assumptions– These protocols are easily specified
• Various packages are available, from inexpensive (roll your own) to highly sophisticated, with animation
• Best reference – Averill Law
Slide 45Modeling & Forecasting
CMG-T 2011
Back to the “Wayback”..
• 1960 - GPSS developed by Geoffrey Gordon, IBMhttp://www.informs-cs.org/wsc97papers/0567.PDF#search=%22simulation%20GPSS%22
• 1961 – Simula I introduced by K. Nygaard & O. Dahl– It used Algol and extensions to express systems with parallel
processes– Simula 67 is considered to be the first object-oriented
language
• 1980 – MacDougall introduces smpl
• 1986 – Schwetman - Introduces CSIM
Slide 46Modeling & Forecasting
CMG-T 2011 Slide 47
Discrete Event Simulation
• Simulated time– Clock progresses from one event to the next – it can skip
individual time steps
• Parallel Processes– The language specifies independent processes that proceed
in time
Time
Q for Server
Service
Complete
A1 A2
S1 S2
Modeling & Forecasting
CMG-T 2011 Slide 48
A CSIM Example
• void generateTransaction()• /* This is a separate process that creates transactions */• {• int i;• double interArrivalTime;
• create("GenTrans");• interArrivalTime = 1000.0/EATransaction;
• for(i = 1; i <= TotalTransactions; i++) {• hold(exponential(interArrivalTime));
ProcessTrans();• }• set(done);
/* signal "done" event */
• }
Modeling & Forecasting
CMG-T 2011 Slide 49
A CSIM Example
• void ProcessTrans()• /* Each time this is called, an independent process is created that • then requests (and competes for) facilities */• {• TIME t1;• double cpuRequestperIO; • long iosThisTrans;• int ioCount;
• /* Initialize Transaction */• create("ProcessTrans");• t1 = simtime();• cpuRequestperIO = EACPU / IOsPerTrans;• iosThisTrans = (int) uniform(0.0,60.0);/* pick number of I/Os from uniform distribution with mean of 30
*/
• /* Iterate for number of I/Os in transaction */
• for (ioCount = 1; ioCount <= iosThisTrans; ioCount++) {• reserve(cpu); /* reserve cpu */• hold(exponential(cpuRequestperIO)); /* hold cpu */• release(cpu); /* release cpu */• SimulateIO(); /* simluate an I/O */• }• tabulate(totalTransTime, simtime()-t1); /* elapsed time for the transaction */• } /* end of ProcessTrans */
Modeling & Forecasting
CMG-T 2011 Slide 50
A CSIM Example
• void SimulateIO()
• /* simulate a disk I/O on the appropriate disk */• {• double selectDisk;• selectDisk = uniform(0,1);• if (selectDisk <= (ProbDiskC/100.0))• SimulateDiskC();• else• SimulateDiskD();• } /* end of SimulateIO */
Modeling & Forecasting
CMG-T 2011 Slide 51
A CSIM Example
• void SimulateDiskC()
• {• double tempVal;• reserve(diskc);
• tempVal = uniform(0,1)*DiskCFullRot; • hold(tempVal); /* hold for latency */
• tempVal = uniform(0,1); /* determine probability of SEEK */• if (tempVal <= (ProbSeekDiskC/100.0)) /* if Seek, hold for Seek Time */• hold(uniform(0,1)*DiskCSeek*2.0);
• tempVal = uniform(0,((2*8.0*AvgTransDiskC)/DiskCTransRate)); /* hold for transfer time in seconds */
• tempVal = tempVal*1000.0; /* convert to milliseconds */• hold(tempVal); /* hold for transfer */•• release(diskc);• } /* end of SimulateDiskC */
Modeling & Forecasting
CMG-T 2011 Slide 52
Simulation Statistics
• A Simulation shows the results of selecting random numbers from various distributions
• The resulting statistics must be evaluated to take the randomness into consideration
Modeling & Forecasting
CMG-T 2011 Slide 53
CSIM Example
Ending simulation time: 5015467.206Elapsed simulation time: 5015467.206CPU time used (seconds): 1.522
FACILITY SUMMARY
Facility service service through- queue response name time util. put length time -----------------------------------------------------------------cpu fcfs 8.32557 0.487 0.05854 0.84592 14.45151diskc fcfs 10.60914 0.621 0.05853 1.57504 26.90791
E[s]CPU=8.333
E[s]C=10.6
UCPU =( 2/ s)*250 ms = 500 ms/s =.5
UC = 636 / 1000 = .636
WCPU=16.66
WC=29.12Modeling & Forecasting
CMG-T 2011 Slide 54
CSIM Example
confidence intervals for the mean after 9900 observations
level confidence interval rel. error
90 % 1216.357672 +/- 49.942077 = [1166.415594, 1266.299749] 0.042817
95 % 1216.357672 +/- 59.684293 = [1156.673379, 1276.041965] 0.051600
98 % 1216.357672 +/- 71.126979 = [1145.230693, 1287.484651] 0.062107
Transaction Time = 30*(16.66+29.12) = 1373.6 ~ 1.4 seconds
9.8% of the mean
http://www.cmg.org/membersonly/2006/CMG-T/CSim.zip Simulation code -
Modeling & Forecasting
CMG-T 2011 Slide 55
A Non-Trivial Example
• From “It May be Virtual, but the Overhead Isn’t”
http://www.cmg.org/membersonly/2006/CMG-T/VMSim.zip Simulation code -
Modeling & Forecasting
CMG-T 2011 Slide 56
Performance Model
• Central Server Model – Buzen 1971
Modeling & Forecasting
CMG-T 2011 Slide 57
Simplified Model
Modeling & Forecasting
CMG-T 2011 Slide 58
Include Hypervisor
L0 – requests for hypervisorL1 – Arrivals for workload #1Vmq1 – Virtual Machine queuePmq – physical CPU queuePm – physical CPUs
P11 – Prob of wait in hypervisorW1 – Wait in HypervisorP12 – Prob of I/OIO1 – I/O Time
Modeling & Forecasting
CMG-T 2011 Slide 59
A Two VM Model
Modeling & Forecasting
CMG-T 2011 Page 60
Happy Modeling!
Questions???
One More Hint – As Picasso used to say:Never fall in love with your model
CMG-T 2011 Slide 61
Bibliography
• Allen, Arnold, Probability, Statistics, and Queueing Theory, Academic Press, 1978.
• Barnett, Brian, “An Introduction to Time Series Forecasting for CPE”, CMG1991 Proceedings.
• W. Feller, An Introduction to Probability and its Applications, Vol II, 2nd edition, Wiley, 1971.
• Graham, Denning, Buzen, Rose, Chandy, Sauer, Bard, Wong, Muntz, Acm computing surveys – “Special Issue: Queueing Network Models of Computer System Performance
• Kleinrock, L, Queueing Systems, Wiley, 1975.
• Scherr, A. L, An analysis of Time Shared computer systems, MIT Press, 1967.
• Buzen, “Queueing Network Models of multiprogramming, PhD Thesis, Harvard University.
Modeling & Forecasting
CMG-T 2011 Slide 62
Bibliography
• Jackson, J. R, “Networks of Waiting Lines”, Operational Research 5, 1957
• Gordon, W. J, and Newell, G. F., “Closed queueing systems with exponential servers, Operational Research, 1967
• Law, Averil, Simulation Modeling and Analysis, McGraw-Hill, 2001• MacDougall, M. H, “Simulating Computer Systems, MIT Press, 1985• MacDougall,M. H. SMPL - A Simple Portable Simulation Language,
Amdahl Corp, Technical Report April 1, 1980• Salsburg, Michael, “It May be Virtual but the Overhead Isn’t”, CMG2006
Proceedings.
Modeling & Forecasting
CMG-T 2011 Slide 63
Bibliography
• Schwetman, H. D, CSIMt: A C-BASED, PROCESS-ORIENTED SIMULATION LANGUAGE, Proceedings of the 1986 Winter Simulation Conference - http://delivery.acm.org/10.1145/320000/318464/p387-schwetman.pdf?key1=318464&key2=7776578511&coll=&dl=acm&CFID=15151515&CFTOKEN=6184618
• Schwetman, H. D, CSIM17 User's Guide (local WWW copy), Mesquite Software, Inc., 1995 - http://www.cs.helsinki.fi/u/kerola/mesquite.com/1_intro.html
Modeling & Forecasting