1 Performance Evaluation of Computer Systems Introduction

1

Performance Evaluation of Computer Systems

Introduction

2

Outline

Introduction to performance evaluation Objectives of performance evaluation Techniques of performance evaluation Metrics in performance evaluation

3

Introduction Computer system users, administrators, and designers are all

interested in performance evaluation. The goal in system performance evaluation is to provide the

highest performance at the lowest cost. Computer performance evaluation has important role in

selection of computer systems, design of systems and applications, and analysis of existing systems.

4

Objectives of Performance Study Evaluating design alternatives (system design) Comparing two or more systems (system selection) Determining the optimal value of a parameter (system

tuning) Finding the performance bottleneck (bottleneck

identification) Characterizing the load on the system (workload

characterization) Determining the number and sizes of components

(capacity planning) Predicting the performance at future loads

(forecasting).

5

Basic Terms

System: Any collection of hardware, software and network.

Metrics: Criteria used to analysis the performance of the system or components.

Workloads: The requests made by the users of the system.

6

Performance Evaluation Activities Performance evaluation of a system can be done at different

stages of system development System in planning and design stage

Use high level models to obtain performance estimates for alternative system configurations and alternative designs.

System is operational Measure the system behavior with a view to improve the

performance Develop validated model that can be used for performance

prediction and capacity planning.

7

Techniques for Performance Evaluation Performance measurement

Obtain measurement data by observing the events and activities on an existing system

Performance modeling Represent the system by a model and manipulate the

model to obtain information about system performance

8

Performance Measurement

Measure the performance directly on a system Need to characterize the workload placed on the

system during measurement Generally provide the most valid results Nevertheless, not very flexible

May be difficult (or even impossible) to vary some workload parameters

9

Performance Modeling

Model An abstraction of the system obtained by making a set of

assumptions about how the system works Capture the essential characteristics of the system

Reasons of using models Experimenting with the real system may be

too costly too risky, or too disruptive to system operation

System may only be in the design stage

10

Performance Modeling Workload characterization

Capture the resource demands and intensity of the load brought to the system

Performance metrics The measure of interest, such as mean response time, the

number of transactions completed per second, the ratio of blocked connection requests, etc.

11

Performance Modeling

Solution methods Analytic modeling Simulation modeling

12

Analytic Modeling

Mathematical methods are used to obtain solutions to the performance measures of interest

Numerical results are easy to compute if a simple analytic solution is available

Useful approach when one only needs rough estimates of performance measures

Solutions to complex models may be difficult to obtain

13

Simulation Modeling

Develop a simulation program that implements the model

Run the simulation program and use the data collected to estimate the performance measurement of interest

A system can be studied at an arbitrary level of detail

It may be costly to develop and run the simulation program

14

Stochastic Model Model contains some random input components

which are characterized by probability distributions, e.g., time between arrivals to a system by exponential distribution

Output is also random, and provides probability distributions of the performance measures of interest

15

Queuing Model

The most commonly used model to analyze the performance of computer systems and networks.

Single queue: models a component of overall system, such as CPU, disk, communication channel

Network of queues: models system components and their interaction.

16

Steps in Performance Modeling

17

Commonly Used Performance Metrics Response Time

Turn around time Reaction time Stretch factor

Throughput Operations/second

Jobs per second Requests per second Millions of Instructions Per Second (MIPS) Millions of Floating Point Operations Per Second (MFLOPS) Packets Per Second (PPS) Bits per second (bps) Transactions Per Second (TPS)

Efficiency Utilization

18

Commonly Used Performance Metrics (Cont…)

Reliability R(t) MTTF

Availability Mean Time to Failure (MTTF) Mean Time to Repair (MTTR) MTTF/(MTTF+MTTR)

19

Response Time

Interval between user’s request and system response

Time

User’sRequest

System’sResponse

20

Response Time (cont…)

Can have two measures of response time Both ok, but 2 preferred if execution long

Time

User FinishesRequest

System Starts

Response

User Starts

Request

System Finishes

Response

System Starts

Execution

ReactionTime

ResponseTime 1

ResponseTime 2

21

Response Time (cont…) Turn around time: time between submission of a

job and completion of output For batch job systems

Reaction time: Time between submission of a request and beginning of execution Usually need to measure inside system since nothing

externally visible Stretch factor: ratio of response time at load to

response time at minimal load Most systems have higher response time as load

increases

22

Throughput Rate at which requests can be serviced by system (requests

per unit time)

23

Efficiency

Ratio of maximum achievable throughput (ex: 9.8 Mbps) to nominal capacity (ex: 10 Mbps) 98%

For multiprocessor systems, ratio of n-processor to that of one-processor (in MIPS or MFLOPS)

Effi

cien

cy

Number of Processors

24

Utilization

Typically, fraction of time resource is busy serving requests Time not being used is idle time System managers often want to balance resources to have

same utilization Ex: equal load on CPUs But may not be possible. Ex: CPU when I/O is bottleneck

May not be time Processors: busy / total Memory: fraction used / total

25

Miscellaneous Metrics Reliability

Probability of errors or mean time between errors (error-free seconds)

Availability Fraction of time system is available to service requests

(fraction not available is downtime) Mean Time To Failure (MTTF) is mean uptime

Useful, since availability high (downtime small) may still be frequent and no good for long request

26

Definition of Reliability Recommendations E.800 of the International Telecommunications Union (ITU-T) defines reliability as follows:

“The ability of an item to perform a required function under given conditions for a given time interval.”

In this definition, an item may be a circuit board, a component on a circuit board, a module consisting of several circuit boards, a base transceiver station with several modules, a fiber-optic transport-system, or a mobile switching center (MSC) and all its subtending network elements. The definition includes systems with software.

27

Basic Definitions of Reliablity

Reliability R(t):X : time to failure of a system

F(t): : distribution function of system lifetime

Mean Time To system Failure:

f(t): density function of system lifetime

tFtXPtR 1

00

dttRdtttfXEMTTF

28

Definition of Availability

Availability is closely related to reliability, and is also defined in ITU-T Recommendation E.800 as follows:

"The ability of an item to be in a state to perform a required function at a given instant of time or at any instant of time within a given time interval, assuming that the external resources, if required, are provided."

An important difference between reliability and availability is that reliability refers to failure-free operation during an interval, while availability refers to failure-free operation at a given instant of time, usually the time when a device or system is first accessed to provide a required function or service

29

Availability (Cont…)

Instantaneous (point) Availability A(t):

A(t) = P (system working at t)

Let H(t) be the convolution of F and G:

g(t): density function of system repair time

Then:

Inst. Availability , , Reliability

dxxgxtFtHt

)()(0

t

xdHxtAtRtA0

)()()()(

)()( tRtA

30

First failed and got repaired at time x<t & UP at end of interval (x,t), prob:


0 x t

x + dx

First repair completed here

Never failed in (0,t), prob: R(t) System working at time t

t

xdHxtA0

)()(

31


MTTR: Mean Time to Repair

Y: repair period of the system

Availability and Reliability are related but different!

0

)( dtttgYEMTTR

32

We can show from equation (1) that:

Also:


MTTRMTTF

MTTFASS

)yearminutes(

60*8760*)1(

perin

Adowntime ss

33

High Reliability/Availability/Safety Traditional applications (long-life/life-critical/safety-critical)

Space missions, aircraft control, defense, nuclear systems New applications (non-life-critical/non-safety-critical, business critical)

Banking, airline reservation, e-commerce applications, web-hosting, telecommunication

Scientific applications (non-critical)

34

Motivation – High Availability

35

IFIP WG10.4 Failure occurs when the delivered service no longer

complies with the specification Error is that part of the system state which is liable

to lead to subsequent failure Fault is adjudged or hypothesized cause of an error

Faults are the cause of errors that may lead to failuresFault Error Failure

36

Three Rules of Validation

Do not trust the results of a simulation model until they have been validated by analytical modeling or measurements.

Do not trust the results of an analytical model until they have been validated by a simulation model or measurements.

Do not trust the results of a measurement until they have been validated by simulation or analytical modeling.

Documents

1 Performance Evaluation of Computer Systems Introduction