Performance. What is performance? How to measure performance? Performance metrics Performance evaluation Why does some hardware perform better than others

Performance

PerformanceWhat is performance?How to measure performance?Performance metricsPerformance evaluation Why does some hardware perform better

than others for different programs?What factors in hardware are related to

performance?How does the machine's instruction set

affect performance?

Airplane AnalogyWhich of these airplanes has the best performance?

393600

79424

178200

268700

228750

Passenger throughput

(passenger x m.p.h)

6008400656Airbus A 3xx

5448720146Douglas DC-8-50

13504000132Concorde

6104150470Boeing 747

6104630375Boeing 777

Speed(m.p.h

)

Range

(miles)

Passenger

Capacity

Airplane

Computer PerformanceResponse time (latency)

How long does it take for my job to run?How long does it take to execute a program?How long must I wait for a database query?

ThroughputHow many jobs can the machine run at once?What is the average execution rate?How much work is getting done?

If we upgrade the processor of a machine which metric do we improve?

If we add a new machine to a network which metric do we improve?

Which Time to Measure?Elapsed Time (Wall clock time, response time)

Counts everything (disk and memory access, I/O, operating system overhead, work on other processes)

Useful but not always good for comparison purposesCPU (execution) time

The time CPU spends computing for the user taskDoes not include time spent waiting for I/O, running

other programsuser CPU time CPU time spent within the program, system CPU time CPU time spent in the operating

system performing tasks on behalf of the program

CPU TimeUnix time command reflects this

breakdown by returning the following when prompted:90.7u 12.9s 2:39 65%

Interpretation:User CPU time is 90.7 sSystem CPU time is 12.9sElapsed time is 159 s ( 90.7+12.9)CPU time is 65% of total elapsed time

A Definition of Performance

For some program running on machine XPerformanceX = 1/Execution_timeX

The machine X is said to be “n times faster” than the machine Y ifPerformanceX/PerformanceY = n

Execution_timeY/Execution_timeX = n

Example: Machine A runs a program in 10 seconds and machine B runs the same program in 15 seconds, how much faster is A than B?

Metrics of Performance

“Time to execute a program” is the ultimate metric in determining the performance

However, it is convenient to inspect other metrics as well when we examine the details of a machine.

Computers use a clock that runs at a constant rate and determines when an event takes place in hardware.

These discrete time intervals are called clock cycles (or ticks, clock ticks, clock periods).

Clock rate (frequency) is the inverse of clock period.

Clock Cycles

Instead of reporting execution time in seconds, we often use cycles

• Clock “ticks” indicate when to start activities

cycleseconds

programcycles

programseconds

time

Start of events often the risingedge of the clock

Clock Cyclecycle time (CT) = time between ticks =

seconds per cycleCycle Count (CC): the number of clock cycles

to execute a programclock rate (frequency) = cycles per second

(1 Hz = 1 cycle/sec)A 200 MHz clock has a 1/(200·106) = ?

nanosecond cycle timeA 4 GHz clock has a 1/(4· 109) = ? nanosecond

cycle time

The CPI MetricCPI Clocks Per Instruction

Number of cycles spent on an instruction on average.

CC = IC CPIHard to compute. It is useful when comparing the performances

of two machines with the same ISA. (Why?)Example: two machines with the same ISA. For

a certain program we haveMachine A: CPI = 2.0Machine B: CPI = 1.2Which machine is faster?What if machine A uses 250 ps and machine B

500 ps cycle time

Improving Performance

So, to improve performance1. Increase the clock frequency (i.e. decrease

the clock period)2. Reduce the number of the clock cycles per

program (IC CPI)

cyclesseconds

programcycles

programseconds

Instruction Cycle ?

• No !• The number of cycles per instruction

depends on the implementations of the instructions in hardware

• The number differs for each processor (even with the same ISA)

The ReasonOperations take different number of cycles

Multiplication takes longer than additionFloating point operations take longer than

integer operationsThe access time to a register is much

shorter than access to the main memory.

Simple Formulae for CPU TimeCPU execution time = CPU clock cycles for a program Clock cycle time (CC CT)

CPU execution time = CPU clock cycles for a program/Clock rate

We can writeCPU clock cycles for a program =IC CPI

ThenCPU execution time = (IC CPI)/Clock rate

ExampleComputer A of 800 MHz

It runs our favorite program in 15 s Our goal

Design computer B with the same ISAIt will run the same program in 8 s.

We may use a new process technology (>Ghz)can increase the clock rate;however, it will also increase CPI by 1.25.

What clock rate should we aim to use?

PerformancePerformance is determined by execution time

(CPU time)We have also other indicators

# of cycles to execute program # of instructions in program (IC)# of cycles per secondaverage # of cycles per instruction (CPI)average # of instructions per second

Common pitfall: thinking one of the above is indicative of performance when it really isn’t.

Number of Instructions Example

A compiler designer has the following two alternatives to generate a certain piece of code with instructions A(1 cycle) , B (2 cycles), and C(3 cycles):

1. 2106 of A, 106 of B, and 2106 of C (IC = 5106)

2. 4106 of A, 106 of B, and 106 of C (IC = 6106)

– Which code sequence is faster?

The MIPS Metric

Millions Instructions Per Second =MIPS = IC/(Execution_time 106)MIPS = IC/(CC cycle time 106)MIPS = (IC clock rate)/(IC CPI 106)MIPS = clock rate/(CPI 106)

• A faster machine has a higher MIPS

Execution_time = IC/(MIPS 106)

A MIPS ExampleA computer with 500 MHz clock

Three different classes of instructions:A (1 cycle), B (2 cycles), C (3 cycles)

Two compilers used to produce code for a large piece of software.Compiler 1:

– 5 billion A, 1 billion B, and 1 billion C instructions.Compiler 2:

10 billion A, 1 billion B, and 1 billion C instructions.Which sequence will be faster according to MIPS?Which sequence will be faster according to

execution time?

CPI example

CPIMachine A: CPI = 10/7 = 1.43Machine B: CPI = 15/12 = 1.25

CPU timeCPU time = (IC CPI) / clock rate

CPI changes according to instruction mix and freq.

When multiplied with clock cycle time gives accurate execution time.

Problems of MIPSMIPS specifies instruction execution rateMIPS does not take into account the

capabilities of the instructionsThus, it is impossible to compare computers

with different ISA using MIPS.MIPS is not constant, even on a single

machine, depends on the application.As we saw in the previous example, MIPS

can vary inversely with performance.

Overview A given program will require

1. Some number of instructions2. Some number of clock cycles3. Some number of seconds

Vocabulary Cycle time: (micro or nano) seconds per

cycle Clock rate (frequency): cycles per second CPI: clock cycles per instruction MIPS: millions of instruction per second MFLOPS: millions of floating point

operations per second

PerformancePerformance is ultimately determined by

EXECUTION TIMEIs any of the following metrics good to

measure performance by itself? Why?# of cycles to execute a program# of instructions in a program# of cycles per secondAverage # of cycles per instructionAverage # number of instructions per second

QuestionAssuming two machines have the same

ISA, which of the following quantities are identical?Clock rateCPIExecution time# of instructionsMIPS

Program Performance

IC, clock rate, CPI

IC, CPI

IC, CPI

IC, possibly CPI

ISA

Compiler

Programming Language

Algorithm

Affects what? How?HW or SW component

Documents

Performance. What is performance? How to measure performance? Performance metrics Performance evaluation Why does some hardware perform better than others