Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 1/19

PerformanceA measure of how fast something works..

Boeing 747

Concorde

DC to Paris

6.5 hours

3 hours

610 mph

1,350mph

Passengers

286,700

178,200

** PMPH = person miles per hour (Speed * Passengers)

Flight Time of Boeing 747 Flight Time of Concorde

Latency (Response Time)

Time to run the task (travel time for each passenger)

Throughput of Boeing 747 Throughput of Concorde

Throughput (Bandwidth)

Tasks run per time (person miles per hour)

Latency & Throughput

1. How long does it take for my job to run? Latency

4. How long does it take to execute a job? Latency

6. How long must I wait for the database query? Latency

2. How many jobs can the machine run at once? Throughput

5. How much work is getting done? Throughput

3. What is the average execution rate? Throughput

Our Concern:

Latency (Response Time) ³Execution Time´

Execution Time

CPU Time

- doesn¶t count I/O or time spent running other programs.

- system CPU time spent in the operating system

- user CPU time spent in the program

Our Focus

user CPU time (CPU) Execution Time = IC * CPI * cycle time

Elapsed Time

- Counts everything (disk and memory access, I/O, etc)

- A useful number, but often not good for comparison purposes

Ex: CPU Execution time

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

A program is running on a RISC machine with the followings:

- 40,000,000 instructions

- 6 cycles/instruction- 1 GHz Clock rate

What is the CPU execution time for this program?

CPU Exec. Time = IC * CPI * Clock cycle time

= 9101640000000

= 0.24 seconds

Ex: Performance

A program is running on a RISC machine with the followings:

- 20,000,000 instructions- 5 cycles/instruction

- 1 GHz Clock rate

Using the same program with a new compiler:

-5,000,000 instructions

-2 cycles/instruction

-1 GHz Clock rate

What is the speedup with the changes?

Speedup = old execution time / new execution time

= 0.1/0.01

= 10 (times faster after change)

Aspects of CPU Performance

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

Inst Count CPI Clock Rate

Program X

Compiler X X

Inst. Set. X X

Organization X X

Technology X

Caching, pipelining, parallelism,

Evaluation Methodology

Simulator

Benchmark

(SPEC95/2000)

Benchmark

Executables

Compiler

Simulated Results:

- Execution time (CPI)

-Instruction references

- Data references

- Miss rates

- # of jumps, branches, etc.

DataTrace

Output

Exe. File

Compiled

Benchmark Programs

Performance best determined by running a real application

- Use programs typical of expected workload- or, typical of expected class of applications

- e.g., compilers/ editors, scientific applications, graphics, etc.

Small benchmarks

- nice for architects and designers

- easy to standardize

- can be abused

SPEC (Standard Performance Evaluation Corporation):

Performance of a computer¶s processor, memory

architecture, compiler, client server, etc.LINPACK: floating-point benchmark program

Drystones: small program, emphasis on floating point

Whetstones: designed for performance testing, mid-1970.

Winbench: Desktop, etc.

(Refer to Ch. 4.3)

Ex: Benchmark Programs

Program Dynamic instructions # of procedure calls Instructions/call

SPEC95 CINT: C Programs

go 584,163,226 1,610,807 362.65

gcc 250,494,615 5,203,867 48.13

m88ksim 850,957 16,796 50.66

compress 41,765,761 1,355,389 30.81

perl 63,028,127 2,611,048 24.14

li 189,184,575 7,971,176 23.73

Suite of C++ Programs

deltablue 42,148,983 1,478,007 28.52

ixx 31,829,777 1,404,978 22.65

eqn 58,401,832 1,999,175 29.21

C Mean 4,894,178 97,407 37.67

C++ Mean 41,513,735 1,588,521 26.45

Ex: Instruction Classes & CPI

Compute the CPU clock cycles and average CPI

for the following program:

Inst. type ICi CPIi

ALU 20 4

Data transfer 20 5

Control 10 3

(Sol) CPU clock cycles = 20*4 + 20*5 + 10*3 = 210

Average CPI = 210/50 = 4.2

Ex: CPI and Instruction FREQi

Compute the average (effective) CPI for the followings:

Inst. type CPIi FR EQi

ALU 3 40% (0.4)

Data transfer 4 40% (0.4)Control 2 20% (0.2)

(Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = 3.2

Ex: Peak CPI

Compute the Peak CPI for the followings:

ALU 3 40% 0%

Data transfer 4 40% 0%

Control 2 20% 100%

(Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0

Ex: Average CPI and Average MIPS

Compute the average (effective) CPI for the followings:

ALU 3 40% (0.4)

Data transfer 4 40% (0.4)

Control 2 20% (0.2)

(Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = 3.2

If the processor is Pentium II (320MHz), what is the MIPS rate?

100102.3

ClockRate MIPS

Ex: Peak CPI and Peak MIPS

Compute the Peak CPI for the followings:

ALU 3 40% 0%

Data transfer 4 40% 0%

Control 2 20% 100%

(Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0

If the processor is Pentium II (320MHz), what is the peak MIPS rate?

160102

ClockRate MIPS

Amdahl¶s Law (I)

Single Enhancement

F: Fraction enhanced, S: Speedup enhanced

Aff ected

Speedup

1 - F FExecution Time(without E)

Unaff ected

Execution Time(with E)

Ex: Amdahl¶s Law (I)Floating point instructions improved to run 2X;

but only 10% of actual instructions are FP

What is the Speedup?

Ex: Amdahl¶s Law (I)

F = 0.1, S = 2

053.195.0

1.0)1.01(

!Speedup

Make Common Case Fast

Enhance the parts of the program that are used most often,

so µexecution time affected by improvement¶ is as large as

possible.

Floating point instructions improved to run 2X;

but only 10% of actual instructions are FP

What is the Speedup?

Amdahl¶s Law (II)

Multiple Enhancements

F1,F2,F3: Fraction enhanced, S1,S2,S3: Speedup enhanced

§§!!

Speedup

1 (F1+F2+F3)

Unaff ected

Execution Time(with E)

1 (F1+F2+F3) F1Execution Time(without E)

Aff ected

Fi /Si

Ex: Amdahl¶s Law (II)

Three CPU perf ormance enhancements with the f ollowing speedup

Enhancements and percentage of the execution time:1) Percentage F1: 20%, Enhanced Speedup S1: 102) Percentage F2: 15%, Enhanced Speedup S2: 153) Percentage F3: 10%, Enhanced Speedup S3: 30

A ssumption: Each enhancement aff ects a diff erent portion of the code

and only one enhancement can be used at a time.

What is the Total Speedup?

71.10333.055.0

!§§!! i i

Speedup

Performance Lect

Documents

WAROP 44 Lect Vol 2 - John Outramwaroftheartsofpeace.com/WAROP 44 Lect Vol 2 All.pdf · Lecture: 'CAMERA LUCIDA' Lect 30-1 - Lect 30-24. The 31st. Lecture: A FLOWERING. Lect 31-1

Lect Relationships

Burns Lect

Pedia Lect

Lect w4 Lect w3 estimation

Crypt Lect

Lect -brainstem

Lect Slides

Saliva.gland lect

Introduction - TU Delft• Lect. 1: Modeling and model-based design • Lect. 2: Measurement-based performance evaluation Lect. 5: Petri-nets • Lect. 6: Data-flow networks • Lect

LH-Lect-2 · 2012. 2. 2. · Title: LH-Lect-2 Author: Nicolai Subject: LH-Lect-2 Keywords: LH-Lect-2 Created Date: 20120201043354

Beamforming Lect

Lect Segmen

Transducer Lect

Lect 11_EN

LECT Unit2

WAROP InDes2 Vol 1 - John Outram · Lect 8-1 - Lect 8-20. The 9th. Lecture: WHAT TABOO? Lect 9-1 - Lect 9-34. The 10th. Lecture: RETURN OF THE SYMPTOM. Lect 10-1 - Lect 10-24. The

Lect - Batteries

Lect 02_EN

LECT Admixtures