19
Performance A measure of how fast something works.. Plane Boeing 747 Concorde DC to Paris 6.5 hours 3 hours Speed 610 mph 1,350mph Passengers 470 132 PMPH 286,700 178,200 ** PMPH = person miles per hou r (Speed * Passengers) Flight Time of Boeing 7 47 Flight Time o f Concorde Latency (Respo nse Tim e) T ime to run the task (travel time for ea ch passenger) < Throughput of Boeing 747 Throughput of Concorde Throughput (Bandwidth) T asks run per time (person miles per hour) >

Performance Lect

Embed Size (px)

Citation preview

Page 1: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 1/19

PerformanceA measure of how fast something works..

Plane

Boeing 747

Concorde

DC to Paris

6.5 hours

3 hours

Speed

610 mph

1,350mph

Passengers

470

132

PMPH

286,700

178,200

** PMPH = person miles per hour (Speed * Passengers)

Flight Time of Boeing 747 Flight Time of Concorde

Latency (Response Time)

Time to run the task (travel time for each passenger)

<

Throughput of Boeing 747 Throughput of Concorde

Throughput (Bandwidth)

Tasks run per time (person miles per hour)

>

Page 2: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 2/19

Latency & Throughput

1. How long does it take for my job to run? Latency

4. How long does it take to execute a job? Latency

6. How long must I wait for the database query? Latency

2. How many jobs can the machine run at once? Throughput

5. How much work is getting done? Throughput

3. What is the average execution rate? Throughput

Our Concern:

Latency (Response Time) ³Execution Time´

Page 3: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 3/19

Execution Time

CPU Time

- doesn¶t count I/O or time spent running other programs.

- system CPU time spent in the operating system

- user CPU time spent in the program

Our Focus

user CPU time (CPU) Execution Time = IC * CPI * cycle time

Elapsed Time

- Counts everything (disk and memory access, I/O, etc)

- A useful number, but often not good for comparison purposes

Page 4: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 4/19

Ex: CPU Execution time

 CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

A program is running on a RISC machine with the followings:

- 40,000,000 instructions

- 6 cycles/instruction- 1 GHz Clock rate

What is the CPU execution time for this program?

CPU Exec. Time = IC * CPI * Clock cycle time

= 9101640000000

vvv

= 0.24 seconds

Page 5: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 5/19

Ex: Performance

A program is running on a RISC machine with the followings:

- 20,000,000 instructions- 5 cycles/instruction

- 1 GHz Clock rate

Using the same program with a new compiler:

-5,000,000 instructions

-2 cycles/instruction

-1 GHz Clock rate

What is the speedup with the changes?

Speedup = old execution time / new execution time

= 0.1/0.01

= 10 (times faster after change)

Page 6: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 6/19

Aspects of CPU Performance

 CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

Inst Count CPI Clock Rate

Program X

Compiler X X

Inst. Set. X X

Organization X X

Technology X

Caching, pipelining, parallelism,

Page 7: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 7/19

Evaluation Methodology

Simulator

Benchmark 

(C++)

Benchmark 

(SPEC95/2000)

Benchmark 

Executables

Compiler

Simulated Results:

- Execution time (CPI)

-Instruction references

- Data references

- Miss rates

- # of jumps, branches, etc.

Input

DataTrace

Tools

Output

Exe. File

Compiled

Page 8: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 8/19

Benchmark Programs

Performance best determined by running a real application

- Use programs typical of expected workload- or, typical of expected class of applications

- e.g., compilers/ editors, scientific applications, graphics, etc.

Small benchmarks

- nice for architects and designers

- easy to standardize

- can be abused

SPEC (Standard Performance Evaluation Corporation):

Performance of a computer¶s processor, memory

architecture, compiler, client server, etc.LINPACK: floating-point benchmark program

Drystones: small program, emphasis on floating point

Whetstones: designed for performance testing, mid-1970.

Winbench: Desktop, etc.

(Refer to Ch. 4.3)

Page 9: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 9/19

Ex: Benchmark Programs

Program Dynamic instructions # of procedure calls Instructions/call

SPEC95 CINT: C Programs

go 584,163,226 1,610,807 362.65

gcc 250,494,615 5,203,867 48.13

m88ksim 850,957 16,796 50.66

compress 41,765,761 1,355,389 30.81

perl 63,028,127 2,611,048 24.14

li 189,184,575 7,971,176 23.73

Suite of C++ Programs

deltablue 42,148,983 1,478,007 28.52

ixx 31,829,777 1,404,978 22.65

eqn 58,401,832 1,999,175 29.21

C Mean 4,894,178 97,407 37.67

C++ Mean 41,513,735 1,588,521 26.45

Page 10: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 10/19

Ex: Instruction Classes & CPI

Compute the CPU clock cycles and average CPI

for the following program:

Inst. type ICi CPIi

ALU 20 4

Data transfer 20 5

Control 10 3

(Sol) CPU clock cycles = 20*4 + 20*5 + 10*3 = 210

Average CPI = 210/50 = 4.2

Page 11: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 11/19

Ex: CPI and Instruction FREQi

Compute the average (effective) CPI for the followings:

Inst. type CPIi FR EQi

ALU 3 40% (0.4)

Data transfer 4 40% (0.4)Control 2 20% (0.2)

(Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = 3.2

Page 12: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 12/19

Ex: Peak CPI

Compute the Peak CPI for the followings:

Inst. type CPIi FR EQi

ALU 3 40% 0%

Data transfer 4 40% 0%

Control 2 20% 100%

(Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0

Page 13: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 13/19

Ex: Average CPI and Average MIPS

Compute the average (effective) CPI for the followings:

Inst. type CPIi FR EQi

ALU 3 40% (0.4)

Data transfer 4 40% (0.4)

Control 2 20% (0.2)

(Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = 3.2

If the processor is Pentium II (320MHz), what is the MIPS rate?

100102.3

10320

10 6

6

6!

v

v

!v

!CPI 

ClockRate MIPS 

Page 14: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 14/19

Ex: Peak CPI and Peak MIPS

Compute the Peak CPI for the followings:

Inst. type CPIi FR EQi

ALU 3 40% 0%

Data transfer 4 40% 0%

Control 2 20% 100%

(Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0

If the processor is Pentium II (320MHz), what is the peak MIPS rate?

160102

10320

10 6

6

6!

v

v

!v

!CPI 

ClockRate MIPS 

Page 15: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 15/19

Amdahl¶s Law (I)

Single Enhancement 

F: Fraction enhanced, S: Speedup enhanced

F/S

 Aff ected

 F  F 

Speedup

!

)1(

1

1 - F FExecution Time(without E)

1 - F

Unaff ected

Execution Time(with E)

Page 16: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 16/19

Ex: Amdahl¶s Law (I)Floating point instructions improved to run 2X;

but only 10% of  actual instructions are FP

What is the Speedup?

Page 17: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 17/19

Ex: Amdahl¶s Law (I)

F = 0.1, S = 2

053.195.0

1

2

1.0)1.01(

1!!

!Speedup

 Make Common Case Fast 

Enhance the parts of the program that are used most often,

so µexecution time affected by improvement¶ is as large as

 possible.

Floating point instructions improved to run 2X;

but only 10% of  actual instructions are FP

What is the Speedup?

Page 18: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 18/19

Amdahl¶s Law (II)

Multiple Enhancements

F1,F2,F3: Fraction enhanced, S1,S2,S3: Speedup enhanced

§§!!

!n

i i

i

n

i

i

 F  F 

Speedup

11

)1(

1

1 (F1+F2+F3)

Unaff ected

Execution Time(with E)

1 (F1+F2+F3) F1Execution Time(without E)

F2 F3

 Aff ected

Fi /Si

Page 19: Performance Lect

8/3/2019 Performance Lect

http://slidepdf.com/reader/full/performance-lect 19/19

Ex: Amdahl¶s Law (II)

Three CPU perf ormance enhancements with the f ollowing speedup

Enhancements and percentage of  the execution time:1) Percentage F1: 20%, Enhanced Speedup S1: 102) Percentage F2: 15%, Enhanced Speedup S2: 153) Percentage F3: 10%, Enhanced Speedup S3: 30

 A ssumption: Each enhancement aff ects a diff erent portion of  the code

and only one enhancement can be used at a time.

What is the Total Speedup?

71.10333.055.0

1

)1(

13

1

3

1

!!

!§§!! i i

i

i

i

 F  F 

Speedup