View
215
Download
0
Category
Preview:
Citation preview
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 1/19
PerformanceA measure of how fast something works..
Plane
Boeing 747
Concorde
DC to Paris
6.5 hours
3 hours
Speed
610 mph
1,350mph
Passengers
470
132
PMPH
286,700
178,200
** PMPH = person miles per hour (Speed * Passengers)
Flight Time of Boeing 747 Flight Time of Concorde
Latency (Response Time)
Time to run the task (travel time for each passenger)
<
Throughput of Boeing 747 Throughput of Concorde
Throughput (Bandwidth)
Tasks run per time (person miles per hour)
>
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 2/19
Latency & Throughput
1. How long does it take for my job to run? Latency
4. How long does it take to execute a job? Latency
6. How long must I wait for the database query? Latency
2. How many jobs can the machine run at once? Throughput
5. How much work is getting done? Throughput
3. What is the average execution rate? Throughput
Our Concern:
Latency (Response Time) ³Execution Time´
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 3/19
Execution Time
CPU Time
- doesn¶t count I/O or time spent running other programs.
- system CPU time spent in the operating system
- user CPU time spent in the program
Our Focus
user CPU time (CPU) Execution Time = IC * CPI * cycle time
Elapsed Time
- Counts everything (disk and memory access, I/O, etc)
- A useful number, but often not good for comparison purposes
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 4/19
Ex: CPU Execution time
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
A program is running on a RISC machine with the followings:
- 40,000,000 instructions
- 6 cycles/instruction- 1 GHz Clock rate
What is the CPU execution time for this program?
CPU Exec. Time = IC * CPI * Clock cycle time
= 9101640000000
vvv
= 0.24 seconds
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 5/19
Ex: Performance
A program is running on a RISC machine with the followings:
- 20,000,000 instructions- 5 cycles/instruction
- 1 GHz Clock rate
Using the same program with a new compiler:
-5,000,000 instructions
-2 cycles/instruction
-1 GHz Clock rate
What is the speedup with the changes?
Speedup = old execution time / new execution time
= 0.1/0.01
= 10 (times faster after change)
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 6/19
Aspects of CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Inst Count CPI Clock Rate
Program X
Compiler X X
Inst. Set. X X
Organization X X
Technology X
Caching, pipelining, parallelism,
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 7/19
Evaluation Methodology
Simulator
Benchmark
(C++)
Benchmark
(SPEC95/2000)
Benchmark
Executables
Compiler
Simulated Results:
- Execution time (CPI)
-Instruction references
- Data references
- Miss rates
- # of jumps, branches, etc.
Input
DataTrace
Tools
Output
Exe. File
Compiled
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 8/19
Benchmark Programs
Performance best determined by running a real application
- Use programs typical of expected workload- or, typical of expected class of applications
- e.g., compilers/ editors, scientific applications, graphics, etc.
Small benchmarks
- nice for architects and designers
- easy to standardize
- can be abused
SPEC (Standard Performance Evaluation Corporation):
Performance of a computer¶s processor, memory
architecture, compiler, client server, etc.LINPACK: floating-point benchmark program
Drystones: small program, emphasis on floating point
Whetstones: designed for performance testing, mid-1970.
Winbench: Desktop, etc.
(Refer to Ch. 4.3)
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 9/19
Ex: Benchmark Programs
Program Dynamic instructions # of procedure calls Instructions/call
SPEC95 CINT: C Programs
go 584,163,226 1,610,807 362.65
gcc 250,494,615 5,203,867 48.13
m88ksim 850,957 16,796 50.66
compress 41,765,761 1,355,389 30.81
perl 63,028,127 2,611,048 24.14
li 189,184,575 7,971,176 23.73
Suite of C++ Programs
deltablue 42,148,983 1,478,007 28.52
ixx 31,829,777 1,404,978 22.65
eqn 58,401,832 1,999,175 29.21
C Mean 4,894,178 97,407 37.67
C++ Mean 41,513,735 1,588,521 26.45
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 10/19
Ex: Instruction Classes & CPI
Compute the CPU clock cycles and average CPI
for the following program:
Inst. type ICi CPIi
ALU 20 4
Data transfer 20 5
Control 10 3
(Sol) CPU clock cycles = 20*4 + 20*5 + 10*3 = 210
Average CPI = 210/50 = 4.2
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 11/19
Ex: CPI and Instruction FREQi
Compute the average (effective) CPI for the followings:
Inst. type CPIi FR EQi
ALU 3 40% (0.4)
Data transfer 4 40% (0.4)Control 2 20% (0.2)
(Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = 3.2
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 12/19
Ex: Peak CPI
Compute the Peak CPI for the followings:
Inst. type CPIi FR EQi
ALU 3 40% 0%
Data transfer 4 40% 0%
Control 2 20% 100%
(Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 13/19
Ex: Average CPI and Average MIPS
Compute the average (effective) CPI for the followings:
Inst. type CPIi FR EQi
ALU 3 40% (0.4)
Data transfer 4 40% (0.4)
Control 2 20% (0.2)
(Sol) Average (Effective) CPI = 3*0.4 + 4*0.4 + 2*0.2 = 3.2
If the processor is Pentium II (320MHz), what is the MIPS rate?
100102.3
10320
10 6
6
6!
v
v
!v
!CPI
ClockRate MIPS
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 14/19
Ex: Peak CPI and Peak MIPS
Compute the Peak CPI for the followings:
Inst. type CPIi FR EQi
ALU 3 40% 0%
Data transfer 4 40% 0%
Control 2 20% 100%
(Sol) Peak CPI = 3*0.0 + 4*0.0 + 2*1.0 = 2.0
If the processor is Pentium II (320MHz), what is the peak MIPS rate?
160102
10320
10 6
6
6!
v
v
!v
!CPI
ClockRate MIPS
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 15/19
Amdahl¶s Law (I)
Single Enhancement
F: Fraction enhanced, S: Speedup enhanced
F/S
Aff ected
S
F F
Speedup
!
)1(
1
1 - F FExecution Time(without E)
1 - F
Unaff ected
Execution Time(with E)
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 16/19
Ex: Amdahl¶s Law (I)Floating point instructions improved to run 2X;
but only 10% of actual instructions are FP
What is the Speedup?
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 17/19
Ex: Amdahl¶s Law (I)
F = 0.1, S = 2
053.195.0
1
2
1.0)1.01(
1!!
!Speedup
Make Common Case Fast
Enhance the parts of the program that are used most often,
so µexecution time affected by improvement¶ is as large as
possible.
Floating point instructions improved to run 2X;
but only 10% of actual instructions are FP
What is the Speedup?
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 18/19
Amdahl¶s Law (II)
Multiple Enhancements
F1,F2,F3: Fraction enhanced, S1,S2,S3: Speedup enhanced
§§!!
!n
i i
i
n
i
i
S
F F
Speedup
11
)1(
1
1 (F1+F2+F3)
Unaff ected
Execution Time(with E)
1 (F1+F2+F3) F1Execution Time(without E)
F2 F3
Aff ected
Fi /Si
8/3/2019 Performance Lect
http://slidepdf.com/reader/full/performance-lect 19/19
Ex: Amdahl¶s Law (II)
Three CPU perf ormance enhancements with the f ollowing speedup
Enhancements and percentage of the execution time:1) Percentage F1: 20%, Enhanced Speedup S1: 102) Percentage F2: 15%, Enhanced Speedup S2: 153) Percentage F3: 10%, Enhanced Speedup S3: 30
A ssumption: Each enhancement aff ects a diff erent portion of the code
and only one enhancement can be used at a time.
What is the Total Speedup?
71.10333.055.0
1
)1(
13
1
3
1
!!
!§§!! i i
i
i
i
S
F F
Speedup
Recommended