Upload
reginald-baker
View
214
Download
2
Embed Size (px)
Citation preview
CS252/PattersonLec 1.1
1/17/01
CMPUT429/CMPE382 Winter 2001
Topic2: Technology Trend and Cost/Performance
(Adapted from David A. Patterson’s CS252
lecture slides at Berkeley)
CS252/PattersonLec 1.2
1/17/01
Year
Tra
nsis
tors
1000
10000
100000
1000000
10000000
100000000
1970 1975 1980 1985 1990 1995 2000
i80386
i4004
i8080
Pentium
i80486
i80286
i8086
Technology Trends: Microprocessor Capacity
CMOS improvements:• Die size: 2X every 3 yrs• Line width: halve / 7 yrs
“Graduation Window”
Alpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million
Moore’s Law
CS252/PattersonLec 1.3
1/17/01
Memory Capacity (Single Chip DRAM)
size
Year
Bit
s
1000
10000
100000
1000000
10000000
100000000
1000000000
1970 1975 1980 1985 1990 1995 2000
year size(Mb) cyc time
1980 0.0625 250 ns
1983 0.25 220 ns
1986 1 190 ns
1989 4 165 ns
1992 16 145 ns
1996 64 120 ns
2000 256 100 ns
CS252/PattersonLec 1.4
1/17/01
Technology Trends(Summary)
Capacity Speed (latency)
Logic 2x in 3 years 2x in 3 years
DRAM 4x in 3 years 2x in 10 years
Disk 4x in 3 years 2x in 10 years
CS252/PattersonLec 1.5
1/17/01
Processor PerformanceTrends
Microprocessors
Minicomputers
Mainframes
Supercomputers
Year
0.1
1
10
100
1000
1965 1970 1975 1980 1985 1990 1995 2000
CS252/PattersonLec 1.6
1/17/01
Processor Performance(1.35X before, 1.55X now)
0
200
400
600
800
1000
1200
87 88 89 90 91 92 93 94 95 96 97
DEC Alpha 21264/600
DEC Alpha 5/500
DEC Alpha 5/300
DEC Alpha 4/266IBM POWER 100
DEC AXP/500
HP 9000/750
Sun-4/
260
IBMRS/
6000
MIPS M/
120
MIPS M
2000
1.54X/yr
CS252/PattersonLec 1.7
1/17/01
Performance Trends(Summary)
• Workstation performance (measured in Spec Marks) improves roughly 50% per year (2X every 18 months)
• Improvement in cost performance estimated at 70% per year
CS252/PattersonLec 1.8
1/17/01
Computer Architecture Topics
Instruction Set Architecture
Pipelining, Hazard Resolution,Superscalar, Reordering, Prediction, Speculation,Vector, DSP
Addressing,Protection,Exception Handling
L1 Cache
L2 Cache
DRAM
Disks, WORM, Tape
Coherence,Bandwidth,Latency
Emerging TechnologiesInterleavingBus protocols
RAID
VLSI
Input/Output and Storage
MemoryHierarchy
Pipelining and Instruction Level Parallelism
CS252/PattersonLec 1.9
1/17/01
Computer Architecture Topics
M
Interconnection NetworkS
PMPMPMP° ° °
Topologies,Routing,Bandwidth,Latency,Reliability
Network Interfaces
Shared Memory,Message Passing,Data Parallelism
Processor-Memory-Switch
MultiprocessorsNetworks and Interconnections
CS252/PattersonLec 1.10
1/17/01
Course Focus
Technology ProgrammingLanguages
OperatingSystems History
ApplicationsInterface Design
(ISA)
Measurement & Evaluation
Parallelism
Computer Architecture:• Instruction Set Design• Organization• Hardware
CS252/PattersonLec 1.11
1/17/01
Measurement Tools
• Benchmarks, Traces, Mixes• Hardware: Cost, delay, area, power
estimation• Simulation (many levels)
– ISA, RT, Gate, Circuit
• Queueing Theory• Rules of Thumb• Fundamental “Laws”/Principles
CS252/PattersonLec 1.12
1/17/01
Which is faster?
• Time to run the task (ExTime)– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns … (Performance)
– Throughput, bandwidth
Plane
Boeing 747
BAD/Sud Concodre
Speed
610 mph
1350 mph
DC to Paris
6.5 hours
3 hours
Passengers
470
132
Throughput (pmph)
286,700
178,200
CS252/PattersonLec 1.13
1/17/01
Definitions
•Performance is in units of things per sec– bigger is better
•If we are primarily concerned with response time–performance(x) = 1
execution_time(x)
" X is n times faster than Y" means
Performance(X) Execution_time(Y)
n = =
Performance(Y) Execution_time(X)
CS252/PattersonLec 1.14
1/17/01
Cycles Per Instruction
IC = Instruction CountCPI = Clock Per Instruction
n
jjj ICPI
CPI
1
Time Cycle timeCPU
RateClock
CPIIC timeCPU
timecycleClock CPIIC timeCPUIC
cyclesclock ofNumber
FrequencyClock
cyclesclock ofNumber timeCPU
timecycleClock cyclesclock ofNumber timeCPU
CS252/PattersonLec 1.15
1/17/01
Cycles Per Instruction
We may separate the contribution of each type ofinstruction to the execution time defining:
j
CPIj
IC
ICCPI
j
j
n
jjj
n instructio execute torequired clocks
ofnumber average theis and executed, is
ninstructio that timesofnumber theis where
cyclesclock ofNumber 1
CS252/PattersonLec 1.16
1/17/01
Example: Calculating CPI
Typical Mix of instruction typesin program
Base Machine (Reg / Reg)
Op Freq Cycles CPI(i) (% Time)
ALU 50% 1 .5 (33%)
Load 20% 2 .4 (27%)
Store 10% 2 .2 (13%)
Branch 20% 2 .4 (27%)
1.5
CS252/PattersonLec 1.17
1/17/01
Aspects of CPU Performance (CPU Law)
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Inst Count CPI Clock RateProgram X
Compiler X (X)
Inst. Set. X X
Organization X X
Technology X
CS252/PattersonLec 1.18
1/17/01
Amdahl's Law
Speedup due to enhancement E:
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
E w/oePerformanc
E w/ ePerformanc
E w/ Time Exec
E w/oTime ExecSpeedup(E)
CS252/PattersonLec 1.19
1/17/01
Amdahl’s Law
enhanced
enhancedenhancedoldnew Speedup
FractionFraction1ExTimeExTime
enhanced
enhancedenhanced
new
oldoverall
Speedup
FractionFraction1
1
ExTime
ExTimeSpeedup
CS252/PattersonLec 1.20
1/17/01
Amdahl’s Law
• Example: Floating point instructions improved to run 2X; but only 10% of actual instructions are FP
95.0ExTime2
0.11.01ExTimeExTime oldoldnew
053.195.0
1
95.0ExTime
ExTime
ExTime
ExTimeSpeedup
old
old
new
oldoverall
CS252/PattersonLec 1.21
1/17/01
Metrics of Performance
Compiler
Programming Language
Application
DatapathControl
Transistors Wires Pins
ISA
Function Units
(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s
Cycles per second (clock rate)
Megabytes per second
Answers per monthOperations per second
CS252/PattersonLec 1.22
1/17/01
SPEC: System Performance Evaluation Cooperative
• First Round 1989– 10 programs yielding a single number (“SPECmarks”)
• Second Round 1992– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point
programs)» Compiler Flags unlimited. March 93 of DEC 4000 Model 610:
spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=memcpy(b,a,c)”
wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200
nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas
• Third Round 1995– new set of programs: SPECint95 (8 integer programs) and SPECfp95
(10 floating point) – “benchmarks useful for 3 years”– Single flag setting for all programs: SPECint_base95, SPECfp_base95
CS252/PattersonLec 1.23
1/17/01
How to Summarize Performance
• Arithmetic mean (weighted arithmetic mean) tracks execution time: (Ti)/n or (Wi*Ti)
• Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/(1/Ri) or n/(Wi/Ri)
• Normalized execution time is handy for scaling performance (e.g., X times faster than SPARCstation 10)
• But do not take the arithmetic mean of normalized execution time, use the geometrici)^1/n)
CS252/PattersonLec 1.24
1/17/01
Performance Evaluation
• “For better or worse, benchmarks shape a field”• Good products created when have:
– Good benchmarks– Good ways to summarize performance
• Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary
• If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales;Sales almost always wins!
• Execution time is the measure of computer performance!
CS252/PattersonLec 1.25
1/17/01
Instruction Set Architecture (ISA)
instruction set
software
hardware
CS252/PattersonLec 1.26
1/17/01
Interface Design
A good interface:
• Lasts through many implementations (portability, compatability)
• Is used in many differeny ways (generality)
• Provides convenient functionality to higher levels
• Permits an efficient implementation at lower levels
Interfaceimp 1
imp 2
imp 3
use
use
use
time
CS252/PattersonLec 1.27
1/17/01
Summary, #1
• Designing to Last through TrendsCapacity Speed
Logic 2x in 3 years 2x in 3 years
DRAM 4x in 3 years 2x in 10 years
Disk 4x in 3 years 2x in 10 years
• 6yrs to graduate => 16X CPU speed, DRAM/Disk size
• Time to run the task– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns, …– Throughput, bandwidth
• “X is n times faster than Y” means ExTime(Y) Performance(X)
--------- = --------------
ExTime(X) Performance(Y)
CS252/PattersonLec 1.28
1/17/01
Summary, #2
• Amdahl’s Law:
• CPI Law:
• Execution time is the REAL measure of computer performance!
• Good products created when have:– Good benchmarks, good ways to summarize performance
• Die Cost goes roughly with die area4
• Can PC industry support engineering/research investment?
Speedupoverall =ExTimeold
ExTimenew
=
1
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle