View
11
Download
2
Category
Preview:
Citation preview
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-1
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-1
Distributed System Engineering Brunel Univ. E&CEng
Distributed System Engineering Computer Architecture
(DS2A)lecture 7
RISC vs CISCPeter van Santen
Dept. of Electronic and Computer Engineering 2004
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-2
Distributed System Engineering Brunel Univ. E&CEng
RISC Overview• Complex Instruction Set Computers
• Reduced Instruction Set Computers
• Instruction Analysis
• RISC machine Analysis
• RISC strategy
• Instruction comparison
• Dynamic performance analysisAdvanced topics
Delayed branch technique
Register windows
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-2
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-3
Distributed System Engineering Brunel Univ. E&CEng
Objectives• Historic Review of CISC
• Introduce RISC architecture
• Review background to RISC development
• Able to analyse and compare performance issues
• Relate concurrentising issues to instruction level parallelism and scheduling
Refs.: Hen03 Chapt. 2http://books.elsevier.com/companions/1558605967
Refs.: Hen03 Chapt. 2http://books.elsevier.com/companions/1558605967
Performance factors (Ic, p, m, k, tcyc)reduce instruction count Ic
reduce processor cpireduce cycle time tcyc
Performance factors (Ic, p, m, k, tcyc)reduce instruction count Ic
reduce processor cpireduce cycle time tcyc
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-4
Distributed System Engineering Brunel Univ. E&CEng
CISC
ISA for Complex Instruction Set Computer
Historic Arguments for:1. Greater variety in instructions would simplify compilers.2. More sophisticated instructions to reduce software problems.3. Metrics based on memory size (memory efficiency) and
program length.4. Micro-programming supported higher level functions directly
executable by microcode.5. Closure of semantic gap.
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-3
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-5
Distributed System Engineering Brunel Univ. E&CEng
CISC techniques
Memory to memory architecture → less complex compiler.Reduction in cost of HW → greater hw complexity
→ micro-programming→ semantic gap closure→ complex instr. sets
(non-orthogonal).Writeable ctrl store → user instructions
→ vir. memory problems→ limited address space→ multi-process swapping.
Performance proportional to prog.. size.
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-6
Distributed System Engineering Brunel Univ. E&CEng
Main microprocessor Architecture families
CISCIntel x86 generic:
86, 286, 386, 486, Pentium, Pentium Pro, PIII, P4
Motorola:M 68x0 & 680x0
Digital VAX (VLSI)
RISCDigital Alpha series:
21064, 21164, 21264MIPS:
R2000, 3000, 4000, 5000, 8000, 10000
Sun SPARC:SPARC, MicroSPARC, SuperSPARC, UltraSPARC
HP/PA-RISCPowerPCIntel: i960
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-4
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-7
Distributed System Engineering Brunel Univ. E&CEng
Instruction Set Analysis
Assignment 51 38 45 45IF 10 43 29 27Call 5 12 15 11Loop 9 3 5 6Goto 9 3 0 4Other 16 1 6 8
round-off errors in averages
Statement Fortran C Pascal Average
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-8
Distributed System Engineering Brunel Univ. E&CEng
CISC and RISC processor Architectures
CISC architecture with uCtrl unitand unified Cache.
Instr. dataMain memory
Data path
Datacache
HWCtrl unit
Instructioncache
Instr. & Datapath
cache
Main memory
Ctrlunit
Micro-programmedctrl mem
RISC with hardwired ctrl unit and split cache
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-5
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-9
Distributed System Engineering Brunel Univ. E&CEng
RISC example Digital Alpha 21164
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-10
Distributed System Engineering Brunel Univ. E&CEng
CISC Example Digital VAX
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-6
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-11
Distributed System Engineering Brunel Univ. E&CEng
CISC/RISC example Intel Pentium Pro
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-12
Distributed System Engineering Brunel Univ. E&CEng
Frequency of VariablesN terms locals parameters
0 - 22 411 80 17 192 15 20 153 3 14 94 2 8 7
≥5 0 20 8
Where:
Terms % occurrence in assignment statements.
Locals % occurrence local variables per procedure/function.
Parameters % occurrence of number of params in procedure calls.
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-7
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-13
Distributed System Engineering Brunel Univ. E&CEng
RISC vs CISC
RISC CISC
Simple single cycle instr. Complex instr. in m-cyclesM ref. by Ld and St only Any instr. may ref. mem.Highly pipelined (overlap.) Less pipelined or notlnstr. exec. by HW Instr. interpreted by micro prg.Fixed format instr. Var. instr. formatFew instr. and modes Many instr. and modesMultiple reg. sets Single reg. setComplexity in compiler Complexity in micro prg.
RISC CISC
Simple single cycle instr. Complex instr. in m-cyclesM ref. by Ld and St only Any instr. may ref. mem.Highly pipelined (overlap.) Less pipelined or notlnstr. exec. by HW Instr. interpreted by micro prg.Fixed format instr. Var. instr. formatFew instr. and modes Many instr. and modesMultiple reg. sets Single reg. setComplexity in compiler Complexity in micro prg.
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-14
Distributed System Engineering Brunel Univ. E&CEng
RISC Processor examplesIBM 801 started 1975, publ. Radin 1982RISC I ~ 1980, Patterson et a] (VLSI)MIPS ~ 1981, Hennesy (VLSI)RISC 11 ~ 1982, Patterson & Sequin (VLSI)HP Prec. ~ 1985, open architectureSunSPARC ~ 1987, scalable processor arch.MIPS R2000 ~ micro. without interl. pipe stagesMIPS R3000MIPS R4000 ~ superscalar-superpipelinedAlpha
Early work at IBM on 801VLSI research at Berkeley and StanfordBerkley use multiple windows others compiler optimisation
Refs. Survey of RISC Architectures http://books.elsevier.com/companions/1558605967/appendices/1558605967-appendix-c.pdf
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-8
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-15
Distributed System Engineering Brunel Univ. E&CEng
RISC strategy
1. analysing applications for key instructions,2. executing key operations in hardware,3. perform most functions in sw.,4. add hw. features only if they yield a net performance
gain,5. include features only if indicated by detailed analysis of
substantial HLL programs.
Maximise the effective throughput of a design(considering hw and sw) by:-
Observations by John Cocke (1975 IBM ) :-CISC computers execute mostly simple instructions.
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-16
Distributed System Engineering Brunel Univ. E&CEng
Simple RISC pipelineoverlapped
MIPS example
IF D/RR ALU MDA WBR
lnstr.
IF D/RR ALU MDA WBRIF D/RR ALU MDA WBR
IF D/RR ALU MDA WBRIF D/RR ALU MDA WBR
IF D/RR ALU MDA WBR
1 2 3 4 5 6 7 8 9 10
123456
time
IF Instruction FetchD/RR Decode/Reg to Reg fetchALU Execution/eff address calcMDA Mem Data AccessWBR Write back to Reg.
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-9
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-17
Distributed System Engineering Brunel Univ. E&CEng
Instruction Comparisonmetric example, with size in bits:-instr. 8, m.addr. 16, reg.addr. 4 and data 32instr. 8, m.addr. 16, reg.addr. 4 and data 32
op:- A:=B+Cop:- A:=B+C
R ↔ R M ↔ R M ↔ M
R2 ← MB Acc ← MB MA ← MB+MC
R3 ← MC Acc ← Mc + AccR1 ← R2 + R3 MA ← AccMA ← R1
I = 104 bits I = 72 bits I = 56 bits
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-18
Distributed System Engineering Brunel Univ. E&CEng
Dynamic Performance analysis
where:
tcyc = time of single clock cycle
D = dynamic instruction count
CPI = average cycles per instruction (CPI)
For a given technology, C will be comparable for a RISC and a CISC architecture, possibly can be made smaller for RISC.
Time to run a given program can be calculated as:-
Texec = tcyc × D × CPITexec = tcyc × D × CPI
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-10
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-19
Distributed System Engineering Brunel Univ. E&CEng
Dynamic Performancecomparison
D is expected to be much greater for RISC, isfound not to be so due to instruction distribution advanced code generation techniques
→ 20% > CISC
CPI greatly reduced for RISC from 5 - 10 cpi for CISCto 1.6 - 2.0 cpi for RISC including cache and MM overheads.
→ 0.25 cpi for superscalar
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-20
Distributed System Engineering Brunel Univ. E&CEng
Dyn. Perf. Comparison VAX vs RISCVAX-1 1/780:-
1 Mips ( 106 average VAX instructions per sec.)
tcyc = 200 ns.
CPI = 5
Texec. = 0.2 × 5 × D μsec. (for VAX)Texec. = 0.2 × 2 × 1.2 D μsec. (for RISC)
→ RISC : CISCTexec. ratio of 0.48
→ speedup 2
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-11
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-21
Distributed System Engineering Brunel Univ. E&CEng
M 68020:-2 Mips ( 106 average VAX instructions per sec.)tcyc = 60 ns.
Determine the average cycles per instruction (CPI), and Calculate the speedup for RISC based on data given in slide 19.
Dyn. Perf. Comparison M68020 vs RISC
ex.
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-22
Distributed System Engineering Brunel Univ. E&CEng
Misc. benefitsReduced design complexity results in:-
lower design cost, reduced chip complexity, higher chip yield, reduced time to market, etc.
Operand Instructions generally: R op R → R or R op I → R
Penalty:-» increased compiler complexity
» pre-runtime scheduling
» code optimisation
» hardware instruction issue unitsRefs.: VAX ISAhttp://books.elsevier.com/companions/1558605967/appendices/1558605967-appendix-e.pdf
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-12
Electronic and Computer Engineering School of Engineering and Design Brunel University
Distributed System Engineering & Micro-Electronics
‘Computer Architecture’
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-23
Distributed System Engineering Brunel Univ. E&CEng
RISC summaryRISC designs aim to gain performance by:• reducing the no of cycles/instruction much faster than they lose performance by
executing more instructions,• fast context switching by using register windows,• complex compiler code optimisation to reduce processor stalls,• some processors have no interlocks for dependencies, are dealt with by the compiler,• super-scalar use multiple instruction streams for Instr. Level Para.• super-pipelined use multi phase clocks for enhanced pipelining over standard scalar
pipelines.• code optimisation for instruction level parallelism and scheduling carried out
statically at compile time.• Legacy binaries → CISC code running on RISC cores (Intel 86, AMD), translation
and optimisation by hardware at runtime.
→ Superpipelined supercalar of degree (n, m) → cpi < 12004 → Mutiple cores, Hyper threading
printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-24
Distributed System Engineering Brunel Univ. E&CEng
Answer Dyn. Perf. ComparisonM68020 vs RISC
M 68020:-
2 Mips ( 106 average VAX instructions per sec.)tcyc = 60 ns.CPI = (500 / 60) = 8 Texec = tcyc* CPI *D
Texec = 0.06* 8 *D (for M68020)Texec = 0.06 * 2 * 1.2 D (for RISC)
RISC : CISC;
Texec ratio of 0.30
speedup 3.3
Recommended