12
printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-1 Electronic and Computer Engineering School of Engineering and Design Brunel University Distributed System Engineering & Micro-Electronics ‘Computer Architecture’ printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-1 Distributed System Engineering Brunel Univ. E&CEng Distributed System Engineering Computer Architecture (DS2A) lecture 7 RISC vs CISC Peter van Santen Dept. of Electronic and Computer Engineering 2004 printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-2 Distributed System Engineering Brunel Univ. E&CEng RISC Overview Complex Instruction Set Computers Reduced Instruction Set Computers Instruction Analysis RISC machine Analysis RISC strategy Instruction comparison Dynamic performance analysis Advanced topics Delayed branch technique Register windows

lect07

Embed Size (px)

Citation preview

Page 1: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-1

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-1

Distributed System Engineering Brunel Univ. E&CEng

Distributed System Engineering Computer Architecture

(DS2A)lecture 7

RISC vs CISCPeter van Santen

Dept. of Electronic and Computer Engineering 2004

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-2

Distributed System Engineering Brunel Univ. E&CEng

RISC Overview• Complex Instruction Set Computers

• Reduced Instruction Set Computers

• Instruction Analysis

• RISC machine Analysis

• RISC strategy

• Instruction comparison

• Dynamic performance analysisAdvanced topics

Delayed branch technique

Register windows

Page 2: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-2

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-3

Distributed System Engineering Brunel Univ. E&CEng

Objectives• Historic Review of CISC

• Introduce RISC architecture

• Review background to RISC development

• Able to analyse and compare performance issues

• Relate concurrentising issues to instruction level parallelism and scheduling

Refs.: Hen03 Chapt. 2http://books.elsevier.com/companions/1558605967

Refs.: Hen03 Chapt. 2http://books.elsevier.com/companions/1558605967

Performance factors (Ic, p, m, k, tcyc)reduce instruction count Ic

reduce processor cpireduce cycle time tcyc

Performance factors (Ic, p, m, k, tcyc)reduce instruction count Ic

reduce processor cpireduce cycle time tcyc

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-4

Distributed System Engineering Brunel Univ. E&CEng

CISC

ISA for Complex Instruction Set Computer

Historic Arguments for:1. Greater variety in instructions would simplify compilers.2. More sophisticated instructions to reduce software problems.3. Metrics based on memory size (memory efficiency) and

program length.4. Micro-programming supported higher level functions directly

executable by microcode.5. Closure of semantic gap.

Page 3: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-3

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-5

Distributed System Engineering Brunel Univ. E&CEng

CISC techniques

Memory to memory architecture → less complex compiler.Reduction in cost of HW → greater hw complexity

→ micro-programming→ semantic gap closure→ complex instr. sets

(non-orthogonal).Writeable ctrl store → user instructions

→ vir. memory problems→ limited address space→ multi-process swapping.

Performance proportional to prog.. size.

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-6

Distributed System Engineering Brunel Univ. E&CEng

Main microprocessor Architecture families

CISCIntel x86 generic:

86, 286, 386, 486, Pentium, Pentium Pro, PIII, P4

Motorola:M 68x0 & 680x0

Digital VAX (VLSI)

RISCDigital Alpha series:

21064, 21164, 21264MIPS:

R2000, 3000, 4000, 5000, 8000, 10000

Sun SPARC:SPARC, MicroSPARC, SuperSPARC, UltraSPARC

HP/PA-RISCPowerPCIntel: i960

Page 4: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-4

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-7

Distributed System Engineering Brunel Univ. E&CEng

Instruction Set Analysis

Assignment 51 38 45 45IF 10 43 29 27Call 5 12 15 11Loop 9 3 5 6Goto 9 3 0 4Other 16 1 6 8

round-off errors in averages

Statement Fortran C Pascal Average

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-8

Distributed System Engineering Brunel Univ. E&CEng

CISC and RISC processor Architectures

CISC architecture with uCtrl unitand unified Cache.

Instr. dataMain memory

Data path

Datacache

HWCtrl unit

Instructioncache

Instr. & Datapath

cache

Main memory

Ctrlunit

Micro-programmedctrl mem

RISC with hardwired ctrl unit and split cache

Page 5: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-5

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-9

Distributed System Engineering Brunel Univ. E&CEng

RISC example Digital Alpha 21164

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-10

Distributed System Engineering Brunel Univ. E&CEng

CISC Example Digital VAX

Page 6: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-6

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-11

Distributed System Engineering Brunel Univ. E&CEng

CISC/RISC example Intel Pentium Pro

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-12

Distributed System Engineering Brunel Univ. E&CEng

Frequency of VariablesN terms locals parameters

0 - 22 411 80 17 192 15 20 153 3 14 94 2 8 7

≥5 0 20 8

Where:

Terms % occurrence in assignment statements.

Locals % occurrence local variables per procedure/function.

Parameters % occurrence of number of params in procedure calls.

Page 7: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-7

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-13

Distributed System Engineering Brunel Univ. E&CEng

RISC vs CISC

RISC CISC

Simple single cycle instr. Complex instr. in m-cyclesM ref. by Ld and St only Any instr. may ref. mem.Highly pipelined (overlap.) Less pipelined or notlnstr. exec. by HW Instr. interpreted by micro prg.Fixed format instr. Var. instr. formatFew instr. and modes Many instr. and modesMultiple reg. sets Single reg. setComplexity in compiler Complexity in micro prg.

RISC CISC

Simple single cycle instr. Complex instr. in m-cyclesM ref. by Ld and St only Any instr. may ref. mem.Highly pipelined (overlap.) Less pipelined or notlnstr. exec. by HW Instr. interpreted by micro prg.Fixed format instr. Var. instr. formatFew instr. and modes Many instr. and modesMultiple reg. sets Single reg. setComplexity in compiler Complexity in micro prg.

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-14

Distributed System Engineering Brunel Univ. E&CEng

RISC Processor examplesIBM 801 started 1975, publ. Radin 1982RISC I ~ 1980, Patterson et a] (VLSI)MIPS ~ 1981, Hennesy (VLSI)RISC 11 ~ 1982, Patterson & Sequin (VLSI)HP Prec. ~ 1985, open architectureSunSPARC ~ 1987, scalable processor arch.MIPS R2000 ~ micro. without interl. pipe stagesMIPS R3000MIPS R4000 ~ superscalar-superpipelinedAlpha

Early work at IBM on 801VLSI research at Berkeley and StanfordBerkley use multiple windows others compiler optimisation

Refs. Survey of RISC Architectures http://books.elsevier.com/companions/1558605967/appendices/1558605967-appendix-c.pdf

Page 8: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-8

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-15

Distributed System Engineering Brunel Univ. E&CEng

RISC strategy

1. analysing applications for key instructions,2. executing key operations in hardware,3. perform most functions in sw.,4. add hw. features only if they yield a net performance

gain,5. include features only if indicated by detailed analysis of

substantial HLL programs.

Maximise the effective throughput of a design(considering hw and sw) by:-

Observations by John Cocke (1975 IBM ) :-CISC computers execute mostly simple instructions.

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-16

Distributed System Engineering Brunel Univ. E&CEng

Simple RISC pipelineoverlapped

MIPS example

IF D/RR ALU MDA WBR

lnstr.

IF D/RR ALU MDA WBRIF D/RR ALU MDA WBR

IF D/RR ALU MDA WBRIF D/RR ALU MDA WBR

IF D/RR ALU MDA WBR

1 2 3 4 5 6 7 8 9 10

123456

time

IF Instruction FetchD/RR Decode/Reg to Reg fetchALU Execution/eff address calcMDA Mem Data AccessWBR Write back to Reg.

Page 9: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-9

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-17

Distributed System Engineering Brunel Univ. E&CEng

Instruction Comparisonmetric example, with size in bits:-instr. 8, m.addr. 16, reg.addr. 4 and data 32instr. 8, m.addr. 16, reg.addr. 4 and data 32

op:- A:=B+Cop:- A:=B+C

R ↔ R M ↔ R M ↔ M

R2 ← MB Acc ← MB MA ← MB+MC

R3 ← MC Acc ← Mc + AccR1 ← R2 + R3 MA ← AccMA ← R1

I = 104 bits I = 72 bits I = 56 bits

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-18

Distributed System Engineering Brunel Univ. E&CEng

Dynamic Performance analysis

where:

tcyc = time of single clock cycle

D = dynamic instruction count

CPI = average cycles per instruction (CPI)

For a given technology, C will be comparable for a RISC and a CISC architecture, possibly can be made smaller for RISC.

Time to run a given program can be calculated as:-

Texec = tcyc × D × CPITexec = tcyc × D × CPI

Page 10: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-10

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-19

Distributed System Engineering Brunel Univ. E&CEng

Dynamic Performancecomparison

D is expected to be much greater for RISC, isfound not to be so due to instruction distribution advanced code generation techniques

→ 20% > CISC

CPI greatly reduced for RISC from 5 - 10 cpi for CISCto 1.6 - 2.0 cpi for RISC including cache and MM overheads.

→ 0.25 cpi for superscalar

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-20

Distributed System Engineering Brunel Univ. E&CEng

Dyn. Perf. Comparison VAX vs RISCVAX-1 1/780:-

1 Mips ( 106 average VAX instructions per sec.)

tcyc = 200 ns.

CPI = 5

Texec. = 0.2 × 5 × D μsec. (for VAX)Texec. = 0.2 × 2 × 1.2 D μsec. (for RISC)

→ RISC : CISCTexec. ratio of 0.48

→ speedup 2

Page 11: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-11

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-21

Distributed System Engineering Brunel Univ. E&CEng

M 68020:-2 Mips ( 106 average VAX instructions per sec.)tcyc = 60 ns.

Determine the average cycles per instruction (CPI), and Calculate the speedup for RISC based on data given in slide 19.

Dyn. Perf. Comparison M68020 vs RISC

ex.

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-22

Distributed System Engineering Brunel Univ. E&CEng

Misc. benefitsReduced design complexity results in:-

lower design cost, reduced chip complexity, higher chip yield, reduced time to market, etc.

Operand Instructions generally: R op R → R or R op I → R

Penalty:-» increased compiler complexity

» pre-runtime scheduling

» code optimisation

» hardware instruction issue unitsRefs.: VAX ISAhttp://books.elsevier.com/companions/1558605967/appendices/1558605967-appendix-e.pdf

Page 12: lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-12

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-23

Distributed System Engineering Brunel Univ. E&CEng

RISC summaryRISC designs aim to gain performance by:• reducing the no of cycles/instruction much faster than they lose performance by

executing more instructions,• fast context switching by using register windows,• complex compiler code optimisation to reduce processor stalls,• some processors have no interlocks for dependencies, are dealt with by the compiler,• super-scalar use multiple instruction streams for Instr. Level Para.• super-pipelined use multi phase clocks for enhanced pipelining over standard scalar

pipelines.• code optimisation for instruction level parallelism and scheduling carried out

statically at compile time.• Legacy binaries → CISC code running on RISC cores (Intel 86, AMD), translation

and optimisation by hardware at runtime.

→ Superpipelined supercalar of degree (n, m) → cpi < 12004 → Mutiple cores, Hyper threading

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-24

Distributed System Engineering Brunel Univ. E&CEng

Answer Dyn. Perf. ComparisonM68020 vs RISC

M 68020:-

2 Mips ( 106 average VAX instructions per sec.)tcyc = 60 ns.CPI = (500 / 60) = 8 Texec = tcyc* CPI *D

Texec = 0.06* 8 *D (for M68020)Texec = 0.06 * 2 * 1.2 D (for RISC)

RISC : CISC;

Texec ratio of 0.30

speedup 3.3