lect07

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-1

Electronic and Computer Engineering School of Engineering and Design Brunel University

Distributed System Engineering & Micro-Electronics

‘Computer Architecture’

printed 15/11/2005 @ 17:50 PvS 2005 RISC vs CISC DS2A 7-1

Distributed System Engineering Brunel Univ. E&CEng

Distributed System Engineering Computer Architecture

(DS2A)lecture 7

RISC vs CISCPeter van Santen

Dept. of Electronic and Computer Engineering 2004

RISC Overview• Complex Instruction Set Computers

• Reduced Instruction Set Computers

• Instruction Analysis

• RISC machine Analysis

• RISC strategy

• Instruction comparison

• Dynamic performance analysisAdvanced topics

Delayed branch technique

Register windows

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-2

Objectives• Historic Review of CISC

• Introduce RISC architecture

• Review background to RISC development

• Able to analyse and compare performance issues

• Relate concurrentising issues to instruction level parallelism and scheduling

Refs.: Hen03 Chapt. 2http://books.elsevier.com/companions/1558605967

Performance factors (Ic, p, m, k, tcyc)reduce instruction count Ic

reduce processor cpireduce cycle time tcyc

Performance factors (Ic, p, m, k, tcyc)reduce instruction count Ic

reduce processor cpireduce cycle time tcyc

ISA for Complex Instruction Set Computer

Historic Arguments for:1. Greater variety in instructions would simplify compilers.2. More sophisticated instructions to reduce software problems.3. Metrics based on memory size (memory efficiency) and

program length.4. Micro-programming supported higher level functions directly

executable by microcode.5. Closure of semantic gap.

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-3

CISC techniques

Memory to memory architecture → less complex compiler.Reduction in cost of HW → greater hw complexity

→ micro-programming→ semantic gap closure→ complex instr. sets

(non-orthogonal).Writeable ctrl store → user instructions

→ vir. memory problems→ limited address space→ multi-process swapping.

Performance proportional to prog.. size.

Main microprocessor Architecture families

CISCIntel x86 generic:

86, 286, 386, 486, Pentium, Pentium Pro, PIII, P4

Motorola:M 68x0 & 680x0

Digital VAX (VLSI)

RISCDigital Alpha series:

21064, 21164, 21264MIPS:

R2000, 3000, 4000, 5000, 8000, 10000

Sun SPARC:SPARC, MicroSPARC, SuperSPARC, UltraSPARC

HP/PA-RISCPowerPCIntel: i960

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-4

Instruction Set Analysis

Assignment 51 38 45 45IF 10 43 29 27Call 5 12 15 11Loop 9 3 5 6Goto 9 3 0 4Other 16 1 6 8

round-off errors in averages

Statement Fortran C Pascal Average

CISC and RISC processor Architectures

CISC architecture with uCtrl unitand unified Cache.

Instr. dataMain memory

Data path

Datacache

HWCtrl unit

Instructioncache

Instr. & Datapath

Main memory

Ctrlunit

Micro-programmedctrl mem

RISC with hardwired ctrl unit and split cache

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-5

RISC example Digital Alpha 21164

CISC Example Digital VAX

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-6

CISC/RISC example Intel Pentium Pro

Frequency of VariablesN terms locals parameters

0 - 22 411 80 17 192 15 20 153 3 14 94 2 8 7

≥5 0 20 8

Where:

Terms % occurrence in assignment statements.

Locals % occurrence local variables per procedure/function.

Parameters % occurrence of number of params in procedure calls.

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-7

RISC vs CISC

RISC CISC

Simple single cycle instr. Complex instr. in m-cyclesM ref. by Ld and St only Any instr. may ref. mem.Highly pipelined (overlap.) Less pipelined or notlnstr. exec. by HW Instr. interpreted by micro prg.Fixed format instr. Var. instr. formatFew instr. and modes Many instr. and modesMultiple reg. sets Single reg. setComplexity in compiler Complexity in micro prg.

RISC CISC

Simple single cycle instr. Complex instr. in m-cyclesM ref. by Ld and St only Any instr. may ref. mem.Highly pipelined (overlap.) Less pipelined or notlnstr. exec. by HW Instr. interpreted by micro prg.Fixed format instr. Var. instr. formatFew instr. and modes Many instr. and modesMultiple reg. sets Single reg. setComplexity in compiler Complexity in micro prg.

RISC Processor examplesIBM 801 started 1975, publ. Radin 1982RISC I ~ 1980, Patterson et a] (VLSI)MIPS ~ 1981, Hennesy (VLSI)RISC 11 ~ 1982, Patterson & Sequin (VLSI)HP Prec. ~ 1985, open architectureSunSPARC ~ 1987, scalable processor arch.MIPS R2000 ~ micro. without interl. pipe stagesMIPS R3000MIPS R4000 ~ superscalar-superpipelinedAlpha

Early work at IBM on 801VLSI research at Berkeley and StanfordBerkley use multiple windows others compiler optimisation

Refs. Survey of RISC Architectures http://books.elsevier.com/companions/1558605967/appendices/1558605967-appendix-c.pdf

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-8

RISC strategy

1. analysing applications for key instructions,2. executing key operations in hardware,3. perform most functions in sw.,4. add hw. features only if they yield a net performance

gain,5. include features only if indicated by detailed analysis of

substantial HLL programs.

Maximise the effective throughput of a design(considering hw and sw) by:-

Observations by John Cocke (1975 IBM ) :-CISC computers execute mostly simple instructions.

Simple RISC pipelineoverlapped

MIPS example

IF D/RR ALU MDA WBR

lnstr.

IF D/RR ALU MDA WBRIF D/RR ALU MDA WBR

IF D/RR ALU MDA WBR

1 2 3 4 5 6 7 8 9 10

123456

IF Instruction FetchD/RR Decode/Reg to Reg fetchALU Execution/eff address calcMDA Mem Data AccessWBR Write back to Reg.

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-9

Instruction Comparisonmetric example, with size in bits:-instr. 8, m.addr. 16, reg.addr. 4 and data 32instr. 8, m.addr. 16, reg.addr. 4 and data 32

op:- A:=B+Cop:- A:=B+C

R ↔ R M ↔ R M ↔ M

R2 ← MB Acc ← MB MA ← MB+MC

R3 ← MC Acc ← Mc + AccR1 ← R2 + R3 MA ← AccMA ← R1

I = 104 bits I = 72 bits I = 56 bits

Dynamic Performance analysis

where:

tcyc = time of single clock cycle

D = dynamic instruction count

CPI = average cycles per instruction (CPI)

For a given technology, C will be comparable for a RISC and a CISC architecture, possibly can be made smaller for RISC.

Time to run a given program can be calculated as:-

Texec = tcyc × D × CPITexec = tcyc × D × CPI

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-10

Dynamic Performancecomparison

D is expected to be much greater for RISC, isfound not to be so due to instruction distribution advanced code generation techniques

→ 20% > CISC

CPI greatly reduced for RISC from 5 - 10 cpi for CISCto 1.6 - 2.0 cpi for RISC including cache and MM overheads.

→ 0.25 cpi for superscalar

Dyn. Perf. Comparison VAX vs RISCVAX-1 1/780:-

1 Mips ( 106 average VAX instructions per sec.)

tcyc = 200 ns.

CPI = 5

Texec. = 0.2 × 5 × D μsec. (for VAX)Texec. = 0.2 × 2 × 1.2 D μsec. (for RISC)

→ RISC : CISCTexec. ratio of 0.48

→ speedup 2

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-11

M 68020:-2 Mips ( 106 average VAX instructions per sec.)tcyc = 60 ns.

Determine the average cycles per instruction (CPI), and Calculate the speedup for RISC based on data given in slide 19.

Dyn. Perf. Comparison M68020 vs RISC

Misc. benefitsReduced design complexity results in:-

lower design cost, reduced chip complexity, higher chip yield, reduced time to market, etc.

Operand Instructions generally: R op R → R or R op I → R

Penalty:-» increased compiler complexity

» pre-runtime scheduling

» code optimisation

» hardware instruction issue unitsRefs.: VAX ISAhttp://books.elsevier.com/companions/1558605967/appendices/1558605967-appendix-e.pdf

printed 15/11/2005 @ 17:50 PvS 2005 DS2A 7-12

RISC summaryRISC designs aim to gain performance by:• reducing the no of cycles/instruction much faster than they lose performance by

executing more instructions,• fast context switching by using register windows,• complex compiler code optimisation to reduce processor stalls,• some processors have no interlocks for dependencies, are dealt with by the compiler,• super-scalar use multiple instruction streams for Instr. Level Para.• super-pipelined use multi phase clocks for enhanced pipelining over standard scalar

pipelines.• code optimisation for instruction level parallelism and scheduling carried out

statically at compile time.• Legacy binaries → CISC code running on RISC cores (Intel 86, AMD), translation

and optimisation by hardware at runtime.

→ Superpipelined supercalar of degree (n, m) → cpi < 12004 → Mutiple cores, Hyper threading

Answer Dyn. Perf. ComparisonM68020 vs RISC

M 68020:-

2 Mips ( 106 average VAX instructions per sec.)tcyc = 60 ns.CPI = (500 / 60) = 8 Texec = tcyc* CPI *D

Texec = 0.06* 8 *D (for M68020)Texec = 0.06 * 2 * 1.2 D (for RISC)

RISC : CISC;

Texec ratio of 0.30

speedup 3.3

lect07

Documents

Outline File Managementcs3231/10s1/lectures/lect07... · 2010-03-25 · File Management System • Provide a convenient naming system for files • Provide uniform I/O support for

Lecture Notes 7 Random Processes - Stanford Universityisl.stanford.edu/~abbas/ee178/lect07-2.pdf · Lecture Notes 7 Random Processes • Deﬁnition • IID Processes • Bernoulli

Lecture 7: The Biological Revolution: What is Life?pogge/Ast141/Unit2/Lect07...Lecture 7: The Biological Revolution: What is Life? Astronomy 141 – Winter 2012 This lecture explores

Lect07-System Analysis Using Transfer Function Representation 2010b1-Libre

Chapter 3 Deadlocks - Computer Science and Engineeringcgi.cse.unsw.edu.au/~cs3231/04s1/lectures/lect07.pdf · Deadlocks Chapter 3 3.1. Resource 3.2. ... Four Conditions for Deadlock

(lect14 clustering) - Michigan State Universityashton/classes/866/notes/lect07/clustering_lecture.pdfMPICP b91L BLonba la q.JG qoaæçs cpsu cps aonb - 0k 1010 PG(MGGLJ 1011.JGq C

Lect10 Marine Ecosystems. Estuaries Marine Shores Shallow Marine Waters Oceans Lect07 Marine Ecosystems

Lect07 handout

Comparative Planetology I: Our Solar Systemsolar.gmu.edu/teaching/ASTR111_2007/lect07/ch07_note_zhang.pdf · Introduction To Modern Astronomy I: Solar System Planets and Moons (chap

lecture01 [相容模式]mll.csie.ntu.edu.tw/course/comp_prog_f12/lecture/lecture01.pdf · Lect05: Complex Types Lect06: Memory Model Lect07: Performance Lect08: Preprocessor Lect09:

Experimental Flow Vispascucci/classes/2009_fall/files/lect07.pdfExperimental Flow Vis. Experimental Flow Vis. Experimental Flow Vis. Experimental Flow Vis. Why would we not stick with

Introduction to CMOS VLSI Design (E158) Harris Lecture 7 ...pages.hmc.edu/harris/class/e158/01/lect07.pdf · Introduction to CMOS VLSI Design (E158 ... about how to represent a design

Time Series Analysis - Technical University of Denmarkhm/time.series.analysis/slides/lect07.pdf · 2007. 8. 23. · Henrik Madsen 2 H. Madsen, Time Series Analysis, Chapmann Hall

Introduction to General and Generalized Linear Models ...hmad/GLM/slides/lect07.pdf · The Generalized Linear Model The canonical link The canonical link is the function which transforms

lect07 signals fourier - Queen's Uphys352/lect07.pdf · Sampling: Key Fourier Transforms impulse train: infinite periodic pattern of delta functions, also called a Dirac comb convolution

the rst step: vectorizing words - University of Cambridgews390/course/l90/lect07-word... · 2020. 11. 17. · some yinkish dripners blorked quasto cally into nindin with the pidibs

COMP284 Scripting Languages - Handoutscgi.csc.liv.ac.uk/~ullrich/COMP284/notes/lect07.pdf1 Web applications Overview HTML forms 2 Available information and Input Overview PHP environment

Lecture 4 - Software Reuse - University of Cypruscs00pe/epl603/lectures/Lect07.... Internet and Web Pioneers: Richard Stallman, 2006. . …

Deﬁnition FPN S ampling - Information Systems Laboratoryisl.stanford.edu/~abbas/ee392b/lect07.pdfFPN) • called y t nterconnect r • is ﬁxed f v o is output e v ij r 1 ≤ i

Optics and Telescope - George Mason Universitysolar.gmu.edu › teaching › ASTR111_2007 › lect07 › ch06_note...telescopes in the world are reflectors: 8 -10 meters in diameter