RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How...

Preview:

Citation preview

RISC:Reduced Instruction Set RISC:Reduced Instruction Set ComputingComputing

OverviewOverview

What is RISC architecture?How did RISC evolve?How does RISC use instruction pipelining?How does RISC use register windowing?What is the future of RISC ?

Early MicroprocessorsEarly Microprocessors

Early Microprocessors were very simpleThey had a small instruction setGradually, more and more instructions were

added

CISC: Complex Instruction Set CISC: Complex Instruction Set ComputingComputing

May include over 300 instructions Approximately a 1:1 relationship with

higher level languagesOnly some of these instructions are used all

the time

Why are more instructions Why are more instructions slower ?slower ?

A 16 instruction set uses a 4 to 16 decoder

If you had a 32 instruction set, you would have to use a 5 to 32 decoder

The larger the decoder, the longer the propagation delay

Problem with CISCProblem with CISC

The more instructions in the instruction set, the larger the propagation delay

CISC is too slow

Get rid of some of those Get rid of some of those InstructionsInstructions

It takes 20 ns to complete each instruction

If we reduce the instruction set, we can get it down to 18 ns to complete each instruction

Every instruction we deleted can be replaced by 3 of the simpler remaining instructions

We choose to eliminate instructions used less than 2% of the time

Consider ThisConsider This

100%(20 c) vs. 98% (18c) + 2%(54c)

=20c vs. 17.64c + 1.08 c

20c > 18.72c

In this case, reducing instructions is faster

Don’t reduce too muchDon’t reduce too much

- say we eliminate instructions used 10% of the time

100%(20 c) vs. 90% (18c) + 10%(54c)

=20c vs. 16.2c + 5.4 c

20c < 21.6c

If we reduce our instruction set too much, the end result could be slower

RISC: Reduced Instruction RISC: Reduced Instruction Set ArchitectureSet Architecture

Fewer than 100 instructions in instruction set

Fixed Length InstructionsLimited Loading and Storing instructions Fewer Addressing modesInstruction PipelineLarge number of registers

RISC:Reduced Instruction Set RISC:Reduced Instruction Set Architecture cont.Architecture cont.

Hardwired control unitDelayed loads and branchesSpeculative Execution of InstructionsOptimizing compilerSeparated Instruction and Data Streams

RISC vs. CISCRISC vs. CISC

RISC Faster Less complicated

instruction set More difficult to

program

CISC Slower More complicated

instruction set Easier to program

Ex:Fixed Length InstructionsEx:Fixed Length InstructionsInstructional Formats for SPARC CPUInstructional Formats for SPARC CPU

Sparc CPU addSparc CPU addr1r1r2+r3r2+r3

Format of instruction: op2 = add Destination register : 00001 : register 1 Add : 000000 Source register: 00010 : register 2 0 00000000 : unused in this instruction Source register: register 3

1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1

op2 Register 1 Add Register 2 Not used Register 3

PipelinesPipelines

Assembly Lines and PipelinesAssembly Lines and PipelinesWhy are assembly lines cool?

Work on more than one item at a time

Finish more items faster

Instruction PipelinesInstruction Pipelines

Very similar to assembly lines in manufacturing

Divides the execution of a task into several stages

Then it can work on more than one task at a time

Overall, faster , and more efficient

Pipeline example: 3 stagesPipeline example: 3 stages

Fetch

instruction

Decode Instruction

Select registers

Execute Instruction

Store Result

Each stage must be completed in 1 clock cycle for this to work

Example 1:Example 1:r1r1r2 +r3r2 +r3r4 r4 r5+r6r5+r6r7 r7 r8+r9r8+r9

Fetch instruction 1

Decode instruction 1, select registers

Execute instruction 1, store results

10 0001 000000 00010 0 00000000 00011

Add r2 + r3 3+2=5

r1 5

Fetch instruction 1

Decode instruction 1, select registers

Execute instruction 1, store results

10 0001 000000 00010 0 00000000 00011

Add r2 + r3

r2=2, r3=3

3+2=5

r1 5

10 0100 000000 01010 0 00000000 00110

Add r5 + r6

r5=5, r6=6

5+6=11

r7 11

10 0111 000000 01000 0 00000000 01001

Add r8 + r9

r8=8,r9=9

8+9=17

r7 17

Fetch instruction 2

Decode instruction 2, select registers

Execute instruction 2, store results

Fetch instruction 3

Decode instruction 3, select registers

Execute instruction 3, store results

r1r1r2 +r3r2 +r3r4 r4 r5+r6r5+r6r7 r7 r8+r9r8+r9

t1 t2 t3 t4 t5

Consider a more problematic Consider a more problematic exampleexample

r1r1r2 +r3 r2 +r3

r4r4r1 +r3 r1 +r3

r5r5r6 +r3r6 +r3

Fetch instruction 1

Decode instruction 1, select registers

Execute instruction 1, store results

10 0001 000000 00010 0 00000000 00011

Add r2 + r3

r2=2, r3=3

3+2=5

r1 5

10 0100 000000 00001 0 00000000 00011

Add r1 + r3

r1=1, r3=3

3+1=4

r4 4

10 0111 000000 01000 0 00000000 01001

Add r6 + r3

r6=6,r3=3

6+3=9

r5 9

Fetch instruction 2

Decode instruction 2, select registers

Execute instruction 2, store results

Fetch instruction 3

Decode instruction 3, select registers

Execute instruction 3, store results

r1r1r2 +r3r2 +r3r4 r4 r1+r3r1+r3r5 r5 r6+r3r6+r3

t1 t2 t3 t4 t5

Problem: data conflict

Since t3 is not yet completed, r1 contains wrong value

Solutions to Data ConflictSolutions to Data Conflict

No-op insertionsInstruction reorderingStall insertionsData forwarding

Fetch instruction 1

Decode instruction 1, select registers

Execute instruction 1, store results

10 0001 000000 00010 0 00000000 00011

Add r2 + r3

r2=2, r3=3

3+2=5

r1 5

10 0100 000000 00001 0 00000000 00011

Add r1 + r3

r1=5, r3=3

3+5=8

r4 4

10 0111 000000 01000 0 00000000 01001

Add r6 + r3

r6=6,r3=3

6+3=9

r5 9

Fetch instruction 2

Decode instruction 2, select registers

Execute instruction 2, store results

Fetch instruction 3

Decode instruction 3, select registers

Execute instruction 3, store results

r1r1r2 +r3r2 +r3r4 r4 r1+r3r1+r3r5 r5 r6+r3r6+r3

t1 t2 t3 t4 t5

Solution1: add No OpSolution1: add No Op

No OP

No op

Possible problems with no-opPossible problems with no-op

SlowerWastes time

Fetch instruction 1

Decode instruction 1, select registers

Execute instruction 1, store results

10 0001 000000 00010 0 00000000 00011

Add r2 + r3

r2=2, r3=3

3+2=5

r1 5

10 0100 000000 00001 0 00000000 00011

Add r6 + r3

r6=6, r3=3

6+3=9

r5 9

10 0111 000000 01000 0 00000000 01001

Add r1 + r3

r1=5,r3=3

5+3=8

r1 8

Fetch instruction 2

Decode instruction 2, select registers

Execute instruction 2, store results

Fetch instruction 3

Decode instruction 3, select registers

Execute instruction 3, store results

r1r1r2 +r3r2 +r3r5 r5 r6+r3r6+r3

r4 r4 r1+r3r1+r3

t1 t2 t3 t4 t5

Solution2: instruction reorderingSolution2: instruction reordering

Possible problems with re-Possible problems with re-orderingordering

It is not possible to reorder every set of operations successfully

Consider:r1r1 +r2

r1r1 +r3r1r1 +r4

Fetch instruction 1

Decode instruction 1, select registers

Execute instruction 1, store results

10 0001 000000 00010 0 00000000 00011

Add r2 + r3

r2=2, r3=3

3+2=5

r1 5

10 0100 000000 00001 0 00000000 00011

Add r1 + r3

r1=5, r3=3

3+5=8

r4 4

10 0111 000000 01000 0 00000000 01001

Add r6 + r3

r6=6,r3=3

6+3=9

r5 9

Fetch instruction 2

Decode instruction 2, select registers

Execute instruction 2, store results

Fetch instruction 3

Decode instruction 3, select registers

Execute instruction 3, store results

r1r1r2 +r3r2 +r3r4 r4 r1+r3r1+r3r5 r5 r6+r3r6+r3

t1 t2 t3 t4 t5

Solution3: add stall insertionSolution3: add stall insertion

stall

stall

Fetch instruction 1

Decode instruction 1, select registers

Execute instruction 1, store results

10 0001 000000 00010 0 00000000 00011

Add r2 + r3

r2=2, r3=3

3+2=5

r1 5

10 0100 000000 00001 0 00000000 00011

Add r1 + r3

r1=5, r3=3

3+5=8

r4 4

10 0111 000000 01000 0 00000000 01001

Add r6 + r3

r6=6,r3=3

6+3=9

r5 9

Fetch instruction 2

Decode instruction 2, select registers

Execute instruction 2, store results

Fetch instruction 3

Decode instruction 3, select registers

Execute instruction 3, store results

r1r1r2 +r3r2 +r3r4 r4 r1+r3r1+r3r5 r5 r6+r3r6+r3

t1 t2 t3 t4 t5

Solution4: data forwardingSolution4: data forwarding

Data passed within same time cycle to next instruction

Solutions to Data ConflictSolutions to Data ConflictNo-Op insertions Slow and Wasteful

Stall insertions

Instruction Reordering

not always possible

Data forwarding

Register WindowingRegister Windowing

Each window overlaps with the next

Main method would be window1

Subroutine is window 2

Since they overlap, window 2 can return values to window 1 easily

SummarySummary

RISC architecture definedBenefits and drawbacks of RISC

architecturePipelines

– Problems with pipelines

Register Windowing

Future of RISCFuture of RISC

Hotly debatedCISC is still easier to support

– Provides backward compatibility

RISC is fasterMore than likely, see a convergence of the 2

systems– Ex: Pentium Processor

Recommended