63
Sequential CPU Sequential CPU Implementation Implementation

Sequential CPU Implementation

  • Upload
    onaona

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Sequential CPU Implementation. Suggested Reading. - Chap 4.3. Byte. 0. 1. 2. 3. 4. 5. nop. 0. 0. addl. 6. 0. halt. 1. 0. subl. 6. 1. rrmovl rA , rB. 2. 0. rA. rB. andl. 6. 2. irmovl V , rB. 3. 0. 8. rB. V. xorl. 6. 3. rmmovl rA , D ( rB ). 4. 0. rA. - PowerPoint PPT Presentation

Citation preview

Page 1: Sequential CPU Implementation

Sequential CPUSequential CPUImplementationImplementationSequential CPUSequential CPUImplementationImplementation

Page 2: Sequential CPU Implementation

– 2 – Processor

Suggested ReadingSuggested Reading

- - Chap 4.3Chap 4.3

Page 3: Sequential CPU Implementation

– 3 – Processor

Y86 Instruction Set P259Y86 Instruction Set P259Byte 0 1 2 3 4 5

pushl rA A 0 rA 8

jXX Dest 7 fn Dest

popl rA B 0 rA 8

call Dest 8 0 Dest

rrmovl rA, rB 2 0 rA rB

irmovl V, rB 3 0 8 rB V

rmmovl rA, D(rB) 4 0 rA rB D

mrmovl D(rB), rA 5 0 rA rB D

OPl rA, rB 6 fn rA rB

ret 9 0

nop 0 0

halt 1 0

addl 6 0

subl 6 1

andl 6 2

xorl 6 3

jmp 7 0

jle 7 1

jl 7 2

je 7 3

jne 7 4

jge 7 5

jg 7 6

Page 4: Sequential CPU Implementation

– 4 – Processor

Building Blocks P278, P279, P280Building Blocks P278, P279, P280

Combinational LogicCombinational Logic Compute Boolean functions of

inputs Continuously respond to input

changes Operate on data and implement

control

Storage ElementsStorage Elements Store bits Addressable memories Non-addressable registers Loaded only as clock rises

Registerfile

Registerfile

A

B

W dstW

srcA

valA

srcB

valB

valW

Clock

ALU

fun

A

B

MUX

0

1

=

Clock

Page 5: Sequential CPU Implementation

– 5 – Processor

Hardware Control LanguageHardware Control Language

Very simple hardware description language Can only express limited aspects of hardware operation

Parts we want to explore and modify

Data TypesData Types bool: Boolean

a, b, c, …

int: wordsA, B, C, …Does not specify word size---bytes, 32-bit words, …

StatementsStatements bool a = bool-expr ; int A = int-expr ;

Page 6: Sequential CPU Implementation

– 6 – Processor

HCL OperationsHCL Operations

Classify by type of value returned

Boolean ExpressionsBoolean Expressions Logic Operations

a && b, a || b, !a Word Comparisons

A == B, A != B, A < B, A <= B, A >= B, A > B Set Membership

A in { B, C, D }» Same as A == B || A == C || A == D

Word ExpressionsWord Expressions Case expressions

[ a : A; b : B; c : C ]Evaluate test expressions a, b, c, … in sequenceReturn word expression A, B, C, … for first successful test

Page 7: Sequential CPU Implementation

– 7 – Processor

4.3.1Instruction Execution Stages P281

4.3.1Instruction Execution Stages P281

FetchFetch Read instruction from instruction memory

DecodeDecode Read program registers

ExecuteExecute Compute value or address

MemoryMemory Read or write data

Write BackWrite Back Write program registers

PCPC Update program counter

Page 8: Sequential CPU Implementation

– 8 – Processor

Instruction DecodingInstruction Decoding

Instruction FormatInstruction Format Instruction byte icode:ifun Optional register byte rA:rB Optional constant word valC

5 0 rA rB D

icodeifun

rArB

valC

Optional Optional

Page 9: Sequential CPU Implementation

– 9 – Processor

Figure 4.16 P283 Executing Arith./Logical OperationFigure 4.16 P283 Executing Arith./Logical Operation

FetchFetch Read 2 bytes

DecodeDecode Read operand registers

ExecuteExecute Perform operation Set condition codes

MemoryMemory Do nothing

Write backWrite back Update register

PC UpdatePC Update Increment PC by 2

OPl rA, rB 6 fn rA rB

Page 10: Sequential CPU Implementation

– 10 – Processor

Stage Computation: Arith/Log. OpsP283 Figure 4.16

Stage Computation: Arith/Log. OpsP283 Figure 4.16

Formulate instruction execution as sequence of simple steps

Use same general form for all instructions

OPl rA, rB

icode:ifun M1[PC]

rA:rB M1[PC+1]

valP PC+2

Fetch

Read instruction byte

Read register byte

Compute next PC

valA R[rA]

valB R[rB]Decode

Read operand A

Read operand B

valE valB OP valA

Set CCExecute

Perform ALU operation

Set condition code register

Memory

R[rB] valE

Write

back

Write back result

PC valPPC update Update PC

M1[PC] 表示从 PC开始的内存中读取一个字节的数据。

Page 11: Sequential CPU Implementation

– 11 – Processor

Executing rrmovlExecuting rrmovl

FetchFetch Read 2 bytes

DecodeDecode Read operand register rA

ExecuteExecute Do nothing

MemoryMemory Do nothing

Write backWrite back Update register

PC UpdatePC Update Increment PC by 2

rrmovl rA, rB 2 0 rA rB

Page 12: Sequential CPU Implementation

– 12 – Processor

Stage Computation: rrmovlP283 Figure 4.16

Stage Computation: rrmovlP283 Figure 4.16

Formulate instruction execution as sequence of simple steps

Use same general form for all instructions

rrmovl rA, rB

icode:ifun M1[PC]

rA:rB M1[PC+1]

valP PC+2

Fetch

Read instruction byte

Read register byte

Compute next PC

valA R[rA]Decode

Read operand A

valE 0 + valAExecute

Perform ALU operation

*valE Memory

R[rB] valE

Write

back

Write back result

PC valPPC update Update PC

Page 13: Sequential CPU Implementation

– 13 – Processor

Executing irmovlExecuting irmovl

FetchFetch Read 6 bytes

DecodeDecode Do nothing

ExecuteExecute Do nothing

MemoryMemory Do nothing

Write backWrite back Update register

PC UpdatePC Update Increment PC by 6

irmovl V, rB 3 0 8 rB V

Page 14: Sequential CPU Implementation

– 14 – Processor

Stage Computation: irmovlP283 Figure 4.16

Stage Computation: irmovlP283 Figure 4.16

Formulate instruction execution as sequence of simple steps

Use same general form for all instructions

irmovl rA, rB

icode:ifun M1[PC]

rA:rB M1[PC+1]

valC M4[PC+2]valP PC+6

Fetch

Read instruction byte

Read register byte

Read constant value

Compute next PC

Decode

valE 0 + valCExecute

Perform ALU operation

Memory

R[rB] valE

Write

back

Write back result

PC valPPC update Update PC

Page 15: Sequential CPU Implementation

– 15 – Processor

Figure 4.17 P283

Executing rmmovlFigure 4.17 P283

Executing rmmovl

FetchFetch Read 6 bytes

DecodeDecode Read operand registers

ExecuteExecute Compute effective address

MemoryMemory Write to memory

Write backWrite back Do nothing

PC UpdatePC Update Increment PC by 6

rmmovl rA, D(rB) 4 0 rA rB D

Page 16: Sequential CPU Implementation

– 16 – Processor

Stage Computation: rmmovlP283Figure 4.17

Stage Computation: rmmovlP283Figure 4.17

Use ALU for address computation

rmmovl rA, D(rB)

icode:ifun M1[PC]

rA:rB M1[PC+1]

valC M4[PC+2]valP PC+6

Fetch

Read instruction byte

Read register byte

Read displacement D

Compute next PC

valA R[rA]

valB R[rB]Decode

Read operand A

Read operand B

valE valB + valCExecute

Compute effective address(sum of the displacement and the base register value)

M4[valE] valAMemory Write value to memory

Write

back

PC valPPC update Update PC

Page 17: Sequential CPU Implementation

– 17 – Processor

Executing mrmovlExecuting mrmovl

FetchFetch Read 6 bytes

DecodeDecode Read operand registers rB

ExecuteExecute Compute effective address

MemoryMemory Read from memory

Write backWrite back Update register rA

PC UpdatePC Update Increment PC by 6

mrmovl D(rB),rA 5 0 rA rB D

Page 18: Sequential CPU Implementation

– 18 – Processor

Stage Computation: mrmovlP283 Figure 4.17

Stage Computation: mrmovlP283 Figure 4.17

Use ALU for address computation

mrmovl D(rB) , rAicode:ifun M1[PC]

rA:rB M1[PC+1]

valC M4[PC+2]valP PC+6

Fetch

Read instruction byte

Read register byte

Read displacement D

Compute next PC

valB R[rB]Decode

Read operand B

valE valB + valCExecute

Compute effective address

valM M4[valE]Memory Read data from memory

R[rA] valM

Write

back Update register rA

PC valPPC update Update PC

Page 19: Sequential CPU Implementation

– 19 – Processor

Figure 4.18 P284 Executing pushlFigure 4.18 P284 Executing pushl

FetchFetch Read 2 bytes

DecodeDecode Read stack pointer and rA

ExecuteExecute Decrement stack pointer by

4

MemoryMemory Store valA at the address of

new stack pointer

Write backWrite back Update stack pointer

PC UpdatePC Update Increment PC by 2

pushl rA a 0 rA 8

Page 20: Sequential CPU Implementation

– 20 – Processor

Stage Computation: pushl P284Figure 4.18

Stage Computation: pushl P284Figure 4.18

Use ALU to Decrement stack pointer

pushl rA

icode:ifun M1[PC]

rA:rB M1[PC+1]

valP PC+2

Fetch

Read instruction byte

Read register byte

Compute next PC

valA R[rA]

valB R [%esp]Decode

Read valA

Read stack pointer

valE valB + (-4)Execute

Decrement stack pointer

M4[valE] valAMemory Store to stack

R[%esp] valEWrite

back

Update stack pointer*在 write back之前实际上写入的元素在堆栈外。

PC valPPC update Update PC

Page 21: Sequential CPU Implementation

– 21 – Processor

Executing poplExecuting popl

FetchFetch Read 2 bytes

DecodeDecode Read stack pointer

ExecuteExecute Increment stack pointer by 4

MemoryMemory Read from old stack pointer

Write backWrite back Update stack pointer Write result to register

PC UpdatePC Update Increment PC by 2

popl rA b 0 rA 8

Page 22: Sequential CPU Implementation

– 22 – Processor

Stage Computation: poplP284 Figure 4.18

Stage Computation: poplP284 Figure 4.18

Use ALU to increment stack pointer Must update two registers

Popped valueNew stack pointer

popl rA

icode:ifun M1[PC]

rA:rB M1[PC+1]

valP PC+2

Fetch

Read instruction byte

Read register byte

Compute next PC

valA R[%esp]

valB R [%esp]Decode

Read stack pointer

Read stack pointer

valE valB + 4Execute

Increment stack pointer

valM M4[valA]Memory Read from stack

R[%esp] valE

R[rA] valM

Write

back

Update stack pointer

Write back result

PC valPPC update Update PC

在写回阶段要写两个寄存器,这两个写是有先后次序的。必须按照上面的方法进行,因为 rA可能就是%esp。具体见书 P270 Practice Problem 4.5.

Page 23: Sequential CPU Implementation

– 23 – Processor

Figure 4.19 P284 Executing JumpsFigure 4.19 P284 Executing Jumps

FetchFetch Read 5 bytes Increment PC by 5

DecodeDecode Do nothing

ExecuteExecute Determine whether to take

branch based on jump condition and condition codes

MemoryMemory Do nothing

Write backWrite back Do nothing

PC UpdatePC Update Set PC to Dest if branch

taken or to incremented PC if not branch

jXX Dest 7 fn Dest

XX XXfall thru:

XX XXtarget:

Not taken

Taken

Page 24: Sequential CPU Implementation

– 24 – Processor

Stage Computation: JumpsP284 Figure 4.19

Stage Computation: JumpsP284 Figure 4.19

Compute both addresses Choose based on setting of condition codes and branch

condition

jXX Dest

icode:ifun M1[PC]

valC M4[PC+1]valP PC+5

Fetch

Read instruction byte

Read destination address

Fall through address

Decode

Bch Cond(CC,ifun)Execute

Take branch?

Memory

Write

back

PC Bch ? valC : valPPC update Update PC

Page 25: Sequential CPU Implementation

– 25 – Processor

Executing callExecuting call

FetchFetch Read 5 bytes Increment PC by 5

DecodeDecode Read stack pointer

ExecuteExecute Decrement stack pointer by

4

MemoryMemory Write incremented PC to

new value of stack pointer

Write backWrite back Update stack pointer

PC UpdatePC Update Set PC to Dest

call Dest 8 0 Dest

XX XXreturn:

XX XXtarget:

Page 26: Sequential CPU Implementation

– 26 – Processor

Stage Computation: callP284 Figure 4.19

Stage Computation: callP284 Figure 4.19

Use ALU to decrement stack pointer Store incremented PC

call Dest

icode:ifun M1[PC]

valC M4[PC+1]valP PC+5

Fetch

Read instruction byte

Read destination address

Compute return point

valB R[%esp]Decode

Read stack pointer

valE valB + –4Execute

Decrement stack pointer

M4[valE] valP Memory Write return value on stack

R[%esp] valE

Write

back

Update stack pointer

PC valCPC update Set PC to destination

Page 27: Sequential CPU Implementation

– 27 – Processor

Executing retExecuting ret

FetchFetch Read 1 byte

DecodeDecode Read stack pointer

ExecuteExecute Increment stack pointer by 4

MemoryMemory Read return address from

old stack pointer

Write backWrite back Update stack pointer

PC UpdatePC Update Set PC to return address

ret 9 0

XX XXreturn:

Page 28: Sequential CPU Implementation

– 28 – Processor

Stage Computation: retP284 Figure 4.19

Stage Computation: retP284 Figure 4.19

Use ALU to increment stack pointer Read return address from memory

ret

icode:ifun M1[PC]

Fetch

Read instruction byte

valA R[%esp]

valB R[%esp]Decode

Read operand stack pointer

Read operand stack pointer

valE valB + 4Execute

Increment stack pointer

valM M4[valA] Memory Read return address

R[%esp] valE

Write

back

Update stack pointer

PC valMPC update Set PC to return address

Page 29: Sequential CPU Implementation

– 29 – Processor

Computation StepsFigure 4.16 P285Computation StepsFigure 4.16 P285

All instructions follow same general pattern Differ in what gets computed on each step

OPl rA, rB

icode:ifun M1[PC]

rA:rB M1[PC+1]

valP PC+2

Fetch

Read instruction byte

Read register byte

[Read constant word]

Compute next PC

valA R[rA]

valB R[rB]Decode

Read operand A

Read operand B

valE valB OP valA

Set CCExecute

Perform ALU operation

Set condition code register

Memory [Memory read/write]

R[rB] valE

Write

back

Write back ALU result

[Write back memory result]

PC valPPC update Update PC

icode,ifun

rA,rB

valC

valP

valA, srcA

valB, srcB

valE

Cond code

valM

dstE

dstM

PC

Page 30: Sequential CPU Implementation

– 30 – Processor

Computation Steps Figure 4.19 P284

Computation Steps Figure 4.19 P284

All instructions follow same general pattern Differ in what gets computed on each step

call Dest

Fetch

Decode

Execute

Memory

Write

back

PC update

icode,ifun

rA,rB

valC

valP

valA, srcA

valB, srcB

valE

Cond code

valM

dstE

dstM

PC

icode:ifun M1[PC]

valC M4[PC+1]valP PC+5

valB R[%esp]

valE valB + –4

M4[valE] valP R[%esp] valE

PC valC

Read instruction byte

[Read register byte]

Read constant word

Compute next PC

[Read operand A]

Read operand B

Perform ALU operation

[Set condition code reg.]

[Memory read/write]

[Write back ALU result]

Write back memory result

Update PC

Page 31: Sequential CPU Implementation

– 31 – Processor

Computed ValuesComputed Values

FetchFetch

icode Instruction code

ifun Instruction function

rA Instr. Register A

rB Instr. Register B

valC Instruction constant

valP Incremented PC

Decode § Write backDecode § Write back

valA Register value A

valB Register value B

ExecuteExecute valE ALU result Bch Branch flag

MemoryMemory valM Value from

memory

Page 32: Sequential CPU Implementation

– 32 – Processor

4.3.2 , 4.3.3, 4.3.4SEQ Operation4.3.2 , 4.3.3, 4.3.4SEQ Operation

StateState Program counter register (PC) Condition code register (CC) Register File Memories

Access same memory space Data: for reading/writing

program data Instruction: for reading

instructions

All updated as clock rises

Combinational LogicCombinational Logic ALU Control logic Memory reads

Instruction memory Register file Data memory

CombinationalLogic Data

memoryData

memory

Registerfile

Registerfile

PC0x00c

CCCCReadPorts

WritePorts

Read WriteRead Write

Page 33: Sequential CPU Implementation

– 33 – Processor

CombinationalLogic Data

memoryData

memory

Registerfile

%ebx = 0x100

Registerfile

%ebx = 0x100

PC0x00c

CC100CC100

ReadPorts

WritePorts

Read WriteRead Write

0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000

0x00e: je dest # Not taken

Cycle 3:

Cycle 4:

0x006: irmovl $0x200,%edx # %edx <-- 0x200Cycle 2:

0x000: irmovl $0x100,%ebx # %ebx <-- 0x100Cycle 1:

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4SEQ Operation #2 (point 1)

SEQ Operation #2 (point 1)

state set according to second irmovl instruction

combinational logic starting to react to state changes

Figure 4.23 P297

Page 34: Sequential CPU Implementation

– 34 – Processor

0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000

0x00e: je dest # Not taken

Cycle 3:

Cycle 4:

0x006: irmovl $0x200,%edx # %edx <-- 0x200Cycle 2:

0x000: irmovl $0x100,%ebx # %ebx <-- 0x100Cycle 1:

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4SEQ Operation #3 (point 2)

SEQ Operation #3 (point 2)

state set according to second irmovl instruction

combinational logic generates results for addl instruction

CombinationalLogic Data

memoryData

memory

Registerfile

%ebx = 0x100

Registerfile

%ebx = 0x100

PC0x00c

CC100CC100

ReadPorts

WritePorts

0x00e

000

Read WriteRead Write

Page 35: Sequential CPU Implementation

– 35 – Processor

0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000

0x00e: je dest # Not taken

Cycle 3:

Cycle 4:

0x006: irmovl $0x200,%edx # %edx <-- 0x200Cycle 2:

0x000: irmovl $0x100,%ebx # %ebx <-- 0x100Cycle 1:

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4SEQ Operation #4 (point 3)

SEQ Operation #4 (point 3)

state set according to addl instruction

combinational logic starting to react to state changes

CombinationalLogic Data

memoryData

memory

Registerfile

%ebx = 0x300

Registerfile

%ebx = 0x300

PC0x00e

CC000CC000

ReadPorts

WritePorts

Read WriteRead Write

Page 36: Sequential CPU Implementation

– 36 – Processor

0x00c: addl %edx,%ebx # %ebx <-- 0x300 CC <-- 000

0x00e: je dest # Not taken

Cycle 3:

Cycle 4:

0x006: irmovl $0x200,%edx # %edx <-- 0x200Cycle 2:

0x000: irmovl $0x100,%ebx # %ebx <-- 0x100Cycle 1:

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4SEQ Operation #5 (point 4)

SEQ Operation #5 (point 4)

state set according to addl instruction

combinational logic generates results for je instruction

CombinationalLogic Data

memoryData

memory

Registerfile

%ebx = 0x300

Registerfile

%ebx = 0x300

PC0x00e

CC000CC000

ReadPorts

WritePorts

0x013

CombinationalLogic Data

memoryData

memory

Registerfile

%ebx = 0x300

Registerfile

%ebx = 0x300

PC0x00e

CC000CC000

ReadPorts

WritePorts

0x013

Read WriteRead Write

Page 37: Sequential CPU Implementation

– 37 – Processor

SEQ SemanticsSEQ Semantics

Achieve the same effect as a sequential execution of Achieve the same effect as a sequential execution of the assignment shown in the tables of Figures 4.16 the assignment shown in the tables of Figures 4.16 to 4.19to 4.19 Though all of the state updates occur simultaneously at the

clock rises to the next cycle. A problem: popl %esp need to sequentially write two

registers. So the register file control logic must process it.

Principle: never needs to read back the state updated Principle: never needs to read back the state updated by an instruction in order to complete the processing by an instruction in order to complete the processing of this instruction of this instruction (P295)(P295) If so, the update must happen in the instruction cycle E.g. pushl semantics E.g. no one instruction need to both set and then read the

condition codes.

Page 38: Sequential CPU Implementation

– 38 – Processor

Exercise (2010.4.7)Exercise (2010.4.7)

See page 282 codes, modified as below.See page 282 codes, modified as below.

1 irmovl $-9, %edx1 irmovl $-9, %edx

2 irmovl $9, %ebx2 irmovl $9, %ebx

3 addl %edx, %ebx3 addl %edx, %ebx

Question:Question:

1)1) Write the binary code of these three instructions, to Write the binary code of these three instructions, to be note to have the address value, start from 0x000; be note to have the address value, start from 0x000;

2)2) Trace the generic form of addl instruction Trace the generic form of addl instruction implementation, and the specific execution of this implementation, and the specific execution of this instruction. Also give the CC value after addl instruction. Also give the CC value after addl execution.execution.

Page 39: Sequential CPU Implementation

– 39 – Processor

SEQ CPU Implementation

SEQ CPU Implementation

Page 40: Sequential CPU Implementation

– 40 – Processor

What we will discuss today?What we will discuss today?

The implementation of a sequential CPU ---- SEQThe implementation of a sequential CPU ---- SEQ Every Instruction finished in one cycle. Instruction executes in sequential No two instruction execute in parallel or overlap

An revised version of SEQ ---- SEQ+An revised version of SEQ ---- SEQ+ Modify the PC Update stage of SEQ to show the difference between ISA and implementation

Page 41: Sequential CPU Implementation

– 41 – Processor

Some MacrosFigure 4.24 P299Some MacrosFigure 4.24 P299

NameName Value Value MeaningMeaning

INOPINOP 00 Code for Code for nopnop instruction instruction

IHALTIHALT 11 Code for Code for halthalt instruction instruction

IRRMOVLIRRMOVL 22 Code for Code for rrmovlrrmovl instruction instruction

IIRMOVLIIRMOVL 33 Code for Code for irmovlirmovl instruction instruction

IRMMOVLIRMMOVL 44 Code for Code for rmmovlrmmovl instruction instruction

IMRMOVLIMRMOVL 55 Code for Code for mrmovlmrmovl instruction instruction

IOPLIOPL 66 Code for integer op instructionsCode for integer op instructions

IJXXIJXX 77 Code for jump instructionsCode for jump instructions

…………………… ………… …………………………………………………………

IPOPLIPOPL BB Code for Code for poplpopl instruction instruction

RESPRESP

RENONERENONE

66

88

Register ID for %espRegister ID for %esp

Indicates no register file accessIndicates no register file access

ALUADDALUADD 00 Function for addition operationFunction for addition operation

Page 42: Sequential CPU Implementation

– 42 – Processor

SEQ Hardware Structure Figure 4.20 P292

SEQ Hardware Structure Figure 4.20 P292

StagesStages Fetch

Read instruction from memory Decode

Read program registers Execute

Compute value or address Memory

Read or write data Write Back

Write program registers PC

Update program counter

Instruction FlowInstruction Flow Read instruction at address

specified by PC Process through stages Update program counter

Instructionmemory

Instructionmemory

PCincrement

PCincrement

CCCCALUALU

Datamemory

Datamemory

Fetch

Decode

Execute

Memory

Write back

icode, ifunrA , rB

valC

Registerfile

Registerfile

A BM

E

Registerfile

Registerfile

A BM

E

PC

valP

srcA, srcBdstA, dstB

valA, valB

aluA, aluB

Bch

valE

Addr, Data

valM

PCvalE, valM

newPC

Page 43: Sequential CPU Implementation

– 43 – Processor

Difference between semantics and implementationDifference between semantics and implementationISAISA

Every stage may update some states, these updates occur sequentially

SEQSEQ All the state update operations occur simultaneously at

clock rising (except memory and CC)

Page 44: Sequential CPU Implementation

– 44 – Processor

SEQ Hardware Figure 4.21 P293

SEQ Hardware Figure 4.21 P293

KeyKey Blue boxes:

predesigned hardware blocks

E.g., memories, ALU

Gray boxes: control logic

Describe in HCL

White ovals ( 椭圆 ): labels for signals

Thick lines: 32-bit word values

Thin lines: 4-8 bit values

Dotted lines: 1-bit values

Instructionmemory

Instructionmemory

PCincrement

PCincrement

CCCC ALUALU

Datamemory

Datamemory

NewPC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

Registerfile

Registerfile

A BM

E

Registerfile

Registerfile

A BM

E

Bch

dstE dstM srcA srcB

icode ifun rA

PC

valC valP

valBvalA

Data

valE

valM

PC

newPC

Page 45: Sequential CPU Implementation

– 45 – Processor

Fetch LogicFetch Logic

Predefined BlocksPredefined Blocks PC: Register containing PC Instruction memory: Read 6 bytes (PC to PC+5) Split: Divide instruction byte into icode and ifun Align: Get fields for rA, rB, and valC

Instructionmemory

Instructionmemory

PCincrement

PCincrement

rBicode ifun rA

PC

valC valP

Needregids

NeedvalC

Instrvalid

AlignAlignSplitSplit

Bytes 1-5Byte 0

Figure 4.25 P299

Page 46: Sequential CPU Implementation

– 46 – Processor

Fetch LogicFetch Logic

Control LogicControl Logic Instr. Valid: Is this instruction valid? Need regids: Does this instruction have a register

bytes? Need valC: Does this instruction have a constant word?

Instructionmemory

Instructionmemory

PCincrement

PCincrement

rBicode ifun rA

PC

valC valP

Needregids

NeedvalC

Instrvalid

AlignAlignSplitSplit

Bytes 1-5Byte 0

Page 47: Sequential CPU Implementation

– 47 – Processor

Fetch Control Logic P300

Fetch Control Logic P300

pushl rA A 0 rA 8

jXX Dest 7 fn Dest

popl rA B 0 rA 8

call Dest 8 0 Dest

rrmovl rA, rB 2 0 rA rB

irmovl V, rB 3 0 8 rB V

rmmovl rA, D(rB) 4 0 rA rB D

mrmovl D(rB), rA 5 0 rA rB D

OPl rA, rB 6 fn rA rB

ret 9 0

nop 0 0

halt 1 0

pushl rA A 0 rA 8pushl rA A 0A 0 rA 8rA 8

jXX Dest 7 fn DestjXX Dest 7 fn7 fn Dest

popl rA B 0 rA 8popl rA B 0B 0 rA 8rA 8

call Dest 8 0 Destcall Dest 8 08 0 Dest

rrmovl rA, rB 2 0 rA rBrrmovl rA, rB 2 02 0 rA rBrA rB

irmovl V, rB 3 0 8 rB Virmovl V, rB 3 03 0 8 rB8 rB V

rmmovl rA, D(rB) 4 0 rA rB Drmmovl rA, D(rB) 4 04 0 rA rBrA rB D

mrmovl D(rB), rA 5 0 rA rB Dmrmovl D(rB), rA 5 05 0 rA rBrA rB D

OPl rA, rB 6 fn rA rBOPl rA, rB 6 fn6 fn rA rBrA rB

ret 9 0ret 9 09 0

nop 0 0nop 0 00 0

halt 1 0halt 1 01 0

bool need_regids =icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,

IIRMOVL, IRMMOVL, IMRMOVL };

bool instr_valid = icode in { INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL, IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL };

Page 48: Sequential CPU Implementation

– 48 – Processor

Decode & Write-Back LogicDecode & Write-Back Logic

Register FileRegister File Read ports A, B Write ports E, M Addresses are register IDs or

8 (no access)

rB

dstE dstM srcA srcB

Registerfile

Registerfile

A BM

EdstE dstM srcA srcB

icode rA

valBvalA valEvalM

Control LogicControl Logic srcA, srcB: read port

addresses dstA, dstB: write port

addresses

Figure 4.26 P300

Page 49: Sequential CPU Implementation

– 49 – Processor

A SourceP301A SourceP301 OPl rA, rB

valA R[rA]Decode Read operand A

rmmovl rA, D(rB)

valA R[rA]Decode Read operand A

popl rA

valA R[%esp]Decode Read stack pointer

jXX Dest

Decode No operand

call Dest

valA R[%esp]Decode Read stack pointer

ret

Decode No operand

int srcA = [icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL } : rA;icode in { IPOPL, IRET } : RESP;1 : RNONE; # Don't need register

];

Page 50: Sequential CPU Implementation

– 50 – Processor

E DestinationP301E DestinationP301

None

R[%esp] valE Update stack pointer

None

R[rB] valE

OPl rA, rB

Write-back

rmmovl rA, D(rB)

popl rA

jXX Dest

call Dest

ret

Write-back

Write-back

Write-back

Write-back

Write-back

Write back result

R[%esp] valE Update stack pointer

R[%esp] valE Update stack pointer

int dstE = [icode in { IRRMOVL, IIRMOVL, IOPL} : rB;icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;1 : RNONE; # Don't need register

];

Page 51: Sequential CPU Implementation

– 51 – Processor

Execute LogicExecute Logic

UnitsUnits ALU

Implements 4 required functions Generates condition code values

CC Register with 3 condition code

bits

bcond Computes branch flag

Control LogicControl Logic Set CC: Should condition code

register be loaded? ALU A: Input A to ALU ALU B: Input B to ALU ALU fun: What function should

ALU compute?

CCCC ALUALU

ALUA

ALUB

ALUfun.

Bch

icode ifun valC valBvalA

valE

SetCC

bcondbcond

Figure 4.27 P302

Page 52: Sequential CPU Implementation

– 52 – Processor

ALU A InputALU A Input

valE valB + –4 Decrement stack pointer

No operation

valE valB + 4 Increment stack pointer

valE valB + valC Compute effective address

valE valB OP valA Perform ALU operation

OPl rA, rB

Execute

rmmovl rA, D(rB)

popl rA

jXX Dest

call Dest

ret

Execute

Execute

Execute

Execute

Execute valE valB + 4 Increment stack pointer

int aluA = [icode in { IRRMOVL, IOPL } : valA;icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC;icode in { ICALL, IPUSHL } : -4;icode in { IRET, IPOPL } : 4;# Other instructions don't need ALU

];

P302

Page 53: Sequential CPU Implementation

– 53 – Processor

ALU OperationALU Operation

valE valB + –4 Decrement stack pointer

No operation

valE valB + 4 Increment stack pointer

valE valB + valC Compute effective address

valE valB OP valA Perform ALU operation

OPl rA, rB

Execute

rmmovl rA, D(rB)

popl rA

jXX Dest

call Dest

ret

Execute

Execute

Execute

Execute

Execute valE valB + 4 Increment stack pointer

int alufun = [icode == IOPL : ifun;1 : ALUADD;

];

P303

Page 54: Sequential CPU Implementation

– 54 – Processor

Condition SetCondition Set

Bool set_cc = icode in { IOPL };Bool set_cc = icode in { IOPL };

We will not discuss the detail of We will not discuss the detail of BcondBcond Though it is also a control unit

P303

Page 55: Sequential CPU Implementation

– 55 – Processor

Memory LogicMemory Logic

MemoryMemory Reads or writes memory word

Control LogicControl Logic Mem. read: should word be read? Mem. write: should word be

written? Mem. addr.: Select address Mem. data.: Select data

Datamemory

Datamemory

Mem.read

Memaddr

read

write

data out

Memdata

valE

valM

valA valP

Mem.write

data in

icode

Figure 4.28 P303

Page 56: Sequential CPU Implementation

– 56 – Processor

Memory AddressMemory AddressOPl rA, rB

Memory

rmmovl rA, D(rB)

popl rA

jXX Dest

call Dest

ret

No operation

M4[valE] valAMemory Write value to memory

valM M4[valA]Memory Read from stack

M4[valE] valP Memory Write return value on stack

valM M4[valA] Memory Read return address

Memory No operation

int mem_addr = [icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : valE;icode in { IPOPL, IRET } : valA;# Other instructions don't need address

];

P304

Page 57: Sequential CPU Implementation

– 57 – Processor

Memory ReadMemory Read

OPl rA, rB

Memory

rmmovl rA, D(rB)

popl rA

jXX Dest

call Dest

ret

No operation

M4[valE] valAMemory Write value to memory

valM M4[valA]Memory Read from stack

M4[valE] valP Memory Write return value on stack

valM M4[valA] Memory Read return address

Memory No operation

bool mem_read = icode in { IMRMOVL, IPOPL, IRET };bool mem_write = icode in { IRMMOVL, IPUSHL, ICALL };

P304

Page 58: Sequential CPU Implementation

– 58 – Processor

PC Update LogicPC Update Logic

New PCNew PC Select next value of PC

NewPC

Bchicode valC valPvalM

PC

Figure 4.29 P304

Page 59: Sequential CPU Implementation

– 59 – Processor

PCUpdatePCUpdate

OPl rA, rB

rmmovl rA, D(rB)

popl rA

jXX Dest

call Dest

ret

PC valPPC update Update PC

PC valPPC update Update PC

PC valPPC update Update PC

PC Bch ? valC : valPPC update Update PC

PC valCPC update Set PC to destination

PC valMPC update Set PC to return address

int new_pc = [icode == ICALL : valC;icode == IJXX && Bch : valC;icode == IRET : valM;1 : valP;

];

P304

Page 60: Sequential CPU Implementation

– 60 – Processor

SEQ Hardware(Review)SEQ Hardware(Review)

Stages occur in sequence One operation in process

at a time

Instructionmemory

Instructionmemory

PCincrement

PCincrement

CCCC ALUALU

Datamemory

Datamemory

NewPC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

Registerfile

Registerfile

A BM

E

Registerfile

Registerfile

A BM

E

Bch

dstE dstM srcA srcB

icode ifun rA

PC

valC valP

valBvalA

Data

valE

valM

PC

newPC

Figure 4.21 P293

Page 61: Sequential CPU Implementation

– 61 – Processor

Instructionmemory

Instructionmemory

PCincrement

PCincrement

CCCC ALUALU

Datamemory

Datamemory

PC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

Registerfile

Registerfile

A BM

E

Registerfile

Registerfile

A BM

E

Bch

dstE dstM srcA srcB

icode ifun rA

pBch pValM pValC pValPpIcode

PC

valC valP

valBvalA

Data

valE

valM

PC

4.3.5SEQ+ Hardware4.3.5SEQ+ Hardware

Still sequential implementation

Reorder PC stage to put at beginning

PC StagePC Stage Task is to select PC for

current instruction Based on results

computed by previous instruction

Processor StateProcessor State PC is no longer stored in

register But, can determine PC

based on other stored information

Page 62: Sequential CPU Implementation

– 62 – Processor

PC ComputationPC Computation

Int pc= [Int pc= [

pIcode == ICALL : pValC;pIcode == ICALL : pValC;

pIcode == IJXX && bBch : pValC;pIcode == IJXX && bBch : pValC;

PIcode == IRET : pValM;PIcode == IRET : pValM;

1 : pValP;1 : pValP;

];];

P308

Page 63: Sequential CPU Implementation

– 63 – Processor

SEQ SummarySEQ Summary

ImplementationImplementation Express every instruction as series of simple steps Follow same general flow for each instruction type Assemble registers, memories, predesigned combinational

blocks Connect with control logic

LimitationsLimitations Too slow to be practical In one cycle, must propagate through instruction memory,

register file, ALU, and data memory Would need to run clock very slowly Hardware units only active for fraction of clock cycle