Chapter 5. Control Design. * Two approaches for control unit design A hard-wired control unit : a sequential logic circuit to generate specific fixed

Chapter 5. Control Design

* Two approaches for control unit design • A hard-wired control unit : a sequential logic circuit to generate specific fixed sequences of control signals → change in behavior only by redesign.

5.5

• A microprogrammed control unit

: by organizing control signals into microinstructions. The signals are

implemented by a kind of software(or firmware) rather than hardware.

→ design change : change the contents of control memory.

→ emulation : a microprogrammed CPU can execute programs written in

the machine language of other computers.

Disadvantage:

① Slower due to fetch.

② more costly due to the presence of the control memory and its

access circuits.

5.1.2. Hardwired Control design method 1 : The classical method of sequential circuit design. For a P-state

circuit, log2P flip-flops are required.

design method 2 : One-hot method, one flip-flop per state. Expensive in terms of

F/F but simplify CU design and debugging.

• GCD processor

Classical method

S0 = 00, S1 = 01, S2 = 10 and S3 = 11

)()0()(

)()0()()0(

20

20

XRXRXRDD

XRXRXRDXRXRXRDDi

1000

0100

0010

0001

3

2

1

0

S

S

S

S

...)...()...(21 ,22,21,22,12,11,11

nni IIIDIIIDD

k

k

mkkkk

mkkkk

DDDz

DDDz

,2,1,

,2,1,

...

...

(5.9)

(5.10)

(5.11)

10

210

0

1

2

3203

2102

201

0

1010

)0()0(

)()0()()0(

)()0()()0(

0

DDLoadYR

DDDLoadXR

DSelectXY

DSwap

DSubtract

DXRDXRDD

XRXRXRDDXRXRXRDD

XRXRXRDXRXRXRDD

D

DDDDLoadYR

One-hot method

S0 = 0001, S1 = 0010, S2 = 0100 and S3 = 1000 The one-hot method is limited to a small number of states The next-state and output equations have a simple and systematic form

The one-hot design method 1. Construct a P-row state table that defines the desired input-output behavior.

2. Associate a separate D-type flip-flop Di with each state Si, and assign the P-bit

one-hot binary code D1, D2 , … , Di-1, Di , Di+1 , … , Dp = 0,0,…,0,1,0,…,0 to Si. 3. Design a combinational circuit C that generates the primary and secondary

output signals { Di } and { zk }, respectively. Di+

is defined by the logic equation

where denote all input combinations that cause a transition from S j

to Si. If zk = 1 ( active ) only in rows k,h for h = 1,2,…,mk, then zk is defined by

)( ,1

2,1, jnj

P

ijjii IIIDD

jnjj,j III ,21, ...,,,

kkmkkkmkkkk DDDDDDz ,2,1,,2,1, ......

Design of 2C multiplier hardwired control

5.2 Microprogrammed Control

Instruction

: implemented by a sequence of one or more sets of concurrent micro-operations.

Microprogramming

: control-signal selection and sequencing information is stored in a ROM or RAM

called a control memory(CM), and microinstruction is fetched from CM.

A microprogrammed computer C1 can be used to execute program written in the

machine language L2 of some other computer C2 by placing an emulation for L2 in the

CM of C1.

Wilker’s Design : microinstruction (I)Control field

Address field

How to decide I word length1. The degree of parallelism required at the micro-operation level 2. How the control information is represented or encoded 3. How to specify the next I address

o Parallelism in I

If all useful combination of parallel micro-operation are specified by a single opcode it would be enormous, and decoder will be complicated.

→ divide the micro-operation specification part into k disjoint control field, any one of which can be performed simultaneously with other.

① In IBM 360/50: I 90 bits (21 partitioned control field). Wilker design: 1-bit control field for each control signal.②

X0

○c0

X1

○c1

X2

○c2

X3

○c3

Register R

Un-encoded form (4-bit)

c0 c1 c2 c3 Micro-operation

1 0 0 0 R← X0

0 1 0 0 R← X1

0 0 1 0 R← X2

0 0 0 1 R← X3

0 0 0 0 No op

Encoded form (3-bit)

K0 K1 K2 Micro-operation

0 0 1 R← X0

0 1 0 R← X1

0 1 1 R← X2

1 0 0 R← X3

0 0 0 No op

5 operations

I : horizontal VS verticalhorizontal form : long format①

② able to express a high degree of parallelism ③ little encoding for the control information.

vertical form : short format ① ② limited ability to express parallelism ③ considerable encoding of the control information.

n independent control signal → log⌈ 2(n+1) bits decoder is needed⌉

I addressing

– use PC (as the primary source)

– conditional branching

Condition select subfield

branch address : store a complete address field or

lower-order bits of address.

restricting the range of branch instruction to a small

region of CM

Timing

– monophase : a simple clock pulse synchronize all the control signals.

control signals are active for the duration of instruction’s execution cycle

– polyphase : divide a clock cycle into phases and control signal is active

during one of the phase. Increase the complexity of the I

format ( to specify the phase of which

control signal)

Ex) Timing of 4-phase I. ( R ← R1 op R2 )

A microprogram sequencer generates a I addresses for CM and comprises PC and all the logics needed for next address generation

Minimizing the width of CMIs: I1, I2, ···, In Each activates a subset of control signals C1, C2, ··· , Cm ⇒want an encoding method

can’t be activated at the same time.

CM

width

length

Control field

decoder 1 decoder 2 decoder 3

··· ··· ···ci ck cj

control field

⇒ achieve the minimum number of bits in the control field maintaining the parallelism

An encoded control field can activate only one control signal at a time. Two control signals can be included in the same control field if and only if they are never simultaneously activated by a I.

1. Find the set of Maximal compatibility class (MCC), defined as the compatibility classes to which no control signal can be added without introducing a pair of incompatible control signals. An encoded control field can activate only one control signal at a time. Two control signals can be included in the same control field iff they are never simultaneously activated by a I. (i.e. they are compatible).

Two control signals Ci1 and Ci2 are compatible if Ci1Ij implies Ci2Ij, and vice versa. The compatibility class is a set of control signals that are pairwise compatible.

2. Determine all minimal MCC covers. A minimal MCC cover is the minimal set of MCC that includes each control signal. ( Note that a minimal MCC cover does not always yield a minimum value of the cost function W ).

3. For each minimal MCC covers, include each control signal in exactly one subset of

some {Ci} and execute the cost W of the resulting solutions and select one with the minimal cost.

Algorithm

• The minimization problem: Find a set of compatibility class {C i} such that

1. Every control signal is contained in at least one {Ci}.

2. The width W = ∑ log2( |Ci| + 1 ) is minimized.i

Deriving MCC

: Denote Si as the set of compatibility classes {Ci} such that {Ci}

contains i Cij control signals.

S1={simply the n original control signals}

Si forms all possible(i)- member compatibility classes.

Using Si, construct Si+1 as follow;

For each {Ci}Si, add a control signal Cik to {Ci} to form {C}.

If {C} is a compatibility class, then add {C} to Si+1 and delete {Ci} and

all subset of {C} from Si .

Stop when Sk= for some kn+1. The MCCs are from . Example: Find the minimum # of bits in the control fields.

1k

1iiS

I Control signal I1 a, b, c, g S1 = a, b, c, d, e, f, g, h I2 a, c, e, h S2 = bd, be, bh, cd, de, dg, ef, eg, fg, fh, gh, dh I3 a, d, f S3 = bde, bdh, deg, dgh, efg, fgh I4 b, c, f S4 =

Cover Table – row for each MCC Ci – column for each control signal Cij

C1 = a, C2 = cd, C3 = bde, C4 = bdh, C5 = deg, C6 = dgh, C7 = efg, C8 = fgh

a b c d e f g h C1=a C2=cd C3=bde C4=bdh C5=deg C6=dgh C7=efg C8=fgh

If a control signal Cij is covered by only one MCC {Ci }, then {Ci } is an essential MCC.

If MCC {Ci } contains an in every row where MCC {Ck } contains an , then {Ci} dominates {Ck}.

If a control signal Cij has an in every column where a control signal Ckl has an , then Cij dominates Ckl.

Minimal MCC covers (similar to the prime implicant covering problem)

Find the Minimal MCC covers Row and column deletion from a cover table.

1. Delete all essential MCC and all column with in essential rows. 2. Delete all but one of identical columns.

3. Delete all domination columns. 4. Delete all domination rows.

After finding two essential MCC {C1} and {C2}, we can get the reduced cover table.

b e f g h C3=bde C4=bdh C5=deg C6=dgh C7=efg C8=fgh If C1+C2+C3+C8, a b c d e f g h C1=a C2=cd C3=bde C8=fgh

C5 covered by C7 and C6 is also covered by C8; therefore, C5 & C6 can be removed.

Minimal covers{C1+C2}

+{C3+C8}

+{C4+C7}

If {C1,C2,C4,C7}={a, cd, bh, efg} → width W = 7 bits If {C1,C2,C4,C7}={a, c, bdh, efg} → width W = 6 bits

⇒ d is covered two times If {C1,C2,C3,C8}={a, cd, be, fgh}→ width W = log2(|C|+1) = 1+2+2+2 = 7 bits

If {C1,C2,C3,C8}={a, c, bde, fgh}→ width W = 1+1+2+2 = 6 bitsAnother minimal MCC covers C1+C2+C3+C8

a b c d e f g h C1=a C2=cd C4=bdh C7=efg Using {a, c, bde, fgh}

101010I4

100101I3

111111I2

011011I1

543210

-Instruction

Minimize width

Control field bits code control signal 0 0 0 No op

1 a 1 1 0 No op

1 c 2 2,3 00 No op

01 b 10

d 11 e 3 4,5 00 No op 01 f 10 g 11 h

A drawback of the minimum-width control field : functionally unrelated control

signals are combined.

Encoding by function

Multiple -Instruction formats

Branch instructions which specify no control signals.

action instructions with no branching capability.

This approach is used at the instruction level.

Branch -Instruction

Condition select Branch address

0 1 if Q(7) = 0

1 0 if COUNT6 = 1

1 1 jump

0 0

Control fields

Action -Instruction

Formats

-program sequencer

: to place all the circuitry required to generate I addresses in a single IC with the advance of VLSI. – a general purpose building block for -programmed CU. – simplify CPU design.

• Nanoprogrammed Computer -programmed Computer.

Instruction

PC

CM IR Control signals

nanoprogrammed ComputerInstruction

PC

CM IR Control signals

nPC

nCM nIR

Criteria ① Size of CM ② Speed reduction(programming needs fetch one time/nanoprogramming twice)

– due to extra memory access and complex controller. ③ The advantage of nanoprogramming is the greater design flexibility

(Compare the size of CM)

Size of control memory in nanoprogramming

CM:Hm

Wm

⇒HmWm

nCM: ⇒HnWn

Wm

Hn

Total size : HmWm+HnWn = S2

Size of comparable single-level CM

Hm

Wm

⇒HmWm = S1

Usually, Hm large Wm small Hn small Wn large (Many micro-instructions can use the same nano-

programmed control)

Big adv. of nanoprogramming “Design flexibility”

1-level CM

log2HmN

Wm

Hm

address Control signal

size = Hm (log2Hm + N ) =S1

Nanoprogramming

address address

log2HmN

Hnlog2Hn

CM nCM

Hm

Assuming no branching

S2 = Hm (log2Hm + log2Hn) + Hn N

Let, r = Hn/Hm = ratio of unique nano-control states to total # of -control states for all instructions. Hn = r·Hm

S2 = Hm (log2Hm + log2r·Hm) + r·Hm N = Hm ( 2· log2Hm +log2r + r·N )

Example) For 68,000 Processor(N = 70, Hm = 650, r = 0.4), which approach is better?

1-level CM design :

S1 = 650 (log2650 + 70) = 52,000

Nanoprogramming

S1 = 650 (log2650 + log2260 )+ 260 70 = 30,550

log265070650

log265070

260log2260650

In this case, nanoprogramming is better than microprogramming

5.3 Pipeline Control

Performance measure: by throughput in MIPS

MIPS

fCycle per instruction(CPI) =

where f is the pipeline’s clock frequency.

Efficiency(utilization):

Speedup

T(m) : the execution time on an m-stage pipeline

T(1) : the execution time on a non-pipelined processor

S(m) = m × E(m)

areatotal

areabusyEm =

S(m) = )(

)1(

mT

T

Performance/cost ratio : where f : pipeline’s clock frequency K : hardware costSuppose the pipeline has m stages for SI. a : the delay of a non-pipelined processor for SI each stage of P : delay a/m and extra delay b due to the buffer resister

hardware cost K = cm + d c : buffer-register cost per stage d : cost of the pipelines data processing logic

PCR = K

f

bm

a

fTc

1

admbdacbcm

m

TcKK

fPCR

)(

12

To maximize PCR with respect to m,

222 ))((

)2(

)(

1)(

admbdacbcm

bdacbcmm

admbdacbcmPCR

dm

d

bc

adm

bdacbcmmadmbdacbcmPCRdm

d

opt

)2()(0)( 2

5.3.3 Superscalar ProcessingSuperscalar operation performs more than one instruction per cycle by

fetching, decoding, and executing several instructions concurrently.

A superscalar computer has a single CPU that attempts to exploit the parallelism that is implicit in computer programs, with multiple execution units.

In Fig. 5.66, the superscalar design has a potential speedup of 10. With K independent m-stage pipeline E-units speedup factors of a superscalar CPU: heavy demand on the instruction-fetch logic a large, fast instruction and data cache

Important factors for PCU of a superscalar computer• Instruction types: A floating-point add instruction has to be issued to a

floating add instruction has to be issued to a floating-point E-unit, not to

an integer E-unit. • E-unit availability. • Data dependencies : To avoid conflicting use of register, data-dependency constraints among the operands must be satisfied. • Control dependencies : Reduce the impact of branch instructions on pipeline efficiency. • Program order : Instructions must eventually produce results in the order,

even if the results may be computed out-of-order internally. read dynamic instruction scheduling and branch prediction.

mk

Documents

Chapter 5. Control Design. * Two approaches for control unit design A hard-wired control unit : a sequential logic circuit to generate specific fixed