140
DIGITAL BUILDING BLOCKS Chapter 5 <1> Digital Design and Computer Architecture,2 nd Edi(on Chapter 5 David Money Harris and Sarah L. Harris

Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5 S ... Compare delay of: 32-bit

Embed Size (px)

Citation preview

Page 1: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

DIGITAL

BUILDINGBLO

CKS

Chapter5<1>

DigitalDesignandComputerArchitecture,2ndEdi(on

Chapter5

DavidMoneyHarrisandSarahL.Harris

Page 2: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<2>

DIGITAL

BUILDINGBLO

CKS Chapter5::Topics

•  Introduc(on

•  Arithme(cCircuits

•  NumberSystems(no) •  Sequen(alBuildingBlocks(later) •  MemoryArrays(later) •  LogicArrays(maybe)

Page 3: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<3>

DIGITAL

BUILDINGBLO

CKS

•  Digitalbuildingblocks:– Gates,mulDplexers,decoders,registers,arithmeDccircuits,counters,memoryarrays,logicarrays

•  Buildingblocksdemonstratehierarchy,modularity,andregularity:– Hierarchyofsimplercomponents– Well-definedinterfacesandfuncDons– Regularstructureeasilyextendstodifferentsizes

•  WillusethesebuildingblocksinChapter7tobuildmicroprocessor

IntroducDon

Page 4: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<4>

DIGITAL

BUILDINGBLO

CKS

1-BitAdders

A B0 00 11 01 1

SCout

S =Cout =

HalfAdderA B

S

Cout +

A B0 00 11 01 1

SCout

S =Cout =

FullAdder

Cin

0 00 11 01 1

00001111

A B

S

Cout Cin+

Page 5: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<5>

DIGITAL

BUILDINGBLO

CKS

1-BitAdders

A B0 00 11 01 1

0110

SCout0001

S =Cout =

HalfAdderA B

S

Cout +

A B0 00 11 01 1

0110

SCout0001

S =Cout =

FullAdder

Cin

0 00 11 01 1

00001111

1001

0111

A B

S

Cout Cin+

Page 6: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<6>

DIGITAL

BUILDINGBLO

CKS

1-BitAdders

A B0 00 11 01 1

0110

SCout0001

S = A ⊕ BCout = AB

HalfAdderA B

S

Cout +

A B0 00 11 01 1

0110

SCout0001

S = A ⊕ B ⊕ CinCout = AB + ACin + BCin

FullAdder

Cin

0 00 11 01 1

00001111

1001

0111

A B

S

Cout Cin+

Page 7: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<7>

DIGITAL

BUILDINGBLO

CKS

•  Types of carry propagate adders (CPAs):

- Ripple-carry (slow)

- Carry-lookahead (fast)

- Prefix (faster)

•  Carry-lookahead and prefix adders faster for large adders but require more hardware

Symbol

MulDbitAdders(CPAs)

A B

S

Cout Cin+N

NN

Page 8: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<8>

DIGITAL

BUILDINGBLO

CKS

•  Chain 1-bit adders together

•  Carry ripples through entire chain

•  Disadvantage: slow

Ripple-CarryAdder

S31

A30 B30

S30

A1 B1

S1

A0 B0

S0

C30 C29 C1 C0Cout ++++

A31 B31

Cin

Page 9: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<9>

DIGITAL

BUILDINGBLO

CKS

tripple = NtFA where tFA is the delay of a full adder

Ripple-CarryAdderDelay

Page 10: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<10>

DIGITAL

BUILDINGBLO

CKS

•  Compute carry out (Cout) for k-bit blocks using generate and propagate signals

•  Some definitions: -  Column i produces a carry out by either generating a carry out or

propagating a carry in to the carry out -  Generate (Gi) and propagate (Pi) signals for each column:

•  Column i will generate a carry out if Ai AND Bi are both 1.

Gi = Ai Bi •  Column i will propagate a carry in to the carry out if Ai OR Bi is 1.

Pi = Ai + Bi •  The carry out of column i (Ci) is:

Ci = Ai Bi + (Ai + Bi )Ci-1 = Gi + Pi Ci-1

Carry-LookaheadAdder

Page 11: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<11>

DIGITAL

BUILDINGBLO

CKS

•  Step 1: Compute Gi and Pi for all columns •  Step 2: Compute G and P for k-bit blocks

•  Step 3: Cin propagates through each k-bit propagate/generate block

Carry-LookaheadAddiDon

Page 12: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<12>

DIGITAL

BUILDINGBLO

CKS

•  Example: 4-bit blocks (G3:0 and P3:0) :

G3:0 = G3 + P3 (G2 + P2 (G1 + P1G0 )

P3:0 = P3P2 P1P0

•  Generally, Gi:j = Gi + Pi (Gi-1 + Pi-1 (Gi-2 + Pi-2Gj )

Pi:j = PiPi-1 Pi-2Pj

Ci = Gi:j + Pi:j Ci-1

Carry-LookaheadAdder

Page 13: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<13>

DIGITAL

BUILDINGBLO

CKS 32-bitCLAwith4-bitBlocks

B0

++++

P3:0

G3P3G2P2G1P1G0

P3P2P1P0

G3:0

Cin

Cout

A0

S0

C0

B1 A1

S1

C1

B2 A2

S2

C2

B3 A3

S3

Cin

A3:0B3:0

S3:0

4-bit CLABlock Cin

A7:4B7:4

S7:4

4-bit CLABlock

C3C7

A27:24B27:24

S27:24

4-bit CLABlock

C23

A31:28B31:28

S31:28

4-bit CLABlock

C27Cout

Page 14: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<14>

DIGITAL

BUILDINGBLO

CKS

For N-bit CLA with k-bit blocks: tCLA = tpg + tpg_block + (N/k – 1)tAND_OR + ktFA

-  tpg : delay to generate all Pi, Gi -  tpg_block : delay to generate all Pi:j, Gi:j -  tAND_OR : delay from Cin to Cout of final AND/OR gate in k-bit CLA

block An N-bit carry-lookahead adder is generally much faster than a ripple-carry adder for N > 16

Carry-LookaheadAdderDelay

Page 15: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<15>

DIGITAL

BUILDINGBLO

CKS

•  Computes carry in (Ci-1) for each column, then

computes sum: Si = (Ai ⊕ Bi) ⊕ Ci

•  Computes G and P for 1-, 2-, 4-, 8-bit blocks, etc. until all Gi (carry in) known

•  log2N stages

PrefixAdder

Page 16: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<16>

DIGITAL

BUILDINGBLO

CKS

•  Carry in either generated in a column or propagated from a previous column.

•  Column -1 holds Cin, so G-1 = Cin, P-1 = 0

•  Carry in to column i = carry out of column i-1: Ci-1 = Gi-1:-1

Gi-1:-1: generate signal spanning columns i-1 to -1 •  Sum equation:

Si = (Ai ⊕ Bi) ⊕ Gi-1:-1 •  Goal: Quickly compute G0:-1, G1:-1, G2:-1, G3:-1, G4:-1, G5:-1,

… (called prefixes)

PrefixAdder

Page 17: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<17>

DIGITAL

BUILDINGBLO

CKS

•  Generate and propagate signals for a block spanning bits i:j: Gi:j = Gi:k + Pi:k Gk-1:j

Pi:j = Pi:kPk-1:j •  In words:

- Generate: block i:j will generate a carry if: •  upper part (i:k) generates a carry or •  upper part propagates a carry generated in lower part

(k-1:j) -  Propagate: block i:j will propagate a carry if both the

upper and lower parts propagate the carry

PrefixAdder

Page 18: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<18>

DIGITAL

BUILDINGBLO

CKS

PrefixAdderSchemaDc

0:-1

-1

2:1

1:-12:-1

012

4:3

3

6:5

5:36:3

456

5:-16:-1 3:-14:-1

8:7

7

10:9

9:710:7

8910

12:11

11

14:13

13:1114:11

121314

13:714:7 11:712:7

9:-110:-1 7:-18:-113:-114:-1 11:-112:-1

15

0123456789101112131415

BiAi

Gi:iPi:i

Gk-1:jPk-1:jGi:kPi:k

Gi:jPi:j

ii:j

BiAiGi-1:-1

Si

iLegend

Page 19: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<19>

DIGITAL

BUILDINGBLO

CKS

tPA = tpg + log2N(tpg_prefix ) + tXOR

-  tpg: delay to produce Pi Gi (AND or OR gate) -  tpg_prefix: delay of black prefix cell (AND-OR gate)

PrefixAdderDelay

Page 20: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<20>

DIGITAL

BUILDINGBLO

CKS

Compare delay of: 32-bit ripple-carry, carry-lookahead, and prefix adders •  CLA has 4-bit blocks •  2-input gate delay = 100 ps; full adder delay = 300 ps

AdderDelayComparisons

Page 21: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<21>

DIGITAL

BUILDINGBLO

CKS

Compare delay of: 32-bit ripple-carry, carry-lookahead, and prefix adders •  CLA has 4-bit blocks •  2-input gate delay = 100 ps; full adder delay = 300 ps

tripple = NtFA = 32(300 ps) = 9.6 ns tCLA = tpg + tpg_block + (N/k – 1)tAND_OR + ktFA = [100 + 600 + (7)200 + 4(300)] ps = 3.3 ns tPA = tpg + log2N(tpg_prefix ) + tXOR = [100 + log232(200) + 100] ps = 1.2 ns

AdderDelayComparisons

Page 22: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<22>

DIGITAL

BUILDINGBLO

CKS

Subtracter

Symbol Implementation

+

A B

-

YY

A B

NN

N

N N

N

N

Page 23: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<23>

DIGITAL

BUILDINGBLO

CKS

Comparator:Equality

Symbol ImplementationA3B3

A2B2A1B1A0B0

Equal=

A B

Equal

44

Page 24: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<24>

DIGITAL

BUILDINGBLO

CKS

Copyright©2007Elsevier 5-<24>

Comparator:LessThan

A < B

-

BA

[N-1]

N

N N

Page 25: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<25>

DIGITAL

BUILDINGBLO

CKS

Copyright©2007Elsevier 5-<25>

F2:0 Function

000 A & B

001 A | B

010 A + B

011 not used

100 A & ~B

101 A | ~B

110 A - B

111 SLT

ArithmeDcLogicUnit(ALU)

ALU

N N

N3

A B

Y

F

Page 26: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<26>

DIGITAL

BUILDINGBLO

CKS

Copyright©2007Elsevier 5-<26>

F2:0 Function

000 A & B

001 A | B

010 A + B

011 not used

100 A & ~B

101 A | ~B

110 A - B

111 SLT

ALUDesign

+

2 01

A B

Cout

Y

3

01

F2

F1:0

[N-1] S

NN

N

N

N NNN

N

2

ZeroExtend

Page 27: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<27>

DIGITAL

BUILDINGBLO

CKS

Copyright©2007Elsevier 5-<27>

•  Configure 32-bit ALU for SLT operation: A = 25 and B = 32

SetLessThan(SLT)Example

+

2 01

A B

Cout

Y

3

01

F2

F1:0

[N-1] S

NN

N

N

N NNN

N

2

ZeroExtend

Page 28: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<28>

DIGITAL

BUILDINGBLO

CKS

Copyright©2007Elsevier 5-<28>

•  Configure 32-bit ALU for SLT operation: A = 25 and B = 32 -  A < B, so Y should be 32-bit

representation of 1 (0x00000001) -  F2:0 = 111

-  F2 = 1 (adder acts as subtracter), so 25 - 32 = -7

-  -7 has 1 in the most significant bit (S31 = 1)

-  F1:0 = 11 multiplexer selects Y = S31 (zero extended) = 0x00000001.

SetLessThan(SLT)Example

+

2 01

A B

Cout

Y

3

01

F2

F1:0

[N-1] S

NN

N

N

N NNN

N

2

ZeroExtend

Page 29: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<29>

DIGITAL

BUILDINGBLO

CKS

Copyright©2007Elsevier 5-<29>

•  LogicalshiHer:shiasvaluetoleaorrightandfillsemptyspaceswith0’s–  Ex:11001>>2=–  Ex:11001<<2=

•  Arithme(cshiHer:sameaslogicalshiaer,butonrightshia,fillsempty

spaceswiththeoldmostsignificantbit(msb).–  Ex:11001>>>2=–  Ex:11001<<<2=

•  Rotator:rotatesbitsinacircle,suchthatbitsshiaedoffoneendare

shiaedintotheotherend–  Ex:11001ROR2=–  Ex:11001ROL2=

Shiaers

Page 30: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<30>

DIGITAL

BUILDINGBLO

CKS

•  LogicalshiHer:–  Ex:11001>>2=00110–  Ex:11001<<2=00100

•  Arithme(cshiHer:–  Ex:11001>>>2=11110–  Ex:11001<<<2=00100

•  Rotator:–  Ex:11001ROR2=01110–  Ex:11001ROL2=00111

Shiaers

Page 31: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<31>

DIGITAL

BUILDINGBLO

CKS

ShiaerDesign

A3:0 Y3:0

shamt1:0

>>

2

4 4

A3 A2 A1 A0

Y3

Y2

Y1

Y0

shamt1:0

00

01

10

11

S1:0

S1:0

S1:0

S1:0

00

01

10

11

00

01

10

11

00

01

10

11

2

Page 32: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<32>

DIGITAL

BUILDINGBLO

CKS

•  A<<N=A×2N–  Example:00001<<2=00100(1×22=4)–  Example:11101<<2=10100(-3×22=-12)

•  A>>>N=A÷2N–  Example:01000>>>2=00010(8÷22=2)–  Example:10000>>>2=11100(-16÷22=-4)

ShiaersasMulDpliers,Dividers

Page 33: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<33>

DIGITAL

BUILDINGBLO

CKS

•  Par(alproductsformedbymulDplyingasingledigitofthemulDplierwithmulDplicand

•  ShiHedparDalproductssummedtoformresult

MulDpliers

Decimal Binary230

42x01010111

5 x 7 = 35

460920+9660

01010101

01010000

x

+0100011

230 x 42 = 9660

multipliermultiplicand

partialproducts

result

Page 34: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<34>

DIGITAL

BUILDINGBLO

CKS

4x4MulDplier

x B3 B2 B1 B0

A3B0 A2B0 A1B0 A0B0

A3 A2 A1 A0

A3B1 A2B1 A1B1 A0B1

A3B2 A2B2 A1B2 A0B2

A3B3 A2B3 A1B3 A0B3+P7 P6 P5 P4 P3 P2 P1 P0

0

P2

0

0

0

P1 P0P5 P4 P3P7 P6

A3 A2 A1 A0

B0B1

B2

B3

x

A B

P

44

8

Page 35: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<35>

DIGITAL

BUILDINGBLO

CKS

4x4Divider

1

A3000

Q3

1

Q2

B0B1B2B3

R0R1R2R3

A2

1

Q1

A1

1

Q0

A0

+

R B

D

R'

N

CinCout

1 0

R B

DR'N

CoutCin

Legend

A/B = Q + R/B Algorithm: R’ = 0 for i = N-1 to 0 R = {R’ << 1. Ai} D = R - B if D < 0, Qi=0, R’=R else Qi=1, R’=D R’=R

Page 36: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

DIGITAL

BUILDINGBLO

CKS

Chapter5<36>

DigitalDesignandComputerArchitecture,2ndEdi(on

Chapter7

DavidMoneyHarrisandSarahL.Harris

Page 37: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<37>

DIGITAL

BUILDINGBLO

CKS Chapter7::Topics

•  Introduc(on

•  PerformanceAnalysis

•  Single-CycleProcessor

•  Mul(cycleProcessor

•  PipelinedProcessor

•  Excep(ons

•  AdvancedMicroarchitecture

Page 38: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<38>

DIGITAL

BUILDINGBLO

CKS

•  Microarchitecture:howtoimplementanarchitectureinhardware

•  Processor:– Datapath:funcDonalblocks–  Control:controlsignals

IntroducDon

Physics

Devices

AnalogCircuits

DigitalCircuits

Logic

Micro-architecture

Architecture

OperatingSystems

ApplicationSoftware

electrons

transistorsdiodes

amplifiersfilters

AND gatesNOT gates

addersmemories

datapathscontrollers

instructionsregisters

device drivers

programs

Page 39: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<39>

DIGITAL

BUILDINGBLO

CKS

•  MulDpleimplementaDonsforasinglearchitecture:– Single-cycle:EachinstrucDonexecutesinasinglecycle

– Mul(cycle:EachinstrucDonisbrokenintoseriesofshortersteps

– Pipelined:EachinstrucDonbrokenupintoseriesofsteps&mulDpleinstrucDonsexecuteatonce

Microarchitecture

RESUME

Page 40: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<40>

DIGITAL

BUILDINGBLO

CKS

•  ProgramexecuDonDmeExecu(onTime=(#instruc(ons)(cycles/instruc(on)(seconds/

cycle)•  DefiniDons:

– CPI:Cycles/instrucDon– clockperiod:seconds/cycle–  IPC:instrucDons/cycle=IPC

•  ChallengeistosaDsfyconstraintsof:– Cost– Power– Performance

ProcessorPerformance

Page 41: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<41>

DIGITAL

BUILDINGBLO

CKS

•  ConsidersubsetofMIPSinstrucDons:–  R-typeinstrucDons:and,or,add,sub,slt– MemoryinstrucDons:lw,sw–  BranchinstrucDons:beq

MIPSProcessor

Page 42: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<42>

DIGITAL

BUILDINGBLO

CKS

•  Determineseverythingaboutaprocessor:– PC– 32registers(andafewextraoneswe’llignorehere)

– Memory

ArchitecturalState

Page 43: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<43>

DIGITAL

BUILDINGBLO

CKS MIPSStateElements

CLK

A RDInstructionMemory

A1

A3WD3

RD2RD1

WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC'

CLK

32 3232 32

32

32

32 32

32

32

5

5

5

Page 44: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<44>

DIGITAL

BUILDINGBLO

CKS

•  Datapath•  Control

Single-CycleMIPSProcessor

Page 45: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<45>

DIGITAL

BUILDINGBLO

CKS

STEP1:FetchinstrucDon

Single-CycleDatapath:lwfetch

CLK

A RDInstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr

CLK

Page 46: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<46>

DIGITAL

BUILDINGBLO

CKS

STEP2:ReadsourceoperandsfromRF

Single-CycleDatapath:lwRegisterRead

Instr

CLK

A RDInstructionMemory

A1

A3WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RDData

MemoryWD

WEPCPC'

25:21

CLK

Page 47: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<47>

DIGITAL

BUILDINGBLO

CKS

STEP3:Sign-extendtheimmediate

Single-CycleDatapath:lwImmediate

SignImm

CLK

A RDInstruction

Memory

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr 25:21

15:0

CLK

Page 48: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<48>

DIGITAL

BUILDINGBLO

CKS

STEP4:Computethememoryaddress

Single-CycleDatapath:lwaddress

SignImm

CLK

A RDInstruction

Memory

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr 25:21

15:0

SrcB

ALUResult

SrcA Zero

CLK

ALUControl2:0

ALU

010

Page 49: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<49>

DIGITAL

BUILDINGBLO

CKS

•  STEP5:Readdatafrommemoryandwriteitbacktoregisterfile

Single-CycleDatapath:lwMemoryRead

A1

A3WD3

RD2

RD1WE3

A2

SignImm

CLK

A RDInstruction

Memory

CLK

Sign Extend

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr 25:21

15:0

SrcB20:16

ALUResult ReadData

SrcA

RegWrite

Zero

CLK

ALUControl2:0

ALU

0101

Page 50: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<50>

DIGITAL

BUILDINGBLO

CKS

STEP6:DetermineaddressofnextinstrucDon

Single-CycleDatapath:lwPCIncrement

RESEUME

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr 25:21

15:0

SrcB20:16

ALUResult ReadData

SrcA

PCPlus4

Result

RegWrite

Zero

CLK

ALUControl2:0

ALU

0101

Page 51: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<51>

DIGITAL

BUILDINGBLO

CKS

Writedatainrttomemory

Single-CycleDatapath:sw

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RDData

MemoryWD

WEPCPC' Instr 25:21

20:16

15:0

SrcB20:16

ALUResult ReadData

WriteData

SrcA

PCPlus4

Result

MemWriteRegWrite

Zero

CLK

ALUControl2:0

ALU

10100

Page 52: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<52>

DIGITAL

BUILDINGBLO

CKS

•  Readfromrsandrt•  WriteALUResulttoregisterfile•  Writetord(insteadofrt)

Single-CycleDatapath:R-Type

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PCPC' Instr 25:21

20:16

15:0

SrcB

20:16

15:11

ALUResult ReadData

WriteData

SrcA

PCPlus4WriteReg4:0

Result

RegDst MemWrite MemtoRegALUSrcRegWrite

Zero

CLK

ALUControl2:0

ALU

0varies1 001

Page 53: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<53>

DIGITAL

BUILDINGBLO

CKS

•  Determinewhethervaluesinrsandrtareequal•  Calculatebranchtargetaddress:BTA=(sign-extendedimmediate<<2)+(PC+4)

Single-CycleDatapath:beq

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

RegDst Branch MemWrite MemtoRegALUSrcRegWrite

Zero

PCSrc

CLK

ALUControl2:0

ALU

01100 x0x 1

Page 54: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<54>

DIGITAL

BUILDINGBLO

CKS Single-CycleProcessor

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

Page 55: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<55>

DIGITAL

BUILDINGBLO

CKS Single-CycleControl

RegDst

BranchMemWriteMemtoReg

ALUSrcOpcode5:0

ControlUnit

ALUControl2:0Funct5:0

MainDecoder

ALUOp1:0

ALUDecoder

RegWrite

Page 56: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<56>

DIGITAL

BUILDINGBLO

CKS

F2:0 Function

000 A & B

001 A | B

010 A + B

011 not used

100 A & ~B

101 A | ~B

110 A - B

111 SLT

Review:ALU

ALU

N N

N3

A B

Y

F

Page 57: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<57>

DIGITAL

BUILDINGBLO

CKS Review:ALU

+

2 01

A B

Cout

Y

3

01

F2

F1:0

[N-1] S

NN

N

N

N NNN

N

2Zero

Extend

Page 58: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<58>

DIGITAL

BUILDINGBLO

CKS

ALUOp1:0 Meaning

00 Add

01 Subtract 10 Look at Funct 11 Not Used

ALUOp1:0 Funct ALUControl2:0

00 X 010 (Add) X1 X 110 (Subtract) 1X 100000 (add) 010 (Add) 1X 100010 (sub) 110 (Subtract) 1X 100100 (and) 000 (And) 1X 100101 (or) 001 (Or) 1X 101010 (slt) 111 (SLT)

ControlUnit:ALUDecoder

Page 59: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<59>

DIGITAL

BUILDINGBLO

CKS

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000

lw 100011

sw 101011

beq 000100

ControlUnitMainDecoder

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

Page 60: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<60>

DIGITAL

BUILDINGBLO

CKS

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 10

lw 100011 1 0 1 0 0 0 00

sw 101011 0 X 1 0 1 X 00

beq 000100 0 X 0 1 0 X 01

ControlUnit:MainDecoder

Page 61: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<61>

DIGITAL

BUILDINGBLO

CKS Single-CycleDatapath:or

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

0010

01

0

0

1

0

Page 62: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<62>

DIGITAL

BUILDINGBLO

CKS

No change to datapath

ExtendedFuncDonality:addi

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

Page 63: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<63>

DIGITAL

BUILDINGBLO

CKS

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 10

lw 100011 1 0 1 0 0 1 00

sw 101011 0 X 1 0 1 X 00

beq 000100 0 X 0 1 0 X 01

addi 001000

ControlUnit:addi

Page 64: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<64>

DIGITAL

BUILDINGBLO

CKS

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 10

lw 100011 1 0 1 0 0 1 00

sw 101011 0 X 1 0 1 X 00

beq 000100 0 X 0 1 0 X 01

addi 001000 1 0 1 0 0 0 00

ControlUnit:addi

Page 65: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<65>

DIGITAL

BUILDINGBLO

CKS

ExtendedFuncDonality:j

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

01

25:0 <<2

27:0 31:28

PCJump

Jump

Page 66: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<66>

DIGITAL

BUILDINGBLO

CKS

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

R-type 000000 1 1 0 0 0 0 10 0

lw 100011 1 0 1 0 0 1 00 0

sw 101011 0 X 1 0 1 X 00 0

beq 000100 0 X 0 1 0 X 01 0

j 000100

ControlUnit:MainDecoder

Page 67: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<67>

DIGITAL

BUILDINGBLO

CKS

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

R-type 000000 1 1 0 0 0 0 10 0

lw 100011 1 0 1 0 0 1 00 0

sw 101011 0 X 1 0 1 X 00 0

beq 000100 0 X 0 1 0 X 01 0

j 000100 0 X X X 0 X XX 1

ControlUnit:MainDecoder

Page 68: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<68>

DIGITAL

BUILDINGBLO

CKS

ProgramExecu(onTime=(#instruc(ons)(cycles/instruc(on)(seconds/cycle)=#instruc(onsxCPIxTC

Review:ProcessorPerformance

Page 69: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<69>

DIGITAL

BUILDINGBLO

CKS

TC limited by critical path (lw)

Single-CyclePerformance

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU1

0100

1

0

1

0 0

Page 70: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<70>

DIGITAL

BUILDINGBLO

CKS

•  Single-cycle critical path: Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU +

tmem + tmux + tRFsetup

•  Typically, limiting paths are: - memory, ALU, register file - Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup

Single-CyclePerformance

Page 71: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<71>

DIGITAL

BUILDINGBLO

CKS

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30

Register setup tsetup 20

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

Tc = ?

Single-CyclePerformanceExample

Page 72: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<72>

DIGITAL

BUILDINGBLO

CKS

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30

Register setup tsetup 20

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup = [30 + 2(250) + 150 + 25 + 200 + 20] ps = 925 ps

Single-CyclePerformanceExample

Page 73: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<73>

DIGITAL

BUILDINGBLO

CKS

Program with 100 billion instructions:

Execution Time = # instructions x CPI x TC

= (100 × 109)(1)(925 × 10-12 s)

= 92.5 seconds

Single-CyclePerformanceExample

Page 74: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<74>

DIGITAL

BUILDINGBLO

CKS

•  Single-cycle:+simple

-  cycleDmelimitedbylongestinstrucDon(lw)-  2adders/ALUs&2memories

•  Mul(cycle:+higherclockspeed+simplerinstrucDonsrunfaster+reuseexpensivehardwareonmulDplecycles-sequencingoverheadpaidmanyDmes•  Samedesignsteps:datapath&control

MulDcycleMIPSProcessor

Page 75: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<75>

DIGITAL

BUILDINGBLO

CKS

•  Replace Instruction and Data memories with a single unified memory – more realistic

MulDcycleStateElements

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

RegisterFile

PCPC'

WD

WE

CLK

EN

Page 76: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<76>

DIGITAL

BUILDINGBLO

CKS

STEP 1: Fetch instruction

MulDcycleDatapath:InstrucDonFetch

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

RegisterFile

PCPC' Instr

CLK

WD

WE

CLK

EN

IRWrite

Page 77: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<77>

DIGITAL

BUILDINGBLO

CKS

MulDcycleDatapath:lwRegisterRead

STEP2a:ReadsourceoperandsfromRF

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

RegisterFile

PCPC' Instr 25:21

CLK

WD

WE

CLK CLK

A

EN

IRWrite

Page 78: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<78>

DIGITAL

BUILDINGBLO

CKS

MulDcycleDatapath:lwImmediate

STEP2b:Sign-extendtheimmediate

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr 25:21

15:0

CLK

WD

WE

CLK CLK

A

EN

IRWrite

Page 79: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<79>

DIGITAL

BUILDINGBLO

CKS

MulDcycleDatapath:lwAddress

STEP3:Computethememoryaddress

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr 25:21

15:0

SrcB

ALUResult

SrcA

ALUOut

CLK

ALUControl2:0

ALU

WD

WE

CLK CLK

A CLK

EN

IRWrite

Page 80: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<80>

DIGITAL

BUILDINGBLO

CKS

MulDcycleDatapath:lwMemoryRead

STEP4:Readdatafrommemory

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr 25:21

15:0

SrcB

ALUResult

SrcA

ALUOut

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A CLK

EN

IRWriteIorD

01

Page 81: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<81>

DIGITAL

BUILDINGBLO

CKS

MulDcycleDatapath:lwWriteRegister

STEP5:Writedatabacktoregisterfile

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr 25:21

15:0

SrcB20:16

ALUResult

SrcA

ALUOut

RegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A CLK

EN

IRWriteIorD

01

Page 82: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<82>

DIGITAL

BUILDINGBLO

CKS

MulDcycleDatapath:IncrementPC

STEP6:IncrementPC

PCWrite

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01PCPC' Instr 25:21

15:0

SrcB20:16

ALUResult

SrcA

ALUOut

ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A

00011011

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD

01

Page 83: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<83>

DIGITAL

BUILDINGBLO

CKS

Write data in rt to memory

MulDcycleDatapath:sw

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01PC 0

1

PC' Instr 25:21

20:16

15:0

SrcB20:16

ALUResult

SrcA

ALUOut

MemWrite ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A

00011011

4

CLK

ENEN

ALUSrcB1:0IRWriteIorDPCWrite

B

Page 84: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<84>

DIGITAL

BUILDINGBLO

CKS

•  Read from rs and rt

•  Write ALUResult to register file

•  Write to rd (instead of rt)

MulDcycleDatapath:R-Type

01

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01PC 0

1

PC' Instr 25:21

20:16

15:0

SrcB20:16

15:11

ALUResult

SrcA

ALUOut

RegDstMemWrite MemtoReg ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWriteIorDPCWrite

Page 85: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<85>

DIGITAL

BUILDINGBLO

CKS

•  rs == rt?

•  BTA = (sign-extended immediate << 2) + (PC+4)

MulDcycleDatapath:beq

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

RegDst BranchMemWrite MemtoReg ALUSrcARegWrite

Zero

PCSrc

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD PCWrite

PCEn

Page 86: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<86>

DIGITAL

BUILDINGBLO

CKS

MulDcycleProcessor

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

Page 87: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<87>

DIGITAL

BUILDINGBLO

CKS

MulDcycleControl

ALUSrcA

PCSrc

Branch

ALUSrcB1:0Opcode5:0

ControlUnit

ALUControl2:0Funct5:0

MainController(FSM)

ALUOp1:0

ALUDecoder

RegWrite

PCWrite

IorD

MemWriteIRWrite

RegDstMemtoReg

RegisterEnables

MultiplexerSelects

Page 88: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<88>

DIGITAL

BUILDINGBLO

CKS MainControllerFSM:Fetch

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

0

1 1

0

X

X

00

01

0100

10

Reset

S0: Fetch

Page 89: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<89>

DIGITAL

BUILDINGBLO

CKS MainControllerFSM:Fetch

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

0

1 1

0

X

X

00

01

0100

10

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

Reset

S0: Fetch

Page 90: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<90>

DIGITAL

BUILDINGBLO

CKS MainControllerFSM:Decode

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

Reset

S0: Fetch S1: Decode

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

X

0 0

0

X

X

0X

XX

XXXX

00

Page 91: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<91>

DIGITAL

BUILDINGBLO

CKS MainControllerFSM:Address

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

Reset

S0: Fetch

S2: MemAdr

S1: Decode

Op = LWor

Op = SW

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

X

0 0

0

X

X

01

10

010X

00

Page 92: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<92>

DIGITAL

BUILDINGBLO

CKS MainControllerFSM:Address

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

Reset

S0: Fetch

S2: MemAdr

S1: Decode

Op = LWor

Op = SW

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

X

0 0

0

X

X

01

10

010X

00

Page 93: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<93>

DIGITAL

BUILDINGBLO

CKS MainControllerFSM:lw

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemRead

Op = LWor

Op = SW

Op = LW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 94: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<94>

DIGITAL

BUILDINGBLO

CKS MainControllerFSM:sw

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1 IorD = 1MemWrite

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

Op = LWor

Op = SW

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 95: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<95>

DIGITAL

BUILDINGBLO

CKS MainControllerFSM:R-Type

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

Op = LWor

Op = SWOp = R-type

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 96: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<96>

DIGITAL

BUILDINGBLO

CKS MainControllerFSM:beq

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 97: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<97>

DIGITAL

BUILDINGBLO

CKS

MulDcycleControllerFSM

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 98: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<98>

DIGITAL

BUILDINGBLO

CKS

ExtendedFuncDonality:addi

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Op = ADDI

S9: ADDIExecute

S10: ADDIWriteback

Page 99: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<99>

DIGITAL

BUILDINGBLO

CKS

MainControllerFSM:addi

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

RegDst = 0MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDIExecute

S10: ADDIWriteback

Page 100: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<100>

DIGITAL

BUILDINGBLO

CKS

ExtendedFuncDonality:j

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01PC 0

1

PC' Instr 25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

RegDst BranchMemWrite MemtoReg ALUSrcARegWrite

Zero

PCSrc1:0

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD PCWrite

PCEn

000110

<<2

25:0 (jump)

31:28

27:0

PCJump

Page 101: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<101>

DIGITAL

BUILDINGBLO

CKS

MainControllerFSM:j

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 00

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

RegDst = 0MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDIExecute

S10: ADDIWriteback

Op = J

S11: Jump

Page 102: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<102>

DIGITAL

BUILDINGBLO

CKS

MainControllerFSM:j

IorD = 0AluSrcA = 0

ALUSrcB = 01ALUOp = 00PCSrc = 00

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SWOp = R-type

Op = BEQ

Op = LWOp = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

RegDst = 0MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDIExecute

S10: ADDIWriteback

PCSrc = 10PCWrite

Op = J

S11: Jump

Page 103: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<103>

DIGITAL

BUILDINGBLO

CKS

•  Instructions take different number of cycles: -  3 cycles: beq, j -  4 cycles: R-Type, sw, addi -  5 cycles: lw

•  CPI is weighted average

•  SPECINT2000 benchmark: -  25% loads -  10% stores -  11% branches -  2% jumps -  52% R-type

Average CPI = (0.11 + 0.02)(3) + (0.52 + 0.10)(4) + (0.25)(5) = 4.12

MulDcycleProcessorPerformance

Page 104: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<104>

DIGITAL

BUILDINGBLO

CKS

Multicycle critical path:

Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup

MulDcycleProcessorPerformance

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01 0

1

PC 01

PC' Instr 25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

MemtoR

eg

ALUSrcARegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

Page 105: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<105>

DIGITAL

BUILDINGBLO

CKS

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30

Register setup tsetup 20

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

Tc = ?

MulDcyclePerformanceExample

Page 106: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<106>

DIGITAL

BUILDINGBLO

CKS

Element Parameter Delay (ps)

Register clock-to-Q tpcq_PC 30

Register setup tsetup 20

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

Tc = tpcq_PC + tmux + max(tALU + tmux, tmem) + tsetup = tpcq_PC + tmux + tmem + tsetup = [30 + 25 + 250 + 20] ps = 325 ps

MulDcyclePerformanceExample

Page 107: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<107>

DIGITAL

BUILDINGBLO

CKS

Program with 100 billion instructions

Execution Time = (# instructions) × CPI × Tc

= (100 × 109)(4.12)(325 × 10-12) = 133.9 seconds

This is slower than the single-cycle processor (92.5 seconds). Why? - Not all steps same length - Sequencing overhead for each step (tpcq + tsetup= 50 ps)

MulDcyclePerformanceExample

Page 108: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<108>

DIGITAL

BUILDINGBLO

CKS

Review:Single-CycleProcessor

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01 PC' Instr 25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

BranchMemWriteMemtoReg

ALUSrc

RegWrite

OpFunct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

01

25:0 <<2

27:0 31:28

PCJump

Jump

Page 109: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<109>

DIGITAL

BUILDINGBLO

CKS

Review:MulDcycleProcessor

ImmExt

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2RD1

WE3

A2

CLK

Sign Extend

RegisterFile

01

01PC 0

1

PC' Instr 25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

ZeroCLK

ALU

WD

WE

CLK

Adr

01

Data

CLK

CLK

AB 00

011011

4

CLK

ENEN000110

<<2

25:0 (Addr)

31:28

27:0

PCJump

5:0

31:26

Branch

MemWrite

ALUSrcARegWrite

OpFunct

ControlUnit

PCSrc

CLK

ALUControl2:0

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

RegD

st

Mem

toReg

Page 110: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<110>

DIGITAL

BUILDINGBLO

CKS

•  Temporal parallelism

•  Divide single-cycle processor into 5 stages: - Fetch - Decode - Execute - Memory - Writeback

•  Add pipeline registers between stages

PipelinedMIPSProcessor

Page 111: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<111>

DIGITAL

BUILDINGBLO

CKS

Single-Cyclevs.Pipelined

RESUME

Time (ps)Instr

FetchInstruction

DecodeRead Reg

ExecuteALU

MemoryRead / Write

WriteReg1

2

0 100 200 300 400 500 600 700 800 900 1100 1200 1300 1400 1500 1600 1700 1800 19001000

Instr

1

2

3

FetchInstruction

DecodeRead Reg

ExecuteALU

MemoryRead / Write

WriteReg

FetchInstruction

DecodeRead Reg

ExecuteALU

MemoryRead/Write

WriteReg

FetchInstruction

DecodeRead Reg

ExecuteALU

MemoryRead/Write

WriteReg

FetchInstruction

DecodeRead Reg

ExecuteALU

MemoryRead/Write

WriteReg

Single-Cycle

Pipelined

Page 112: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<112>

DIGITAL

BUILDINGBLO

CKS

PipelinedProcessorAbstracDon

Time (cycles)

lw $s2, 40($0) RF 40

$0RF

$s2+ DM

RF $t2

$t1RF

$s3+ DM

RF $s5

$s1RF

$s4- DM

RF $t6

$t5RF

$s5& DM

RF 20

$s1RF

$s6+ DM

RF $t4

$t3RF

$s7| DM

add $s3, $t1, $t2

sub $s4, $s1, $s5

and $s5, $t5, $t6

sw $s6, 20($s1)

or $s7, $t3, $t4

1 2 3 4 5 6 7 8 9 10

add

IM

IM

IM

IM

IM

IM lw

sub

and

sw

or

Page 113: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<113>

DIGITAL

BUILDINGBLO

CKS

Single-Cycle&PipelinedDatapath

SignImmE

CLK

A RD

InstructionMemory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PCF01

PC' InstrD 25:21

20:16

15:0

SrcBE

20:16

15:11

RtE

RdE

<<2+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

ResultW

PCPlus4EPCPlus4F

ZeroM

CLK CLK

ALU

WriteRegE4:0

CLKCLK

CLK

SignImm

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PC01

PC' Instr 25:21

20:16

15:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

Zero

CLK

ALU

Fetch Decode Execute Memory Writeback

Page 114: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<114>

DIGITAL

BUILDINGBLO

CKS

WriteReg must arrive at same time as Result

CorrectedPipelinedDatapath

SignImmE

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PCF01

PC' InstrD 25:21

20:16

15:0

SrcBE

20:16

15:11

RtE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

WriteRegM4:0

ResultW

PCPlus4EPCPlus4F

ZeroM

CLK CLK

WriteRegW4:0

ALU

WriteRegE4:0

CLKCLK

CLK

Fetch Decode Execute Memory Writeback

Page 115: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<115>

DIGITAL

BUILDINGBLO

CKS

•  Samecontrolunitassingle-cycleprocessor

•  Controldelayedtoproperpipelinestage

PipelinedProcessorControl

SignImmE

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

01

01

A RDData

MemoryWD

WE01

PCF01

PC' InstrD 25:21

20:16

15:0

5:0

SrcBE

20:16

15:11

RtE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

WriteRegM4:0

ResultW

PCPlus4EPCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

ZeroM

PCSrcM

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

BranchE BranchM

RegDstE

ALUSrcE

WriteRegE4:0

Page 116: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<116>

DIGITAL

BUILDINGBLO

CKS

•  WhenaninstrucDondependsonresultfrominstrucDonthathasn’tcompleted

•  Types:– Datahazard:registervaluenotyetwrikenbacktoregisterfile

– Controlhazard:nextinstrucDonnotdecidedyet(causedbybranches)

PipelineHazards

Page 117: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<117>

DIGITAL

BUILDINGBLO

CKS

DataHazard

Time (cycles)

add $s0, $s2, $s3 RF $s3

$s2RF

$s0+ DM

RF $s1

$s0RF

$t0& DM

RF $s0

$s4RF

$t1| DM

RF $s5

$s0RF

$t2- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IM add

or

sub

Page 118: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<118>

DIGITAL

BUILDINGBLO

CKS

•  InsertnopsincodeatcompileDme•  RearrangecodeatcompileDme•  ForwarddataatrunDme•  StalltheprocessoratrunDme

HandlingDataHazards

Page 119: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<119>

DIGITAL

BUILDINGBLO

CKS

•  Insert enough nops for result to be ready

•  Or move independent useful instructions forward

Compile-TimeHazardEliminaDon

Time (cycles)

add $s0, $s2, $s3 RF $s3

$s2RF

$s0+ DM

RF $s1

$s0RF

$t0& DM

RF $s0

$s4RF

$t1| DM

RF $s5

$s0RF

$t2- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IM add

or

sub

nop

nop

RF RFDMnopIM

RF RFDMnopIM

9 10

Page 120: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<120>

DIGITAL

BUILDINGBLO

CKS

DataForwarding

Time (cycles)

add $s0, $s2, $s3 RF $s3

$s2RF

$s0+ DM

RF $s1

$s0RF

$t0& DM

RF $s0

$s4RF

$t1| DM

RF $s5

$s0RF

$t2- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IM add

or

sub

Page 121: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<121>

DIGITAL

BUILDINGBLO

CKS

DataForwarding

SignImmE

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

SignExtend

RegisterFile

01

01

A RDData

MemoryWD

WE

10

PCF01

PC' InstrD 25:21

20:16

15:0

5:0

SrcBE

25:21

15:11

RsE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

WriteRegM4:0

ResultW

PCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD2:0

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

PCSrcM

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

RegDstE

ALUSrcE

WriteRegE4:0

000110

000110

SignImmD

ForwardAE

ForwardBE

20:16 RtE

RsD

RdD

RtD

RegWriteM

RegWriteW

Hazard Unit

PCPlus4E

BranchE BranchM

ZeroM

Page 122: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<122>

DIGITAL

BUILDINGBLO

CKS

•  ForwardtoExecutestagefromeither:– Memorystageor– Writebackstage

•  ForwardinglogicforForwardAE:if ((rsE != 0) AND (rsE == WriteRegM) AND RegWriteM) then ForwardAE = 10 else if ((rsE != 0) AND (rsE == WriteRegW) AND RegWriteW) then ForwardAE = 01 else ForwardAE = 00ForwardinglogicforForwardBEsame,butreplacersEwithrtE

DataForwarding

Page 123: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<123>

DIGITAL

BUILDINGBLO

CKS

Stalling

Time (cycles)

lw $s0, 40($0) RF 40

$0RF

$s0+ DM

RF $s1

$s0RF

$t0& DM

RF $s0

$s4RF

$t1| DM

RF $s5

$s0RF

$t2- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IM lw

or

sub

Trouble!

Page 124: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<124>

DIGITAL

BUILDINGBLO

CKS

Stalling

Time (cycles)

lw $s0, 40($0) RF 40

$0RF

$s0+ DM

RF $s1

$s0RF

$t0& DM

RF $s0

$s4RF

$t1| DM

RF $s5

$s0RF

$t2- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IM lw

or

sub

9

RF $s1

$s0

IM or

Stall

Page 125: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<125>

DIGITAL

BUILDINGBLO

CKS

StallingHardware

SignImmE

CLK

A RDInstruction

Memory+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

SignExtend

RegisterFile

01

01

A RDData

MemoryWD

WE

10

PCF01

PC' InstrD 25:21

20:16

15:0

5:0

SrcBE

25:21

15:11

RsE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

WriteRegM4:0

ResultW

PCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD2:0

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

PCSrcM

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

RegDstE

ALUSrcE

WriteRegE4:0

000110

000110

SignImmD

StallF

StallD

ForwardAE

ForwardBE

20:16 RtE

RsD

RdD

RtD

RegWriteM

RegWriteW

MemtoRegE

Hazard Unit

FlushE

PCPlus4E

BranchE BranchM

ZeroM

EN

EN CLR

Page 126: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<126>

DIGITAL

BUILDINGBLO

CKS

lwstall = ((rsD==rtE) OR (rtD==rtE)) AND MemtoRegE StallF = StallD = FlushE = lwstall

StallingLogic

Page 127: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<127>

DIGITAL

BUILDINGBLO

CKS

•  beq:–  branchnotdeterminedunDl4thstageofpipeline–  InstrucDonsaaerbranchfetchedbeforebranchoccurs

–  TheseinstrucDonsmustbeflushedifbranchhappens•  Branchmispredic(onpenalty

–  numberofinstrucDonflushedwhenbranchistaken– Maybereducedbydeterminingbranchearlier

ControlHazards

Page 128: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<128>

DIGITAL

BUILDINGBLO

CKS

ControlHazards:OriginalPipeline

SignImmE

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

SignExtend

RegisterFile

01

01

A RDData

MemoryWD

WE

10

PCF01

PC' InstrD 25:21

20:16

15:0

5:0

SrcBE

25:21

15:11

RsE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

WriteRegM4:0

ResultW

PCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD2:0

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

PCSrcM

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

RegDstE

ALUSrcE

WriteRegE4:0

000110

000110

SignImmD

StallF

StallD

ForwardAE

ForwardBE

20:16 RtE

RsD

RdD

RtD

RegWriteM

RegWriteW

MemtoRegE

Hazard Unit

FlushE

PCPlus4E

BranchE BranchM

ZeroM

EN

EN

CLR

Page 129: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<129>

DIGITAL

BUILDINGBLO

CKS

ControlHazards

Time (cycles)

beq $t1, $t2, 40 RF $t2

$t1RF- DM

RF $s1

$s0RF& DM

RF $s0

$s4RF| DM

RF $s5

$s0RF- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IM lw

or

sub

20

24

28

2C

30

...

...

9

Flushthese

instructions

64 slt $t3, $s2, $s3 RF $s3

$s2RF

$t3slt DMIM slt

Page 130: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<130>

DIGITAL

BUILDINGBLO

CKS

IntroducedanotherdatahazardinDecodestage

EarlyBranchResoluDon

EqualD

SignImmE

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

SignExtend

RegisterFile

01

01

A RDData

MemoryWD

WE

10

PCF01

PC' InstrD 25:21

20:16

15:0

5:0

SrcBE

25:21

15:11

RsE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchD

WriteRegM4:0

ResultW

PCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD2:0

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

PCSrcD

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

RegDstE

ALUSrcE

WriteRegE4:0

000110

000110

=

SignImmD

StallF

StallD

ForwardAE

ForwardBE

20:16 RtE

RsD

RdE

RtD

RegWriteM

RegWriteW

MemtoRegE

Hazard Unit

FlushE

EN

EN

CLR

CLR

Page 131: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<131>

DIGITAL

BUILDINGBLO

CKS

EarlyBranchResoluDon

Time (cycles)

beq $t1, $t2, 40 RF $t2

$t1RF- DM

RF $s1

$s0RF& DMand $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

andIM

IM lw20

24

28

2C

30

...

...

9

Flushthis

instruction

64 slt $t3, $s2, $s3 RF $s3

$s2RF

$t3slt DMIM slt

Page 132: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<132>

DIGITAL

BUILDINGBLO

CKS

HandlingData&ControlHazards

EqualD

SignImmE

CLK

A RDInstruction

Memory

+

4

A1

A3WD3

RD2

RD1WE3

A2

CLK

SignExtend

RegisterFile

01

01

A RDData

MemoryWD

WE

10

PCF01

PC' InstrD 25:21

20:16

15:0

5:0

SrcBE

25:21

15:11

RsE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchD

WriteRegM4:0

ResultW

PCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD2:0

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

PCSrcD

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

RegDstE

ALUSrcE

WriteRegE4:0

000110

000110

0101

=

SignImmD

StallF

StallD

ForwardAE

ForwardBE

ForwardAD

ForwardBD

20:16 RtE

RsD

RdD

RtD

RegWriteE

RegWriteM

RegWriteW

MemtoRegE

BranchD

Hazard Unit

FlushE

EN

EN

CLR

CLR

Page 133: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<133>

DIGITAL

BUILDINGBLO

CKS

•  Forwardinglogic: ForwardAD = (rsD !=0) AND (rsD == WriteRegM) AND RegWriteM ForwardBD = (rtD !=0) AND (rtD == WriteRegM) AND RegWriteM

•  Stallinglogic:

branchstall = BranchD AND RegWriteE AND (WriteRegE == rsD OR WriteRegE == rtD) OR BranchD AND MemtoRegM AND

(WriteRegM == rsD OR WriteRegM == rtD) StallF = StallD = FlushE = lwstall OR branchstall

ControlForwarding&StallingLogic

Page 134: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<134>

DIGITAL

BUILDINGBLO

CKS

•  Guesswhetherbranchwillbetaken– Backwardbranchesareusuallytaken(loops)– Considerhistorytoimproveguess

•  GoodpredicDonreducesfracDonofbranchesrequiringaflush

BranchPredicDon

Page 135: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<135>

DIGITAL

BUILDINGBLO

CKS

•  SPECINT2000 benchmark: -  25% loads -  10% stores -  11% branches -  2% jumps -  52% R-type

•  Suppose: -  40% of loads used by next instruction -  25% of branches mispredicted -  All jumps flush next instruction

•  What is the average CPI?

PipelinedPerformanceExample

Page 136: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<136>

DIGITAL

BUILDINGBLO

CKS

•  SPECINT2000 benchmark: -  25% loads -  10% stores -  11% branches -  2% jumps -  52% R-type

•  Suppose: -  40% of loads used by next instruction -  25% of branches mispredicted -  All jumps flush next instruction

•  What is the average CPI? -  Load/Branch CPI = 1 when no stalling, 2 when stalling -  CPIlw = 1(0.6) + 2(0.4) = 1.4 -  CPIbeq = 1(0.75) + 2(0.25) = 1.25 Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15

PipelinedPerformanceExample

Page 137: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<137>

DIGITAL

BUILDINGBLO

CKS

•  Pipelined processor critical path:

Tc = max {

tpcq + tmem + tsetup

2(tRFread + tmux + teq + tAND + tmux + tsetup )

tpcq + tmux + tmux + tALU + tsetup

tpcq + tmemwrite + tsetup

2(tpcq + tmux + tRFwrite) }

PipelinedPerformance

Page 138: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<138>

DIGITAL

BUILDINGBLO

CKS

Element Parameter Delay (ps) Register clock-to-Q tpcq_PC 30 Register setup tsetup 20 Multiplexer tmux 25 ALU tALU 200 Memory read tmem 250 Register file read tRFread 150 Register file setup tRFsetup 20 Equality comparator teq 40 AND gate tAND 15 Memory write Tmemwrite 220 Register file write tRFwrite 100 ps

Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup )

= 2[150 + 25 + 40 + 15 + 25 + 20] ps = 550 ps

PipelinedPerformanceExample

Page 139: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<139>

DIGITAL

BUILDINGBLO

CKS

Program with 100 billion instructions

Execution Time = (# instructions) × CPI × Tc

= (100 × 109)(1.15)(550 × 10-12)

= 63 seconds

PipelinedPerformanceExample

Page 140: Chapter 5 S - users.tricity.wsu.eduusers.tricity.wsu.edu/~bobl/cpts260/u05_processor/notes_ch5and7.pdf · Carry-Lookahead Adder. Chapter 5  S ... Compare delay of: 32-bit

Chapter5<140>

DIGITAL

BUILDINGBLO

CKS

Processor Execution Time

(seconds)

Speedup

(single-cycle as baseline)

Single-cycle 92.5 1

Multicycle 133 0.70

Pipelined 63 1.47

ProcessorPerformanceComparison