13
Pipelined Design ללללל9

Pipelined Design

  • Upload
    lotte

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Pipelined Design. תרגול 9. Latency and Throughput. Latency - Total time to perform a single operation from start to end. Throughput – operations / latency ratio or GOPS, giga-operations per second. The clock cycle is always bounded by the slowest operation. Picoseconds - 10^-12 - PowerPoint PPT Presentation

Citation preview

Page 1: Pipelined Design

Pipelined Design

9תרגול

Page 2: Pipelined Design

Latency and Throughput

Latency - Total time to perform a single operation from start to end.

Throughput – operations / latency ratio or GOPS, giga-operations per second.

The clock cycle is always bounded by the slowest operation.

Picoseconds - 10^-12 Nanoseconds - 10^-9

Page 3: Pipelined Design

Latency and Throughput

A B C D E FR

E

G

80ps 30ps 60ps 50ps 70ps 10ps 20ps

How to maximize throughput using an additional register? Put it between C and D

How to maximize throughput using 2 additional registers? AB, CD, EF

How to maximize throughput using 3 additional registers? A, BC, D, EF

What is the cycle time, latency and throughput in that case? 110ps, 440ps, (1/110) * 1000 = 9.09

How to get max throughput with min stages? 5 stages, since we cannot go lower than 100ps for a clock cycle

Page 4: Pipelined Design

Pipeline registers

Select A,Since only jxx and call need valP at further stages (E and M resp.)

Predicted PC

Page 5: Pipelined Design

5

Control Hazards

• PC prediction (Fetch stage) jxx , ret → ? call, jmp → ValC Other → ValP

• Branch prediction strategies• Always-taken• Never-taken• Backward taken, forward not taken

Why are forward branches less common ? How can we deal with stack prediction (ret) ?

Page 6: Pipelined Design

6

Data Hazards

Data dependencies in Processors Result from one instruction is being used as an

operand for another (nop example) Stalling / forwarding / load-use hazard

Page 7: Pipelined Design

Data Dependencies: 2 Nop’s

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9

F D E M WF D E M W

0x006: irmovl $3,%eax F D E M WF D E M W

0x00c: nop F D E M WF D E M W

0x00d: nop F D E M WF D E M W

0x00e: addl %edx,%eax F D E M WF D E M W

0x010: halt F D E M WF D E M W

10# demo-h2.ys

W

R[ %eax  ] 3

D

valA R[ %edx  = ]10

valB R[ %eax  = ]0

•••

W

R[ %eax  ] 3

W

R[ %eax  ]← 3

D

valA R[ %edx  = ]10

valB R[ %eax  = ]0

D

valA R[ %edx [ = 10

valB R[ %eax [ = 0

•••

Cycle 6

Error

Page 8: Pipelined Design

Stalling solution

If instruction follows too closely after one that writes register, slow it down

Hold instruction in decode Dynamically inject nop into execute stage

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9

F D E M W

0x006: irmovl $3,%eax F D E M W

0x00c: nop F D E M W

bubble

F

E M W

0x00e: addl %edx,%eax D D E M W

0x010: halt F D E M W

10

#demo-h2.ys

F

F D E M W0x00d: nop

Page 9: Pipelined Design

Data Forwarding Solution

irmovl in write-back stage

Destination value in W pipeline register

Forward as valB for decode stage

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9

F D E M WF D E M W

0x006: irmovl $3,%eax F D E M WF D E M W

0x00c: nop F D E M WF D E M W

0x00d: nop F D E M WF D E M W

0x00e: addl%edx,%eax F D E M WF D E M W

0x010: halt F D E M WF D E M W

#demo-h2.ys

Cycle 6

W

R[ %eax ] 3

D

valA R[ %edx [ = 10

valB W_valE= 3

•••

W_dstE=  %eaxW_valE = 3

srcA=  %edxsrcB=  %eax

←←

Page 10: Pipelined Design

Limitation of Forwarding

Load-use dependency Value needed by end of

decode stage in cycle 7 Value read from memory in

memory stage of cycle 8

0x000: irmovl $128,%edx

1 2 3 4 5 6 7 8 9

F D E M

W0x006: irmovl $3,%ecx F D E M

W

0x00c: rmmovl %ecx, 0(%edx) F D E M W

0x012: irmovl $10,%ebx F D E M W

0x018: mrmovl 0(%edx),%eax # Load %eax F D E M W

#demo-luh.ys

0x01e: addl%ebx,%eax # Use %eax

0x020: halt

F D E M W

F D E M W

10

F D E M W

11

Error

MM_dstM=  %eaxm_valM M]128[ = 3

Cycle 7 Cycle 8

D

valA M_valE= 10valB R[ %eax  = ]0

D

valA M_valE= 10valB R[ %eax[ = 0

MM_dstE=  %ebxM_valE= 10

•••

←←

Page 11: Pipelined Design

Avoiding Load/Use Hazard

Stall using instruction for one cycle

Can then pick up loaded value by forwarding from

memory stage

0x000: irmovl $128,%edx

1 2 3 4 5 6 7 8 9

F D E M

W

F D E M

W0x006: irmovl $3,%ecx F D E M

W

F D E M

W

0x00c: rmmovl %ecx, 0(%edx) F D E M WF D E M W

0x012: irmovl $10,%ebx F D E M WF D E M W

0x018: mrmovl 0(%edx),%eax# Load %eax F D E M WF D E M W

#demo-luh.ys

0x01e:addl%ebx,%eax# Use %eax

0x020: halt

F D E M W

E M W

10

D D E M W

11

bubble

F D E M W

F

F

MM_dstM = %eax

m_valM0M]128[ = 3

MM_dstM=  %eaxm_valM M]128[ = 3

Cycle 8

D

valA W_valE= 10valB m_valM= 3

D

valA W_valE= 10valB m_valM= 3

WW_dstE = %ebx

W_valE= 10

WW_dstE=  %ebxW_valE= 10

••

←←

Page 12: Pipelined Design

12

Control Logic

Processing “ret”Must stall until instruction reaches write back

Load/use hazardMust stall between read memory and use

Mis-predicted branch removing instructions from the pipe

Page 13: Pipelined Design

• Implementing the forwarding logic

• Note: 5 forwarding sources