29
UC Regents Fall 2005 © UCB CS 152 L7: Pipelining I 2005-9-20 John Lazzaro (www.cs.berkeley.edu/~lazzaro) CS 152 Computer Architecture and Engineering Lecture 7 Pipelining I www-inst.eecs.berkeley.edu/~cs152/ TAs: David Marquardt and Udam Saini

Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

2005-9-20John Lazzaro

(www.cs.berkeley.edu/~lazzaro)

CS 152 Computer Architecture and Engineering

Lecture 7 – Pipelining I

www-inst.eecs.berkeley.edu/~cs152/

TAs: David Marquardt and Udam Saini

Page 2: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Office Hours Change

David: W 3-4, Th 3-4, 125 CoryUdam: W 3-5 125 Cory, Tu 10-12 345 SodaJohn: Mon 9:30-10:30 AM, 315 Soda

Page 3: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Last Time: Performance Equation

SecondsProgram

InstructionsProgram= Seconds

Cycle InstructionCycles

Goal is to optimize execution time, notindividualequation

terms.

The CPI of the

program.Reflects

the program’s instruction

mix.

Machinesare

optimizedwith

respect toprogram

workloads.

Clockperiod.

Optimizejointlywith

machineCPI.

Page 4: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Today: Introduction to Pipelining

How to apply the performance equation to our single-cycle CPU.

Why pipelining is hard: data hazards,control hazards, structural hazards.

Pipelining: an idea from assemblyline production applied to CPU design

Also: Introduction to Lab 3

Page 5: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Note: Reading is Fundamental ...

The lectures are a gentle introduction, to prepare you to read the book ...

The book presentation of pipelined processors is sufficient to do Lab 3.

These lectures are not.

Page 6: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Recall: Our single-cycle processor

rd1

RegFile

rd2

WEwd

rs1

rs2

ws

D

PC

Q

+

0x4

Dout

Data Memory

WE

Din

Addr

MemToReg

Addr Data

Instr

Mem32A

L

U

32

32

op

Ext

SecondsProgram

InstructionsProgram

= SecondsCycle Instruction

Cycles

CPI == 1This is good.

Slow.This is bad.

Challenge: Speed up clock while keeping CPI == 1

Page 7: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Recall: An R-format CPU design

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

32ALU

32

32

op

opcode rs rt rd functshamt

Decode fields to get : ADD $8 $9 $10

Logic

Page 8: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Reminder: How data flows after posedge

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

32ALU

32

32

op

Logic

Addr Data

InstrMem

D

PC

Q+

0x4

Page 9: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Next posedge: Update state and repeat

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

D

PC

Q

Page 10: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Observation: Logic idle most of cycle

rd1

RegFile

rd2

WEwd

rs1

rs2

ws

D

PC

Q

+

0x4

Dout

Data Memory

WE

Din

Addr

MemToReg

Addr Data

Instr

Mem32A

L

U

32

32

op

Ext

For most of cycle, ALU is either “waiting” for its inputs, or “holding” its output

Ideal: a CPU architecture where each part is always “working”.

Page 11: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Inspiration: Automobile assembly lineAssembly line moves on a steady clock.

Each station does the same task on each car.Car

body shell

Car chassis

Mergestation

Boltingstation

The clock

Page 12: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Inspiration: Automobile assembly lineSimpler station tasks → more cars per hour.Simple tasks take less time, clock is faster.

Page 13: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Inspiration: Automobile assembly lineLine speed limited by slowest task.

Most efficient if all tasks take same time to do

Page 14: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Inspiration: Automobile assembly lineSimpler tasks, complex car → long line!

These lines go 24 x 7, and rarely shut down.

Why?

Page 15: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Lessons from car assembly lines

Faster line movement yields more cars per hour off the line.

Faster line movement requires more stages, each doing simpler tasks.

To maximize efficiency, all stages should take same amount of time(if not, workers in fast stages are idle)

“Filling”, “flushing”, and “stalling” assembly line are all bad news.

Page 16: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Key Analogy: The instruction is the car

D

PC

Q

+

0x4

Addr Data

Instr

Mem

IR IR IR

Instruction Fetch

IR

Pipeline Stage #1 Stage #2

Controlshardware

in stage 2

Stage #3

Controlshardware

in stage 3

Stage #4

Controlshardware

in stage 4

Stage #5

Controlshardware

in stage 5

“Data-stationary control”

Page 17: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Example: Decode & Register Fetch Stage

D

PC

Q

+

0x4

Addr Data

Instr

Mem

IR

Instr Fetch

Pipeline Stage #1

rd1

RegFile

rd2

WEwd

rs1

rs2

ws

Ext

IR

B

A

M

Stage #2

Decode & Reg Fetch

IR

Stage #3

ADD R4,R3,R2OR R7,R6,R5SUB R10, R9,R8

ADD R4,R3,R2OR R7,R6,R5SUB R10,R9,R8

A sample program

R’s chosen so that instructions are

independent - like cars on the line.

Page 18: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Decode & Reg Fetch

Performance Equation and Pipelining

rd1

RegFile

rd2

WEwd

rs1

rs2

ws

D

PC

Q

+

0x4

Addr Data

Instr

Mem

Ext

IR IR IR

B

A

M

Instr Fetch Stage #3

SecondsProgram

InstructionsProgram= Seconds

Cycle InstructionCycles

To get shortest clock period,

balance the work to do in each

pipeline stage.

CPI == 1Once pipe is fill,one instructioncompletes per

cycle

Clock period is shorter

Less work to do in each cycle

Page 19: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Hazards: An instruction is not a car ...

rd1

RegFile

rd2

WEwd

rs1

rs2

ws

D

PC

Q

+

0x4

Addr Data

Instr

Mem

Ext

IR IR IR

B

A

M

Instr Fetch

Stage #1 Stage #2 Stage #3

Decode & Reg Fetch

ADD R4,R3,R2OR R5,R4,R2

An example of a “hazard” -- we must

(1) detect and (2) resolve all hazards

to make a CPU that matches ISA

R4 not written yet ...... wrong value of R4 fetched from RegFile, contract with programmer broken! Oops! ADD R4,R3,R2

OR R5,R4,R2

New sample program

Page 20: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Decode & Reg Fetch

Performance Equation and Hazards

rd1

RegFile

rd2

WEwd

rs1

rs2

ws

D

PC

Q

+

0x4

Addr Data

Instr

Mem

Ext

IR IR IR

B

A

M

Instr Fetch Stage #3

SecondsProgram

InstructionsProgram= Seconds

Cycle InstructionCycles

“Software slows the machine

down”Seymour Cray

Some ways to cope with hazards

makes CPI > 1“stalling pipeline”

Added logic to detect and resolve hazards increases

clock period

Page 21: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

A (simplified) 5-stage pipelined CPU

rd1

RegFile

rd2

WEwd

rs1

rs2

ws

D

PC

Q

+

0x4

Addr Data

Instr

Mem

Ext

IR IR

B

A

M

Instr Fetch

“IF” Stage “ID/RF” Stage

Decode & Reg Fetch

1 2

“EX” StageExecution

32A

L

U

32

32

op

IR

Y

M

3

IR

Dout

Data Memory

WE

Din

Addr

MemToReg

R

“MEM” StageMemory

WE, MemToReg

4WB5

WriteBack

Mux,Logic

Welcome to Lab 3!

Page 22: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Administrivia: Upcoming deadlines ...

Thursday 9/29: At 11:59 PM via email:Lab 2 peer evaluations, and Lab 3 preliminary design document due.

Monday 9/26: Lab 2 final report due via the submit program, 11:59 PM.

Friday 9/23: Lab 2 “Xilinx Checkoff”, in section. For non-150 students, “150 Lab Lecture 4”, 2-3 PM, 125 Cory.

Lab 3 now available on the web site

Page 23: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Starting 9/29: Homework, Midterm, LabHW graded on effort

Midterm two weeks from today, in evening, no class that day.

Thursday review session.Will cover format, material, and ground rules for test.

Lab 3 design doc, checkoffs, later in week ...

Page 24: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Lab 3 Introduction

“Pipelining Your Processor”

Page 25: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Week 1 for Lab 3: Pipelining Processors

Page 26: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Week 1 for Lab 3: Pipelining Processors

Page 27: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Week 2: Hazard-Free Code on the Board

ADD R4,R3,R2OR R7,R6,R5SUB R10,R9,R8

A sample program

R’s chosen so that instructions are

independent - like cars on the line.

Page 28: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Week 3: Run TA’s “Hard Tests” on Xilinx

An example of a “hazard” -- we must

(1) detect and (2) resolve all hazards

to make a CPU that matches ISA

ADD R4,R3,R2OR R5,R4,R2

New sample program

Page 29: Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf · 2005-09-20 · CS 152 L7: Pipelining I UC Regents Fall 2005 © UCB Office Hours Change

UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I

Next 2 Lectures: Pipelining details ...

Control, Hazards,Forwarding