13

Click here to load reader

Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Embed Size (px)

DESCRIPTION

Maximum Clock Rate Consider delays through each functional element—e.g., Register File: 1.4 ns Mux’s: 0.6 ns Function Unit: 2.0 ns F max = 1/T total where T total =  T i For the above example, F max = 1/4.0 = 250 MHz

Citation preview

Page 1: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Computer Architecture: IntroLecture 7- Enhancing performance via pipelining; ASMs as an alternative SM design tool

J. SchmalzelS. Mandayam

Page 2: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Simple Model Data Path (7-18)

Dbus

n-bit bus

Signal

Dbus

DoutAout

FS

Status

DA

AA

RW

Const

MB

MD

Register

File

Function

Unit

MuxB

MuxD

Din

BA

Page 3: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Maximum Clock RateConsider delays through each functional element—e.g.,

Register File: 1.4 ns

Mux’s: 0.6 ns

Function Unit: 2.0 ns

Fmax = 1/Ttotal

where Ttotal = Ti

For the above example, Fmax = 1/4.0 = 250 MHz

Page 4: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Considerations for Ti

Combinatorial delays Propagation delay (numbers of gate levels) Rise times and fall times

Sequential delays Propagation delay Setup and hold times (w.r.t. clock edge)

Page 5: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

How to Speed Up Architecture?

Brute force: Technology speed up (scaling, power) Parallelism

Architecture alternatives: Pipelining

Page 6: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Pipeline RegistersInsert pipeline register between each significant architectural element

Pipeline Register

Pipeline Register

Pipeline Register

DP Element 3

DP Element 2

DP Element 1

Speedup is due to sequential clocking through each stage of the pipeline:

Fmax= 1/(Tmax + TPR)

If we use the same time delays from before, and assume TPR = 0.6 ns,

Fmax= 1/(2.0 + 0.6) = 385 MHz

Page 7: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

4-Stage Pipeline DiagramCK1 CK2 CK3 CK4 CK5 CK6 CK7 CK8 CK9

1A 1B 1C 1D

2A 2B 2C 2D

3A 3B 3C 3D

4A 4B 4C 4D

5A 5B 5C 5D

-- -- -- --

-- -- --

-- --

Pipeline fills: CK1-CK3; Pipeline is full: CK4-CK5; Pipeline is emptying: CK6-CK8

Opn #

Clock #

Page 8: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Description of Pipeline DiagramEach stage of the pipeline is denoted as A, B, C, D…Each operation presented to the pipeline is numbered

—e.g., 1, 2, 3…Each clock is numbered: CK1, CK2…Flush operation is indicated by “—” which suggests

nothing is presented to start of pipeline. In actuality, some other operation sequence would be started, but what is shown emphasizes the fact that it takes 3 additional clock cycles to finish what was started.

Page 9: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Pipeline Issues

Delays due to pipeline filling: No useful output for an N-stage pipeline until N clocks

Highest performance when pipeline is fullPeriodic need to flush (empty) the pipeline

to accommodate branching, etc.

Page 10: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Algorithmic State Machines (ASMs)

State blocks for Moore (1-hot) machines include a State box and Decision box(es).

State boxes correspond to state bubbles on a SD.Output list is provided in state boxOutputs of input decision box exit the state block

and enter other state boxes. Decision boxes correspond to the input test shown on SD arcs into and out of SD bubbles.

Page 11: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Direct Implementation of 1-hot ASMs

State boxes map to D-F/F’sState block entries map to OR gatesDecision boxes map to AND gates

Page 12: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Control Path

Hard-wired controlSequencerMicroprogramming (also may see

“Microsequencer”)

Page 13: Computer Architecture: Intro Lecture 7- Enhancing performance via pipelining; ASMs as an alternative…

Questions, Comments, Discussion