Click here to load reader
Upload
domenic-black
View
212
Download
0
Embed Size (px)
DESCRIPTION
Maximum Clock Rate Consider delays through each functional element—e.g., Register File: 1.4 ns Mux’s: 0.6 ns Function Unit: 2.0 ns F max = 1/T total where T total = T i For the above example, F max = 1/4.0 = 250 MHz
Citation preview
Computer Architecture: IntroLecture 7- Enhancing performance via pipelining; ASMs as an alternative SM design tool
J. SchmalzelS. Mandayam
Simple Model Data Path (7-18)
Dbus
n-bit bus
Signal
Dbus
DoutAout
FS
Status
DA
AA
RW
Const
MB
MD
Register
File
Function
Unit
MuxB
MuxD
Din
BA
Maximum Clock RateConsider delays through each functional element—e.g.,
Register File: 1.4 ns
Mux’s: 0.6 ns
Function Unit: 2.0 ns
Fmax = 1/Ttotal
where Ttotal = Ti
For the above example, Fmax = 1/4.0 = 250 MHz
Considerations for Ti
Combinatorial delays Propagation delay (numbers of gate levels) Rise times and fall times
Sequential delays Propagation delay Setup and hold times (w.r.t. clock edge)
How to Speed Up Architecture?
Brute force: Technology speed up (scaling, power) Parallelism
Architecture alternatives: Pipelining
Pipeline RegistersInsert pipeline register between each significant architectural element
Pipeline Register
Pipeline Register
Pipeline Register
DP Element 3
DP Element 2
DP Element 1
Speedup is due to sequential clocking through each stage of the pipeline:
Fmax= 1/(Tmax + TPR)
If we use the same time delays from before, and assume TPR = 0.6 ns,
Fmax= 1/(2.0 + 0.6) = 385 MHz
4-Stage Pipeline DiagramCK1 CK2 CK3 CK4 CK5 CK6 CK7 CK8 CK9
1A 1B 1C 1D
2A 2B 2C 2D
3A 3B 3C 3D
4A 4B 4C 4D
5A 5B 5C 5D
-- -- -- --
-- -- --
-- --
Pipeline fills: CK1-CK3; Pipeline is full: CK4-CK5; Pipeline is emptying: CK6-CK8
Opn #
Clock #
Description of Pipeline DiagramEach stage of the pipeline is denoted as A, B, C, D…Each operation presented to the pipeline is numbered
—e.g., 1, 2, 3…Each clock is numbered: CK1, CK2…Flush operation is indicated by “—” which suggests
nothing is presented to start of pipeline. In actuality, some other operation sequence would be started, but what is shown emphasizes the fact that it takes 3 additional clock cycles to finish what was started.
Pipeline Issues
Delays due to pipeline filling: No useful output for an N-stage pipeline until N clocks
Highest performance when pipeline is fullPeriodic need to flush (empty) the pipeline
to accommodate branching, etc.
Algorithmic State Machines (ASMs)
State blocks for Moore (1-hot) machines include a State box and Decision box(es).
State boxes correspond to state bubbles on a SD.Output list is provided in state boxOutputs of input decision box exit the state block
and enter other state boxes. Decision boxes correspond to the input test shown on SD arcs into and out of SD bubbles.
Direct Implementation of 1-hot ASMs
State boxes map to D-F/F’sState block entries map to OR gatesDecision boxes map to AND gates
Control Path
Hard-wired controlSequencerMicroprogramming (also may see
“Microsequencer”)
Questions, Comments, Discussion