View
219
Download
1
Tags:
Embed Size (px)
Citation preview
1
From From Combinational to Combinational to
Sequential Circuits Sequential Circuits to Simple to Simple
ProcessorsProcessors
What we covered on Friday meeting?
1. Design of SOP circuits from KMaps. Prime implicants and Covering2. Design of POS circuits from KMaps. Prime implicates and Covering3. Design of ESOP circuits from KMaps. Algebraic rules for AND/EXOR
logic.4. Design using NAND and NOR gates. De Morgan Rules.5. Factorization.6. Multiplexers.7. Iterative circuits and their types.8. Using State Machines to design one-directional iterative circuits9. Predicates10. Oracles11. SAT oracles12. Graph Coloring oracles and distributed processors13. SEND+MORE=MONEY problem and its oracle.14. The idea of Constraint Satisfaction and Distributed Software/hardware for it.
Ask questions to Mr Parasa and Mr Mathias Sunardi who actively participated.
3
Reminder Embedded SystemsReminder Embedded Systems
Outline
• Introduction• Combinational logic• Sequential logic• FSM design• Custom single-purpose processor design• RT-level custom single-purpose processor design
5
Increasing abstraction level in design specification
• Higher abstraction level focus of hardware/software design evolution– Description smaller/easier to capture
• E.g., Line of sequential program code can translate to 1000 gates
– Many more possible implementations available• (a) Like flashlight, the higher above the ground, the more ground illuminated
– Sequential program designs may differ in performance/transistor count by orders of magnitude
– Logic-level designs may differ by only power of 2
• (b) Design process proceeds to lower abstraction level, narrowing in on single implementation
(a) (b)
idea
implementation
back-of-the-envelope
sequential programregister-transfers
logic
mod
elin
g co
st in
crea
ses
oppo
rtun
itie
s de
crea
se
idea
implementation
6
What is Synthesis
• Automatically converting system’s behavioral description to a structural implementation– Complex whole formed by parts
– Structural implementation must optimize design metrics
• More expensive, complex than compilers– Cost = $100s to $10,000s
– User controls 100s of synthesis options
– Optimization criticalOptimization critical• Otherwise could use software
– Optimizations different for each user
– Run time = hours, days
7
Gajski’s Y-chart
• Each axis represents type of description– BehavioralBehavioral
• Defines outputs as function of inputs• Algorithms but no implementation
– StructuralStructural• Implements behavior by connecting
components with known behavior
– PhysicalPhysical• Gives size/locations of components and
wires on chip/board
• Synthesis converts behavior at given level to structure at same level or lower
– E.g.,• FSM → gates, flip-flops (same level)• FSM → transistors (lower level)• FSM X registers, FUs (higher level)• FSM X processors, memories (higher
level)
Behavior
Physical
Structural
Processors, memories
Registers, FUs, MUXs
Gates, flip-flops
Transistors
Sequential programs
Register transfers
Logic equations/FSM
Transfer functions
Cell Layout
Modules
Chips
Boards
FU = functional unitFU = functional unit
FSM = finite state machineFSM = finite state machine
Introduction
• Processor– Digital circuit that performs a
computation tasks– Controller and datapath– General-purpose: variety of computation
tasks– Single-purpose: one particular
computation task– Custom single-purpose: non-standard task
• A custom single-purpose processor may be
– Fast, small, low power– But, high NRE, longer time-to-market,
less flexible
Microcontroller
CCD preprocessor
Pixel coprocessorA2D
D2A
JPEG codec
DMA controller
Memory controller ISA bus interface UART LCD ctrl
Display ctrl
Multiplier/Accum
Digital camera chip
lens
CCD
CMOS transistor on silicon
• Transistor– The basic electrical component in digital systems
– Acts as an on/off switch
– Voltage at “gate” controls whether current flows from source to drain
– Don’t confuse this “gate” with a logic gate
source drainoxidegate
IC package IC channel
Silicon substrate
gate
source
drain
Conductsif gate=1
1
CMOS transistor implementations
• Complementary Metal Oxide Semiconductor
• We refer to logic levels– Typically 0 is 0V, 1 is 5V
• Two basic CMOS types– nMOS conducts if gate=1
– pMOS conducts if gate=0
– Hence “complementary”
• Basic gates– Inverter, NAND, NOR
x F = x'
1
inverter
0
F = (xy)'
x
1
x
y
y
NAND gate
0
1
F = (x+y)'
x y
x
y
NOR gate
0
gate
source
drain
nMOS
Conductsif gate=1
gate
source
drain
pMOS
Conductsif gate=0
Basic logic gates
F = x yAND
F = (x y)’NAND
F = x yXOR
F = xDriver
F = x’Inverter
x F
F = x + yOR
F = (x+y)’NOR
x F
x
yF
Fx
y
x
yF
x
yF
x
yF
F = x yXNOR
Fyxx
0y0
F0
0 1 01 0 01 1 1
x0
y0
F0
0 1 11 0 11 1 1
x0
y0
F0
0 1 11 0 11 1 0
x0
y0
F1
0 1 01 0 01 1 1
x0
y0
F1
0 1 11 0 11 1 0
x0
y0
F1
0 1 01 0 01 1 0
x F0 01 1
x F0 11 0
Combinational logic designCombinational logic design A) Problem description
y is 1 if a is to 1, or b and c are 1. z is 1 if b or c is to 1, but not both, or if all are 1.
D) Minimized output equations
00
0
1
01 11 10
0
1
0 1 0
1 1 1
abcy
y = a + bc
00
0
1
01 11 10
0
0
1 0 1
1 1 1
z
z = ab + b’c + bc’
abc
C) Output equations
y = a'bc + ab'c' + ab'c + abc' + abc
z = a'b'c + a'bc' + ab'c + abc' + abc
B) Truth table
1 0 1 1 11 1 0 1 11 1 1 1 1
0 0 1 0 10 1 0 0 10 1 1 1 01 0 0 1 0
00 0 0 0
Inputsa b c
Outputsy z
E) Logic Gates
abc
y
z
Combinational components
With enable input e all O’s are 0 if e=0
With carry-in input Ci
sum = A + B + Ci
May have status outputs carry, zero, etc.
O =I0 if S=0..00I1 if S=0..01…I(m-1) if S=1..11
O0 =1 if I=0..00O1 =1 if I=0..01…O(n-1) =1 if I=1..11
sum = A+B (first n bits)carry = (n+1)’th bit of A+B
less = 1 if A<B equal =1 if A=Bgreater=1 if A>B
O = A op Bop determinedby S.
n-bit, m x 1 Multiplexor
O
…S0
S(log m)
n
n
I(m-1) I1 I0
…
log n x n Decoder
…
O1 O0O(n-1)
I0I(log n -1)
…
n-bitAdder
n
A B
n
sumcarry
n-bitComparator
n n
A B
less equal greater
n bit, m function
ALU
n n
A B
…S0
S(log m)
n
O
14
Logic synthesisLogic synthesis• Logic-level behavior to structural implementation
– Logic equations and/or FSM to connected gates
• Combinational logic synthesis– Two-level minimization (Sum of products/product of sums)
• Best possible performance– Longest path = 2 gates (AND gate + OR gate/OR gate + AND gate)
• Minimize size– Minimum cover– Minimum cover that is prime– Heuristics
– Multilevel minimization• Trade performance for size• Pareto-optimal solution
– Heuristics
• FSM synthesis– State minimization– State encoding
15
Two-level minimization
• Represent logic function as sum of products (or product of sums)– AND gate for each product
– OR gate for each sum
• Gives best possible performance– At most 2 gate delay
• Goal: minimize size– Minimum cover
• Minimum # of AND gates (sum of products)
– Minimum cover that is prime• Minimum # of inputs to each AND gate (sum
of products)
F = abc'd' + a'b'cd + a'bcd + ab'cd
Sum of products
4 4-input AND gates and 1 4-input OR gate
→ 40 transistors
a
b
c
d
F
Direct implementation
16
Minimum cover
• Minimum # of AND gates (sum of products)
• Literal: variable or its complement – a or a’, b or b’, etc.
• Minterm: product of literals– Each literal appears exactly once
• abc’d’, ab’cd, a’bcd, etc.
• Implicant: product of literals– Each literal appears no more than once
• abc’d’, a’cd, etc.
– Covers 1 or more minterms• a’cd covers a’bcd and a’b’cd
• Cover: set of implicants that covers all minterms of function
• Minimum cover: cover with minimum # of implicants
17
Minimum cover: K-map approach
• Karnaugh map (K-map)– 1 represents minterm
– Circle represents implicant
• Minimum cover– Covering all 1’s with min # of
circles
– Example: direct vs. min cover• Less gates
– 4 vs. 5
• Less transistors– 28 vs. 40
11
10 0 0
0 0 1 0
1 0 0 0
0 0 0
abcd
00
01
11
10
00 01 10
1
10 0 0
0 0 1 0
1 0 0 0
0 0 0
abcd
00
01
11
10
00 01 11 10
1
F=abc'd' + a'cd + ab'cd
a
bc
d
F
2 4-input AND gate1 3-input AND gates1 4 input OR gate → 28 transistors
K-map: sum of products K-map: minimum cover
Minimum cover
Minimum cover implementation
18
Minimum cover that is prime
• Minimum # of inputs to AND gates• Prime implicant
– Implicant not covered by any other implicant
– Max-sized circle in K-map
• Minimum cover that is prime– Covering with min # of prime implicants– Min # of max-sized circles– Example: prime cover vs. min cover
• Same # of gates– 4 vs. 4
• Less transistors– 26 vs. 28
10 0 0
0 0 1 0
1 0 0 0
0 0 0
abcd
00
01
11
10
00 01 11 10
1
K-map: minimum cover that is prime
Minimum cover that is prime
F=abc'd' + a'cd + b'cd
1 4-input AND gate 2 3-input AND gates1 4 input OR gate
→ 26 transistors
F
a
bc
d
Implementation
19
Minimum cover: heuristics
• K-maps give optimal solution every time– Functions with > 6 inputs too complicated
– Use computer-based tabular method• Finds all prime implicants
• Finds min cover that is prime
• Also optimal solution every time
• Problem: 2n minterms for n inputs– 32 inputs = 4 billion minterms
– Exponential complexity
• Heuristic– Solution technique where optimal solution not guaranteed
– Hopefully comes close
20
Heuristics: iterative improvement
• Start with initial solution– i.e., original logic equation
• Repeatedly make modifications toward better solution• Common modifications
– ExpandExpand• Replace each nonprime implicant with a prime implicant covering it• Delete all implicants covered by new prime implicant
– ReduceReduce• Opposite of expand
– ReshapeReshape• Expands one implicant while reducing another• Maintains total # of implicants
– IrredundantIrredundant• Selects min # of implicants that cover from existing implicants
• Synthesis tools differ in modifications used and the order they are used
21
Multilevel logic minimization
• Trade performance for size– Increase delay for lower # of gates
– Gray area represents all possible solutions
– Circle with X represents ideal solution• Generally not possible
– 2-level gives best performance• max delay = 2 gates
• Solve for smallest size
– Multilevel gives pareto-optimal pareto-optimal solutionsolution
• Minimum delay for a given size
• Minimum size for a given delay
size
dela
y
multi-lev
el minim
.
2-level minim.
22
Example of logic factorization
• Minimized 2-level logic function:– F = adef + bdef + cdef + gh– Requires 5 gates with 18 total gate inputs
• 4 ANDS and 1 OR
• After algebraic manipulation:– F = (a + b + c)def + gh– Requires only 4 gates with 11 total gate inputs
• 2 ANDS and 2 ORs
– Less inputs per gate– Assume gate inputs = 2 transistors
• Reduced by 14 transistors– 36 (18 * 2) down to 22 (11 * 2)
– Sacrifices performance for size• Inputs a, b, and c now have 3-gate delay
• Iterative improvement heuristic commonly used
F
b
ce
a
d
fgh
2-level minimized
F
bc
e
a
d
fgh
multilevel minimized
23
FSM synthesisFSM synthesis• FSM to gates• State minimization
– Reduce # of states• Identify and merge equivalent states
– Outputs, next states same for all possible inputs– Tabular method gives exact solution
• Table of all possible state pairs• If n states, n2 table entries• Thus, heuristics used with large # of states
• State encoding– Unique bit sequence for each state– If n states, log2(n) bits– n! possible encodings– Thus, heuristics common
Sequential componentsSequential components
Q = 0 if clear=1, I if load=1 and clock=1, Q(previous) otherwise.
Q = 0 if clear=1, Q(prev)+1 if count=1 and clock=1.
clear
n-bitRegister
n
n
load
I
Q
shift
I Q
n-bitShift register
n-bitCounter
n
Q
Q = lsb - Content shifted - I stored in msb
Reversible shifter shifts left and rigth
Reversible counter counts up and down
Reading it operation in most of registers – generalized registers.generalized registers.
Sequential logic designSequential logic designA) Problem Description
You want to construct a clock divider. Slow down your pre-existing clock so that you output a 1 for every four clock cycles
0
1 2
3
x=0
x=1x=0
x=0
a=1 a=1
a=1
a=1
a=0
a=0
a=0
a=0
B) State Diagram
C) Implementation Model
Combinational logic
State register
a x
I0
I0
I1
I1
Q1 Q0
D) State Table (Moore-type)
1 0 1 1 11 1 0 1 11 1 1 0 0
0 0 1 0 10 1 0 0 10 1 1 1 01 0 0 1 0
00 0 0 0
InputsQ1 Q0 a
OutputsI1 I0
1
0
0
0
x
• Given this implementation model– Sequential logic design quickly reduces to
combinational logic design
Sequential logic design (cont.)
00
1
Q1Q0 I1
I1 = Q1’Q0a + Q1a’ + Q1Q0’
0 1
1
1
010
00 11 10 a 01
0
0
0
1 0 1
1
00 01 11 a
1
10 I0 Q1Q0
I0 = Q0a’ + Q0’a0
1
0 0
0
1
1
0
0
00 01 11 10
x = Q1Q0
x
0
1
0
a
Q1Q0
E) Minimized Output Equations F) Combinational Logic
a
Q1 Q0
I0
I1
x
Custom single-purpose processor Custom single-purpose processor basic modelbasic model
controller and datapath
controller datapath
…
…
externalcontrolinputs
externalcontrol outputs
…
externaldata
inputs
…
externaldata
outputs
datapathcontrolinputs
datapathcontroloutputs
… …
a view inside the controller and datapath
controllercontroller datapathdatapath
… …
stateregister
next-stateand
controllogic
registers
functionalunits
Example:Example: greatest common divisor
GCD
(a) black-box (a) black-box viewview
x_i y_i
d_o
go_i
0: int x, y;1: while (1) {2: while (!go_i);3: x = x_i; 4: y = y_i;5: while (x != y) {6: if (x < y) 7: y = y - x; else 8: x = x - y; }9: d_o = x; }
(b) desired functionality(b) desired functionality
y = y -x7: x = x - y8:
6-J:
x!=y
5: !(x!=y)
x<y !(x<y)
6:
5-J:
1:
1
!1
x = x_i3:
y = y_i4:
2:
2-J:
!go_i
!(!go_i)
d_o = x
1-J:
9:
(c) state (c) state diagramdiagram
• First create algorithm
• Convert algorithmConvert algorithm to “complex” state machine– Known as FSMD: finite-
state machine with datapath
– Can use templates to perform such conversion
State diagram templates
Assignment statement
a = bnext statement
a = b
next statement
Loop statement
while (cond) { loop-body-
statements}next statement
loop-body-statements
cond
next statement
!cond
J:
C:
Branch statement
if (c1) c1 stmtselse if c2 c2 stmtselse other stmtsnext statement
c1
c2 stmts
!c1*c2 !c1*!c2
next statement
othersc1 stmts
J:
C:
Creating the datapath
• Create a register for any declared variable
• Create a functional unit for each arithmetic operation
• Connect the ports, registers and functional units– Based on reads and writes
– Use multiplexors for multiple sources
• Create unique identifier – for each datapath component
control input and output
y = y -x7: x = x - y8:
6-J:
x!=y
5: !(x!=y)
x<y !(x<y)
6:
5-J:
1:
1
!1
x = x_i3:
y = y_i4:
2:
2-J:
!go_i
!(!go_i)
d_o = x
1-J:
9:
subtractor subtractor
7: y-x8: x-y5: x!=y 6: x<y
x_i y_i
d_o
0: x 0: y
9: d
n-bit 2x1 n-bit 2x1x_sel
y_sel
x_ld
y_ld
x_neq_y
x_lt_y
d_ld
<
5: x!=y
!=
Datapath
Creating the controller’s FSM
• Same structure as FSMD
• Replace complex actions/conditions with datapath configurations
y = y -x7: x = x - y8:
6-J:
x!=y
5: !(x!=y)
x<y !(x<y)
6:
5-J:
1:
1
!1
x = x_i3:
y = y_i4:
2:
2-J:
!go_i
!(!go_i)
d_o = x
1-J:
9:
y_sel = 1y_ld = 1
7: x_sel = 1x_ld = 1
8:
6-J:
x_neq_y
5:!x_neq_y
x_lt_y !x_lt_y
6:
5-J:
d_ld = 1
1-J:
9:
x_sel = 0x_ld = 13:
y_sel = 0y_ld = 14:
1:1
!1
2:
2-J:
!go_i
!(!go_i)
go_i
0000
0001
0010
0011
0100
0101
0110
0111 1000
1001
1010
1011
1100
Controller
subtractor subtractor
7: y-x8: x-y5: x!=y 6: x<y
x_i y_i
d_o
0: x 0: y
9: d
n-bit 2x1 n-bit 2x1x_sel
y_sel
x_ld
y_ld
x_neq_y
x_lt_y
d_ld
<
5: x!=y
!=
Datapath
SplittingSplitting into a controller and datapath
y_sel = 1y_ld = 1
7: x_sel = 1x_ld = 1
8:
6-J:
x_neq_y=1
5:x_neq_y=0
x_lt_y=1 x_lt_y=0
6:
5-J:
d_ld = 1
1-J:
9:
x_sel = 0x_ld = 13:
y_sel = 0y_ld = 14:
1:1
!1
2:
2-J:
!go_i
!(!go_i)
go_i
0000
0001
0010
0011
0100
0101
0110
0111 1000
1001
1010
1011
1100
ControllerController implementation model
y_sel
x_selCombinational
logic
Q3 Q0
State register
go_i
x_neq_y
x_lt_y
x_ld
y_ld
d_ld
Q2 Q1
I3 I0I2 I1
subtractor subtractor
7: y-x8: x-y5: x!=y 6: x<y
x_i y_i
d_o
0: x 0: y
9: d
n-bit 2x1 n-bit 2x1x_sel
y_sel
x_ld
y_ld
x_neq_y
x_lt_y
d_ld
<
5: x!=y
!=
(b) Datapath
Controller state tableController state table for the GCD example
Inputs Outputs
Q3 Q2 Q1 Q0 x_neq_y
x_lt_y
go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld
0 0 0 0 * * * 0 0 0 1 X X 0 0 0
0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
0 0 1 0 * * * 0 0 0 1 X X 0 0 0
0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
1 0 0 1 * * * 1 0 1 0 X X 0 0 0
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1 1 1 1 * * * 0 0 0 0 X X 0 0 0
Completing the GCD custom single-purpose processor design
• We finished the datapath• We have a state table for
the next state and control logic– All that’s left is
combinational logic design
• This is not an optimized design, but we see the basic steps
… …
a view inside the controller and datapath
controller datapath
… …
stateregister
next-stateand
controllogic
registers
functionalunits
You may be asked in homeworks or exams or projects to optimize the design with some respect such as area, speed , power or testability
• We often start with a state machine– Rather than algorithm
– Cycle timing often too central to functionality
• Example– Bus bridge that converts 4-bit
bus to 8-bit bus
– Start with FSMD
– Known as register-transfer (RT) level
– Exercise: complete the design
RT-level custom single-purpose processor design – Example “Bus Bridge”
Pro
blem
Spe
cifi
cati
on
BridgeA single-purpose processor that
converts two 4-bit inputs, arriving one at a time over data_in along with a
rdy_in pulse, into one 8-bit output on data_out along with a rdy_out pulse.
Sen
de r
data_in(4)
rdy_in rdy_out
data_out(8)
Receiver
clock
FS
MD
WaitFirst4 RecFirst4Startdata_lo=data_in
WaitSecond4
rdy_in=1
rdy_in=0
RecFirst4End
rdy_in=1
RecSecond4Startdata_hi=data_in
RecSecond4End
rdy_in=1rdy_in=0
rdy_in=1
rdy_in=0
Send8Startdata_out=data_hi
& data_lordy_out=1
Send8Endrdy_out=0
Bridge
rdy_in=0Inputs rdy_in: bit; data_in: bit[4];Outputs rdy_out: bit; data_out:bit[8]Variables data_lo, data_hi: bit[4];
RT-level custom single-purpose processor design (cont’)
WaitFirst4 RecFirst4Startdata_lo_ld=1
WaitSecond4
rdy_in=1
rdy_in=0
RecFirst4End
rdy_in=1
RecSecond4Startdata_hi_ld=1
RecSecond4End
rdy_in=1rdy_in=0
rdy_in=1
rdy_in=0
Send8Startdata_out_ld=1
rdy_out=1
Send8Endrdy_out=0
(a) Controller
rdy_in rdy_out
data_lodata_hi
data_in(4)
(b) Datapath
data_out
data_out_ld data_hi_ld data_lo_ld
clk
to all registers
data_out
Bridge
Example “Bus Bridge”
Optimizing single-purpose processors
• Optimization is the task of making design metric values the best possible
• Optimization opportunities– original program
– FSMD
– datapath
– FSM
Optimizing the original program
• Analyze program attributes and look for areas of possible improvement– number of computations
– size of variable
– time and space complexity
– operations used• multiplication and division very expensive
Optimizing the original program (cont’)
0: int x, y;1: while (1) {2: while (!go_i);3: x = x_i; 4: y = y_i;5: while (x != y) {6: if (x < y) 7: y = y - x; else 8: x = x - y; }9: d_o = x; }
0: int x, y, r; 1: while (1) { 2: while (!go_i); // x must be the larger number 3: if (x_i >= y_i) { 4: x=x_i; 5: y=y_i; } 6: else { 7: x=y_i; 8: y=x_i; } 9: while (y != 0) {10: r = x % y;11: x = y; 12: y = r; }13: d_o = x; }
original program optimized program
replace the subtraction operation(s) with modulo
operation in order to speed up program
GCD(42, 8) - 9 iterations to complete the loop
x and y values evaluated as follows : (42, 8), (43, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2).
GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows: (42, 8), (8,2), (2,0)
Optimizing the FSMD
• Areas of possible improvements– merge states
• states with constants on transitions can be eliminated, transition taken is already known
• states with independent operations can be merged
– separate states• states which require complex operations (a*b*c*d) can be broken
into smaller states to reduce hardware size
– scheduling
Optimizing the FSMD (cont.)
int x, y;
2:
go_i !go_i
x = x_iy = y_i
x<y x>y
y = y -x x = x - y
3:
5:
7: 8:
d_o = x9:
y = y -x7: x = x - y8:
6-J:
x!=y
5: !(x!=y)
x<y !(x<y)
6:
5-J:
1:
1
!1
x = x_i
y = y_i4:
2:
2-J:
!go_i
!(!go_i)
d_o = x
1-J:
9:
int x, y;
3:
original FSMD optimized FSMD
eliminate state 1 – transitions have constant values
merge state 2 and state 2J – no loop operation in between them
merge state 3 and state 4 – assignment operations are independent of one another
merge state 5 and state 6 – transitions from state 6 can be done in state 5
eliminate state 5J and 6J – transitions from each state can be done from state 7 and state 8, respectively
eliminate state 1-J – transition from state 1-J can be done directly from state 9
Optimizing the datapath
• Sharing of functional units– one-to-one mapping, as done previously, is not necessary
– if same operation occurs in different states, they can share a single functional unit
• Multi-functional units– ALUs support a variety of operations, it can be shared
among operations occurring in different states
Optimizing the FSM
• State encoding– task of assigning a unique bit pattern to each state in an FSM
– size of state register and combinational logic vary
– can be treated as an ordering problem
• State minimization– task of merging equivalent states into a single state
• state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state
44
Technology mappingTechnology mapping• Library of gates available for implementation
– Simple• only 2-input AND,OR gates
– Complex• various-input AND,OR,NAND,NOR,etc. gates
• Efficiently implemented meta-gates (i.e., AND-OR-INVERT,MUX)
• Final structure consists of specified library’s components only
• If technology mapping integrated with logic synthesis– More efficient circuit
– More complex problem
– Heuristics required
45
Complexity impact on user
• As complexity grows, heuristics used
• Heuristics differ tremendously among synthesis tools– Computationally expensive
• Higher quality results
• Variable optimization effort settings
• Long run times (hours, days)
• Requires huge amounts of memory
• Typically needs to run on servers, workstations
– Fast heuristics• Lower quality results
• Shorter run times (minutes, hours)
• Smaller amount of memory required
• Could run on PC
• Super-linear-time (i.e. n3) heuristics usually used– User can partition large systems to reduce run times/size
– 1003 > 503 + 503 (1,000,000 > 250,000)
46
Integrating logic design and physical design
• Past– Gate delay much greater than wire delay
– Thus, performance evaluated as # of levels of gates only
• Today– Gate delay shrinking as feature size
shrinking
– Wire delay increasing• Performance evaluation needs wire length
– Transistor placement (needed for wire length) domain of physical design
– Thus, simultaneous logic synthesis and physical design required for efficient circuits
Wire
Transistor
Del
ay
Reduced feature size
Embedded Systems Embedded Systems CaseCaseStudyStudy
47
Elevator Controller
48
Elevator System
• CRC cardsCRC cards is a well-known method for analyzing a system and developing an architecture.
• CRCCRC– Classes: logical groupings of data and functionality
– Responsibilities: describe what the class do
– Collaborators: other classes w/ which a given class works
• Elevator Control ClassesElevator Control Classes– Elevator car, Passenger, Floor control, Car control, Car sensors, etc.
• Architectural ClassesArchitectural Classes– Car state, Floor control reader, Car control reader, Car control sender,
Scheduler
49
50
F floorsF floors
N hoistwaysN hoistways
51
52
53
54
55
Classes: logical groupings of data and functionality
Responsibilities: describe what the class do
Collaborators: other classes w/ which a given class works
Elevator Control ClassesElevator Control Classes
Elevator car, Passenger, Floor control, Car control, Car sensors, etc.
Architectural ClassesArchitectural Classes
Car state, Floor control reader, Car control reader, Car control sender, Scheduler
Physical Physical InterfacesInterfaces
56
Architecture
• Computation and I/O occur at:– Floor control panels/displays– Elevator cars– System controller
• Panels Controller
• Car Controller– read buttons and send events to system controller– read sensor inputs and send to system controller
57
System ControllerSystem Controller• Must take inputs from many sources:
• Must control cars to hard real-time deadlines
• User interface, scheduling are soft deadlines
• Testing– Build an elevator simulator using SystemC,
Verilog, VHDL and FPGA• Simulate multiple elevators• Simulate real-time control demands
58
Homework 2Homework 2• The simplest possible custom single-purpose processor
– Design a processor to multiply two numbers. The initial data are in registers/counters A and B. The result should be in register/counter C.
– You have only reversible counters (with reading) to be used in the data path. – The counters perform the following operations:
• Add one• Subtract one• Read new value
– Invent the algorithm for multiplication. Use minimum number of counters– Design the reversible counter by hand using logic gates and D FFs.– Design the control unit– Design the data path– Draw the timing diagram of the whole system.– You can use VHDL or Verilog to help you, but I need your design by hand.
Summary
• Custom single-purpose processors– Straightforward design techniques
– Can be built to execute algorithms
– Typically start with FSMD
– CAD tools can be of great assistance
Questions to Exams (1)
1. What are the main methods of Combinational logic design?2. What is Mealy FSM (Finite State Machine)?3. What is Moore State Machine?4. Think about a robot controller as a Sequential logic Circuit. What are the
blocks and their role?5. Role of abstraction in FSM design. Give examples.6. Explain the concepts from Gajski’s Chart in a Custom single-purpose
processor design7. RT-level custom single-purpose processor design. Explain briefly all design
stages from bottom of design hierarchy (layout) to the top (system design of a GCD processor as an example)
8. List and explain logic gates.9. List and explain combinational blocks.10. List and explain sequential blocks.11. List and explain sensors to be used with embedded systems of FSM type.12. List and explain actuators to be used with such embedded systems.
Questions to Exams (2)
1. What are the main synthesis processes and CAD tools in Combinational logic design?
2. What are the methods to solve the covering problem?3. Explain the concept of search and give examples.4. Explain the concept of heuristic in search and give examples. SOP
minimization can be very useful. Also ESOP.5. Explain design tradeoffs and Pareto Optimization on one practical example.6. Explain in detail on example the basic synthesis method for Mealy FSM from
specification to a circuit from D type flip-flops (FFs) and logic gates.7. Explain and illustrate how D, T and JK flip-flops work.8. What is a difference between
• Register with enable• Register without enable• Reversible register
9. Draw the schematic of the FSMD.10. Explain GCD algorithm of Euclides on examples.11. Without looking to the slides, convert GCD algorithm to a FSMD. 12. How can we optimize GCD?13. Apply these ideas to Least Common Multiplier algorithm and FSMD for two
numbers.
Questions to Exams (3)
1. The role of GO-TO commands in FSMD design. Are they good or bad? Give examples. The role of structured design of FSMD.
2. How the data path is created from FSMD? This is one of main topics for this whole class. You have to know it well.
3. How CU (Control Unit) is created from FSMD? This is one of main topics for this whole class. You have to know it well.
4. Compare state graph, state transition table and flow-chart. Why we need all of them?
5. In this class we are not optimizing combinational logic or FSMs too much. But if you have taken ECE 572 or ECE 573 classes you know many methods to optimize on these levels. Can you give practical examples of these optimizations in GCD or other similar system?
6. Complete the “Bus bridge” FSMD that converts 4-bit bus to 8-bit bus and is given in these slides.
7. Discuss Optimizing the single-purpose processors. Give examples. Explain levels of optimization, such as the original program, the FSMD, the data path, the CU, the register, the combinational logic, finally the technology mapping.
8. Design the complete elevator system for a villa of a crazy millionaire artist from Hollywood. Cost does not count. You have to amaze his guests.
64
Sources
Slides from S. Mohammadi
Vahid, Siamak Mohammadi Givargis and Marwedel
•EECE 353-1
•Real-Time Systems
•T. John Koo
•Embedded Computing Systems Laboratory
•Institute for Software Integrated Systems
•Department of Electrical Engineering and Computer Science
•Vanderbilt University
•5306 Stevenson Center
•January 16, 2006