Upload
vokhue
View
215
Download
1
Embed Size (px)
Citation preview
EECS 150 - Components and Design Techniques for Digital Systems
Lec 18 – Arithmetic II (Multiplication)
David CullerElectrical Engineering and Computer Sciences
University of California, Berkeley
http://www.eecs.berkeley.edu/~cullerhttp://www-inst.eecs.berkeley.edu/~cs150
Review• Circuit design for unsigned addition
– Full adder per bit slice– Delay limited by Carry Propagation
» Ripple is algorithmically slow, but wires are short
• Carry select– Simple, resource-intensive– Excellent layout
• Carry look-ahead– Excellent asymptotic behavior– Great at the board level, but wire length effects are significant on chip
• Digital number systems– How to represent negative numbers– Simple operations– Clean algorithmic properties
• 2s complement is most widely used– Circuit for unsigned arithmetic– Subtract by complement and carry in– Overflow when cin xor cout of sign-bit is 1
Computer Number Systems
• Positional notation– Dn-1 Dn-2 …D0 represents Dn-1Bn-1 + Dn-2Bn-2 + …+ D0 B0
where Di ∈∈∈∈ { 0, …, B-1 }
• 2s Complement– Dn-1 Dn-2 …D0 represents: - Dn-12n-1 + Dn-22n-2 + …+ D0 20
– MSB has negative weight
0000
0001
0010
0011
1000
0101
0110
0100
1001
1010
1011
1100
1101
0111
11101111
+0
+1
+2
+3
+4
+5
+6
+7-8
-7
-6
-5
-4
-3
-2
-1
-8 + 5
2s Complement Overflow
��������������� ����������������������� ����
��������������� ���������������������� ����
5 + 3 = -8! -7 - 2 = +7!
0000
0001
0010
0011
1000
0101
0110
0100
1001
1010
1011
1100
1101
0111
11101111
+0
+1
+2
+3
+4
+5
+6
+7-8
-7
-6
-5
-4
-3
-2
-1
0000
0001
0010
0011
1000
0101
0110
0100
1001
1010
1011
1100
1101
0111
11101111
+0
+1
+2
+3
+4
+5
+6
+7-8
-7
-6
-5
-4
-3
-2
-1
How can you tell an overflow occurred?
2s comp. Overflow Detection
�
�
��
��������������
�������
��� ����
��
��
�
���������� ����
�������
���������
�
�
�
��������������
�������
�������
��
��
��
��������������
��� ����
����� ����
� ������ � ������
� ��������� � ���������
������������� ���������������������������������
2s Complement Adder/Subtractor
��� � � ���!�"�� #� ���!�� �!��
A B
CO
S
+ CI
A B
CO
S
+ CI
A B
CO
S
+ CI
A B
CO
S
+ CI
0 1
Add/Subtract
A 3 B 3 B 3
0 1
A 2 B 2 B 2
0 1
A 1 B 1 B 1
0 1
A 0 B 0 B 0
Sel Sel Sel Sel
S 3 S 2 S 1 S 0
Overflow
Adders on the Xilinx Virtex• Dedicated carry logic
provides fast arithmetic carry capability for high-speed arithmetic functions. The Virtex-E CLB supports two separate carry chains, one per Slice. The height of the carry chains is two bits per CLB.
• The arithmetic logic includes an XOR gate and AND gate that allows a 2-bit full adder to be implemented within a slice.
• Cin to Cout delay = 0.1ns, versus 0.4ns for F to X delay. How do we map a 2-bit adder to one slice?
Carry Look-ahead Adders
• In general, for n-bit addition best we can achieve is
delay αααα log(n)
• How do we arrange this? (think trees)• First, reformulate basic adder stage:
carry “kill”ki = ai’ bi’
carry “propagate”pi = ai ⊕ bi
carry “generate”gi = ai bi
ci+1 = gi + picisi = pi ⊕ ci
0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1
a b ci ci+1 s
Carry Look-ahead Adders – in blocks
• “Group” propagate and generate signals:
• P true if the group as a whole propagates a carry to cout
• G true if the group as a whole generates a carry
• Group P and G can be generated hierarchically.
pi
gi
pi+1
gi+1
pi+k
gi+k
P = pi pi+1 … pi+kG = gi+k + pi+kgi+k-1 + … + (pi+1pi+2 … pi+k)gi
cin
cout
Cout = G + PCin
Carry Look-ahead Adders
a0b0a1b1a2b2
a
a3b3a4b4a5b5
b
c3 = Ga + Pac0
Pa
Ga
Pb
Gb
a6b6a7b7a8b8
c
c6 = Gb + Pbc3
Pc
Gc
P = PaPbPc
G = Gc + PcGb + PbPcGa
c9 = G + Pc0
c0
9-bit Example of hierarchically generated P and G signals:
Parallel Prefix (generalizing CLA)
• Compute all the prefixes Fi = Fi-1 op Fi-2 op … op F0• Assume associative and commutative
B ABA
BAx
BAxAx
01234567
76 54 32 10
10
10
32
32
54
54
76
76
6 4 2 0
74 30
3074
54 10
70
30
54 10
74 64 30 20
30
30
70 60 50 40
c0
a0b0s0
a1b1s1
c1
a2b2s2
a3b3s3
c3
c2
c0
c0
a4b4s4
a5b5s5
c5
a6b6s6
a7b7s7
c7
c6
c0
c4
c0
c8
p,g
P,G
P,G
cin
cout
P,GPa,Ga
Pb,Gb
P = PaPbG = Gb + GaPb
Cout = G + cinP
aibisi
p,g
ci
ci+1
p = a ⊕ bg = ab
s = p ⊕ ci
ci+1 = g + cip
8-bit Carry Look-ahead Adder
Time / Space (resource) Trade-offs
• Carry select and CLA utilize more silicon to reduce time.
• Can we use more time to reduce silicon?
• How few FAs does it take to do addition?
Bit-serial Adder
• Addition of 2 n-bit numbers:– takes n clock cycles,– uses 1 FF, 1 FA cell, plus registers– the bit streams may come from or go to other circuits, therefore
the registers may be optional.
• Requires controller– What does the FSM look like? Implemented?
• Final carry out?
• A, B, and R held in shift-registers. Shift right once per clock cycle.
• Reset is asserted by controller.
n-bit shift register
n-bit shift registers
sc
reset
R
FAFF
B
A
lsb
Announcements• Reading: 5.8• Regrades in with homework on Friday• Digital Design in the news – from UCB
– Organic e-textiles (Prof. Vivek Subramanian)
Basic concept of multiplication
� ����$���
� ������
�������"��#
�������"��#
����
����
����
����
%
�������� "�&�#
'���������� $��
• product of 2 n-bit numbers is an 2n-bit number– sum of n n-bit partial products
• unsigned
Combinational Multiplier:accumulation of partial products
��
��
�����
��
��
�����
�����
��
��
�����
�����
�����
��
��
�����
�����
�����
�����
�����
�����
�����
�����
����������
( ) ( � ( & ( � ( � ( � ( �( �
Array Multiplier
b3 0 b2 0 b1 0 b0 0
P7 P6 P5 P4
a0
0
a1
0
a2
0
a3
0
P0
P1
P2
P3
FA
bj sum in
sum out
carryout
ai
carryin
Each row: n-bit adder with AND gates
What is the critical path?
Generates all n partial products simultaneously.
“Shift and Add” Multiplier• Sums each partial
product, one at a time.
• In binary, each partial product is shifted versions of A or 0.
Control Algorithm:1. P ←←←← 0, A ←←←← multiplicand,
B ←←←← multiplier2. If LSB of B==1 then add A to P
else add 03. Shift [P][B] right 14. Repeat steps 2 and 3 n-1 times.5. [P][B] has product.
Bn-bit shift registers
P
An-bit register
+01
0 n-bit adder
• Cost αααα n, ΤΤΤΤ = n clock cycles.• What is the critical path for
determining the min clock period?
Carry-save Addition• Speeding up multiplication is
a matter of speeding up the summing of the partial products.
• “Carry-save” addition can help.
• Carry-save addition passes (saves) the carries to the output, rather than propagating them.
• Example: sum three numbers,310 = 0011, 210 = 0010, 310 = 0011
310 0011+ 210 0010
c 0100 = 410
s 0001 = 110
310 0011c 0010 = 210
s 0110 = 610
1000 = 810
carry-save add
carry-save add
carry-propagate add
• In general, carry-save addition takes in 3 numbers and produces 2.• Whereas, carry-propagate takes 2 and produces 1.• With this technique, we can avoid carry propagation until final addition
Carry-save Circuits
• When adding sets of numbers, carry-save can be used on all but the final sum.
• Standard adder (carry propagate) is used for final sum.
FA FA FA FA FA FA FA FA 0
CSA
s cs cs cs cs cs cs cs cc
CSA
CPA
CSA
CSA
x0
x1
x2
Array Mult. using Carry-save Additionb3 0 b2 0 b1 0 b0 0
P7 P6 P5 P4
a0
0
a1
a2
a3
P0
P1
P2
P3
1
0
0
0
0
0
0
0
000
FA
bj sum in
sum out
carryout
ai
carryin
Fast carry-propagate adder
Another Representation (from book)
A3 B0
SC
A2 B0
SC
A1 B0
SC
A0 B0
SC
A3 B1
SC
A2 B1
SC
A1 B1
SC
A0 B1
SC
A3 B2
SC
A2 B2
SC
A1 B2
SC
A0 B2
SC
A3 B3
SC
A2 B3
S
A1 B3
S
A0 B3
S
B0
B1
B2
B3
P7 P6 P5 P4 P3 P2 P1 P0
A3 A2 A1 A0
� ��������$*+�� ���������!����
&�,�&�����-����� ��������$*�
F A
X
Y
A B
S CI CO
Cin Sum In
Sum Out Cout
Add
CPA
Carry-save Addition
CSA is associative and communitive. For example:(((X0 + X1) + X2 ) + X3 ) = ((X0 + X1) +( X2 + X3 ))
• A balanced tree can be used to reduce the logic delay.
• This structure is the basis of the Wallace Tree Multiplier.
• Partial products are summed with the CSA tree. Fast CPA (ex: CLA) is used for final sum.
• Multiplier delay α log3/2N + log2NCSA
CPA
CSA
CSA
x0x1x2
CSA
CSACSA
x3x4x5x6x7
log3/2N
log2N
Signed Multiplier
Signed Multiplication:Remember for 2’s complement numbers MSB has negative weight:
ex: -6 = 110102 = 0•20 + 1•21 + 0•22 + 1•23 - 1•24
= 0 + 2 + 0 + 8 - 16 = -6
• Therefore for multiplication:a) subtract final partial productb) sign-extend partial products
• Modifications to shift & add circuit:a) adder/subtractorb) sign-extender on P shifter register
11
2
0
22 −−
−
=−=�
nn
iN
ii xxX
Signed multiplication
� ����$���
� ������
1101 (-3)
1011 (-5)
1101
11010
000000
1101000
*
"��#
• product of 2 n-bit numbers is an 2n-bit number– sum of n n-bit partial products
• unsigned
1111 Note: 2s complement
Sign extension111
00
1
+(-3)+
++
-
+(-6)
-(-24)
00001111
Signed Array Multiplierb3 0 b2 0 b1 0 b0 0
P7 P6 P5 P4
a0
0
a1
0
a2
0
a3
0
P0
P1
P2
P3
Implicit
Sign
extension
- - - -
“Shift and Add” Signed Multiplier
Bn-bit shift registers
P
An-bit register
+01
0 n-bit adder
• Signed extend partial product at each stage
• Final step is a subtract
Summary
• 2 complement number systems– Algebraic and corresponding bit manipulations– Overflow detection– Signficance of “sign bit” -2n-1
• Carry look ahead is form a parallel prefix• Time / Space tradeoffs
– Bit serial adder
• Binary Multiplication algorithm– Array multiplier– Serial multiply (with bit parallel adder)
• Signed multiplication– Sign extend multipicand– Sign bit of multiplier treated as subtract