6
ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY Digital Library: Arithmetic Cores RECRLAB@OU 1 Daniel Llamocca Divider Implementation ALGORITHM The division of two unsigned integer numbers (where is the dividend and the divisor), results in a quotient and a remainder . These quantities are related by =×+. For the implementation, we follow the hand-division method. We grab bits of A one by one and comparing it with the divisor. If the result is greater or equal than B, then we subtract B from it. On each iteration, we get one bit of Q. Fig. 1 shows the algorithm as well as an example: A = 10001100; B = 1001 For hardware implementation, we consider restoring dividers (i.e., those that keep the actual residue value at every step). SUBTRACTION OF UNSIGNED NUMBERS REPRESENTED WITH BITS: = This point deserves special attention as the divider hardware relies on a result obtained here. We usually determine the sign of the subtraction by sign-extending and so that they are in 2’s complement representation with +1 bits. Then, we do: = + () + 1, where = −1 −2 0 , and determines the sign of the subtraction result. However, when and are unsigned, we can compute () without sign-extending . We then analyze = : - If =1→≥ (and is equal to −1 −2 0 , i.e. it is an unsigned number with bits) - If =0→< (here is NOT equal to −1 −2 0 ) NOTE ABOUT THE 2’S COMPLEMENT OF ZERO Let be a number in 2’s complement with bits: = −1 −2 0 , where = − −1 2 −1 +∑ 2 −2 =0 is the signed decimal value of . The 2’s complement of is given by: = () + 1. = −1 −2 0 If and are thought as -bit unsigned numbers, i.e.: =∑ 2 −1 =0 ,=∑ 2 −1 =0 then: =2 . What if =0? Here =2 requires +1 bits. Why is not zero? This is actually consistent with 2’s complement arithmetic, as in the operation : − = + () + 1, we let hold the value of 1, so that if =0, then () = 11 … 11 and = 1. This way, () + 1 is properly represented. Fig. 2 shows this operation. Note that with = 1, all carries (from 0 to ) are one. The result of the operation is then Q. There is no overflow as = −1 =0. Thus, the case =0 works very well for 2’s complement operations, if we include let carry the value of 1. COMPUTING WITH bits = −1 −2 0 ,= −1 −2 0 . With , unsigned, we have 0 ≤ , ≤ 2 −1 To do , we sign-extend and to +1 bits turning them into two numbers in 2’s complement representation. The sign-extension actually amounts to zero-extending. Then: = 0 −1 −2 0 , = 0 −1 −2 0 . = =0. In 2’s complement, we have that: 0 ≤ , ≤ 2 −1. It follows that: −(2 − 1) ≤ − ≤ 2 −1. Thus can be represented in 2’s complement with +1 bits (as expected). Let = () + 1, = −1 −2 0 . In unsigned representation, =2 +1 . Fig. 3 shows the operation by using: +, where = () + 1. Recall that we let 1 be held by . Note that if = 0→=2 +1 (here is represented by the second operator as well as = 1) Figure 1. Division Algorithm 1 q n-1 q n-2 q n-3 ...q 0 + 1 1 1 ...1 q n-1 q n-2 q n-3 ...q 0 c in =c 0 c n =1 c n-1 =1 Q: P: Figure 2. Q-A when A=0 00001111 10001100 1001 10001 1001 10000 1001 1110 1001 101 1001 A B Q R ALGORITHM R = 0 for i = n-1 downto 0 left shift R (input = a i ) if R B q i = 1, R R-B else q i = 0 end end

ELECTRICAL AND COMPUTER ENGINEERING …llamocca/dig_library/arith/Divider...ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY Digital Library: Arithmetic Cores RECRLAB@OU

Embed Size (px)

Citation preview

ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY Digital Library: Arithmetic Cores RECRLAB@OU

1 Daniel Llamocca

Divider Implementation

ALGORITHM The division of two unsigned integer numbers 𝐴

𝐵⁄ (where 𝐴 is the dividend and 𝐵 the divisor), results in a quotient 𝑄 and

a remainder 𝑅. These quantities are related by 𝐴 = 𝐵 × 𝑄 + 𝑅.

For the implementation, we follow the hand-division method. We grab bits of A one by one and comparing it with the divisor. If the result is greater or equal than B, then we subtract B from it. On each iteration, we get one bit of Q. Fig. 1 shows the algorithm as well as an example: A = 10001100; B = 1001

For hardware implementation, we consider restoring dividers (i.e., those that keep the actual residue value at every step).

SUBTRACTION OF UNSIGNED NUMBERS REPRESENTED WITH 𝑛 BITS: 𝑇 = 𝑅 − 𝐵 This point deserves special attention as the divider hardware relies on a result obtained here. We usually determine the sign of the subtraction by sign-extending 𝑅 and 𝐵 so that they are in 2’s complement representation

with 𝑛 + 1 bits. Then, we do: 𝑇 = 𝑅 + 𝑛𝑜𝑡(𝐵) + 1, where 𝑇 = 𝑡𝑛𝑡𝑛−1𝑡𝑛−2 … 𝑡0, and 𝑡𝑛 determines the sign of the subtraction

result. However, when 𝑅 and 𝐵 are unsigned, we can compute 𝑛𝑜𝑡(𝐵) without sign-extending 𝐵. We then analyze 𝑐𝑛 = 𝑐𝑜𝑢𝑡: - If 𝑐𝑛 = 1 → 𝑅 ≥ 𝐵 (and 𝑅 − 𝐵 is equal to 𝑡𝑛−1𝑡𝑛−2 … 𝑡0, i.e. it is an unsigned number with 𝑛 bits) - If 𝑐𝑛 = 0 → 𝑅 < 𝐵 (here 𝑅 − 𝐵 is NOT equal to 𝑡𝑛−1𝑡𝑛−2 … 𝑡0)

NOTE ABOUT THE 2’S COMPLEMENT OF ZERO Let 𝐴 be a number in 2’s complement with 𝑛 bits: 𝐴 = 𝑎𝑛−1𝑎𝑛−2 … 𝑎0, where 𝐴 = −𝑎𝑛−12𝑛−1 + ∑ 𝑎𝑖2𝑖𝑛−2

𝑖=0 is the signed decimal

value of 𝐴.

The 2’s complement of 𝐴 is given by: 𝑃 = 𝑛𝑜𝑡(𝐴) + 1. 𝑃 = 𝑝𝑛−1𝑝𝑛−2 … 𝑝0

If 𝑃 and 𝐴 are thought as 𝑛-bit unsigned numbers, i.e.: 𝐴 = ∑ 𝑎𝑖2𝑖𝑛−1𝑖=0 , 𝑃 = ∑ 𝑝𝑖2𝑖𝑛−1

𝑖=0 then: 𝑃 = 2𝑛 − 𝐴.

What if 𝐴 = 0? Here 𝑃 = 2𝑛 requires 𝑛 + 1 bits. Why 𝑃 is not zero? This is actually

consistent with 2’s complement arithmetic, as in the operation 𝑄 − 𝐴: 𝑄 − 𝐴 = 𝑄 + 𝑛𝑜𝑡(𝑃) + 1, we let 𝑐𝑖𝑛 hold the value of 1, so that if 𝐴 = 0, then

𝑛𝑜𝑡(𝐴) = 11 … 11 and 𝑐𝑖𝑛 = 1. This way, 𝑛𝑜𝑡(𝐴) + 1 is properly represented. Fig.

2 shows this operation. Note that with 𝑐𝑖𝑛 = 1, all carries (from 𝑐0 to 𝑐𝑁) are one. The result of the operation is then Q. There is no overflow as 𝑜𝑣𝑒𝑟𝑓𝑙𝑜𝑤 =𝑐𝑛𝑐𝑛−1 = 0. Thus, the case 𝐴 = 0 works very well for 2’s complement

operations, if we include let 𝑐𝑖𝑛 carry the value of 1.

COMPUTING 𝑹 − 𝑩 WITH 𝒏 bits

𝑅 = 𝑟𝑛−1𝑟𝑛−2 … 𝑟0, 𝐵 = 𝑏𝑛−1𝑏𝑛−2 … 𝑏0. With 𝑅, 𝐵 unsigned, we have 0 ≤ 𝑅, 𝐵 ≤ 2𝑛 − 1 To do 𝑅 − 𝐵, we sign-extend 𝑅 and 𝐵 to 𝑛 + 1 bits turning them into two numbers in 2’s complement representation. The

sign-extension actually amounts to zero-extending. Then: 𝑅 = 0𝑟𝑛−1𝑟𝑛−2 … 𝑟0, 𝐵 = 0𝑏𝑛−1𝑏𝑛−2 … 𝑏0. 𝑟𝑛 = 𝑏𝑛 = 0. In 2’s

complement, we have that: 0 ≤ 𝑅, 𝐵 ≤ 2𝑛 − 1. It follows that: −(2𝑛 − 1) ≤ 𝑅 − 𝐵 ≤ 2𝑛 − 1. Thus 𝑅 − 𝐵 can be represented

in 2’s complement with 𝑛 + 1 bits (as expected). Let 𝐾 = 𝑛𝑜𝑡(𝐵) + 1, 𝐾 = 𝑘𝑛𝑘𝑛−1𝑘𝑛−2 … 𝑘0. In unsigned representation, 𝐾 = 2𝑛+1 − 𝐵.

Fig. 3 shows the operation 𝑅 − 𝐵 by using: 𝑅 + 𝐾, where 𝐾 = 𝑛𝑜𝑡(𝐵) + 1. Recall that we let 1 be held by 𝑐𝑖𝑛. Note that if 𝐵 =0 → 𝐾 = 2𝑛+1 (here 𝐾 is represented by the second operator as well as 𝑐𝑖𝑛 = 1)

Figure 1. Division Algorithm

1

qn-1qn-2qn-3...q0 +

1 1 1 ...1

qn-1qn-2qn-3...q0

cin = c0

cn=1

cn-1=1

Q:

P:

Figure 2. Q-A when A=0

00001111

10001100

1001

10001

1001

10000

1001

1110

1001

101

1001 AB

Q

R

ALGORITHM

R = 0

for i = n-1 downto 0

left shift R (input = ai)

if R B

qi = 1, R R-B

else

qi = 0

end

end

ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY Digital Library: Arithmetic Cores RECRLAB@OU

2 Daniel Llamocca

Now, we determine the value of 𝑘𝑛−1:

Case 𝐵 ≠ 0: 1 ≤ 𝐵 ≤ 2𝑛 − 1 → 2𝑛+1 − (2𝑛 − 1) ≤ 𝐾 ≤ 2𝑛+1 − 1 ∴ 2𝑛 + 1 ≤ 𝐾 ≤ 2𝑛+1 − 1. Thus, 𝑘𝑛 = 1 Case 𝐵 = 0: 𝐾 = 2𝑛+1. 𝐾 requires 𝑁 + 2 bits, with 𝑘𝑛+1 = 1, and 𝑘𝑛 = 0:

𝐾 𝑘𝑛𝑘𝑛−1𝑘𝑛−2 … 𝑘0 𝑘𝑛

𝐵 ≠ 0

(or 𝐵 > 0)

2𝑛 100…0

𝑘𝑛 = 1 2𝑛 + 1 100…1

… … 2𝑛+1 − 1 111…1

𝐵 = 0 2𝑛+1 1000…0 𝑘𝑛 = 0

Now, we consider 𝑅, 𝐵, and 𝐾 to represent unsigned integers.

𝑅 − 𝐵 ≡ 𝑅 + 𝐾 = ∑ 𝑟𝑖2𝑖

𝑛

𝑖=0

+ ∑ 𝑘𝑖2𝑖

𝑛

𝑖=0

= ∑ 𝑟𝑖2𝑖

𝑛−1

𝑖=0

+ 𝑘𝑛2𝑛 + ∑ 𝑘𝑖2𝑖

𝑛−1

𝑖=0

𝑅 + 𝐾 = 𝑅 + 2𝑛+1 − 𝐵 = ∑ 𝑟𝑖2𝑖

𝑛−1

𝑖=0

+ 2𝑛+1 − ∑ 𝑏𝑖2𝑖

𝑛−1

𝑖=0

𝑅 − 𝐵 < 0:

Since 𝑅 ≥ 0 → 𝐵 > 0 → 𝑘𝑛 = 1

𝑅 + 2𝑛+1 − 𝐵 = ∑ 𝑟𝑖2𝑖𝑛−1𝑖=0 + 2𝑛+1 − ∑ 𝑏𝑖2𝑖𝑛−1

𝑖=0 < 2𝑛+1

𝑅 + 𝐾 = ∑ 𝑟𝑖2𝑖𝑛−1𝑖=0 + 𝑘𝑛2𝑛 + ∑ 𝑘𝑖2𝑖𝑛−1

𝑖=0 < 2𝑛+1 → ∑ 𝑟𝑖2𝑖𝑛−1𝑖=0 + ∑ 𝑘𝑖2𝑖𝑛−1

𝑖=0 < 2𝑛

o The (𝑛 + 1)-bit sum (considering the operation as unsigned) of R and K is lower than 2𝑛+1. Then, there is no overflow

in the (𝑛 + 1)- bit unsigned sum. Thus 𝑐𝑛+1 = 0.

o The 𝑛-bit sum (considering the operations as unsigned) of 𝑅 and 𝑘𝑛−1𝑘𝑛−2 … 𝑘0 is lower than 2𝑛. Thus, there is no

overflow of the 𝑛-bit unsigned sum. Thus 𝑐𝑛 = 0.

𝑅 − 𝐵 ≥ 0:

𝑅 + 2𝑛+1 − 𝐵 = ∑ 𝑟𝑖2𝑖𝑛−1𝑖=0 + 2𝑛+1 − ∑ 𝑏𝑖2𝑖𝑛−1

𝑖=0 ≥ 2𝑛+1

𝑅 + 𝐾 = ∑ 𝑟𝑖2𝑖𝑛−1𝑖=0 + 𝑘𝑛2𝑛 + ∑ 𝑘𝑖2𝑖𝑛−1

𝑖=0 ≥ 2𝑛+1 → ∑ 𝑟𝑖2𝑖𝑛−1𝑖=0 + ∑ 𝑘𝑖2𝑖𝑛−1

𝑖=0 ≥ 2𝑛+1 − 𝑘𝑛2𝑛

o The (𝑛 + 1)-bit sum (considering the operation as unsigned) of R and K is greater or equal than 2𝑛+1. Then, there is

overflow of the (𝑛 + 1)-bit unsigned sum. Thus 𝑐𝑛+1 = 1. o For the n-bit sum of R and 𝑘𝑛−1𝑘𝑛−2 … 𝑘0, we have two cases:

𝐵 > 0 → 𝑘𝑛 = 1. Then ∑ 𝑟𝑖2𝑖𝑛−1𝑖=0 + ∑ 𝑘𝑖2𝑖𝑛−1

𝑖=0 ≥ 2𝑛+1 − 2𝑛 → ∑ 𝑟𝑖2𝑖𝑛−1𝑖=0 + ∑ 𝑘𝑖2𝑖𝑛−1

𝑖=0 ≥ 2𝑛

𝐵 = 0 → 𝑘𝑛 = 0. Then ∑ 𝑟𝑖2𝑖𝑛−1𝑖=0 + ∑ 𝑘𝑖2𝑖𝑛−1

𝑖=0 ≥ 2𝑛+1

In both cases, the n-bit sum (considering the operands as unsigned) of 𝑅 and 𝑘𝑛−1𝑘𝑛−2 … 𝑘0 is greater of equal than 2𝑛. So, there is overflow of the 𝑛-bit unsigned sum. Thus 𝑐𝑛 = 1 when 𝑅 ≥ 𝐵.

2’s complement operation 𝑅 − 𝐵 with 𝑛 + 1 bits: There is no overflow of the subtraction as 𝑐𝑛 = 𝑐𝑛−1. For 𝑅 − 𝐵 ≥ 0: The result 𝑇 = 𝑅 − 𝐵 is a positive number, thus 𝑇𝑛 = 0. Therefore 𝑡𝑛−1𝑡𝑛−2 … 𝑡0 contains 𝑅 − 𝐵 in unsigned

representation. In conclusion: 𝐼𝑓 𝑅 < 𝐵 → 𝑐𝑛 = 0. The 𝑛 bits 𝑇𝑛−1𝑇𝑛−2 … 𝑇0 DO NOT contain the result 𝑅 − 𝐵.

𝐼𝑓 𝑅 ≥ 𝐵 → 𝑐𝑛 = 1. The 𝑛 bits 𝑇𝑛−1𝑇𝑛−2 … 𝑇0 DO represent 𝑅 − 𝐵 in unsigned representation.

0rn-1rn-2...r0 -

0bn-1bn-2...b0

R:

B:

1

0rn-1rn-2...r0 +

1kn-1kn-2...k0

cin

R:

K:

Figure 3. Operation 𝑅 − 𝐵 ≡ 𝑅 + 𝐾 = 𝑅 + 𝑛𝑜𝑡(𝐵) + !

ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY Digital Library: Arithmetic Cores RECRLAB@OU

3 Daniel Llamocca

RESTORING ARRAY DIVIDER FOR UNSIGNED INTEGERS 𝐴, 𝐵: positive integers in unsigned representation. 𝐴 = 𝑎𝑁−1𝑎𝑁−2 … 𝑎0 with 𝑁 bits, and 𝐵 = 𝑏𝑀−1𝑏𝑀−2 … 𝑏0 with 𝑀 bits, with

the condition that 𝑁 ≥ 𝑀. 𝑄 = 𝑞𝑢𝑜𝑡𝑖𝑒𝑛𝑡, 𝑅 = 𝑟𝑒𝑠𝑖𝑑𝑢𝑒. 𝐴 = 𝐵 × 𝑄 + 𝑅.

In this parallel implementation, the result of every stage is called the remainder 𝑅𝑖.

Fig. 4 depicts the parallel algorithm with 𝑁 stages. For each stage

𝑖, 𝑖 = 0, … , 𝑁 − 1, we have:

𝑅𝑖: output of stage 𝑖. Remainder after every stage. 𝑌𝑖: input of stage 𝑖. It holds the minuend.

For the next stage, we append the next bit of 𝐴 to 𝑅𝑖. This becomes

𝑌𝑖+1 (the minuend): 𝑌𝑖+1 = 𝑅𝑖&𝑎𝑁−1−𝑖 , 𝑖 = 0, … , 𝑁 − 1

At each stage 𝑖, the subtraction 𝑌𝑖 − 𝐵 is performed. If 𝑌𝑖 ≥ 𝐵 then

𝑅𝑖 = 𝑌𝑖 − 𝐵. If 𝑌𝑖 < 𝐵, then 𝑅𝑖 = 𝑌𝑖.

Stage 𝑌𝑖 Computation of 𝑅𝑖 # of

𝑅𝑖 bits

0 𝑌0 = 𝑎𝑁−1 𝑅0 = 𝑌0 − 𝐵, 𝑖𝑓 𝑌0 ≥ 𝐵 𝑅0 = 𝑌0, 𝑖𝑓 𝑌0 < 𝐵

1

1 𝑌1 = 𝑅0&𝑎𝑁−2 𝑅1 = 𝑌1 − 𝐵, 𝑖𝑓 𝑌1 ≥ 𝐵 𝑅1 = 𝑌1, 𝑖𝑓 𝑌1 < 𝐵

2

2 𝑌2 = 𝑅1&𝑎𝑁−3 𝑅2 = 𝑌2 − 𝐵, 𝑖𝑓 𝑌2 ≥ 𝐵 𝑅2 = 𝑌2, 𝑖𝑓 𝑌2 < 𝐵

3

… … … …

M-1 𝑌𝑀−1 = 𝑅𝑀−2&𝑎𝑀−𝑁 𝑅𝑀−1 = 𝑌𝑀−1 − 𝐵, 𝑖𝑓 𝑌𝑀−1 ≥ 𝐵 𝑅𝑀−1 = 𝑌𝑀−1, 𝑖𝑓 𝑌𝑀−1 < 𝐵

M

Since 𝐵 has 𝑀 bits, the operation 𝑌𝑖 − 𝐵 requires 𝑀 bits for both

operands. To maintain consistency, we let 𝑌𝑖 be represented with 𝑀 bits.

𝑅𝑖: output of each stage. For the first 𝑀 stages, 𝑅𝑖 requires 𝑖 + 1 bits. However, for consistency and clarity’s sake, since 𝑅𝑖 might be

the result of a subtraction, we let 𝑅𝑖 use M bits.

For stages 0 𝑡𝑜 𝑀 − 2:

𝑅𝑖 is always transferred onto the next stage. Note that we transfer

𝑅𝑖 with 𝑀 − 1 least significant bits. There is no loss of accuracy here since 𝑅𝑖 at most requires M-1 bits for stage M-2. We need 𝑅𝑖

with M-1 bits since 𝑌𝑖+1 uses 𝑀 bits.

Stages 𝑀 − 1 𝑡𝑜 𝑁 − 1:

Starting from stage 𝑀 − 1, 𝑅𝑖 requires 𝑀 bits. We also know that

the remainder requires at most 𝑀 bits (maximum value is 2𝑀 − 2).

So, starting from stage M-1 we need to transfer 𝑀 bits.

As 𝑌𝑖+1 now requires 𝑀 + 1 bits, we need 𝑀 + 1 units starting from stage 𝑀.

To implement the operation 𝑌𝑖 − 𝐵 we use a subtractor. When 𝑌𝑖 ≥ 𝐵 → 𝑐𝑜𝑢𝑡𝑖 = 1, and when 𝑌𝑖 < 𝐵 → 𝑐𝑜𝑢𝑡𝑖 = 0. This 𝑐𝑜𝑢𝑡𝑖

becomes a bit of the quotient: 𝑄𝑖 = 𝑐𝑜𝑢𝑡𝑁−1−𝑖. This quotient Q requires N bits at most.

Also, the final remainder is the result of the last stage. The maximum theoretical value of the remainder is 2𝑀 − 2, thus the remainder 𝑅 requires 𝑀 bits. 𝑅 = 𝑅𝑁−1.

Also, note that we should avoid a division by 0. If B=0, then, in our circuit: 𝑄 = 2𝑁 − 1 and R = 𝑎𝑀−1𝑎𝑀−2 … 𝑎0.

Figure 4. Parallel implementation algorithm

Y0

R0

...

...

Y1

R1

...

Y2

R2

...

Y3

RM-2

...

YM-1

...

...Stage 0

Stage 1

Stage 2

Stage 3

Stage M-1

RM-1

...

YM

Stage M

RM

...

YM+1

Stage M+1

RM+1

...

YM+2

Stage M+2

...

...

RN-2

...

YN-1

Stage N-1

RN-1

M bits

M+1 bits

ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY Digital Library: Arithmetic Cores RECRLAB@OU

4 Daniel Llamocca

COMBINATIONAL ARRAY DIVIDER Fig. 5 shows the hardware of this array divider for N=8, M=4. Note that the first M=4 stages only require 4 units, while the next stages require 5 units. This is fully combinatorial implementation. Each level computes 𝑅𝑖. It first computes 𝑌𝑖 − 𝐵. When 𝑌𝑖 ≥ 𝐵 → 𝑐𝑜𝑢𝑡𝑖 = 1, and when 𝑌𝑖 < 𝐵 → 𝑐𝑜𝑢𝑡𝑖 = 0. This 𝑐𝑜𝑢𝑡𝑖 is used

to determine whether the next 𝑅𝑖 is 𝑌𝑖 − 𝐵 or 𝑌𝑖.

Each Processing Unit (PU) is used to process 𝑌𝑖 − 𝐵 one bit at a time, and to let a particular bit of either 𝑌𝑖 − 𝐵 or 𝑌𝑖 be

transferred on to the next stage.

FULLY PIPELINED ARRAY DIVIDER Fig. 6 shows the hardware core of the fully pipelined array divider with its inputs, outputs, and parameters.

PU PU PU PU

PU PU PU PU

PU PU PU PU

PU PU PU PU

PU PU PU PU

PU PU PU PU

PU PU PU

PU

PU

PU PU

b3 b2 b1 b0

a6

PU

PU

PU

PU PU

a7

0

000

a5

a4

a3

a2

a1

a0

q7 1

1

1

1

1

1

1

1

q6

q5

q4

q3

q2

q1

q0

r0r1r2r3

FA

b

cin

1 0

cout

a

s

r

x00

c00c03c04 c02 c01

c10c11c12c13c14

c21c22c23c24 c20

c32c33c34 c30c31

c43c44c45 c41c42 c40

c54c55 c52c53 c51 c50

c65 c63c64 c62 c61 c60

c70c74c75 c73 c72 c71

x01x02x03

x11x12x13 x10

x22x23 x20x21

x33 x30x31x32

x44 x41x42x43 x40

x50x52x53x54 x51

x61x63x64 x60x62

x72x74 x70x71x73

y03

y12 y11 y10y13

y23

y02 y01 y00

y22 y21 y20

y33 y32 y31 y30

y43y44 y42 y41 y40

y54 y53 y52 y51 y50

y64 y63 y62 y61 y60

y74 y73 y72 y71 y70

PU

Q

R

ARRAY

DIVIDER

M N

NA

B M

N

M

Figure 5. Fully Combinatorial Array Divider architecture for N=8, M=4

Figure 6. Fully pipelined IP core for the array divider

Q

R

v

ARRAY

DIVIDER

M N

NA

B

E

resetn

clock

M

N

M

ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY Digital Library: Arithmetic Cores RECRLAB@OU

5 Daniel Llamocca

Fig. 7 shows the internal architecture of this pipelined array divider for N=8, M=4. Note that the first M=4 stages only require 4 units, while the next stages require 5 units. Note that the enable input ‘E’ is only an input to the shift register on the left, which is used to generate the valid output 𝑣. This way, valid outputs are readily signaled. If E=’1’, the output result is computed

in N cycles (and v=’1’ after N cycles).

Figure 7. Fully Pipelined Array Divider architecture for N=8, M=4

PU PU PU PU

PU PU PU PU

PU PU PU PU

PU PU PU PU

PU PU PU PU

PU PU PU PU

PU PU PU

PU

PU

PU PU

b3 b2 b1 b0 a6

PU

PU

PU

PU PU

a7

0

000 a5 a4 a3 a2 a1 a0

q7

1

1

1

1

1

1

1

1

q6 q5 q4 q3 q2 q1 q0 r0r1r2r3

x00

c00c03c04 c02 c01

c10c11c12c13c14

c21c22c23c24 c20

c32c33c34 c30c31

c43c44c45 c41c42 c40

c54c55 c52c53 c51 c50

c65 c63c64 c62 c61 c60

c70c74c75 c73 c72 c71

x01x02x03

x11x12x13 x10

x22x23 x20x21

x33 x30x31x32

x44 x41x42x43 x40

x50x52x53x54 x51

x61x63x64 x60x62

x72x74 x70x71x73

y03

y12 y11 y10y13

y23

y02 y01 y00

y22 y21 y20

y33 y32 y31 y30

y43y44 y42 y41 y40

y54 y53 y52 y51 y50

y64 y63 y62 y61 y60

y74 y73 y72 y71 y70

E

v

ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY Digital Library: Arithmetic Cores RECRLAB@OU

6 Daniel Llamocca

ITERATIVE RESTORING DIVIDER Fig. 8 shows the iterative hardware architecture as well as the state machine. Here, 𝑅𝑖 is always held at register R. The subtractor

computes 𝑌𝑖 − 𝐵. This requires 𝑀 + 1 bits in the worst case.

If 𝑌𝑖 ≥ 𝐵 then 𝑅𝑖 = 𝑌𝑖 − 𝐵. Yi here is the minuend. 𝑌𝑖 − 𝐵 is loaded onto register R. Note that only M bits are needed.

If 𝑌𝑖 < 𝐵, then 𝑅𝑖 = 𝑌𝑖. Here only 𝑌𝑖 is loaded onto register R. This is done by just shifting 𝑎𝑁−1 into register R

Note that R requires M bits since it holds the remainder at every stage. Also, since we always shift 𝑐𝑜𝑢𝑡𝑖 onto register A, the

quotient Q is held at A in the last iteration.

LEFT SHIFT

REGISTER

LE w REGISTER

E

DA DB

-cout

Q

B

LEFT SHIFT

REGISTER

sclrL

Ew

MN

M

M+1

0&B

R

M

A

aN-1

M+1

0

M+1

M

RM-1RM-2...R0aN-1

RM-1RM-2...R0

E

FSMsclrRLR

ER

done

LAB

EA

MN

cout

cout

aN-1

M

Y

sclrR 1, ER1

C 0

S1

1

resetn=0

E0

ER 1, EA 1

S2

done1

S3

1cout

0

noC=N-1 C C+1

yes

LAB, EA 1

LR 1

E10

Figure 8. Iterative Divider