High Performance FPGA-High Performance FPGA-based Floating Point Adder based Floating Point Adder
with Three Inputswith Three InputsAuthors: A. Guntoro and M. GlesnerAuthors: A. Guntoro and M. Glesner
Institute of Microelectronic SystemInstitute of Microelectronic System
Conference: Field Programmable Logic and Conference: Field Programmable Logic and Applications (FPL), 2008Applications (FPL), 2008
Presenter: Presenter: Tareq Hasan KhanTareq Hasan Khan ID: 11083577ID: 11083577
ECE, U of SECE, U of S
Literature review-2 (EE 800)Literature review-2 (EE 800)
22
OutlineOutline
IEEE 754 StandardIEEE 754 Standard Floating point addition algorithmFloating point addition algorithm Proposed three input floating point Proposed three input floating point
adderadder Overall architectureOverall architecture Brief description of each stageBrief description of each stage
ResultsResults Conclusion Conclusion
33
IEEE 754 Standard IEEE 754 Standard
Issued by IEEE in the year 1985Issued by IEEE in the year 1985 Covers different types of floating point formatCovers different types of floating point format
SingleSingle Double… etc Double… etc
In radix-2, floating point number can be written asIn radix-2, floating point number can be written as
(-1)(-1)s s x 1.f x 2x 1.f x 2ee
where,where, s = sign bit, s = sign bit, f = mantissa, f = mantissa, e = biased exponent e = biased exponent
44
Floating point addition Floating point addition algorithmalgorithm
1. Calculate the exponent difference.2. Align the mantissa by shifting the mantissa
with the lower exponent to the right.3. Add/sub both mantissas depending on the
sign bits.4. Perform the Leading-One Detection (LOD) to
determine the location of the first logic one.5. Normalize and round the result.
55
OutlineOutline
IEEE 754 StandardIEEE 754 Standard Floating point addition algorithmFloating point addition algorithm Proposed three input floating point Proposed three input floating point
adderadder Overall architectureOverall architecture Brief description of each stageBrief description of each stage
ResultsResults Conclusion Conclusion
66
Proposed three input floating Proposed three input floating point adder architecturepoint adder architecture
Used in lifting based Used in lifting based Discrete Wavelet Discrete Wavelet Transform (DWT)Transform (DWT)
5 stage pipeline 5 stage pipeline
Unique researchUnique research
77
Stage 1Stage 1 Mantissa Comparator: compares the two
mantissas Ma and Mb and latches both mantissas
Zero logic: detects if the corresponding input is zero.
Exponent difference: computes the two differences between Ea and Eb (i.e Ea − Eb and Eb − Ea).
88
Stage 2Stage 2 Shift, swap, add guard block
shift the mantissa with the smaller exponent to the right with the amount determined by the exponent selector block.
Swaps the mantissas when (Ma < Mb and Ea = Eb) or (Ea < Eb) is true.
The hidden bit and the guard bits are appended, resulting in fractions Fa and Fb.
If a zero number is detected, the corresponding fractions will be set to zero.
Exponent difference block computes the two differences between Ed and Ec
Mc is latched in Register
99
Stage 3Stage 3 Add/sub and shift
The fractions Fa and Fb are added/subtracted depending on the sign difference (Sa XOR Sb), resulting the fraction Fab.
If the exponent Ec is greater than max(Ea, Eb),the result will be shifted to the right.
Shift and add guard It prepares the mantissa Mc. If Ec is less than max(Ea, Eb),
Mc will be shifted right instead. The hidden bit and the guard bits are appended to Mc,
resulting in fraction Fc.
1010
Stage 4Stage 4 Operand swap and add/sub block
Swaps the operands Fab and Fc if necessary (notice that both operands have the same exponent).
It performs the addition or subtraction, which results Fr. Leading One Petection (LOP) block
Predicts the first occurrence of the “logic one” directly from the operands. One-bit inaccuracy might occur, so it gives two values at the output
Exponent adjustment block prepares the dominant exponent by simply adding two to the larger exponent (i.e. max(Ea, Eb, Ec) + 2). Because three addition/subtraction arithmetic operations might have an increase of exponent by two.
1111
Stage 5Stage 5 LOP error is corrected from Fr Normalization is basically a shiftleft
block with the amount given by the corrected LOP value The overflow and underflow detector verifies if the
resulting fraction and exponent lay outside the floating-point range.
The rounding logic implements two rounding mechanisms: rounding to zero and rounding to nearest.
1212
OutlineOutline
IEEE 754 StandardIEEE 754 Standard Floating point addition algorithmFloating point addition algorithm Proposed three input floating point Proposed three input floating point
adderadder Overall architectureOverall architecture Brief description of each stageBrief description of each stage
ResultsResults Conclusion Conclusion
1313
ResultResult
Xilinx Virtex2 XC2V2000-5
Xilinx Virtex2 XC2VP30-7
Config. Format: exponent–mantissa–guard
1414
ResultResult
Slice usage Slightly higher compared to Malik, but still lower compared to the
IP core. Operating speeds
Higher than both the IP core and Malik on most of the target devices.About 19% speed gain can be achieved on Virtex2Pro and 22% on Virtex2 compared to Malik.
Addition of three floating-point The architectures from IP core and Malik will consume at least
twice as many slices and will have a 10-level pipeline stage.
1515
ConclusionConclusion
Design of a 3 input floating point Design of a 3 input floating point adder adder 5 stage pipeline5 stage pipeline
Can be operated on Can be operated on
Xilinx Virtex2 XC2V2000-5 and Xilinx Virtex2 XC2V2000-5 and
Virtex2Pro XC2VP30-7 at Virtex2Pro XC2VP30-7 at
105 MHz and 143 MHz respectively.105 MHz and 143 MHz respectively.
1616
ThanksThanks