1 Introduction to Integer Arithmetic. 2 Suggested Reading Computer Arithmetic – Behrooz Parhami – Oxford Press, pages 211-224 (Basic Division Schemes);

1

Introduction toIntroduction to

Integer ArithmeticInteger Arithmetic

2

Suggested ReadingSuggested Reading Computer Arithmetic – Behrooz Parhami – Oxford Computer Arithmetic – Behrooz Parhami – Oxford

Press, pages 211-224 (Basic Division Schemes); Press, pages 211-224 (Basic Division Schemes); pages 261- 272 (Division by Convergence) and pages 261- 272 (Division by Convergence) and pages 345-356 (Square-Rooting Methods)pages 345-356 (Square-Rooting Methods)

Computer Arithmetic – Digital Computer Arithmetic Computer Arithmetic – Digital Computer Arithmetic – Joseph F. F. Cavanagh – McGraw-Hill– Joseph F. F. Cavanagh – McGraw-Hill

Computer Arithmetic Simulator by Israel Koren.Computer Arithmetic Simulator by Israel Koren.

3

• Numeric EncodingsNumeric Encodings

Unsigned & Two’s complementUnsigned & Two’s complement

• Programming ImplicationsProgramming Implications

C promotion rules C promotion rules

• Basic operationsBasic operations

Addition, negation, multiplicationAddition, negation, multiplication

• Programming ImplicationsProgramming Implications

Consequences of overflow Consequences of overflow

Using shifts to perform power-of-2 multiply/divideUsing shifts to perform power-of-2 multiply/divide

TopicsTopics

4

Number Range

Decimal

X = (xk-1 xk-2 … x1 x0.x-1 … x-l)10

Xmin Xmax(check!)

10k - 10-l0

Binary

Number system

X = (xk-1 xk-2 … x1 x0.x-1 … x-l)2 0 2k - 2-l

Conventional fixed-radix

X = (xk-1 xk-2 … x1 x0.x-1 … x-l)r 0 rk - r-l

ulp = r-lNotation: Unit in the least significant position

Unit in the last position

5

Representations of signed numbers

Signed-magnitude BiasedComplement

Radix-complement Diminished-radix complement

(Digit complement)

Two’s complement One’s complement

r = 2r = 2

Most Used Representation

[-8, +7] -> [0,15]

F.P. Exponent

6

7 0111 1111 0111 01116 0110 1110 0110 01105 0101 1101 0101 01014 0100 1100 0100 01003 0011 1011 0011 00112 0010 1010 0010 00101 0001 1001 0001 00010 0000 1000 0000 0000-0 1000 1111-1 1001 0111 1111 1110-2 1010 0110 1110 1101-3 1011 0101 1101 1100-4 1100 0100 1100 1011-5 1101 0011 1011 1010-6 1110 0010 1010 1001-7 1111 0001 1001 1000-8 0000 1000

Signed-magnitude

BiasedTwo’s

complementOne’s

complement

7

Encoding IntegersEncoding Integers

short int x = 15213; short int y = -15213;

C C shortshort 2 bytes long 2 bytes long

Sign BitSign Bit For 2’s complement, most significant bit indicates signFor 2’s complement, most significant bit indicates sign

• 0 for nonnegative (or positive)0 for nonnegative (or positive)

• 1 for negative1 for negative

B2T (X ) xw 1 2w 1 xi 2i

i0

w 2

B2U(X ) xi 2i

i0

w 1

Unsigned Two’s Complement

SignBit

Decimal Hex Binaryx 15213 3B 6D 00111011 01101101y -15213 C4 93 11000100 10010011

8

Encoding Example (Cont.)Encoding Example (Cont.)

x = 15213: 00111011 01101101 y = -15213: 11000100 10010011

Weight 15213 -152131 1 1 1 12 0 0 1 24 1 4 0 08 1 8 0 0

16 0 0 1 1632 1 32 0 064 1 64 0 0

128 0 0 1 128256 1 256 0 0512 1 512 0 0

1024 0 0 1 10242048 1 2048 0 04096 1 4096 0 08192 1 8192 0 0

16384 0 0 1 16384-32768 0 0 1 -32768

Sum 15213 -15213

9

Numeric RangesNumeric RangesNumeric RangesNumeric Ranges

Unsigned ValuesUnsigned Values UMinUMin == 00

000…0000…0

UMaxUMax == 22ww – 1 – 1111…1111…1

Two’s Complement ValuesTwo’s Complement Values TMinTMin == –2 –2ww–1–1

100…0100…0

TMaxTMax == 22ww–1–1 – 1 – 1011…1011…1

Other ValuesOther Values Minus 1(-1)Minus 1(-1)

111…1111…1

Decimal Hex BinaryUMax 65535 FF FF 11111111 11111111TMax 32767 7F FF 01111111 11111111TMin -32768 80 00 10000000 00000000-1 -1 FF FF 11111111 111111110 0 00 00 00000000 00000000

Values for W = 16

10

Values for Different Word SizesValues for Different Word Sizes

ObservationsObservations ||TMin TMin | | = = TMaxTMax + 1 + 1

• Asymmetric rangeAsymmetric range

UMaxUMax == 2 * 2 * TMaxTMax + + 1 1

C ProgrammingC Programming #include <limits.h>#include <limits.h>

• K&R App. B11K&R App. B11

Declares constants, e.g.,Declares constants, e.g.,• ULONG_MAXULONG_MAX• LONG_MAXLONG_MAX• LONG_MINLONG_MIN

Values platform-specificValues platform-specific

W8 16 32 64

UMax 255 65,535 4,294,967,295 18,446,744,073,709,551,615TMax 127 32,767 2,147,483,647 9,223,372,036,854,775,807TMin -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808

11

Unsigned & Signed Numeric ValuesUnsigned & Signed Numeric Values

EquivalenceEquivalence Same encodings for Same encodings for

nonnegative valuesnonnegative values

UniquenessUniqueness Every bit pattern represents Every bit pattern represents

unique integer valueunique integer value Each representable integer Each representable integer

has unique bit encodinghas unique bit encoding

X B2T(X)B2U(X)0000 00001 10010 20011 30100 40101 50110 60111 7

–88–79–610–511–412–313–214–115

10001001101010111100110111101111

01234567

12

short int x = 15213; unsigned short int ux = (unsigned short) x; short int y = -15213; unsigned short int uy = (unsigned short) y;

Casting Signed to UnsignedCasting Signed to Unsigned

C Allows Conversions from Signed to UnsignedC Allows Conversions from Signed to Unsigned

Resulting ValueResulting Value No change in bit representationNo change in bit representation Nonnegative values unchangedNonnegative values unchanged

• uxux = 15213 = 15213

Negative values change into (large) positive values ! !Negative values change into (large) positive values ! !• uyuy = 50323 = 50323

13

Signed vs. Unsigned in CSigned vs. Unsigned in CSigned vs. Unsigned in CSigned vs. Unsigned in C

ConstantsConstants By default are considered to be signed integersBy default are considered to be signed integers Unsigned if have “U” as suffixUnsigned if have “U” as suffix

0U, 4294967259U0U, 4294967259U

CastingCasting Explicit casting between signed & unsigned same as U2T and T2UExplicit casting between signed & unsigned same as U2T and T2U

int tx, ty;int tx, ty;

unsigned ux, uy;unsigned ux, uy;

tx = (int) ux;tx = (int) ux;

uy = (unsigned) ty;uy = (unsigned) ty; Implicit casting also occurs via assignments and procedure callsImplicit casting also occurs via assignments and procedure calls

tx = ux;tx = ux;

uy = ty;uy = ty;

14

Sign ExtensionSign ExtensionSign ExtensionSign Extension

Task:Task: Given Given ww-bit signed integer -bit signed integer xx Convert it to Convert it to ww++kk-bit integer with same value-bit integer with same value

Rule:Rule: Make Make kk copies of sign bit: copies of sign bit: XX = = xxww–1 –1 ,…, ,…, xxww–1 –1 , , xxww–1 –1 , , xxww–2 –2 ,…, ,…, xx00

k copies of MSB

• • •X

X • • • • • •

• • •

w

wk

15

Sign Extension ExampleSign Extension Example

Converting from smaller to larger integer data typeConverting from smaller to larger integer data type C automatically performs sign extensionC automatically performs sign extension

short int x = 15213; int ix = (int) x; short int y = -15213; int iy = (int) y;

Decimal Hex Binaryx 15213 3B 6D 00111011 01101101ix 15213 00 00 3B 6D 00000000 00000000 00111011 01101101y -15213 C4 93 11000100 10010011iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011

16

Negating with Complement & IncrementNegating with Complement & IncrementNegating with Complement & IncrementNegating with Complement & Increment

Claim: Following Holds for 2’s Complement Claim: Following Holds for 2’s Complement ~x + 1 == -x~x + 1 == -x

ComplementComplement Observation: Observation: ~x + x == 1111…11~x + x == 1111…1122 == -1 == -1

IncrementIncrement ~x + x + (-x + 1)~x + x + (-x + 1) ==== -1 + (-x + 1)-1 + (-x + 1)

(Adding (-x +1) on both sides of equation )(Adding (-x +1) on both sides of equation ) ~x + 1~x + 1 ==== -x-x

1 0 0 1 0 11 1 x

0 1 1 0 1 00 0~x+

1 1 1 1 1 11 1-1

17

Comp. & Incr. ExamplesComp. & Incr. ExamplesComp. & Incr. ExamplesComp. & Incr. Examples

Decimal Hex Binaryx 15213 3B 6D 00111011 01101101~x -15214 C4 92 11000100 10010010~x+1 -15213 C4 93 11000100 10010011y -15213 C4 93 11000100 10010011

x = 15213

Decimal Hex Binary0 0 00 00 00000000 00000000~0 -1 FF FF 11111111 11111111~0+1 0 00 00 00000000 00000000

0

18

Unsigned AdditionUnsigned AdditionUnsigned AdditionUnsigned Addition

Standard Addition FunctionStandard Addition Function Ignores carry outputIgnores carry output

• • •

• • •

u

v+

• • •u + v

• • •

True Sum: w+1 bits

Operands: w bits

Discard Carry: w bits UAddw(u , v)

19

Class Exercise - 1Class Exercise - 1

Suppose that you have a number (Suppose that you have a number (positive or negativepositive or negative) ) represented in two´s complement, using 4 bits of word length.represented in two´s complement, using 4 bits of word length.

Specify the steps which are necessary to perform fast Specify the steps which are necessary to perform fast division by 2 and obtain the correct result for division by 2 and obtain the correct result for positive and positive and negativenegative numbers. numbers. Remember that arithmetic shift operations are shifts where the sign bit Remember that arithmetic shift operations are shifts where the sign bit

is propagated from right to left.is propagated from right to left.

Think about 4-bit numbers in two’s complement, that is they Think about 4-bit numbers in two’s complement, that is they are in the range [-8 to +7].are in the range [-8 to +7].

20

Carry and Overflow Detection in Software for Two´s Carry and Overflow Detection in Software for Two´s Complement Arithmetic Complement Arithmetic

Detection in Hardware is Detection in Hardware is slightly Different because the slightly Different because the result (Sum) bit is not used – result (Sum) bit is not used – CyCyi-1 and Cy and Cyi-2 bits are used bits are used instead.instead.

CARRYCARRY (Addition or (Addition or Subtraction)Subtraction)Sign Sign Sign Carry In Sign Carry In Carry OutCarry Out

A A i-1 B B i-1 Cyin Cyin i-1 CyoutCyouti

00 0 0 0 0 0000 0 0 1 1 0000 1 1 0 0 0000 1 1 1 1 1111 0 0 0 0 0011 0 0 1 1 1111 1 1 0 0 111 1 1 1 1 1 11

CyoutCyouti== A A i-1.B .B i-1 + (A + (A i-1 ΘΘ B B i-1). Cyin ). Cyin i-1

OVERFLOWOVERFLOW

Sign Sign Sign Carry-in Sign Carry-in OverflowOverflow

A A i-1 B B i-1 Cyin Cyin i-1 Ovf Ovf i

00 0 0 0 0 00

00 0 0 1 1 11

00 1 1 0 0 00

00 1 1 1 1 00

11 0 0 0 0 00

11 0 0 1 1 00

11 1 1 0 0 11

1 1 1 1 1 1 00

Ovf Ovf i = A = Ai-1.B.Bi-1.Cyin.Cyini-1 + A + A i-1 B B i-1. Cyin . Cyin i-1 (hardware method (hardware method

of detection) or the of detection) or the software method of detection:software method of detection:

Numbers have equal signs and resulting sign is different Numbers have equal signs and resulting sign is different from number’s signsfrom number’s signs

It is possible to have a CARRY out and not have an It is possible to have a CARRY out and not have an OVERFLOW !!!!!!!!!!!!OVERFLOW !!!!!!!!!!!!

21

Basic Operations in Two´s Complement: Basic Operations in Two´s Complement: Addition and Subtraction Have the Same TreatmentAddition and Subtraction Have the Same Treatment

A = -7 ; B= +8 A = -7 ; B= +8

A – B = A + (-B) = ? (4 bits)A – B = A + (-B) = ? (4 bits)

1001 (-7)1001 (-7)

+ 1000 (-8)+ 1000 (-8)

(CY=1) 0001 (-15) (Borrow=0)(CY=1) 0001 (-15) (Borrow=0)

OVERFLOWOVERFLOW

A = +7 ; B = + 8 A = +7 ; B = + 8 A - B = A + (-B) = ? A - B = A + (-B) = ?

0111 (+7)0111 (+7)

+ 1000 (-8)+ 1000 (-8)

(CY= 0) 1111 (-1) ( Borrow = 1)(CY= 0) 1111 (-1) ( Borrow = 1)

NO OVERFLOWNO OVERFLOW

A= +7 ; B = +6A= +7 ; B = +6

A - B = A + (-B)?A - B = A + (-B)? 0111 (+7)0111 (+7)

+ 1010 (-6)+ 1010 (-6)

(CY= 1) 0001 (+1) ( Borrow = 0)(CY= 1) 0001 (+1) ( Borrow = 0)

NO OVERFLOWNO OVERFLOW

NOTES:

1 – Multi operand arithmetic (eg. a 16-bit subtraction on a 8-bit microcontroller) demands the use of arithmetic operations which use the CARRY FLAG.

2 – The Hardware has only one flag, which is usually termed CARRY and instructions are usually termed ADDc, SUBc (or SUBnc).

3- Looking to the left side of this slide we can see that actually in subtraction, it has to be SUBTRACT on BORROW (propagate borrow then there is no CARRY).

4 – OVERFLOW (hardware detection)

Overflow occurs when the sign bits are zero and there is a carry from the previous bits or when the sign bits are one and there is no carry from the previous bits:

Ov = Xn-1.Yn-1.CYń-1 + Xń-1.Yń-1.CYn-1

5 – OVERFLOW (Software detection)

If both numbers have the same signs and the result of the addition is of a different sign then an overflow occurred !!!

22

Multiplication in Two´s ComplementMultiplication in Two´s Complement

It can be easily performed in It can be easily performed in software by a sequence of software by a sequence of multiply-add operations, but it multiply-add operations, but it takes many clock cycles. The takes many clock cycles. The number of clock cycles is number of clock cycles is directly proportional to the directly proportional to the number of bits of the operands.number of bits of the operands.

The software algorithm can be The software algorithm can be improved as it will be seen in improved as it will be seen in the next slide. the next slide.

If the Microprocessor has a If the Microprocessor has a Parallel Combinational Parallel Combinational Multiplier it can be done in one Multiplier it can be done in one clock cycle (for single clock cycle (for single operands) or approximately 6 operands) or approximately 6 clock cycles for double-word clock cycles for double-word operands (DSPs and other operands (DSPs and other modern microprocessors have modern microprocessors have such a multiplier)such a multiplier)

1 0 1 0 0 0 1

1 1 0 1 1 0 1X

1 0 1 0 0 0 10 0 0 0 0 0 0

1 0 1 0 0 0 11 0 1 0 0 0 1

0 0 0 0 0 0 01 0 1 0 0 0 1

1 0 1 0 0 0 1

1 1 0 0 1 0 0 1 1 1 1 1 0 1

+

23

Optimized Multiplication in Software Optimized Multiplication in Software ((http://www.convict.lu/Jeunes/Math/Fast_operations.htm)http://www.convict.lu/Jeunes/Math/Fast_operations.htm)

The algorithm is based on a particularity of binary The algorithm is based on a particularity of binary notation.notation.

Imagine the multiplying of the base 10 numbersImagine the multiplying of the base 10 numbers xx1010 = 7 and y = 7 and y1010 = 5 = 5

xx22 = 111 = 111

yy22 = 101, which signifies y = 101, which signifies y1010 = 1*2 = 1*222 + 0*2 + 0*211 + 1*2 + 1*200 = = 1*1001*10022 + 0*10 + 0*1022 + 1*1 + 1*122

The distributive rule gives us:The distributive rule gives us:

111 * 101 = 111 * (1*100 + 0*10 + 1*1) = 111*(1*100) 111 * 101 = 111 * (1*100 + 0*10 + 1*1) = 111*(1*100) + 111*(0*10) + 111*(1*1)+ 111*(0*10) + 111*(1*1)

The associative and commutative rules give us:The associative and commutative rules give us:

= (111*100)*1 + (111*10)*0 + (111*1)*1= (111*100)*1 + (111*10)*0 + (111*1)*1

In binary notation, multiplying by factors of 2 is In binary notation, multiplying by factors of 2 is equivalent to shifting the number:equivalent to shifting the number:

= 11100*1 + 1110*0 + 111*1= 11100*1 + 1110*0 + 111*1

= 11100 + 111 = 100011 = 35= 11100 + 111 = 100011 = 351010

Thus a simple algorithm may be written for Thus a simple algorithm may be written for multiplicationmultiplication::

Operate the muliplication z = x * yOperate the muliplication z = x * y

z := 0z := 0

while y <> 0 dowhile y <> 0 do

is the least significant bit of y 1 ?is the least significant bit of y 1 ?

yes: z := z + x; no: continue;yes: z := z + x; no: continue;

shift x one digit to the left;shift x one digit to the left;

shift y one digit to the right;shift y one digit to the right;

Let's now analyze the function Let's now analyze the function MULV8MULV8 which which may be accessed from within a program by may be accessed from within a program by preparing the temporary variables preparing the temporary variables TEMPX TEMPX and and TEMPYTEMPY, calling the function and finally , calling the function and finally retrieving the product from the variable retrieving the product from the variable RESULT.RESULT.

For example,For example,

we want our program to compute:we want our program to compute:

z := x * yz := x * y

In PIC-assembler this will sound:In PIC-assembler this will sound:

MOVF x,WMOVF x,WMOVWF TEMPXMOVWF TEMPXMOVF y,WMOVF y,WMOVWF TEMPYMOVWF TEMPYCALL MULV8CALL MULV8MOVF RESULT,WMOVF RESULT,WMOVWF zMOVWF z

O tempo da Multiplicação (ou número de ciclos) O tempo da Multiplicação (ou número de ciclos) vai depender da configuração dos bits do vai depender da configuração dos bits do multipliando e do multiplicador. Por multipliando e do multiplicador. Por exemplo, a multiplicação por 3 é bastante exemplo, a multiplicação por 3 é bastante rápida, pois “y” logo valerá zero e ele sai rápida, pois “y” logo valerá zero e ele sai do loop.do loop.

24

Optimized Multiplication in Software – 8 bitsOptimized Multiplication in Software – 8 bits

here is what the computer will do:here is what the computer will do:

clrfclrf means means 'clear file''clear file' (in PIC-language a file is an 8-bit (in PIC-language a file is an 8-bit register)register)

movfmovf 'transfer value from file to itself (F) or the 'transfer value from file to itself (F) or the accumulator (W)‘accumulator (W)‘

btfscbtfsc means means 'skip next instruction if the designed bit is 'skip next instruction if the designed bit is clear'clear'))

bcfbcf 'bit clear at file' Status,C = CLEAR THE CARRY-'bit clear at file' Status,C = CLEAR THE CARRY-FLAGFLAG

rrfrrf 'rotate right file and store it to itself or the 'rotate right file and store it to itself or the accumulator‘accumulator‘

rlfrlf 'rotate left file and store...‘ 'rotate left file and store...‘

movlw movlw 'fill accumulator with litteral value‘'fill accumulator with litteral value‘

movwfmovwf 'transfer value from accumulator to file‘'transfer value from accumulator to file‘

btfssbtfss 'skip next instruction if designed bit is set' 'skip next instruction if designed bit is set' Status,Z = ZERO-FLAG SET?Status,Z = ZERO-FLAG SET?

MULV8 CLRF RESULTMULU8LOOP MOVF TEMPX,W BTFSC TEMPY,0 ADDWF RESULT BCF STATUS,C RRF TEMPY,F BCF STATUS,C RLF TEMPX,F MOVF TEMPY,F BTFSS STATUS,Z GOTO MULU8LOOP RETURN

25

Multiplication for 16 bitsMultiplication for 16 bits

ADD16ADD16 MOVF TEMPX16,W MOVF TEMPX16,W ADDWF RESULT16 ADDWF RESULT16 BTFSC STATUS,C BTFSC STATUS,C INCF RESULT16_H INCF RESULT16_H MOVF TEMPX16_H,W MOVF TEMPX16_H,W ADDWF RESULT16_H ADDWF RESULT16_H RETURN RETURNMULV16 MULV16 CLRF RESULT16 CLRF RESULT16 CLRF RESULT16_H CLRF RESULT16_HMULU16LOOPMULU16LOOP BTFSC TEMPY16,0 BTFSC TEMPY16,0 CALL ADD16 CALL ADD16 BCF STATUS,C BCF STATUS,C RRF TEMPY16_H,F RRF TEMPY16_H,F RRF TEMPY16,F RRF TEMPY16,F BCF STATUS,C BCF STATUS,C RLF TEMPX16,F RLF TEMPX16,F RLF TEMPX16_H,F RLF TEMPX16_H,F MOVF TEMPY16,F MOVF TEMPY16,F BTFSS STATUS,Z BTFSS STATUS,Z GOTO MULU16LOOP GOTO MULU16LOOP MOVF TEMPY16_H,F MOVF TEMPY16_H,F BTFSS STATUS,Z BTFSS STATUS,Z GOTO MULU16LOOP GOTO MULU16LOOP RETURN RETURN

26

Fixed-Point ArithmeticFixed-Point Arithmetic

RepresentationRepresentationUsing 2’s Complement:Using 2’s Complement: Integer Part:Integer Part: 2 2mm-1 positive values-1 positive values

22mm negative values negative values

Fractional PartFractional Part: [ 2: [ 2-n-n, 1), with n bits, 1), with n bits

Smallest Number: 2Smallest Number: 2-n = = Largest Number: ~ 1Largest Number: ~ 1

Let us Suppose a Fractional Part with 10 bits

The smallest fraction is: 2-10 = 1/ 1024 ~ 0.0009765 ~ 0.001

The largest fraction is: 1/21 + 1/22 + . . . + 1/210 = (210 –1)/210 = 1023/1024 ~ 0.99902

Where is the position of the decimal point ? Depends on the Application

S m bits n bits

0000000000001

1111111111111

2 m-1 . . .. . . . 21 20 2-1 2-2 2-3 . . . . . 2-n

27

Class Exercise - 2Class Exercise - 2

Let us suppose an application (a software for construction engineers) that Let us suppose an application (a software for construction engineers) that requires objects as small as 1mm, or as large as one medium size requires objects as small as 1mm, or as large as one medium size building, to be represented on the screen. Consider that the computer building, to be represented on the screen. Consider that the computer has a word length of 16 bits.has a word length of 16 bits.

Suppose also that the image used by the application can be rotated by Suppose also that the image used by the application can be rotated by very small degrees (in fractions or radians) to give the illusion of very small degrees (in fractions or radians) to give the illusion of continuous movement (The viewer can navigate inside the building like in continuous movement (The viewer can navigate inside the building like in a video game).a video game).

Devise and justify one possible fixed-point representation for this Devise and justify one possible fixed-point representation for this application that would be capable of satisfying the restrictions above.application that would be capable of satisfying the restrictions above.

Now suppose a flight simulator. What distances and object sizes can be Now suppose a flight simulator. What distances and object sizes can be represented using the same word partition above ?represented using the same word partition above ?

28

Class-Exercise 3Class-Exercise 3

- Represent the following real numbers (in base 10) in two’s - Represent the following real numbers (in base 10) in two’s complement fixed-point arithmetic, with a total of eight bits, complement fixed-point arithmetic, with a total of eight bits, being four bits in the fractional part and perform the following being four bits in the fractional part and perform the following operations:operations:

A= + 4.75 = A= + 4.75 =

B= + 2.375 = B= + 2.375 =

A + B = A + B =

A – B = A – B =

WHAT IF ?: WHAT IF ?:

A =+ 5.5 = A =+ 5.5 =

B = +7.5 = B = +7.5 =

A + B =A + B =

29

Addition and Subtraction Using Fixed-Point Addition and Subtraction Using Fixed-Point ArithmeticArithmetic

AdditionAddition: It all happens as if the number being added was an integer : It all happens as if the number being added was an integer number. The integer unit of the ALU is used.number. The integer unit of the ALU is used. Let us consider a Let us consider a number in two’s complement with a total of 8 bits and 4 bits in the number in two’s complement with a total of 8 bits and 4 bits in the fractional part:fractional part:

A= +4.75 = 0100.1100A= +4.75 = 0100.1100

B= +2.375= 0010.0110B= +2.375= 0010.0110

A+B= +7.125A+B= +7.125= 0111.0010 => (propagates from the fractional part into the integer = 0111.0010 => (propagates from the fractional part into the integer part)part)

A-B=A+(-B) = >A-B=A+(-B) = >

A= +4.75 = 0100.1100A= +4.75 = 0100.1100

-B=-2.375= 1101.1010-B=-2.375= 1101.1010

A+(-B)=+2.375A+(-B)=+2.375= 0010.0110= 0010.0110

WHAT IF ?: WHAT IF ?:

A =+ 5.5 = 0101.1000A =+ 5.5 = 0101.1000

B = +7.5 = 0111.1000B = +7.5 = 0111.1000

A + B =+13.0= 1101.0000 A + B =+13.0= 1101.0000 OVERFLOW !!!!!!!!!!!!!!!!!!!OVERFLOW !!!!!!!!!!!!!!!!!!!

30

Class-Exercise 4Class-Exercise 4

- Represent the following numbers in two’s - Represent the following numbers in two’s complement fixed-point arithmetic, with a total of complement fixed-point arithmetic, with a total of eight bits and four bits in the fractional part and eight bits and four bits in the fractional part and obtain their obtain their productproduct. The result has to fit into 8 bits . The result has to fit into 8 bits using the same representation:using the same representation:

A= + 2.75 = A= + 2.75 =

B= + 2.375 = B= + 2.375 =

A * B = A * B =

31

Fixed-Point MultiplicationFixed-Point Multiplication

Fixed-Point Arithmetic, together with Scaling, is Used to Deal with Integer and Fixed-Point Arithmetic, together with Scaling, is Used to Deal with Integer and Fractional Values. Fractional Values.

Overflow can Occur and has to be Treated by SoftwareOverflow can Occur and has to be Treated by Software

Multiplication Example:Multiplication Example:

A= +2.75 = 0010.1100A= +2.75 = 0010.1100

B= +2.375= 0010.0110B= +2.375= 0010.0110

A * B = ? A * B = ?

NOTES: If NOTES: If x = (0000 0000 . 0000 0011) Small Number x = (0000 0000 . 0000 0011) Small Number x x22 UNDERFLOW !! UNDERFLOW !!

y = (0101 0000 . 0000 0000) Large Number y = (0101 0000 . 0000 0000) Large Number y y22 OVERFLOW !! OVERFLOW !!

Integer Fraction

Integer Fraction

Integer H Fraction H

015

Integer Fraction

Fraction L

31

015

0

B =

A =

(A * B)32 = Integer L

(A * B)16 =

SCALING

UnderflowOverflow

32

Loosing Precision Because of TruncationLoosing Precision Because of Truncation

Consider an Application that uses two’s complement with a word Consider an Application that uses two’s complement with a word length of 16 bits, 5 bits for the integer part and 10 bits for the length of 16 bits, 5 bits for the integer part and 10 bits for the fractional part.fractional part.

Consider that we have to multiply a distance of 5.675 meters by the Consider that we have to multiply a distance of 5.675 meters by the cosine of 45cosine of 450 (0.707). (0.707).

The correct value should be: 5.675 * 0.707 = 4.012The correct value should be: 5.675 * 0.707 = 4.012

A= 5.675 = 000101 .1000101011 (16 bits)A= 5.675 = 000101 .1000101011 (16 bits)

B= 0.707 = 000000 .1011000011 (16 bits)B= 0.707 = 000000 .1011000011 (16 bits)

A * B = 0000000000111101 001110 0011000001 (32 bits)A * B = 0000000000111101 001110 0011000001 (32 bits)

A * B = A * B = 000011 .1101001110 (scaled to 000011 .1101001110 (scaled to 16 )16 )

A * B = 3.918A * B = 3.918

Because of Truncation, The result Differs by 2.3%Because of Truncation, The result Differs by 2.3%

33

Does the Order of Computation Matter?Does the Order of Computation Matter?

Let us consider: a = 48221; b = 51324 and c = 33600, three values that Let us consider: a = 48221; b = 51324 and c = 33600, three values that we want to add and scale so that the result is translated into a domain we want to add and scale so that the result is translated into a domain with a 10-bit fractional part. Should we scale before and add afterwards, with a 10-bit fractional part. Should we scale before and add afterwards, or vice-versa?or vice-versa?

Scaling Operands Before Performing ComputationsScaling Operands Before Performing Computations

Result1 = int[a:1024] + int[b:1024] + int [c:1024] = 129Result1 = int[a:1024] + int[b:1024] + int [c:1024] = 129

Scaling after adding the three values:Scaling after adding the three values:

Result2 = int[(a + b + c) : 1024] = 130Result2 = int[(a + b + c) : 1024] = 130

Result2 is more accurate than Result1 !!Result2 is more accurate than Result1 !!

CONCLUSION: Sometimes we have to change the order of CONCLUSION: Sometimes we have to change the order of the operations to obtain better results.the operations to obtain better results.

34

Some Comments About Arithmetic Operations Some Comments About Arithmetic Operations on Embedded Systemson Embedded Systems

Many Arithmetic Operations can be Speeded Up by Many Arithmetic Operations can be Speeded Up by using tables. E.g. trigonometric functions, divisionusing tables. E.g. trigonometric functions, division

Software tricks (which would be very hard to implement Software tricks (which would be very hard to implement in hardware) can be used in software to speed up in hardware) can be used in software to speed up arithmetic operations in embedded systems. arithmetic operations in embedded systems.

One of the techniques is to analyse the operands and One of the techniques is to analyse the operands and take a decision about how many loops to iterate.take a decision about how many loops to iterate.

35

Floating-Point x Fixed-PointFloating-Point x Fixed-Point

Floating-Point - provides large dynamic range and Fixed-Floating-Point - provides large dynamic range and Fixed-point does not. point does not. What about precision ?What about precision ?

A Floating-Point co-processor is very convenient for the A Floating-Point co-processor is very convenient for the programmer because he(she) does not have to worry programmer because he(she) does not have to worry about data ranges and alignment of the decimal point, about data ranges and alignment of the decimal point, neither overflow or underflow detection. In Fixed-Point, neither overflow or underflow detection. In Fixed-Point, the programmer has to worry about all these problems.the programmer has to worry about all these problems.

However, a floating-point unit demands a considerable However, a floating-point unit demands a considerable silicon area and power, which is not commensurate with silicon area and power, which is not commensurate with low-power embedded devices. In fact, most DSP low-power embedded devices. In fact, most DSP processors have avoided floating-point units because of processors have avoided floating-point units because of these restrictions.these restrictions.

When there is no Floating-Point unit, most arithmetic and When there is no Floating-Point unit, most arithmetic and trigonometric functions have to be done in software.trigonometric functions have to be done in software.

36

Class Exercise 5Class Exercise 5

Consider the following problem where numbers are in two’s complement, 8 Consider the following problem where numbers are in two’s complement, 8 bits total and 4 bits in fractional part.bits total and 4 bits in fractional part.

A = 7.5A = 7.5B = 0.25B = 0.25C = 6.25C = 6.25

We want to do: (A + B + C) / 3.25We want to do: (A + B + C) / 3.251 – Can I perform the addition and then divide the result – does the result of 1 – Can I perform the addition and then divide the result – does the result of the addition fit into the word length? What can I do?the addition fit into the word length? What can I do?

What if:What if:A = 0.25A = 0.25B = 0.5B = 0.5C = 0.125C = 0.125We want to do: (A + B + C) / 1.25 * 6.0We want to do: (A + B + C) / 1.25 * 6.0

2 – What happens if I perform the operations from left to right ? What can I 2 – What happens if I perform the operations from left to right ? What can I do to avoid loosing significant bits during my operations ?do to avoid loosing significant bits during my operations ?

37

Block Floating Point Operations - IBlock Floating Point Operations - I Block Floating Point Provides Some of the Benefits of Floating Point Block Floating Point Provides Some of the Benefits of Floating Point

Representation, but by Representation, but by Scaling Blocks of Numbers Rather than each Individual Scaling Blocks of Numbers Rather than each Individual NumberNumber..

Block Floating Point Numbers are Represented by the Full Word Length of a Block Floating Point Numbers are Represented by the Full Word Length of a Fixed Point Number.Fixed Point Number.

If Any One of a Block of Numbers Becomes Too Large for the Available Word If Any One of a Block of Numbers Becomes Too Large for the Available Word Length, the Programmer Scales Down all the Numbers in the Block, by Shifting Length, the Programmer Scales Down all the Numbers in the Block, by Shifting Them to the Right. Example with word length = 8 bits. In the example, variables A, Them to the Right. Example with word length = 8 bits. In the example, variables A, B, and C are the result of some computation and bits 8 and 9 do not fit into the B, and C are the result of some computation and bits 8 and 9 do not fit into the original word length (overflow). original word length (overflow). To continue use them and maintain their relative To continue use them and maintain their relative values, they have to be scaled as a group (as a block), and undo the scaling values, they have to be scaled as a group (as a block), and undo the scaling operation later.operation later. Example: Example: A =A = 1010 | 0001 . 1000| 0001 . 1000

B = B = 0000 | 0110 . 0000 | 0110 . 0000C = C = 0101 | 1001 . 0001 | 1001 . 0001

Similarly, if the Largest of a Block of Numbers is Small, the Programmer Scales Similarly, if the Largest of a Block of Numbers is Small, the Programmer Scales up all the Numbers in the Block to Use the Full Available word length of the up all the Numbers in the Block to Use the Full Available word length of the Mantissa. Example with word length = 8 bits. A, B and C are the result of some Mantissa. Example with word length = 8 bits. A, B and C are the result of some previous computation (where there was an underflow – in yellow). previous computation (where there was an underflow – in yellow). If we scale up If we scale up the block of variables we do not loose the least significant bits. We have to undo the block of variables we do not loose the least significant bits. We have to undo the scale up later to bring the result to its proper domain.the scale up later to bring the result to its proper domain.

Example: Example: A =A = 0000 . 0100 | 0000 . 0100 | 11011101B = 0000 . 0010 | B = 0000 . 0010 | 10001000C = 0000 . 0011 | C = 0000 . 0011 | 11001100 7654 3210 7654 3210

38

Block Floating Point Operations - IIBlock Floating Point Operations - II

This Approach is Used to Make the Most of the Mantissa of the This Approach is Used to Make the Most of the Mantissa of the Operands and also to Minimize Loss of Significant Bits During Operands and also to Minimize Loss of Significant Bits During Arithmetic Operations with Scaling (Truncation).Arithmetic Operations with Scaling (Truncation).

EXAMPLE: Normalize Operands (Left shift until MSB=1) EXAMPLE: Normalize Operands (Left shift until MSB=1)

0110 0 0 0 0

0000 1 0 0 0

0010 0 1 0 0

0100 0 1 0 0

Values Afterwards

100 1

Shared Exponent

000 0 1 1 0

1000 0 0 0 0

0000 0 1 0 1

0000 0 0 1 1

Values Before

0

S S

39

ExampleExample

16-Bit word processor16-Bit word processorAfter converting our Floating-Point Representation into a Fixed-After converting our Floating-Point Representation into a Fixed-Point Representation, suppose that we have:Point Representation, suppose that we have:

A = 5000A = 5000B = 9000B = 9000C = 8000C = 8000

Suppose that we have to perform: Suppose that we have to perform: (-B + SQRT (B * B - 4 * A * C) ) / (2 * (-B + SQRT (B * B - 4 * A * C) ) / (2 * A)A)

The Intermediate results of (B*B) and (4*A *C) are too big to fit into a The Intermediate results of (B*B) and (4*A *C) are too big to fit into a 16-bit word. However, it is expected that the result fits into a 16-bit 16-bit word. However, it is expected that the result fits into a 16-bit word.word.

Thus, we can use a block floating-point representation by shifting Thus, we can use a block floating-point representation by shifting the data by the same amount and then perform the operations.the data by the same amount and then perform the operations.

A = (5000 >> 10) (divide by 2A = (5000 >> 10) (divide by 210 or 1024) => or 1024) => Thus, exponent = 10Thus, exponent = 10B = (9000 >> 10)B = (9000 >> 10)C = (8000 >> 10) C = (8000 >> 10)

After the operation the result is shifted back by the amount of bits specified After the operation the result is shifted back by the amount of bits specified by the exponentby the exponent

40

Normalization Operations for 2´s Complement NumbersNormalization Operations for 2´s Complement Numbers

To use Block Floating Point (and also other arithmetic operations) To use Block Floating Point (and also other arithmetic operations) Normalization Operations are RequiredNormalization Operations are Required

Let us suppose 5-bit 2´s complement numbers. I have to calculate Let us suppose 5-bit 2´s complement numbers. I have to calculate the normalization factor. the normalization factor. How do I calculate it ?How do I calculate it ?

Let us try with the numbers +2 (00010)Let us try with the numbers +2 (00010)2 and –2 (11110) and –2 (11110)2.. For positive numbers I do left shift 2 positions (x4)For positive numbers I do left shift 2 positions (x4) For negative numbers I do left shift 3 positions (x8)For negative numbers I do left shift 3 positions (x8) So I have two different normalization factors ? Does this work ?So I have two different normalization factors ? Does this work ?

It works because after the arithmetic operations, the It works because after the arithmetic operations, the resulting number is right shifted by the same amount.resulting number is right shifted by the same amount.

Thus, calculate the number of left-shift positions (up to Thus, calculate the number of left-shift positions (up to sign bit) for the most significant “1” for positive numbers sign bit) for the most significant “1” for positive numbers and for the most significant “0” for negative numbersand for the most significant “0” for negative numbers

41

DivisionDivision

42

DivisionDivision

It is much more Difficult to Accelerate than MultiplicationIt is much more Difficult to Accelerate than MultiplicationSome Existing Methods of Implementation Are:Some Existing Methods of Implementation Are:

Shift and Subtract, or Programmed DivisionShift and Subtract, or Programmed Division (Similar to Paper and (Similar to Paper and Pencil Method)Pencil Method) Restoring MethodRestoring Method Non-Restoring MethodNon-Restoring Method

Division By ConvergenceDivision By Convergence – – Obtain the Reciprocate (inverse) of the Divisor by Obtain the Reciprocate (inverse) of the Divisor by some Convergence Method and Multiply it by the Dividend – Also a software method some Convergence Method and Multiply it by the Dividend – Also a software method but it assumes that the Microprocessor has a hardware (very fast) multiplier. but it assumes that the Microprocessor has a hardware (very fast) multiplier.

Successive Approximation Methods to Obtain the Reciprocate of the Successive Approximation Methods to Obtain the Reciprocate of the Divisor.Divisor.

Look-up table for the Reciprocate (Partial or Total)Look-up table for the Reciprocate (Partial or Total)

High-Radix DivisionHigh-Radix Division – Mostly Methods for Implementing in Hardware – Mostly Methods for Implementing in Hardware

43

Programmed (Restoring) Division Example – Integer NumbersProgrammed (Restoring) Division Example – Integer Numbers

======== INTEGER DIV ============= INTEGER DIV =====z (dend)z (dend) 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 = =

(117)(117)10

224dd 1 0 1 0 1 0 1 0 = = (10)(10)10

==================================================ss(0) 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 12s2s(0) 0 1 1 1 0 1 0 1-q3.24d 1 0 1 0 {q3=1}--------------------------------------------ss(1) 0 1 0 0 1 0 1 0 1 0 0 1 0 12s2s(1) 0 1 0 0 1 0 1q2.24d 0 0 0 0 {q2=0}--------------------------------------------ss(2) 1 0 0 1 0 1 1 0 0 1 0 12s2s(2) 1 0 0 1 0 1-q1.24d 1 0 1 0 {q1=1}--------------------------------------------ss(3) 1 0 0 0 1 1 0 0 0 12s2s(3) 1 0 0 0 1 q0.24d 1 0 1 0 {q0=1}--------------------------------------------ss(4) 0 1 1 1 0 1 1 1ss 0 1 1 1 = 7 (remainder)0 1 1 1 = 7 (remainder)q q 1 0 1 1 = 11 (quotient)1 0 1 1 = 11 (quotient)

This method assumes that the dividend This method assumes that the dividend has 2n bits and the divisor has n bits.has 2n bits and the divisor has n bits.

The method is similar to the “paper and The method is similar to the “paper and pencil” method. Negative numbers pencil” method. Negative numbers have to be converted to positive first.have to be converted to positive first.

Firstly, compare the value of the divisor Firstly, compare the value of the divisor with the higher part of the dividend. If with the higher part of the dividend. If the divisor is larger, shift the dividend, the divisor is larger, shift the dividend, subtract the divisor from the higher subtract the divisor from the higher part and set the corresponding part and set the corresponding quotient bit to “1”.quotient bit to “1”.

If the higher part of the shifted dividend is If the higher part of the shifted dividend is lower than the divisor, do not subtract lower than the divisor, do not subtract anything from the higher part of the anything from the higher part of the dividend and set the corresponding dividend and set the corresponding quotient bit to “0”.quotient bit to “0”.

The number of iterations is equal to The number of iterations is equal to number of bits of the divisor.number of bits of the divisor.

The remainder is left in the higher part of The remainder is left in the higher part of the dividendthe dividend

44

Programmed (Restoring) Division Example – Programmed (Restoring) Division Example – Fractional (Real) NumbersFractional (Real) Numbers

======== FRACTIONAL DIV ============= FRACTIONAL DIV =====z fracz frac .0 1 1 1 0 1 0 1 .0 1 1 1 0 1 0 1 = = d fracd frac .1 0 1 0 .1 0 1 0 = = ==================================================ss(0) . 0 1 1 1 0 1 0 1 . 0 1 1 1 0 1 0 12s2s(0) 0 .1 1 1 0 1 0 1-q-1d .1 0 1 0 {q-1=1}--------------------------------------------ss(1) .0 1 0 0 1 0 1 .0 1 0 0 1 0 12s2s(1) 0 .1 0 0 1 0 1-q-2d .0 0 0 0 {q-2=0}--------------------------------------------ss(2) .1 0 0 1 0 1 .1 0 0 1 0 12s2s(2) 1 .0 0 1 0 1-q-3d .1 0 1 0 {q-3=1}--------------------------------------------ss(3) .1 0 0 0 1 .1 0 0 0 12s2s(3) 1 . 0 0 0 1 -q-4d .1 0 1 0 {q-4=1}--------------------------------------------ss(4) .0 1 1 1 .0 1 1 1sfracsfrac . .0 0 0 00 0 0 0 0 1 1 1 0 1 1 1

(remainder)(remainder)qfrac qfrac . .1 0 1 1 (quotient)1 0 1 1 (quotient)

For Fractional, or Real, Numbers, the For Fractional, or Real, Numbers, the procedure is exactly the same as for procedure is exactly the same as for integer numbers.integer numbers.

The only difference is that the remainder, The only difference is that the remainder, which is left in the higher part of the which is left in the higher part of the shifted dividend, has to be transferred shifted dividend, has to be transferred to the lower part of it to be correct.to the lower part of it to be correct.

Them main problem with this method is Them main problem with this method is that it requires a comparison (can be that it requires a comparison (can be done by subtraction) operation on each done by subtraction) operation on each step. This implies in more clock cycles step. This implies in more clock cycles than necessary.than necessary.

The next slide shows NonRestoring The next slide shows NonRestoring Division, which is simpler to Division, which is simpler to implement, either in software or in implement, either in software or in Hardware.Hardware.

45

Nonrestoring Unsigned DivisionNonrestoring Unsigned Division

==================================================z z 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 = = (117)(117)10 No overflow since in higher No overflow since in higher

part:part:224dd 0 1 0 1 00 1 0 1 0 = = (10)(10)10 (0111)(0111)two < (1010) < (1010)two

-2-24dd 1 0 1 1 01 0 1 1 0==================================================ss(0) 0 0 1 1 1 0 1 0 10 0 1 1 1 0 1 0 12s2s(0) 0 1 1 1 0 1 0 1 Positive,+(-24d) 1 0 1 1 0 so subtract--------------------------------------------ss(1) 0 0 1 0 0 1 0 10 0 1 0 0 1 0 12s2s(1) 0 1 0 0 1 0 1 Positive, so set q3=1+(-24d) 1 0 1 1 0 and subtract--------------------------------------------ss(2) 1 1 1 1 1 0 11 1 1 1 1 0 12s2s(2) 1 1 1 1 0 1 Negative, so set q2=0+24d 0 1 0 1 0 and add--------------------------------------------ss(3) 0 1 0 0 0 10 1 0 0 0 12s2s(3) 1 0 0 0 1 Positive, so set q1=1+(-24d) 1 0 1 1 0 and subtract--------------------------------------------ss(4) 0 0 1 1 10 0 1 1 1 Positive, so set qPositive, so set q0=1=1ss 0 1 1 10 1 1 1 = 7 (remainder)= 7 (remainder)q q 1 0 1 11 0 1 1 = 11 (quotient)= 11 (quotient)

z = Dividend

s = Remainder

d = Divisor

The big Advantage of this Method is that it is easy to test and decide if we have to add or subtract the quotient on each iteration. This means a simple implementation.

46

Programmed Division Using Left Shifts – Pseudo ASMProgrammed Division Using Left Shifts – Pseudo ASM

Using left shifts, divide Using left shifts, divide unsigned 2k-bit unsigned 2k-bit dividenddividend, z_high | z_low, storing the k-, z_high | z_low, storing the k-bit quotient and remainder.bit quotient and remainder.

Registers: Registers: R0 holds 0 Rc for CounterR0 holds 0 Rc for Counter

Rd for divisor Rs for z_high & Rd for divisor Rs for z_high & remrem

Rq for z_low & quotient }Rq for z_low & quotient }

{Load operands into regs Rd, Rs and Rq }{Load operands into regs Rd, Rs and Rq }

div:div: loadload Rd with divisorRd with divisor

loadload Rs with z_highRs with z_high

loadload Rq with z_lowRq with z_low

{Check for exceptions }{Check for exceptions }

branchbranch d_by_0 if Rd=R0d_by_0 if Rd=R0

branch d_ovfl if Rs > Rdbranch d_ovfl if Rs > Rd

{Initialize Counter}{Initialize Counter}

load load k into Rck into Rc

{Begin division loop}{Begin division loop}

d_loop: shift Rq left 1 {zero to LSB, MSB to cy}d_loop: shift Rq left 1 {zero to LSB, MSB to cy}

rotate Rs left 1 {cy to LSB, MSB to cy}rotate Rs left 1 {cy to LSB, MSB to cy}

skip if carry=1skip if carry=1

branch no_sub if Rs < Rdbranch no_sub if Rs < Rd

sub Rd from Rs {2´s compl. Subtract}sub Rd from Rs {2´s compl. Subtract}

incr Rq {set quotient digit to1}incr Rq {set quotient digit to1}

No_sub: decr Rc {decrement counter by 1}No_sub: decr Rc {decrement counter by 1}

branch d_loop if Rc branch d_loop if Rc 0 0

{Store the quotient and remainder }{Store the quotient and remainder }

store Rq into quotientstore Rq into quotient

store Rs into remainderstore Rs into remainder

d_by_0: - - - - -d_by_0: - - - - -

d_ovfl: - - - - - d_ovfl: - - - - -

d_done: - - - - -d_done: - - - - -

Even though it is an unsigned division, a 2’s complement subtraction instruction is required. Ignoring operand load and result store instructions, the function of a divide instruction is accomplished by executing between 6k+3 and 8k+3 machine instructions. For a 16-bit divisor this means well over 100 instructions on average.

Rd(divisor) 000 . . . 000

Rs(p.rem) Rq(rem/quot)

47

Division Algorithm - 1 Division Algorithm - 1 (http://www.sxlist.com/techref/microchip/math/div/24by16.htm )(http://www.sxlist.com/techref/microchip/math/div/24by16.htm )

48

Division Algorithm – 2Division Algorithm – 2 http://www.convict.lu/Jeunes/Math/Fast_operations2.htmhttp://www.convict.lu/Jeunes/Math/Fast_operations2.htm

Fast divisionFast division for PICsfor PICsIf you went through our If you went through our fast multiplyingfast multiplying, now try the fast division if you dare., now try the fast division if you dare.The algorithm that has been applied here belongs to the The algorithm that has been applied here belongs to the CORDICCORDIC family. Also have a look at our family. Also have a look at our CORDIC CORDIC squaresquare--root functionroot function..Normally division-algorithms follow the way, children are tought to operate. Let's take an example:Normally division-algorithms follow the way, children are tought to operate. Let's take an example:

16546 is the numerator, 27 the divisor16546 is the numerator, 27 the divisor : start with the left-most digit:: start with the left-most digit: if 1 < 27 then add the second digitif 1 < 27 then add the second digit if 16 < 27 then add the third digitif 16 < 27 then add the third digit 165 > 27, so integer-divide 165 div 27 = 6165 > 27, so integer-divide 165 div 27 = 6 get the remainder, which is 165 - 6 * 27 = 3get the remainder, which is 165 - 6 * 27 = 3 now restart at now restart at with the remainderwith the remainder

With RISC-technology, at assembler level, the tests are operated with substractions, checking whether the results are negative, zero or positive. The integer-division is done by successive substractions until the result is negative. A counter then indicates how often substractions were made.With RISC-technology, at assembler level, the tests are operated with substractions, checking whether the results are negative, zero or positive. The integer-division is done by successive substractions until the result is negative. A counter then indicates how often substractions were made.As already pointed out, CORDIC has a very different approach to mathematical operations. The incredible speed of the algorithms are the result from a divide and conquer approach. Practically let's have see how our CORDIC division works:As already pointed out, CORDIC has a very different approach to mathematical operations. The incredible speed of the algorithms are the result from a divide and conquer approach. Practically let's have see how our CORDIC division works:

Suppose you want to integer-divide Suppose you want to integer-divide 878710 10 = 1010111= 1010111

22 through through 661010 = 110 = 11022..

numerator 0numerator 011010111 base_index := 00000001 = 1010111 base_index := 00000001 = 1divisor 00000110divisor 00000110 result:=0result:=0

rotate divisor and base_index until the most significant bits of numerator and divisor are equal:rotate divisor and base_index until the most significant bits of numerator and divisor are equal:

00001100 00000010 = 200001100 00000010 = 2 00011000 00000100 = 400011000 00000100 = 4 00110000 00001000 = 800110000 00001000 = 8 0011100000 00010000 = 16100000 00010000 = 16

now substract both numerator and altered divisor:now substract both numerator and altered divisor:

01010111-01010111- 0110000001100000 -------------------- < 0< 0

if negative -which is the case here- rotate back divisor and base_index one digit to the right:if negative -which is the case here- rotate back divisor and base_index one digit to the right:

00110000 00001000 = 800110000 00001000 = 8 substract again rotated divisor from numerator:substract again rotated divisor from numerator:

01010111-01010111- 0011000000110000 -------------------- 00100111, positive remainder00100111, positive remainder

now replace the divisor by the remainder:now replace the divisor by the remainder:

new numerator:= 00100111new numerator:= 00100111 this time add the base_index to result:this time add the base_index to result:

result:= result(0) + 8 = 8result:= result(0) + 8 = 8 now rotate to the right divisor and base_index one digit:now rotate to the right divisor and base_index one digit:

00011000 00000100 = 400011000 00000100 = 4 substract again:substract again:

00100111-00100111- 0001100000011000 -------------------- 00001111, remainder positive, so00001111, remainder positive, so

new numerator:=00001111new numerator:=00001111 result:=result + base_index = 8+4 = 12result:=result + base_index = 8+4 = 12 rotate:rotate:

00001100 00000010 = 200001100 00000010 = 2 substract :substract :

00001111-00001111- 0000110000001100 -------------------- 00000011, remainder positive, so00000011, remainder positive, so

new numerator:=00000011new numerator:=00000011 result:=result + base_index = 12+2 = result:=result + base_index = 12+2 = 1414 rotate:rotate:

00000110 00000001 = 100000110 00000001 = 1 substract :substract :

00000011-00000011- 0000011000000110 -------------------- < 0, so do nothing< 0, so do nothing

stopstop

Here PIC 16F84 and 628 code:Here PIC 16F84 and 628 code:DIVV8DIVV8 MOVF TEMPY8,F BTFSC STATUS,Z ;SKIP IF NON-ZERO RETURN CLRF RESULT8 MOVLW 1 MOVWF IDX16 SHIFT_IT8 BCF STATUS,C RLF IDX16,F BCF STATUS,C RLF TEMPY8,F BTFSS TEMPY8,7 GOTO SHIFT_IT8DIVU8LOOP MOVF TEMPY8,W SUBWF TEMPX8 BTFSC STATUS,C GOTO MOVF TEMPY8,F BTFSC STATUS,Z ;SKIP IF NON-ZERO RETURN CLRF RESULT8 MOVLW 1 MOVWF IDX16 SHIFT_IT8 BCF STATUS,C RLF IDX16,F BCF STATUS,C RLF TEMPY8,F BTFSS TEMPY8,7 GOTO SHIFT_IT8DIVU8LOOP MOVF TEMPY8,W SUBWF TEMPX8 BTFSC STATUS,C GOTO

COUNT8 ADDWF TEMPX8 GOTO FINAL8 COUNT8 MOVF IDX16,W ADDWF RESULT8 FINAL8 BCF STATUS,C RRF TEMPY8,F BCF STATUS,C RRF IDX16,F BTFSS STATUS,C GOTO DIVU8LOOP RETURNCOUNT8 ADDWF TEMPX8 GOTO FINAL8 COUNT8 MOVF IDX16,W ADDWF RESULT8 FINAL8 BCF STATUS,C RRF TEMPY8,F BCF STATUS,C RRF IDX16,F BTFSS STATUS,C GOTO DIVU8LOOP RETURN SUB16 MOVF TEMPY16_H,W MOVWF TEMPYY MOVF TEMPY16,W SUBWF TEMPX16 BTFSS STATUS,C INCF TEMPYY,F MOVF TEMPYY,W SUBWF TEMPX16_H RETURNADD16BIS MOVF TEMPY16,W ADDWF TEMPX16 BTFSC STATUS,C INCF TEMPX16_H,F MOVF TEMPY16_H,W ADDWF TEMPX16_H SUB16 MOVF TEMPY16_H,W MOVWF TEMPYY MOVF TEMPY16,W SUBWF TEMPX16 BTFSS STATUS,C INCF TEMPYY,F MOVF TEMPYY,W SUBWF TEMPX16_H RETURNADD16BIS MOVF TEMPY16,W ADDWF TEMPX16 BTFSC STATUS,C INCF TEMPX16_H,F MOVF TEMPY16_H,W ADDWF TEMPX16_H

RETURNRETURNDIVV16DIVV16 MOVF TEMPY16,F BTFSS STATUS,Z GOTO ZERO_TEST_SKIPPED MOVF TEMPY16_H,F BTFSC STATUS,Z RETURNZERO_TEST_SKIPPED MOVLW 1 MOVWF IDX16 CLRF IDX16_H CLRF RESULT16 CLRF RESULT16_HSHIFT_IT16 BCF STATUS,C RLF IDX16,F RLF IDX16_H,F BCF MOVF TEMPY16,F BTFSS STATUS,Z GOTO ZERO_TEST_SKIPPED MOVF TEMPY16_H,F BTFSC STATUS,Z RETURNZERO_TEST_SKIPPED MOVLW 1 MOVWF IDX16 CLRF IDX16_H CLRF RESULT16 CLRF RESULT16_HSHIFT_IT16 BCF STATUS,C RLF IDX16,F RLF IDX16_H,F BCF STATUS,C RLF TEMPY16,F RLF TEMPY16_H,F BTFSS TEMPY16_H,7 GOTO SHIFT_IT16DIVU16LOOP CALL SUB16 BTFSC STATUS,C GOTO COUNTX CALL ADD16BIS GOTO FINALX COUNTX MOVF IDX16,W ADDWF RESULT16 BTFSC STATUS,C INCF RESULT16_H,F MOVF IDX16_H,W ADDWF STATUS,C RLF TEMPY16,F RLF TEMPY16_H,F BTFSS TEMPY16_H,7 GOTO SHIFT_IT16DIVU16LOOP CALL SUB16 BTFSC STATUS,C GOTO COUNTX CALL ADD16BIS GOTO FINALX COUNTX MOVF IDX16,W ADDWF RESULT16 BTFSC STATUS,C INCF RESULT16_H,F MOVF IDX16_H,W ADDWF RESULT16_H FINALX BCF STATUS,C RRF TEMPY16_H,F RRF TEMPY16,F BCF STATUS,C RRF IDX16_H,F RRF IDX16,F BTFSS STATUS,C GOTO DIVU16LOOP RETURNRESULT16_H FINALX BCF STATUS,C RRF TEMPY16_H,F RRF TEMPY16,F BCF STATUS,C RRF IDX16_H,F RRF IDX16,F BTFSS STATUS,C GOTO DIVU16LOOP RETURN ... somewhere in the code CALL DIVV16... somewhere in the code CALL DIVV16

Note that these programs work only for unsigned variables. Worst case for DIVV8 is about 144 cycles, which at 20 MHz is about 30 microseconds. The interest of this algorithm appears more clearly, if larger variables should be used.Note that these programs work only for unsigned variables. Worst case for DIVV8 is about 144 cycles, which at 20 MHz is about 30 microseconds. The interest of this algorithm appears more clearly, if larger variables should be used.

49

CORDIC – Square Root CORDIC – Square Root http://www.convict.lu/Jeunes/Math/square_root_CORDIC.htmhttp://www.convict.lu/Jeunes/Math/square_root_CORDIC.htm

Square-root based on CORDICSquare-root based on CORDICWe explained the CORDIC basics for We explained the CORDIC basics for trigtrig--functionsfunctions earlier. The solution of exercise 2 of that page will be shown here. But some preliminary explanations. earlier. The solution of exercise 2 of that page will be shown here. But some preliminary explanations.Perhaps you know the following card-game:Perhaps you know the following card-game:You tell a candidate to select and remind a number from 1 to 31. Then you show him the following five cards one by one. He must answer the question whether the number is yes or no written on that card. By miracle you can tell him the number he chose. The card-You tell a candidate to select and remind a number from 1 to 31. Then you show him the following five cards one by one. He must answer the question whether the number is yes or no written on that card. By miracle you can tell him the number he chose. The card-

order is irrelevant.order is irrelevant.The trick is to mentally add the first number of each card where he answered YES.The trick is to mentally add the first number of each card where he answered YES.Let's take an example: the candidate chooses Let's take an example: the candidate chooses 2323 card 1: Yes, so mind 1card 1: Yes, so mind 1 card 2: Yes, so add 2 -->3card 2: Yes, so add 2 -->3 card 3: Yes, add 4 -->7card 3: Yes, add 4 -->7 card 4: No, do nothingcard 4: No, do nothing card 5: Yes, add 16 -->card 5: Yes, add 16 -->2323

How does this game work?How does this game work?By answering yes or no, the candidate is simply converting the decimal number 23 in a binary number:By answering yes or no, the candidate is simply converting the decimal number 23 in a binary number:

23231010 = 11101 = 111012 2 = [Yes, Yes, Yes, No, Yes], where Yes=1 and No=0= [Yes, Yes, Yes, No, Yes], where Yes=1 and No=0

Each card shows all the numbers with the same binary-digit set to 1.Each card shows all the numbers with the same binary-digit set to 1.The quiz-master computes the reconversion to decimal by calculating the base-polynomial:The quiz-master computes the reconversion to decimal by calculating the base-polynomial:

1 x 21 x 244 + 0 x 2 + 0 x 233 + 1 x 2 + 1 x 222 + 1 x 2 + 1 x 211 + 1 x 2 + 1 x 200

= 1 x 16 + 0 x 8 + 1 x 4 + 1 x 2 + 1 x 1 = 23= 1 x 16 + 0 x 8 + 1 x 4 + 1 x 2 + 1 x 1 = 23In fact CORDIC-algorithms are based on this sort of computing. The interest is of course the proximity of the binary-system to computer-systems. Multiplying by 2 is equivalent of shifting the binary number 1 digit to the left. Dividing by 2 is the same as rotating 1 digit to In fact CORDIC-algorithms are based on this sort of computing. The interest is of course the proximity of the binary-system to computer-systems. Multiplying by 2 is equivalent of shifting the binary number 1 digit to the left. Dividing by 2 is the same as rotating 1 digit to

the right:the right:

111011110122 x 2 x 21010 = 111010 = 11101022

111011110122 DIV 2 DIV 21010 = 1110.1 = 1110.122

These shift-operations are extremely quick.These shift-operations are extremely quick.NOTE: in only one of our examples this shift-trick is used, for only few higher computer-languages allow access to these low-level functions. But CORDIC has another speed-advantage which comes from the exponential approach. Multiplying and dividing may be NOTE: in only one of our examples this shift-trick is used, for only few higher computer-languages allow access to these low-level functions. But CORDIC has another speed-advantage which comes from the exponential approach. Multiplying and dividing may be

reduced to additions and substractions.reduced to additions and substractions.To compute a square-root with CORDIC the number is yielded by multiplying, adding and testing. To compute a square-root with CORDIC the number is yielded by multiplying, adding and testing.

LL2^L2^L

yyx=x=

1205612056 00initial valueinitial value

71280128 x 128 71280128 x 128 >> 12056do nothing6646464 x 64 12056do nothing6646464 x 64 << 12056add 64 to y 12056add 64 to yinitialinitial --> 6453296(64 + 32) --> 6453296(64 + 32)22 << 12056add 32 to last y --> 9641696(96 + 16) 12056add 32 to last y --> 9641696(96 + 16)22 >> 12056 12056

do nothingdo nothing38104(96 + 8)38104(96 + 8)22 << 12056add 8 to last y --> 10424108(104 + 4) 12056add 8 to last y --> 10424108(104 + 4)22 << 12056add 4 to last y --> 10812108(108 + 2) 12056add 4 to last y --> 10812108(108 + 2)22 >> 12056 12056

do nothingdo nothing01109(108 + 1)01109(108 + 1)22 << 12056add 1 to last y --> 109-10.5a.s.o.and so on 12056add 1 to last y --> 109-10.5a.s.o.and so on

and so onand so onHere a C-routine for integer-square-rooting for numbers between 0 and 65536:Here a C-routine for integer-square-rooting for numbers between 0 and 65536:

int sqrt (int x) int sqrt (int x) { { int base, i, y ; int base, i, y ; base = 128 ; base = 128 ; y = 0 ; y = 0 ; for (i = 1; i <= 8; i++) for (i = 1; i <= 8; i++) { { y + = base ; y + = base ; if ( (y * y) > x ) if ( (y * y) > x ) { { y y --= base ; // base should not have been added, so we substract again = base ; // base should not have been added, so we substract again } } base >> 1 ; // shift 1 digit to the right = divide by 2 base >> 1 ; // shift 1 digit to the right = divide by 2 } } return y ; return y ; } }

Here a Robolab-version: (you may use our Here a Robolab-version: (you may use our texttext--based modifiersbased modifiers or use the variable numbers of your choice) or use the variable numbers of your choice)

Documents

1 Introduction to Integer Arithmetic. 2 Suggested Reading Computer Arithmetic – Behrooz Parhami – Oxford Press, pages 211-224 (Basic Division Schemes);