1 Floating Point Representation and Arithmetic (see Patterson Chapter 4)

1

Floating Point

Representation and Arithmetic

(see Patterson Chapter 4)

2

Outline

• Review of floating point scientific notation• Floating point binary• IEEE Floating Point Standard• Addition in Floating Point• Remarks about multiplication

3

Floating Point Notation

• Decimal• 12.4568ten (decimal notation) means

• 10*1 + 2 + 4/10 + 5/100 + 6/1000 + 8/10000

• In scientific notation• 12.4568 =

• 124568 * 10-4 = 1245680 * 10-5 =• 12456.8 * 10-3 = 1245.68 * 10-2 =• 124.568 * 10-1 =12.4568 * 100

• 1.24568 * 101

• 1.24568*101 is an example of normalised scientific notation.

4

Floating Point in Binary

• Binary• 0.010011two =

(0/2) + (1/22) + (0/24) +(1/25) + (1/26)• 0 + 1/4 + 0 + 1/32 + 1/64 =• (0.25 + 0.03125 + 0.015625)ten =• 0.296875ten

• In scientific notation• 10011*2-6 = 1001.1*2-5 == 100.11*2-4 = 1.0011*2-2 normalised

5

Normalised Notation

• In normalised binary scientific notation• unless the number is 0

• always have 1.sssssss...sss * 2E

• sss...sss is the significand• E is the exponent

• The significand s1s2...sn represents

sii1

n / 2i

6

Representation

• Note that it is impossible to exactly represent all decimal numbers in this way (eg 0.3)

• Problem of representation of floating point numbers in fixed word length• need to represent

• sign• significand• exponent

• in one word (32 bits).

7

Representation

• Represents floating point number:• (-1)S * (1.0+F) * 2E

• S is 1 bit (if S=1 then negative)• F is 23 bits• E is 8 bits

31 30 23 022sign bitS

exponent8 bits E

significand: 23 bits F

8

Squeezing out More from the Bits

• Since every non-zero binary f.p. number (normalised) is of the form:• 1.sss...sss *2E

• We do not have to represent explicitly the 1 in the word, and can therefore interpret the bit-pattern as:• (-1)S (1 + significand) * 2E

• thus ‘reclaiming’ an extra bit!• E= 0000 0000 is reserved for zero.

9

Requirements• As far as possible the ALU should be able

to reuse integer machinery in implementation of f.p.

• Eg, comparison with zero• easy because of sign bit

• fp numbers can be easily classified as negative, zero or positive without additional hardware.

• Comparison of two fp numbers x<y not so straightforward -• how are negative exponents to be formed?

10

Bad Example: (1/2) > 2 ???

• Representation of 1/2 is• 0.1two = 1.0*2-1 (normalised)

0 1111 1111 0000.... 0000

S E significand

Representation of 2 is» 10two = 1.0*21 (normalised)

0 0000 0001 0000.... 0000

S E significand

11

Representation of Exponent

• Inappropriate to use two’s complement for the exponent• Ideally want 0000 0000 to represent most negative number, 1111 1111 most positive.• Number range:

1111 11111111 1110.......0111 11110111 1110...

0000 0000

use this for 20

positive

negative0111 1111 = 127ten

12

Biased Representation(IEEE FP Standard)

• The ‘bias’ 127 represents 0• 128 to 255 represent positive exponents• 1 to 127 represent negative exponents

• (remember 0 is reserved for the entire number being zero).

• The actual exponent is therefore:• E - bias

• (-1)S * (1 + significand) * 2E-bias

13

Example 1

• Represent 0.3125ten = 5/16• 5/16 = 1/4 + 1/16 = 0.0101two = 1.01*2-2

• S = 0• E = ???

• -2 = E-bias = E-127• E = 125ten = 0111 1101two

• Significand = 010.…000

• 0 0111 1101 010000...000

14

Example 2

• What does • 0 0111 1101 010000...000• represent?

• S = 0• E = 0111 1101 = 125ten

• Exponent = E-bias = 125-127 = -2

• Significand = 1/4• (-1)S(1+sig.)2E-bias = (1 + 1/4)*(1/4) = 5/16

15

Addition of FP Numbers

• Given two numbers:• normalise them both• adjust the floating point of the smaller number to

match the larger one• Add them together• renormalise• check for underflow/overflow of exponent

• if so then break;

• round significand to required number of bits• might need renormalisation (eg, 11111 round to 4 bits).

16

Addition Example

• 0.5 + 2.75 = 3.25

• 0.1two + 10.11two

• 1.0*2-1 + 1.011*21

• 0.010*21 + 1.011*21

• 1.101*21 (already normalised)

• (1 + (1/2) + (1/8)) * 2

• 3.25

17

Remarks

• The IEEE FP standard represents floats in 32 bits, higher precision represented across two words (doubles).

• Multiplication is relatively easy, since the exponents add, and the significands can be done with integer multiplication.

• There can be huge pitfalls in reliably transferring floating point code to different hardware!

18

Summary• FP scientific notation• normalised representation in binary• Bias to represent -ve to +ve range in

exponent• Addition• Notice how a 32-bit binary string can

represent many different entities in memory.• Memory architectures NEXT.

Documents

1 Floating Point Representation and Arithmetic (see Patterson Chapter 4)