Upload
alexandra-reed
View
213
Download
0
Embed Size (px)
Citation preview
1
Floating Point
Representation and Arithmetic
(see Patterson Chapter 4)
2
Outline
• Review of floating point scientific notation• Floating point binary• IEEE Floating Point Standard• Addition in Floating Point• Remarks about multiplication
3
Floating Point Notation
• Decimal• 12.4568ten (decimal notation) means
• 10*1 + 2 + 4/10 + 5/100 + 6/1000 + 8/10000
• In scientific notation• 12.4568 =
• 124568 * 10-4 = 1245680 * 10-5 =• 12456.8 * 10-3 = 1245.68 * 10-2 =• 124.568 * 10-1 =12.4568 * 100
• 1.24568 * 101
• 1.24568*101 is an example of normalised scientific notation.
4
Floating Point in Binary
• Binary• 0.010011two =
(0/2) + (1/22) + (0/24) +(1/25) + (1/26)• 0 + 1/4 + 0 + 1/32 + 1/64 =• (0.25 + 0.03125 + 0.015625)ten =• 0.296875ten
• In scientific notation• 10011*2-6 = 1001.1*2-5 == 100.11*2-4 = 1.0011*2-2 normalised
5
Normalised Notation
• In normalised binary scientific notation• unless the number is 0
• always have 1.sssssss...sss * 2E
• sss...sss is the significand• E is the exponent
• The significand s1s2...sn represents
sii1
n / 2i
6
Representation
• Note that it is impossible to exactly represent all decimal numbers in this way (eg 0.3)
• Problem of representation of floating point numbers in fixed word length• need to represent
• sign• significand• exponent
• in one word (32 bits).
7
Representation
• Represents floating point number:• (-1)S * (1.0+F) * 2E
• S is 1 bit (if S=1 then negative)• F is 23 bits• E is 8 bits
31 30 23 022sign bitS
exponent8 bits E
significand: 23 bits F
8
Squeezing out More from the Bits
• Since every non-zero binary f.p. number (normalised) is of the form:• 1.sss...sss *2E
• We do not have to represent explicitly the 1 in the word, and can therefore interpret the bit-pattern as:• (-1)S (1 + significand) * 2E
• thus ‘reclaiming’ an extra bit!• E= 0000 0000 is reserved for zero.
9
Requirements• As far as possible the ALU should be able
to reuse integer machinery in implementation of f.p.
• Eg, comparison with zero• easy because of sign bit
• fp numbers can be easily classified as negative, zero or positive without additional hardware.
• Comparison of two fp numbers x<y not so straightforward -• how are negative exponents to be formed?
10
Bad Example: (1/2) > 2 ???
• Representation of 1/2 is• 0.1two = 1.0*2-1 (normalised)
0 1111 1111 0000.... 0000
S E significand
Representation of 2 is» 10two = 1.0*21 (normalised)
0 0000 0001 0000.... 0000
S E significand
11
Representation of Exponent
• Inappropriate to use two’s complement for the exponent• Ideally want 0000 0000 to represent most negative number, 1111 1111 most positive.• Number range:
1111 11111111 1110.......0111 11110111 1110...
0000 0000
use this for 20
positive
negative0111 1111 = 127ten
12
Biased Representation(IEEE FP Standard)
• The ‘bias’ 127 represents 0• 128 to 255 represent positive exponents• 1 to 127 represent negative exponents
• (remember 0 is reserved for the entire number being zero).
• The actual exponent is therefore:• E - bias
• (-1)S * (1 + significand) * 2E-bias
13
Example 1
• Represent 0.3125ten = 5/16• 5/16 = 1/4 + 1/16 = 0.0101two = 1.01*2-2
• S = 0• E = ???
• -2 = E-bias = E-127• E = 125ten = 0111 1101two
• Significand = 010.…000
• 0 0111 1101 010000...000
14
Example 2
• What does • 0 0111 1101 010000...000• represent?
• S = 0• E = 0111 1101 = 125ten
• Exponent = E-bias = 125-127 = -2
• Significand = 1/4• (-1)S(1+sig.)2E-bias = (1 + 1/4)*(1/4) = 5/16
15
Addition of FP Numbers
• Given two numbers:• normalise them both• adjust the floating point of the smaller number to
match the larger one• Add them together• renormalise• check for underflow/overflow of exponent
• if so then break;
• round significand to required number of bits• might need renormalisation (eg, 11111 round to 4 bits).
16
Addition Example
• 0.5 + 2.75 = 3.25
• 0.1two + 10.11two
• 1.0*2-1 + 1.011*21
• 0.010*21 + 1.011*21
• 1.101*21 (already normalised)
• (1 + (1/2) + (1/8)) * 2
• 3.25
17
Remarks
• The IEEE FP standard represents floats in 32 bits, higher precision represented across two words (doubles).
• Multiplication is relatively easy, since the exponents add, and the significands can be done with integer multiplication.
• There can be huge pitfalls in reliably transferring floating point code to different hardware!
18
Summary• FP scientific notation• normalised representation in binary• Bias to represent -ve to +ve range in
exponent• Addition• Notice how a 32-bit binary string can
represent many different entities in memory.• Memory architectures NEXT.