15
Floating Point Floating Point Representations Representations CDA 3101 CDA 3101 Discussion Session 02 Discussion Session 02

Floating Point Representations

  • Upload
    hidi

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Floating Point Representations. CDA 3101 Discussion Session 0 2. Question 1. Converting the binary number 1010 0100 1001 0010 0100 1001 0010 0100 2 to decimal, if the binary is Unsigned? 2 ’ s complement? Single precision floating-point?. Question 1 .1. - PowerPoint PPT Presentation

Citation preview

Page 1: Floating Point Representations

Floating Point Floating Point RepresentationsRepresentations

CDA 3101 CDA 3101

Discussion Session 02Discussion Session 02

Page 2: Floating Point Representations

Question 1Question 1• Converting the binary number1010 0100 1001 0010 0100 1001 0010 01002

to decimal, if the binary is

Unsigned? 2’s complement? Single precision floating-point?

Page 3: Floating Point Representations

Question 1.1Question 1.1• Converting bin (unsigned) to dec 1010 0100 1001 0010 0100 1001 0010 01002

1*231 + 1*229 + … + 1*28 + 1*25 + 1*22

= 2761050404

Page 4: Floating Point Representations

Question 1.2Question 1.2• Converting bin (2’s complement) to dec 1010 0100 1001 0010 0100 1001 0010 01002

-1*231 + 1*229 + … + 1*28 + 1*25 + 1*22

= -1533916892

Page 5: Floating Point Representations

Question 1.3Question 1.3• Converting bin (Single precision FP) to dec

1010 0100 1001 0010 0100 1001 0010 01002

Sign bit : 1

Exponent : 01001001 = 73

Fraction : 00100100100100100100100 =1*2-3 + 1*2-6 + … + 1*2-15 + 1*2-18 + 1*2-21

=0.142857074

(-1)S * (1.Fraction) * 2(Exponent - 127)

=(-1)1 * (1.142857074) * 2(73 - 127)

=-1.142857074 * 2-54

=-6.344131187 * 10-17

S(1) Biased Exponent(8) Fraction (23)

Page 6: Floating Point Representations

Question 2Question 2• Show the IEEE 754 binary representation

for the floating-point number 0.110 in single precision and double precision

Page 7: Floating Point Representations

Question 2.1Question 2.1• Converting 0.110 to single-precision FP

Step1: Covert fraction 0.1 to binary (multiplying by 2)0.1*2 = 0.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, … 000110011…

1.10011… * 2-4

Step2: Express in single precision format(-1)S * (1.Fraction) * 2(Exponent +127)

=(-1)0 * (1.10011001100110011001100) * 2(-4+127)

0 01111011 10011001100110011001100

Page 8: Floating Point Representations

Question 2.2Question 2.2• Converting 0.110 to double-precision FP

Step1: Covert fraction 0.1 to binary (multiplying by 2)0.1*2 = 0.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, … 000110011…

1.10011… * 2-4

Step2: Express in double precision format(-1)S * (1.Fraction) * 2(Exponent +1023)

=(-1)0 * (1.1001100110011001100110) * 2(-4+1023)

0 01111111011 1001100110011001100110011001100110011001100110011001

Page 9: Floating Point Representations

Question 3Question 3• Convert the following single-precision

numbers into decimala. 0 11111111 0000000000000000000000b. 0 00000000 0000000000000000000010

Page 10: Floating Point Representations

Question 3.1Question 3.1• Converting bin (Single precision FP) to dec 0 11111111 000000000000000000000002

Sign bit : 0 Exponent : 11111111 = Infinity Fraction : 00000000000000000000000 = 0

Infinity

S(1) Biased Exponent(8) Fraction (23)

Page 11: Floating Point Representations

Question 3.2Question 3.2• Converting bin (Single precision FP) to dec 0 00000000 000000000000000000000102

Sign bit : 0 Exponent : 00000000 = 0 Fraction : 00000000000000000000010 =1*2-22

=0.000000238

(-1)S * (0.Fraction) * 2-126

=(-1)0 * (0.000000238) * 2-126

= 2.797676555 * 10-45

S(1) Biased Exponent(8) Fraction (23)

Page 12: Floating Point Representations

Question 4Question 4• Consider the 80-bit extended-precision IEEE

754 floating point standard that uses 1 bit for the sign, 16 bits for the biased exponent and 63 bits for the fraction (f). Then, write (i) the 80- bit extended-precision floating point representation in binary and (ii) the corresponding value in base-10 positional (decimal) system of

a. the third smallest positive normalized numberb. the largest (farthest from zero) negative

normalized number c. the third smallest positive denormalized

number that can be represented.

Page 13: Floating Point Representations

Question 4.1Question 4.1

• The third smallest positive normalized numberBias: 215-1 = 32767

Sign: 0Biased Exponent: 0000 0000 0000 0001Fraction (f): 61 zeros followed by 10Decimal Value: (-1)0*2(1-32767)*(1+2-62) = 2-32766+2-32828

Page 14: Floating Point Representations

Question 4.2Question 4.2• The largest (farthest from zero)

negative normalized number Sign: 1Biased Exponent: 1111 1111 1111 1110Fraction: 63 onesDecimal Value: (-1)1*2(65534-32767)*(1+2-1+2-2+…+2-63) = -232767(264-1)2-63 = -232768 (approx.)

Page 15: Floating Point Representations

Question 4.3Question 4.3• The third smallest positive

denormalized number Sign: 0Biased Exponent: 0000 0000 0000 0000Fraction: 61 zeros followed by 11Decimal Value: (-1)0*2-32766*(2-62+2-63) = 3*2-32829