68
CSC 2400 Computer Systems I Lecture 2 Representations

CSC 2400 Computer Systems I Lecture 2 Representations

Embed Size (px)

Citation preview

CSC 2400Computer Systems I

Lecture 2

Representations

The Machine

3

Why Don’t Computers Use Base 10? Base 10 Number Representation

– That’s why fingers are known as “digits”– Natural representation for financial transactions

Floating point number cannot exactly represent $1.20

– Even carries through in scientific notation 1.5213 X 104

Implementing Electronically– Hard to store

ENIAC (First electronic computer) used 10 vacuum tubes / digit

– Hard to transmit Need high precision to encode 10 signal levels on single wire

– Messy to implement digital logic functions Addition, multiplication, etc.

4

How do we represent data in a computer?

At the lowest level, a computer is an electronic machine.– works by controlling the flow of electrons

Easy to recognize two conditions:1. presence of a voltage – we’ll call this state “1”

2. absence of a voltage – we’ll call this state “0”

Could base state on value of voltage, but control and detection circuits more complex.

– compare turning on a light switch tomeasuring or regulating voltage

5

Computer is a binary digital system.

Basic unit of information is the binary digit, or bit. Values with more than two states require multiple bits.

– A collection of two bits has four possible states:00, 01, 10, 11

– A collection of three bits has eight possible states:

000, 001, 010, 011, 100, 101, 110, 111

– A collection of n bits has 2n possible states.

Binary (base two) system:• has two states: 0 and 1

Digital system:• finite number of symbols

6

What kinds of data do we need to represent?

– Numbers – signed, unsigned, integers, floating point,complex, rational, irrational, …

– Text – characters, strings, …– Images – pixels, colors, shapes, …– Sound– Logical – true, false– Instructions– …

Data type: – representation and operations within the computer

7

Machine Words Machine Has “Word Size”

– Nominal size of integer-valued data Including addresses

– Most current machines are 32 bits (4 bytes) Limits addresses to 4GB Becoming too small for memory-intensive applications

– High-end systems are 64 bits (8 bytes) Potentially address 1.8 X 1019 bytes

– Machines support multiple data formats Fractions or multiples of word size Always integral number of bytes

8

Word-Oriented Memory Organization

Addresses Specify Byte Locations

– Address of first byte in word– Addresses of successive words

differ by 4 (32-bit) or 8 (64-bit)

000000010002000300040005000600070008000900100011

32-bitWords

Bytes Addr.

0012001300140015

64-bitWords

Addr =??

Addr =??

Addr =??

Addr =??

Addr =??

Addr =??

0000

0004

0008

0012

0000

0008

9

Data Representations Sizes of C Objects (in Bytes)

– C Data Type Sparc/Unix Typical 32-bit Intel IA32 int 4 4

4 long int 8 4

4 char 1 1

1 short 2 2

2 float 4 4

4 double 8 8

8 long double 8 8

10/12 char * 8 4

4– Or any other “pointer”

10

Pointers and Arrays Pointer

– Address of a variable in memory– Allows us to indirectly access variables

in other words, we can talk about its addressrather than its value

Array– A list of values arranged sequentially in memory– Example: a list of telephone numbers

– Expression a[4] refers to the 5th element of the array a

11

Address vs. ValueSometimes we want to deal with the address

of a memory location,rather than the value it contains.

Adding a column of numbers.– R2 contains address of first location.– Read value, add to sum, and

increment R2 until all numbershave been processed.

R2 is a pointer -- it contains theaddress of data we’re interested in.

x3107x2819x0110x0310x0100x1110x11B1x0019

x3100

x3101

x3102

x3103

x3104

x3105

x3106

x3107

x3100R2

address

value

12

Byte Ordering How should bytes within multi-byte word be ordered in

memory? Conventions

– Sun’s, Mac’s are “Big Endian” machines Least significant byte has highest address

– Alphas, PC’s are “Little Endian” machines Least significant byte has lowest address

13

Byte Ordering Example Big Endian

– Least significant byte has highest address

Little Endian– Least significant byte has lowest address

Example– Variable x has 4-byte representation 0x01234567– Address given by &x is 0x100

0x100 0x101 0x102 0x103

01 23 45 67

0x100 0x101 0x102 0x103

67 45 23 01

Big Endian

Little Endian

01 23 45 67

67 45 23 01

14

Machine-Level Code Representation Encode Program as Sequence of Instructions

– Each simple operation Arithmetic operation Read or write memory Conditional branch

– Instructions encoded as bytes Alpha’s, Sun’s, Mac’s use 4 byte instructions

– Reduced Instruction Set Computer (RISC) PC’s use variable length instructions

– Complex Instruction Set Computer (CISC)

– Different instruction types and encodings for different machines Most code not binary compatible

Programs are Byte Sequences Too!

15Chapter 1

Big Idea: Information is Bits + Context Computer stores the bits You decide how to interpret them

Example (binary)

00000000 00000000 00000000 00001101

Decimal = 13 Float = 1.82169E-44

Integers

17

Unsigned Integers Non-positional notation

– could represent a number (“5”) with a string of ones (“11111”)– problems?

Weighted positional notation– like decimal numbers: “329”– “3” is worth 300, because of its position, while “9” is only worth 9

329102 101 100

10122 21 20

3x100 + 2x10 + 9x1 = 329 1x4 + 0x2 + 1x1 = 5

mostsignificant

leastsignificant

18

Unsigned Integers (cont.) An n-bit unsigned integer represents 2n values:

from 0 to 2n-1.22 21 20

0 0 0 0

0 0 1 1

0 1 0 2

0 1 1 3

1 0 0 4

1 0 1 5

1 1 0 6

1 1 1 7

19

Unsigned Binary Arithmetic Base-2 addition – just like base-10!

– add from right to left, propagating carry

10010 10010 1111+ 1001 + 1011 + 111011 11101 10000

10111+ 111

carry

20

Signed Integers With n bits, we have 2n distinct values.

– assign about half to positive integers (1 through 2n-1)and about half to negative (- 2n-1 through -1)

– that leaves two values: one for 0, and one extra

Positive integers– just like unsigned – zero in most significant bit

00101 = 5

Negative integers– sign-magnitude – set top bit to show negative,

other bits are the same as unsigned10101 = -5

– one’s complement – flip every bit to represent negative11010 = -5

– in either case, MS bit indicates sign: 0=positive, 1=negative

21

Two’s Complement Problems with sign-magnitude and 1’s complement

– two representations of zero (+0 and –0)– arithmetic circuits are complex

How to add two sign-magnitude numbers?– e.g., try 2 + (-3)

How to add to one’s complement numbers? – e.g., try 4 + (-3)

Two’s complement representation developed to makecircuits easy for arithmetic.

– for each positive number (X), assign value to its negative (-X),such that X + (-X) = 0 with “normal” addition, ignoring carry out

00101 (5) 01001 (9)

+ 11011 (-5) + (-9)

00000 (0) 00000 (0)

22

Two’s Complement Representation If number is positive or zero,

– normal binary representation, zeroes in upper bit(s)

If number is negative,– start with positive number– flip every bit (i.e., take the one’s complement)– then add one

00101 (5) 01001 (9)

11010 (1’s comp) (1’s comp)

+ 1 + 111011 (-5) (-9)

23

Two’s Complement Signed Integers MS bit is sign bit – it has weight –2n-1. Range of an n-bit number: -2n-1 through 2n-1 – 1.

– The most negative number (-2n-1) has no positive counterpart.

-23 22 21 20

0 0 0 0 0

0 0 0 1 1

0 0 1 0 2

0 0 1 1 3

0 1 0 0 4

0 1 0 1 5

0 1 1 0 6

0 1 1 1 7

-23 22 21 20

1 0 0 0 -8

1 0 0 1 -7

1 0 1 0 -6

1 0 1 1 -5

1 1 0 0 -4

1 1 0 1 -3

1 1 1 0 -2

1 1 1 1 -1

ASCII, etc.

25

Text: ASCII Characters ASCII: Maps 128 characters to 7-bit code. (type man ascii)

– both printable and non-printable (ESC, DEL, …) characters

00 nul10 dle20 sp 30 0 40 @ 50 P 60 ` 70 p01 soh11 dc121 ! 31 1 41 A 51 Q 61 a 71 q02 stx12 dc222 " 32 2 42 B 52 R 62 b 72 r03 etx13 dc323 # 33 3 43 C 53 S 63 c 73 s04 eot14 dc424 $ 34 4 44 D 54 T 64 d 74 t05 enq15 nak25 % 35 5 45 E 55 U 65 e 75 u06 ack16 syn26 & 36 6 46 F 56 V 66 f 76 v07 bel17 etb27 ' 37 7 47 G 57 W 67 g 77 w08 bs 18 can28 ( 38 8 48 H 58 X 68 h 78 x09 ht 19 em 29 ) 39 9 49 I 59 Y 69 i 79 y0a nl 1a sub2a * 3a : 4a J 5a Z 6a j 7a z0b vt 1b esc2b + 3b ; 4b K 5b [ 6b k 7b {0c np 1c fs 2c , 3c < 4c L 5c \ 6c l 7c |0d cr 1d gs 2d - 3d = 4d M 5d ] 6d m 7d }0e so 1e rs 2e . 3e > 4e N 5e ^ 6e n 7e ~0f si 1f us 2f / 3f ? 4f O 5f _ 6f o 7f del

26

Interesting Properties of ASCII Code What is relationship between a decimal digit ('0', '1', …)

and its ASCII code?

What is the difference between an upper-case letter ('A', 'B', …) and its lower-case equivalent ('a', 'b', …)?

Given two ASCII characters, how do we tell which comes first in alphabetical order?

Are 128 characters enough?(http://www.unicode.org/)

27

Other Data Types Text strings

– sequence of characters, terminated with NULL (0)– typically, no special hardware support

Image– array of pixels

monochrome: one bit (1/0 = black/white) color: red, green, blue (RGB) components (e.g., 8 bits each) other properties: transparency

– hardware support: typically none, in general-purpose processors MMX -- multiple 8-bit operations on 32-bit word

Sound– sequence of fixed-point numbers

28

Pointers in C C lets us talk about and manipulate “pointers”

(addresses) as variables and in expressions.

Declaration int *p; /* p is a pointer to an int */

A pointer in C is always a pointer to a particular data type:int*, double*, char*, etc.

Operators *p -- returns the value pointed to by p (“dereference”) &z -- returns the address of variable z (“address of”)

29

Example int i; int *ptr;

i = 4; ptr = &i; *ptr = *ptr + 1;

store the value 4 into the memory locationassociated with i

store the address of i into the memory location associated with ptr

read the contents of memoryat the address stored in ptr

store the result into memoryat the address stored in ptr

Converting Integer Representations

31

Converting Binary (2’s C) to Decimal

1. If leading bit is one, take two’s complement to get a positive number.

2. Add powers of 2 that have “1” in thecorresponding bit positions.

3. If original number was negative,add a minus sign.

n 2n

0 1

1 2

2 4

3 8

4 16

5 32

6 64

7 128

8 256

9 512

10 1024

X = 01101000two

= 26+25+23 = 64+32+8= 104ten

Assuming 8-bit 2’s complement numbers.

32

More Examples

n 2n

0 1

1 2

2 4

3 8

4 16

5 32

6 64

7 128

8 256

9 512

10 1024

Assuming 8-bit 2’s complement numbers.

X = 00100111two

= 25+22+21+20 = 32+4+2+1= 39ten

X = 11100110two

-X = 00011010= 24+23+21 = 16+8+2= 26ten

X = -26ten

33

Converting Decimal to Binary (2’s C) First Method: Division

1. Divide by two – remainder is least significant bit.

2. Keep dividing by two until answer is zero,writing remainders from right to left.

3. Append a zero as the MS bit;if original number negative, take two’s complement.

X = 104ten 104/2 = 52 r0 bit 0

52/2 = 26 r0 bit 126/2 = 13 r0 bit 213/2 = 6 r1 bit 3

6/2 = 3 r0 bit 43/2 = 1 r1 bit 5

X = 01101000two 1/2 = 0 r1 bit 6

34

Converting Decimal to Binary (2’s C) Second Method: Subtract Powers of Two

1. Change to positive decimal number.

2. Subtract largest power of two less than or equal to number.

3. Put a one in the corresponding bit position.

4. Keep subtracting until result is zero.

5. Append a zero as MS bit;if original was negative, take two’s complement.

X = 104ten 104 - 64 = 40 bit 6

40 - 32 = 8 bit 58 - 8 = 0 bit 3

X = 01101000two

n 2n

0 1

1 2

2 4

3 8

4 16

5 32

6 64

7 128

8 256

9 512

10 1024

35

Hexadecimal Notation It is often convenient to write binary (base-2) numbers

as hexadecimal (base-16) numbers instead.– fewer digits -- four bits per hex digit– less error prone -- easy to corrupt long string of 1’s and 0’s

Binary Hex Decimal0000 0 0

0001 1 1

0010 2 2

0011 3 3

0100 4 4

0101 5 5

0110 6 6

0111 7 7

Binary Hex Decimal1000 8 8

1001 9 9

1010 A 10

1011 B 11

1100 C 12

1101 D 13

1110 E 14

1111 F 15

36

Converting from Binary to Hexadecimal Every four bits is a hex digit.

– start grouping from right-hand side

011101010001111010011010111

7D4F8A3

This is not a new machine representation,just a convenient way to write the number.

Operations in C

38

Overview of some C operators

! Logical NOT or “bang”

~ Bitwise NOT (“flips” bits)

& Bitwise AND

^ Bitwise XOR

| Bitwise OR

+ Addition

<< Bitwise left shift (shifts bits to left)

>> Bitwise right shift (shifts bits to right)

39

Addition 2’s comp. addition is just binary addition.

– assume all integers have the same number of bits– ignore carry out– for now, assume that sum fits in n-bit 2’s comp. representation

01101000 (104) 11110110 (-10)

+ 11110000 (-16) + (-9)

01011000 (88) (-19)

Assuming 8-bit 2’s complement numbers.

40

Subtraction Negate subtrahend (2nd no.) and add.

– assume all integers have the same number of bits– ignore carry out– for now, assume that difference fits in n-bit 2’s comp. representation

01101000 (104) 11110110 (-10)

- 00010000 (16) + (-9)

01011000 (88) (-19)

01101000 (104) 11110110 (-10)

+ 11110000 (-16) + (9)

01011000 (88) (-1)

Assuming 8-bit 2’s complement numbers.

41

Sign Extension To add two numbers, we must represent them

with the same number of bits. If we just pad with zeroes on the left:

Instead, replicate the MS bit -- the sign bit:

4-bit 8-bit0100 (4) 00000100 (still 4)

1100 (-4) 00001100 (12, not -4)

4-bit 8-bit0100 (4) 00000100 (still 4)

1100 (-4) 11111100 (still -4)

42

Overflow If operands are too big, then sum cannot be represented

as an n-bit 2’s comp number.

We have overflow if:– signs of both operands are the same, and– sign of sum is different.

Another test -- easy for hardware:– carry into MS bit does not equal carry out

01000 (8) 11000 (-8)

+ 01001 (9) +10111 (-9)

10001 (-15) 01111 (+15)

43

Logical Operations Operations on logical TRUE or FALSE

– two states -- takes one bit to represent: TRUE=1, FALSE=0

View n-bit number as a collection of n logical values– operation applied to each bit independently

A B A AND B0 0 00 1 01 0 01 1 1

A B A OR B0 0 00 1 11 0 11 1 1

A NOT A0 11 0

44

Examples of Logical Operations AND

– useful for clearing bits AND with zero = 0 AND with one = no change

OR– useful for setting bits

OR with zero = no change OR with one = 1

NOT– unary operation -- one argument– flips every bit

11000101AND 00001111

00000101

11000101OR 00001111

11001111

NOT 1100010100111010

45

Bit-Level Operations in C

Operations &, |, ~, ^ Available in C– Apply to any “integral” data type

long, int, short, char

– View arguments as bit vectors– Arguments applied bit-wise

Examples (Char data type)– ~0x41 --> 0xBE

~010000012 --> 101111102

– ~0x00 --> 0xFF~000000002 --> 111111112

– 0x69 & 0x55 --> 0x41011010012 & 010101012 --> 010000012

– 0x69 | 0x55 --> 0x7D011010012 | 010101012 --> 011111012

46

Relations Between Operations DeMorgan’s Laws

– Express & in terms of |, and vice-versa A & B = ~(~A | ~B)

– A and B are true if and only if neither A nor B is false A | B = ~(~A & ~B)

– A or B are true if and only if A and B are not both false

Exclusive-Or using Inclusive Or A ^ B = (~A & B) | (A & ~B)

– Exactly one of A and B is true A ^ B = (A | B) & ~(A & B)

– Either A is true, or B is true, but not both

47

General Boolean Algebras Operate on Bit Vectors

– Operations applied bitwise

All of the Properties of Boolean Algebra Apply

01101001& 01010101 01000001

01101001| 01010101 01111101

01101001^ 01010101 00111100

~ 01010101 10101010 01000001 01111101 00111100 10101010

48

Contrast: Logic Operations in C

Contrast to Logical Operators– &&, ||, !

View 0 as “False” Anything nonzero as “True” Always return 0 or 1 Early termination

Examples (char data type)– !0x41 --> 0x00– !0x00 --> 0x01– !!0x41 --> 0x01

– 0x69 && 0x55 --> 0x01– 0x69 || 0x55 --> 0x01– p && *p (avoids null pointer access)

49

Shift Operations

Left Shift: x << y– Shift bit-vector x left y positions

Throw away extra bits on left Fill with 0’s on right

Right Shift: x >> y– Shift bit-vector x right y positions

Throw away extra bits on right

– Logical shift Fill with 0’s on left

– Arithmetic shift Replicate most significant bit on right Useful with two’s complement integer

representation

01100010Argument x

00010000<< 3

00011000Log. >> 2

00011000Arith. >> 2

10100010Argument x

00010000<< 3

00101000Log. >> 2

11101000Arith. >> 2

0001000000010000

0001100000011000

0001100000011000

00010000

00101000

11101000

00010000

00101000

11101000

Limits of the Machine

51

Numeric RangesNumeric RangesUnsigned Values

– UMin = 0000…0

– UMax = 2w – 1111…1

Two’s Complement Values– TMin = –2w–1

100…0

– TMax = 2w–1 – 1

011…1

Other Values– Minus 1

111…1

Decimal Hex BinaryUMax 65535 FF FF 11111111 11111111TMax 32767 7F FF 01111111 11111111TMin -32768 80 00 10000000 00000000-1 -1 FF FF 11111111 111111110 0 00 00 00000000 00000000

Values for W = 16

52

Values for Different Word Sizes

Observations– |TMin | = TMax + 1

Asymmetric range

– UMax = 2 * TMax + 1

C Programming–  #include <limits.h>

K&R App. B11

– Declares constants, e.g.,  ULONG_MAX  LONG_MAX  LONG_MIN

– Values platform-specific

W8 16 32 64

UMax 255 65,535 4,294,967,295 18,446,744,073,709,551,615TMax 127 32,767 2,147,483,647 9,223,372,036,854,775,807TMin -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808

53

short int x = 15213; unsigned short int ux = (unsigned short) x; short int y = -15213; unsigned short int uy = (unsigned short) y;

Casting Signed to Unsigned

C Allows Conversions from Signed to Unsigned

Resulting Value– No change in bit representation– Nonnegative values unchanged

ux = 15213

– Negative values change into (large) positive values uy = 50323

54

Signed vs. Unsigned in CSigned vs. Unsigned in C

Constants– By default are considered to be signed integers– Unsigned if have “U” as suffix

0U, 4294967259U

Casting– Explicit casting between signed & unsigned same as U2T and T2U

int tx, ty;

unsigned ux, uy;

tx = (int) ux;

uy = (unsigned) ty;

– Implicit casting also occurs via assignments and procedure callstx = ux;

uy = ty;

55

Why Should I Use Unsigned?

Don’t Use Just Because Number Nonzero– C compilers on some machines generate less efficient code

unsigned i;

for (i = 1; i < cnt; i++)

a[i] += a[i-1];

– Easy to make mistakesfor (i = cnt-2; i >= 0; i--)

a[i] += a[i+1];

Do Use When Performing Modular Arithmetic– Multiprecision arithmetic– Other esoteric stuff

Do Use When Need Extra Bit’s Worth of Range– Working right up to limit of word size

56

MultiplicationMultiplication

Computing Exact Product of w-bit numbers x, y– Either signed or unsigned

Ranges– Unsigned:

Result requires up to 2w bits

– Two’s complement: Result requires to 2w–1 bits

Maintaining Exact Results– Would need to keep expanding word size with each product computed– Done in software by “arbitrary precision” arithmetic packages

57

Unsigned Multiplication in CUnsigned Multiplication in C

Standard Multiplication Function– Ignores high order w bits

Implements Modular ArithmeticUMultw(u , v) = u · v mod 2w

• • •

• • •

u

v*

• • •u · v

• • •

True Product: 2*w bits

Operands: w bits

Discard w bits: w bits UMultw(u , v)

• • •

58

Unsigned vs. Signed MultiplicationUnsigned vs. Signed Multiplication

Unsigned Multiplicationunsigned ux = (unsigned) x;

unsigned uy = (unsigned) y;

unsigned up = ux * uy– Truncates product to w-bit number up = UMultw(ux, uy)

– Modular arithmetic: up = ux uy mod 2w

Two’s Complement Multiplicationint x, y;

int p = x * y;– Compute exact product of two w-bit numbers x, y– Truncate result to w-bit number p = TMultw(x, y)

59

Power-of-2 Multiply with Shift

Operation– u << k gives u * 2k

– Both signed and unsigned

Examples– u << 3 == u * 8– u << 5 - u << 3 == u * 24– Most machines shift and add much faster than multiply

Compiler generates this code automatically

• • •

0 0 1 0 0 0•••

u

2k*

u · 2kTrue Product: w+k bits

Operands: w bits

Discard k bits: w bits UMultw(u , 2k)

•••

k

• • • 0 0 0•••

TMultw(u , 2k)

0 0 0••••••

60

Unsigned Power-of-2 Divide with Shift

Quotient of Unsigned by Power of 2– u >> k gives u / 2k – Uses logical shift

Division Computed Hex Binaryx 15213 15213 3B 6D 00111011 01101101x >> 1 7606.5 7606 1D B6 00011101 10110110x >> 4 950.8125 950 03 B6 00000011 10110110x >> 8 59.4257813 59 00 3B 00000000 00111011

0 0 1 0 0 0•••

u

2k/

u / 2kDivision:

Operands:•••

k••• •••

•••0 ••• •••

u / 2k •••Result:

.

Binary Point

0 •••

61

Signed Power-of-2 Divide with Shift

Quotient of Signed by Power of 2– x >> k gives x / 2k – Uses arithmetic shift– Rounds wrong direction when u < 0

0 0 1 0 0 0•••

x

2k/

x / 2kDivision:

Operands:•••

k••• •••

•••0 ••• •••

RoundDown(x / 2k) •••Result:

.

Binary Point

0 •••

Division Computed Hex Binaryy -15213 -15213 C4 93 11000100 10010011y >> 1 -7606.5 -7607 E2 49 11100010 01001001y >> 4 -950.8125 -951 FC 49 11111100 01001001y >> 8 -59.4257813 -60 FF C4 11111111 11000100

62

Correct Power-of-2 Divide Quotient of Negative Number by Power of 2

– Want x / 2k (Round Toward 0)– Compute as (x+2k-1)/ 2k

In C: (x + (1<<k)-1) >> k Biases dividend toward 0

• Case 1: No rounding

Divisor:

Dividend:

0 0 1 0 0 0•••

u

2k/

u / 2k

•••

k

1 ••• 0 0 0•••

1 •••0 1 1••• .

Binary Point

1

0 0 0 1 1 1•••+2k +–1 •••

1 1 1•••

1 ••• 1 1 1•••

Biasing has no effect

63

Correct Power-of-2 Divide (Cont.)

Divisor:

Dividend:

Case 2: Rounding

0 0 1 0 0 0•••

x

2k/

x / 2k

•••

k1 ••• •••

1 •••0 1 1••• .

Binary Point

1

0 0 0 1 1 1•••+2k +–1 •••

1 ••• •••

Biasing adds 1 to final result

•••

Incremented by 1

Incremented by 1

Floating Point

65

Fractions: Fixed-Point How can we represent fractions?

– Use a “binary point” to separate positivefrom negative powers of two -- just like “decimal point.”

– 2’s comp addition and subtraction still work. if binary points are aligned

00101000.101 (40.625)

+ 11111110.110 ( -1.25) [note, it is in 2s compl.]

00100111.011 (39.375)

2-1 = 0.5

2-2 = 0.25

2-3 = 0.125

No new operations -- same as integer arithmetic.

66

Very Large and Very Small: Floating-Point

Large values: 6.023 x 1023 -- requires 79 bits Small values: 6.626 x 10-34 -- requires >110 bits

Use equivalent of “scientific notation”: F x 2E

Need to represent F (fraction), E (exponent), and sign. IEEE 754 Floating-Point Standard (32-bits):

S Exponent Fraction

1b 8b 23b

0exponent,2fraction.01

254exponent1,2fraction.11126

127exponent

S

S

N

N

67

Floating Point Example Single-precision IEEE floating point number: 10111111010000000000000000000000

– Sign is 1 – number is negative.– Exponent field is 01111110 = 126 (decimal).– Fraction is 0.100000000000… = 0.5 (decimal).

Value = -1.5 x 2(126-127) = -1.5 x 2-1 = -0.75.

sign exponent fraction

68

Floating Point in CFloating Point in C

C Guarantees Two Levelsfloat single precision

double double precision

Conversions– Casting between int, float, and double changes numeric values– Double or float to int

Truncates fractional part Like rounding toward zero Not defined when out of range

– Generally saturates to TMin or TMax

– int to double Exact conversion, as long as int has ≤ 53 bit word size

– int to float Will round according to rounding mode