IT 252 Computer Organization and Architecture Number Representation Chia-Chi Teng

IT 252Computer Organizationand Architecture

Number Representation

Chia-Chi Teng

Where Are We Now?

CS142 & 124

IT344

Review (do you remember from 124/104?)

• 8 bit signed 2’s complement binary # -> decimal #

• 0111 1111 = ?

• 1000 0000 = ?

• 1111 1111 = ?

• Decimal # -> 8 bit signed 2’s complement binary #

• 32 = ?

• -2 = ?

• 200 = ?

Decimal Numbers: Base 10

Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Example:

3271 =

(3x103) + (2x102) + (7x101) + (1x100)

Numbers: positional notation

• Number Base B B symbols per digit:• Base 10 (Decimal):0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Base 2 (Binary): 0, 1

• Number representation:

• d31d30 ... d1d0 is a 32 digit number

• value = d31 B31 + d30 B30 + ... + d1 B1 + d0 B0

• Binary: 0,1 (In binary digits called “bits”)• 0b11010 = 124 + 123 + 022 + 121 + 020

= 16 + 8 + 2= 26

• Here 5 digit binary # turns into a 2 digit decimal #

• Can we find a base that converts to binary easily?

#s often written0b…

Hexadecimal Numbers: Base 16

• Hexadecimal: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F• Normal digits + 6 more from the alphabet

• In C, written as 0x… (e.g., 0xFAB5)

• Conversion: BinaryHex• 1 hex digit represents 16 decimal values

• 4 binary digits represent 16 decimal values1 hex digit replaces 4 binary digits

• One hex digit is a “nibble”. Two is a “byte”• 2 bits is a “half-nibble”. Shave and a haircut…

• Example:• 1010 1100 0011 (binary) = 0x_____ ?

Decimal vs. Hexadecimal vs. Binary

Examples:

1010 1100 0011 (binary) = 0xAC3

10111 (binary) = 0001 0111 (binary) = 0x17

0x3F9 = 11 1111 1001 (binary)

How do we convert between hex and Decimal?

00 0 000001 1 000102 2 001003 3 001104 4 010005 5 010106 6 011007 7 011108 8 100009 9 100110 A 101011 B 101112 C 110013 D 110114 E 111015 F 1111

MEMORIZE!

Precision and Accuracy

Precision is a count of the number bits in a computer word used to represent a value.

Accuracy is a measure of the difference between the actual value of a number and its computer representation.

Don’t confuse these two terms!

High precision permits high accuracy but doesn’t guarantee it. It is possible to have high precisionbut low accuracy.

Example: float pi = 3.14;pi will be represented using all bits of the significant (highly precise), but is only an approximation (not accurate).

What to do with representations of numbers?

• Just what we do with numbers!• Add them• Subtract them• Multiply them• Divide them• Compare them

• Example: 10 + 7 = 17• …so simple to add in binary that we can

build circuits to do it!• subtraction just as you would in decimal• Comparison: How do you tell if X > Y ?

1 0 1 0

+ 0 1 1 1

-------------------------

1 0 0 0 1

11

02

46

810

1214

0

2

4

6

8

10

1214

0

4

8

12

16

20

24

28

32

Integer Addition

Visualizing (Mathematical) Integer Addition

• Integer Addition• 4-bit integers u,

v

• Compute true sum Add4(u , v)

• Values increase linearly with u and v

• Forms planar surface

Add4(u , v)

uv

02

46

810

1214

0

2

4

6

8

10

1214

0

2

4

6

8

10

12

14

16

Visualizing Unsigned Addition

• Wraps Around• If true sum ≥ 2w

• At most once

0

2w

2w+1

UAdd4(u , v)

uv

True Sum

Modular Sum

Overflow

Overflow

BIG IDEA: Bits can represent anything!!• Characters?

• 26 letters 5 bits (25 = 32)• upper/lower case + punctuation

7 bits (in 8) (“ASCII”)• standard code to cover all the world’s

languages 8,16,32 bits (“Unicode”)www.unicode.com

• Logical values?• 0 False, 1 True

• colors ? Ex:

• locations / addresses? commands?

• MEMORIZE: N bits at most 2N things

Red (00) Green (01) Blue (11)

How to Represent Negative Numbers?

• So far, unsigned numbers

• Obvious solution: define leftmost bit to be sign! • 0 +, 1 –

• Rest of bits can be numerical value of number

• Representation called sign and magnitude

• x86 uses 32-bit integers. +1ten would be:

0000 0000 0000 0000 0000 0000 0000 0001

• And –1ten in sign and magnitude would be:

1000 0000 0000 0000 0000 0000 0000 0001

Shortcomings of sign and magnitude?

• Arithmetic circuit complicated• Special steps depending whether signs are the same or not

• Also, two zeros• 0x00000000 = +0ten

• 0x80000000 = –0ten

• What would two 0s mean for programming?

• Therefore sign and magnitude abandoned

Another try: complement the bits

• Example: 710 = 001112 –710 = 110002

• Called One’s Complement

• Note: positive numbers have leading 0s, negative numbers have leadings 1s.

00000 00001 01111...

111111111010000 ...

• What is -00000 ? Answer: 11111

• How many positive numbers in N bits?

• How many negative numbers?

Standard Negative Number Representation• What is result for unsigned numbers if tried to subtract large number from a small one?• Would try to borrow from string of leading 0s, so result would have a string of leading 1s

3 - 4 00…0011 – 00…0100 = 11…1111

• With no obvious better alternative, pick representation that made the hardware simple

• As with sign and magnitude, leading 0s positive, leading 1s negative

000000...xxx is ≥ 0, 111111...xxx is < 0 except 1…1111 is -1, not -0 (as in sign & mag.)

• This representation is Two’s Complement

2’s Complement Number “line”: N = 5

• 2N-1 non-negatives

• 2N-1 negatives

• one zero

• how many positives?

00000 0000100010

1111111110

10000 0111110001

0 12

-1-2

-15 -16 15

.

.

.

.

.

.

-311101

-411100

00000 00001 01111...

111111111010000 ...

Numeric Ranges

• Unsigned Values

• UMin = 0000…0

• UMax = 2w – 1111…1

• Two’s Complement Values

• TMin = –2w–1

100…0

• TMax = 2w–1 – 1011…1

• Other Values

• Minus 1111…1

Decimal Hex Binary UMax 65535 FF FF 11111111 11111111 TMax 32767 7F FF 01111111 11111111 TMin -32768 80 00 10000000 00000000 -1 -1 FF FF 11111111 11111111 0 0 00 00 00000000 00000000

Values for W = 16

Values for Different Word Sizes

• Observations• |TMin | = TMax +

1 Asymmetric range

• UMax = 2 * TMax + 1

W 8 16 32 64

UMax 255 65,535 4,294,967,295 18,446,744,073,709,551,615 TMax 127 32,767 2,147,483,647 9,223,372,036,854,775,807 TMin -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808

C Programming #include <limits.h> Declares constants, e.g.,

ULONG_MAX LONG_MAX LONG_MIN

Values platform specific

Unsigned & Signed Numeric Values

• Equivalence• Same encodings for

nonnegative values

• Uniqueness• Every bit pattern represents

unique integer value

• Each representable integer has unique bit encoding

• Can Invert Mappings• U2B(x) = B2U-1(x)

Bit pattern for unsigned integer

• T2B(x) = B2T-1(x) Bit pattern for two’s comp

integer

X B2T(X)B2U(X)0000 00001 10010 20011 30100 40101 50110 60111 7

–88–79–610–511–412–313–214–115

10001001101010111100110111101111

01234567

Two’s Complement Formula

• Can represent positive and negative numbers in terms of the bit value times a power of 2:

d31 x -(231) + d30 x 230 + ... + d2 x 22 + d1 x 21 + d0 x 20

• Example: 1101two

= 1x-(23) + 1x22 + 0x21 + 1x20

= -23 + 22 + 0 + 20

= -8 + 4 + 0 + 1

= -8 + 5

= -3ten

Two’s Complement shortcut: Negation• Change every 0 to 1 and 1 to 0 (invert or

complement), then add 1 to the result

• Proof*: Sum of number and its (one’s) complement must be 111...111two

However, 111...111two= -1ten

Let x’ one’s complement representation of x

Then x + x’ = -1 x + x’ + 1 = 0 -x = x’ + 1

• Example: -3 to +3 to -3x : 1111 1111 1111 1111 1111 1111 1111 1101twox’: 0000 0000 0000 0000 0000 0000 0000 0010two+1: 0000 0000 0000 0000 0000 0000 0000 0011two()’: 1111 1111 1111 1111 1111 1111 1111 1100two+1: 1111 1111 1111 1111 1111 1111 1111 1101twoYou should be able to do this in your head…

*Check out www.cs.berkeley.edu/~dsw/twos_complement.html

What if too big?• Binary bit patterns above are simply representatives of numbers. Strictly speaking they are called “numerals”.

• Numbers really have an number of digits• with almost all being same (00…0 or 11…1) except

for a few of the rightmost digits

• Just don’t normally show leading digits

• If result of add (or -, *, / ) cannot be represented by these rightmost HW bits, overflow is said to have occurred.

00000 00001 00010 1111111110

unsigned

Peer Instruction Question

X = 1111 1111 1110 1100two

Y = 0011 1010 0000 0000two

A. X > Y (if signed)

B. X > Y (if unsigned)

C. X = -19 (if signed)

ABC0: FFF1: FFT2: FTF3: FTT4: TFF5: TFT6: TTF7: TTT

Peer Instruction Question

X = 1111 1111 1110 1100two

Y = 0011 1010 0000 0000two

A. X > Y (if signed)

B. X > Y (if unsigned)

C. X = -19 (if signed)

ABC0: FFF1: FFT2: FTF3: FTT4: TFF5: TFT6: TTF7: TTT

A: False (X negative)B: TrueC: False(X = -20)

Number summary...

• We represent “things” in computers as particular bit patterns: N bits 2N things

• Decimal for human calculations, binary for computers, hex to write binary more easily

• 1’s complement - mostly abandoned

• 2’s complement universal in computing: cannot avoid, so learn

• Overflow: numbers ; computers finite,errors!

00000 00001 01111...

111111111010000 ...

00000 00001 01111...

111111111010000 ...

META: We often make design decisions to make HW simple

Information units

• Basic unit is the bit (has value 0 or 1)

• Bits are grouped together in units and operated on together:

• Byte = 8 bits

• Word = 4 bytes

• Double word = 2 words

• etc.

Encoding Byte Values

• Byte = 8 bitsByte = 8 bits• Binary 000000002 to 111111112

• Decimal: 010 to 25510

First digit must not be 0 in C

• Hexadecimal 0016 to FF16

Base 16 number representation Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’

Write FA1D37B16 in C as 0xFA1D37B• Or 0xfa1d37b

0 0 00001 1 00012 2 00103 3 00114 4 01005 5 01016 6 01107 7 01118 8 10009 9 1001A 10 1010B 11 1011C 12 1100D 13 1101E 14 1110F 15 1111

HexDecim

al

Binary

Memory addressing

• Memory is an array of information units

– Each unit has the same size

– Each unit has its own address

– Address of an unit and contents of the unit at that address are different

address

012

123-170

contents

Addressing

• In most of today’s computers, the basic unit that can be addressed is a byte. (how many bit is a byte?)

– x86 (and pretty much all CPU today) is byte addressable

• The address space is the set of all memory units that a program can reference

– The address space is usually tied to the length of the registers

– x86 has 32-bit registers. Hence its address space is 4G bytes

– Older micros (minis) had 16-bit registers, hence 64 KB address space (too small)

– Some current (Alpha, Itanium, Sparc, Altheon) machines have 64-bit registers, hence an enormous address space

Machine Words

• Machine Has “Word Size”Machine Has “Word Size”• Nominal size of integer-valued data

Including addresses

• Many current machines still use 32 bits (4 bytes) words Limits addresses to 4GB Becoming too small for memory-intensive applications

• New or high-end systems use 64 bits (8 bytes) words Potential address space 1.8 X 1019 bytes x86-64 machines support 48-bit addresses: 256 Terabytes

• Machines support multiple data formats Fractions or multiples of word size Always integral number of bytes

Addressing words

• Although machines are byte-addressable, 4 byte integers are the most commonly used units

• Every 32-bit integer starts at an address divisible by 4

int at address 0int at address 4int at address 8

Word-Oriented Memory Organization

• Addresses Specify Addresses Specify Byte LocationsByte Locations

• Address of first byte in word

• Addresses of successive words differ by 4 (32-bit) or 8 (64-bit)

000000010002000300040005000600070008000900100011

32-bitWords Bytes Addr.

0012001300140015

64-bitWords

Addr =??

Addr =??

Addr =??

Addr =??

Addr =??

Addr =??

0000

0004

0008

0012

0000

0008

Data Representations

• Sizes of C Objects (in Bytes)Sizes of C Objects (in Bytes)• C Data Type Typical 32-bit Intel IA32

x86-64 char 1 1

1 short 2 2

2 int 4 4

4 long 4 4

8 long long 8 8

8 float 4 4

4 double 8 8

8 long double 8 10/12

10/16 char * 4 4

8• Or any other pointer

Byte Ordering

How should bytes within multi-byte word be ordered in memory?

ConventionsBig Endian: Sun, PPC Mac, Internet

Least significant byte has highest address

Little Endian: x86Least significant byte has lowest address

Byte Ordering Example

• Big EndianBig Endian• Least significant byte has highest address

• Big End First

• Little EndianLittle Endian• Least significant byte has lowest address

• Little End First

• ExampleExample

• Variable x has 4-byte representation 0x01234567

• Address given by &x is 0x100

0x100 0x101 0x102 0x103

450x100 0x101 0x102 0x103

Big Endian

Little Endian

01 23 45 67

67 45 23 01

Big-endian vs. little-endian

• Byte order within a word:

0

0

123

1 2 3

Little-endian (we’ll use this)

Big-endian

012

Memory address

3

Word #0byteValue of Word #0

Reading Byte-Reversed Listings

• DisassemblyDisassembly• Text representation of binary machine code

• Generated by program that reads the machine code

• Example FragmentExample Fragment Address Instruction Code Assembly Rendition 8048365: 5b pop %ebx 8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx 804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)

Deciphering NumbersDeciphering Numbers Value: 0x12ab Pad to 32 bits: 0x000012ab Split into bytes: 00 00 12 ab Reverse: ab 12 00 00

Examining Data RepresentationsCode to Print Byte Representation of Data

Casting pointer to unsigned char * creates byte array

typedef unsigned char *pointer;

void show_bytes(pointer start, int len){ int i; for (i = 0; i < len; i++) printf("0x%p\t0x%.2x\n", start+i, start[i]); printf("\n");}

Printf directives:%p: Print pointer%x: Print Hexadecimal

show_bytes Execution Example

int a = 15213;

printf("int a = 15213;\n");

show_bytes((pointer) &a, sizeof(int));

Result (Linux):

int a = 15213;

0x11ffffcb8 0x6d

0x11ffffcb9 0x3b

0x11ffffcba 0x00

0x11ffffcbb 0x00

Representing & Manipulating Sets

• RepresentationRepresentation

• Width w bit vector represents subsets of {0, …, w–1}

• aj = 1 if j A

01101001 { 0, 3, 5, 6 }

76543210

01010101 { 0, 2, 4, 6 }

76543210

• OperationsOperations

• & Intersection 01000001 { 0, 6 }

• | Union 01111101 { 0, 2, 3, 4, 5, 6 }

• ^ Symmetric difference 00111100 { 2, 3, 4, 5 }

• ~ Complement 10101010 { 1, 3, 5, 7 }

Bit-Level Operations in C

• Operations &, |, ~, ^ Available in COperations &, |, ~, ^ Available in C• Apply to any “integral” data type

long, int, short, char, unsigned

• View arguments as bit vectors

• Arguments applied bit-wise

• Examples (Char data type)Examples (Char data type)• ~0x41 --> 0xBE

~010000012 --> 101111102

• ~0x00 --> 0xFF

~000000002 --> 111111112

• 0x69 & 0x55 --> 0x41

011010012 & 010101012 --> 010000012

• 0x69 | 0x55 --> 0x7D

011010012 | 010101012 --> 011111012

Contrast: Logic Operations in C

• Contrast to Logical OperatorsContrast to Logical Operators• &&, ||, !

View 0 as “False” Anything nonzero as “True” Always return 0 or 1 Early termination

• Examples (char data type)Examples (char data type)• !0x41 --> 0x00• !0x00 --> 0x01• !!0x41 --> 0x01

• 0x69 && 0x55 --> 0x01• 0x69 || 0x55 --> 0x01• p && *p (avoids null pointer access)

Shift Operations• Left Shift: Left Shift: x << yx << y

• Shift bit-vector x left y positions• Throw away extra bits on left

Fill with 0’s on right

• Right Shift: Right Shift: x >> yx >> y• Shift bit-vector x right y positions

Throw away extra bits on right

• Logical shift Fill with 0’s on left

• Arithmetic shift Replicate most significant bit on

right

• Undefined BehaviorUndefined Behavior• Shift amount < 0 or word size

01100010Argument x

00010000<< 3

00011000Log. >> 2

00011000Arith. >> 2

10100010Argument x

00010000<< 3

00101000Log. >> 2

11101000Arith. >> 2

0001000000010000

0001100000011000

0001100000011000

00010000

00101000

11101000

00010000

00101000

11101000

The CPU - Instruction Execution Cycle

• The CPU executes a program by repeatedly following this cycle

1. Fetch the next instruction, say instruction i

2. Execute instruction i

3. Compute address of the next instruction, say j

4. Go back to step 1

• Of course we’ll optimize this but it’s the basic concept

What’s in an instruction?

• An instruction tells the CPU

– the operation to be performed via the OPCODE

– where to find the operands (source and destination)

• For a given instruction, the ISA specifies

– what the OPCODE means (semantics)

– how many operands are required and their types, sizes etc.(syntax)

• Operand is either

– register (integer, floating-point, PC)

– a memory address

– a constant

Reference slides

You ARE responsible for the material on these slides (they’re

just taken from the reading anyway) ; we’ve moved them to

the end and off-stage to give more breathing room to lecture!

Kilo, Mega, Giga, Tera, Peta, Exa, Zetta, Yotta• Common use prefixes (all SI, except K [= k in SI])

• Confusing! Common usage of “kilobyte” means 1024 bytes, but the “correct” SI value is 1000 bytes

• Hard Disk manufacturers & Telecommunications are the only computing groups that use SI factors, so what is advertised as a 30 GB drive will actually only hold about 28 x 230 bytes, and a 1 Mbit/s connection transfers 106 bps.

Name Abbr Factor SI sizeKilo K 210 = 1,024 103 = 1,000

Mega M 220 = 1,048,576 106 = 1,000,000

Giga G 230 = 1,073,741,824 109 = 1,000,000,000

Tera T 240 = 1,099,511,627,776 1012 = 1,000,000,000,000

Peta P 250 = 1,125,899,906,842,624 1015 = 1,000,000,000,000,000

Exa E 260 = 1,152,921,504,606,846,976 1018 = 1,000,000,000,000,000,000

Zetta Z 270 = 1,180,591,620,717,411,303,424

1021 = 1,000,000,000,000,000,000,000

Yotta Y 280 = 1,208,925,819,614,629,174,706,176

1024 = 1,000,000,000,000,000,000,000,000

physics.nist.gov/cuu/Units/binary.html

kibi, mebi, gibi, tebi, pebi, exbi, zebi, yobi

• New IEC Standard Prefixes [only to exbi officially]

• International Electrotechnical Commission (IEC) in 1999 introduced these to specify binary quantities. •Names come from shortened versions of the original SI prefixes (same pronunciation) and bi is short for “binary”, but pronounced “bee” :-(

•Now SI prefixes only have their base-10 meaning and never have a base-2 meaning.

Name Abbr Factorkibi Ki 210 = 1,024mebi Mi 220 = 1,048,576gibi Gi 230 = 1,073,741,824tebi Ti 240 = 1,099,511,627,776pebi Pi 250 = 1,125,899,906,842,624exbi Ei 260 = 1,152,921,504,606,846,976zebi Zi 270 =

1,180,591,620,717,411,303,424yobi Yi 280 =

1,208,925,819,614,629,174,706,176

en.wikipedia.org/wiki/Binary_prefix

As of thiswriting, thisproposal hasyet to gainwidespreaduse…

• What is 234? How many bits addresses (I.e., what’s ceil log2 = lg of) 2.5 TiB?

• Answer! 2XY means…X=0 ---X=1 kibi ~103

X=2 mebi ~106

X=3 gibi ~109

X=4 tebi ~1012

X=5 pebi ~1015

X=6 exbi ~1018

X=7 zebi ~1021

X=8 yobi ~1024

The way to remember #s

Y=0 1Y=1 2Y=2 4Y=3 8Y=4 16Y=5 32Y=6 64Y=7 128Y=8 256Y=9 512

MEMORIZE!

Which base do we use?

• Decimal: great for humans, especially when doing arithmetic

• Hex: if human looking at long strings of binary numbers, its much easier to convert to hex and look 4 bits/symbol

• Terrible for arithmetic on paper

• Binary: what computers use; you will learn how computers do +, -, *, /

• To a computer, numbers always binary

• Regardless of how number is written:

• 32ten == 3210 == 0x20 == 1000002 == 0b100000

• Use subscripts “ten”, “hex”, “two” in book, slides when might be confusing

Two’s Complement for N=32

0000 ... 0000 0000 0000 0000two =

0ten0000 ... 0000 0000 0000 0001two =

1ten0000 ... 0000 0000 0000 0010two =

2ten. . .0111 ... 1111 1111 1111 1101two =

2,147,483,645ten0111 ... 1111 1111 1111 1110two =

2,147,483,646ten0111 ... 1111 1111 1111 1111two =

2,147,483,647ten1000 ... 0000 0000 0000 0000two =

–2,147,483,648ten1000 ... 0000 0000 0000 0001two =

–2,147,483,647ten1000 ... 0000 0000 0000 0010two =

–2,147,483,646ten. . . 1111 ... 1111 1111 1111 1101two =

–3ten1111 ... 1111 1111 1111 1110two =

–2ten1111 ... 1111 1111 1111 1111two =

–1ten

• One zero; 1st bit called sign bit

• 1 “extra” negative:no positive 2,147,483,648ten

Two’s comp. shortcut: Sign extension

• Convert 2’s complement number rep. using n bits to more than n bits

• Simply replicate the most significant bit (sign bit) of smaller to fill new bits• 2’s comp. positive number has infinite 0s

• 2’s comp. negative number has infinite 1s

• Binary representation hides leading bits; sign extension restores some of them

• 16-bit -4ten to 32-bit:

1111 1111 1111 1100two

1111 1111 1111 1111 1111 1111 1111 1100two

Preview: Signed vs. Unsigned Variables

• Java and C declare integers int• Use two’s complement (signed integer)

• Also, C declaration unsigned int• Declares a unsigned integer

• Treats 32-bit number as unsigned integer, so most significant bit is part of the number, not a sign bit

Documents

IT 252 Computer Organization and Architecture Number Representation Chia-Chi Teng