2
Numbers Bit patterns have no inherent meaning
conventions define relationship between bits and numbers
Numbers can be represented in any base
Binary numbers (base 2) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001... decimal: 0...2n-1
Of course it gets more complicated: negative numbers fractions and real numbers numbers are finite (overflow)
How do we represent negative numbers?i.e., which bit patterns will represent which numbers?
ni
ii 0
digit base
3
Possible Representations
Sign Magnitude One's Complement Two's Complement
000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0 100 = -3 100 = -4101 = -1 101 = -2 101 = -3110 = -2 110 = -1 110 = -2111 = -3 111 = -0 111 = -1
Issues: balance, number of zeros, ease of operations Which one is best? Why?
4
Two’s complement format
00000001
0010
0011
0100
0101
0110
011110001001
1010
1011
1100
1101
1110
1111
01
2
3
4
5
6
7-8-7
-6
-5
-4
-3
-2
-1
… -4 -3 -2 -1 0 1 2 3 4 …
5
Two’s complement operations
Negating a two’s complement number: invert all bits and add 1
remember: “negate” and “invert” are quite different ! The sum of a number and its inverted representation must be
111….111two, which represent –1
Converting n bit numbers into numbers with more than n bits:
copy the most significant bit (the sign bit) into the other bits
0010 0000 0010
1010 1111 1010
Referred as "sign extension"
MIPS 16 bit immediate gets converted to 32 bits for arithmetic
x x + 1 = 0 x x = 1 x x 1
6
Two’s complement
Two’s complement gets its name from the rule that the unsigned sum of an n bit number and its negative is 2n
Thus, the negation of a number x is 2n-x adding a number x and its negate considering the bit
patterns as unsigned:
Negative integers in two’s complement notation look like large numbers in unsigned notation
nn 211)(2
1)x(x1xx
111…11two
7
MIPS word is 32 bits long
32 bit signed numbers:
0000 0000 0000 0000 0000 0000 0000 0000two = 0ten
0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten
0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten
...
0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten
0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten = 231-1
1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten = -231
1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten
1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten
...
1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten
1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten
1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten
msb (bit 31) lsb (bit 0)
maxint
minint
8
Addition and subtraction
Just like in grade school (carry/borrow 1s) 0111 0111 0110+ 0110 - 0110 - 0101
Two's complement operations easy subtraction using addition of negative numbers
0111+ 1010
Overflow (result too large for finite computer word): adding two n-bit numbers does not yield an n-bit number
0111+ 0001
0001 0001
0001
_1101
_1000
9
Detecting overflow
No overflow when adding a positive and a negative number No overflow when signs are the same for subtraction Overflow occurs when the value affects the sign:
overflow when adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive and get a negative or, subtract a positive from a negative and get a positive
Consider the operations A + B, and A – B Can overflow occur if B is 0 ? Can overflow occur if A is 0 ?
cannot occur !can occur !
10
Effects of overflow
An exception (interrupt) occurs Control jumps to predefined address for exception Interrupted address is saved for possible resumption
Details based on software system / language
Don't always want to detect overflow new MIPS instructions: addu, addiu, subu unsigned integers are commonly used for memory
addresses where overflows are ignored
note: addiu still sign-extends!note: sltu, sltiu for unsigned
comparisons
11
Floating point numbers (a brief look)
We need a way to represent
numbers with fractions, e.g., 3.1416
very small numbers, e.g., .000000001
very large numbers, e.g., 3.15576 x 109
solution: scientific representation
sign, exponent, significand: (–1)sign x significand x 2E
more bits for significand gives more accuracy
more bits for exponent increases range
A number in scientific notation that has no leading 0s is called normalized
Esign 2xxx1.xxx1)(
(1+fraction)
12
Floating point numbers
A floating point number represent a number in which the binary point is not fixed
IEEE 754 floating point standard:
single precision: (32 bits)
1 bit sign, 8 bit exponent, 23 bit fraction
double precision: (64 bits)
1 bit, 11 bit exponent, 52 bit fraction
13
IEEE 754 floating-point standard Leading “1” bit of significand is implicit
Exponent is “biased” to make sorting easier all 0s is smallest exponent all 1s is largest (almost) bias of 127 for single precision and 1023 for double precision summary: (–1)sign fraction) 2exponent – bias
Example:
decimal: -.75 = - ( ½ + ¼ ) binary: -.11 = -1.1 x 2-1
floating point: exponent = E + bias = -1+127=126 = 01111110two
IEEE single precision:
sign
1 01111110 10000000000000000000
exponent fraction
14
IEEE 754 encoding of floating points
Single precision Double precision Number
Exponent Fraction Exponent Fraction
0 0 0 0 0
0 nonzero 0 nonzero +/- denormalized
1 - 254 nonzero 1 - 2046 nonzero +/- floating point
255 0 2047 0 +/- infinity
255 nonzero 2047 nonzero NaN
15
Floating point complexities
Operations are somewhat more complicated (see text) In addition to overflow we can have “underflow”
Overflow: a positive exponent too large to fit in the exponent field Underflow: a negative exponent too large to fit in the exponent field
Accuracy can be a big problem
IEEE 754 keeps two extra bits, guard and round
four rounding modes
positive divided by zero yields “infinity”
zero divide by zero yields “not a number” (NaN)
other complexities
Implementing the standard can be tricky Not using the standard can be even worse: Pentium bug!
16
IEEE 754 encoding
Single precision Double precision Number
Exponent Fraction Exponent Fraction
0 0 0 0 0
0 nonzero 0 nonzero +/- denormilized
1 to 254 nonzero 1 to 2046 nonzero +/- floating point
255 0 2047 0 +/- infinity
255 nonzero 2047 nonzero NaN
17
Floating point addition/subtraction
To add/sub two floating point numbers:
Step1. we must align the decimal point of the number with smaller exponent compare the two exponents shift the significand of the smaller number to the right by an
amount equal to the difference of the two exponents Step 2. add/sub the significands Step 3. The sum is not in normalized scientific notation, so we
need to adjust it either shifting right the significand and incrementing the
exponent, or shifting left and decrementing the exponent Step 4. We must round the number
add 1 to the least significant bit if the first bit being thrown away is 1
Step 5. If necessary re-normalize
18
Floating point addition/subtraction
Still normalized?
4. Round the significand to the appropriate
number of bits
YesOverflow or
underflow?
Start
No
Yes
Done
1. Compare the exponents of the two numbers.
Shift the smaller number to the right until its
exponent would match the larger exponent
2. Add the significands
3. Normalize the sum, either shifting right and
incrementing the exponent or shifting left
and decrementing the exponent
No Exception
Small ALU
Exponentdifference
Control
ExponentSign Fraction
Big ALU
ExponentSign Fraction
0 1 0 1 0 1
Shift right
0 1 0 1
Increment ordecrement
Shift left or right
Rounding hardware
ExponentSign Fraction
19
Accuracy of floating point arithmetic
Floating-point numbers are often approximations for a number they can’t really represent
Rounding requires the hardware to include extra bits in the calculation
IEEE 754 always keeps 2 extra bits on the right during intermediate additions called guard and round respectively
20
Common fallacies
Floating point addition is associative x + (y+z) = (x+y)+z Example:
Just as a left shift can replace an integer multiply by a power of 2, a right shift is the same as an integer division by a power of 2 (true for unsigned, but not for signed even when we sign extend) Example: (shift right by two = divide by 4ten)
-5ten = 1111 1111 1111 1111 1111 1111 1111 1011two
1111 1111 1111 1111 1111 1111 1111 10two = -2ten
0.0)10(1.5101.51.0)10(1.5101.5 38ten
38ten
38ten
38ten
1.01.0)(0.01.0)101.5101.5( ten38
ten38
ten
It should be –1ten
21
Concluding remarks
Computer arithmetic is constrained by limited precision Bit patterns have no inherent meaning (side effect of the
stored-program concept) but standards do exist two’s complement IEEE 754 floating point
Computer instructions determine “meaning” of the bit patterns Performance and accuracy are important so there are many
complexities in real machines Algorithm choice is important and may lead to hardware
optimizations for both space and time (e.g., multiplication)