Upload
mywbut-home-for-engineers
View
228
Download
0
Embed Size (px)
Citation preview
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 1/21
Computer Representation of Numbers and Computer Arithmetic
In a Computer numbers are represented by binary digits 0 and 1. Computers employ
binary arithmetic for performing operations on numbers. Since it gets cumbersome to
display large numbers in binary form computers usually display them in hexadecimal or
octal or decimal system. All of these number systems are positional systems. In a
positional system a number is represented by a set of symbols. Each of these symbolsdenote a particular value depending on its position. The number of symbols used in a
positional system depends on its 'base'. Let us now discuss about various positional
number systems:
Decimal System :
The decimal system uses 10 as its base value and employs ten symbols 0 to 9 in
representing numbers. Let us consider a decimal number 7402 consisting of four
symbols 7,4,0,2. In terms of base 10 it can be expressed as follows.
So each of the symbols from a set of symbols denoting a number is multiplied with
power of the base (10) depending on its position counted from the right. The count
always begins with 0.
In general a decimal number consisting of symbols can be
expressed as:
where,
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 2/21
Similarly, a fractional part of a decimal number can be expressed as
Binary system:
Binary system is the positional system consisting of two symbols i.e. 0,1 and '2' as its
base. Any binary number actually represents a decimal value given by
where
Consider the binary number 10101. The decimal equivalent of 10101 is given by
Hexadecimal System :
The Hexadecimal system is the positional system consisting of sixteen symbols,
0,1,2...9,A,B,C,D,E,F, and '16' as its base. Here the symbols A denotes 10, B denotes
11 and so on. The decimal equivalent of the given hexadecimal number is
given by . For example consider
.
We can convert a binary number directly to a hexadecimal number by grouping the
binary digits, starting from the right, into sets of four and converting each group to its
equivalent hexadecimal digit. If in such a grouping the last set falls short of four binary
digits then do the obvious thing of prefixing it with adequate number of binary digit '0'.
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 3/21
For example let us find the hexadecimal equivalent of
The vice-versa is also true.
Octal System: The octal system is the positional system that uses 8 as its base andas its symbol set of size 8. The decimal equivalent of an octal number
is given by . For
example consider
We can get the octal equivalent of a binary number by grouping the binary digits,starting from the right, into sets of three binary digits and converting each of these sets
to its octal equivalent. If such a grouping results in a last set having less number of
digits it may be prefixed with adequate number of binary digit 0. As an example the
octal equivalent of
Conversion of decimal system to non-decimal system:
To convert a decimal number to a number of any other system we should consider the
integer and fractional parts separately and follow the following procedure:
Conversion of integer part:
(a) Consider the integer part of a given decimal number and divide it by the base b of
the new number system. The remainder will constitute the rightmost digit of the integer
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 4/21
part of the new number.
(b) Next divide the quotient again by the base b. The remainder will constitute second
digit from the right in the new system.
Continue this process until we end up with a zero-quotient. The last remainder is the
leftmost digit of the new number.
Conversion of fractional part :
(a) Consider the fractional part of the given decimal number and multiply it with the
base b of the new system. The integral part of the product constitutes the leftmost digit
of the fractional part in the new system.
(b) Now again multiply the fractional part resulting in step (a) by the base b of the new
system. The integral part of the resultant product is the second digit from the left in the
new system.
Repeat the above step until we encounter a zero-fractional part or a duplicate fractional
part. The integer part of this last product will be the rightmost digit of the fractional part
of the new number.
Eg: Convert 54.45 into its binary equivalent.
(a) Consider the integer part i.e. 54 and apply the steps listed under conversion of
integer part i.e.
(b) Conversion of fractional part:
Product integral part Binary number
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 5/21
0.45 2 = 0.90 0
0.9 2 = 1.80 1
0.8 2 = 1.6 1
0.6 2 = 1.2 1
0.2 2 = 0.4 0
0.4 2 = 0.8 0
0.8 2 = 1.6 1
0.6 2 = 1.2 1
0.2 2 = 0.4 0
0.4 2 = 0.8 0
0.8 2 = 1.6 1
Here the overbar denotes the repetition of the binary digits.
Note: Using binary system as an intermediate stage we can easily convert octal
numbers to hexadecimal numbers and vice-versa.
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 6/21
In the above two examples we have grouped the binary digits suitably either to
quadruplets or triplets to convert octal to hexadecimal and hexadecimal to octal
numbers respectively.
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 7/21
Computer Representation of Numbers
Computers are designed to use binary digits to represent numbers and other information. The
computer memory is organized into strings of bits called words of same length. Decimal numbers
are first converted into their binary equivalents and then are represented in either integer or
floating point form.
Integer Representation
The largest decimal number that can be represented , in binary form , in a computer depends on
its word length. An n-bit word computer can handle a number as large as . For instance
a 16-bit word machine can represent numbers as large as . How do we represent negative
numbers ? Negative numbers are stored using complement. This is obtained by taking the
complement of the binary representation of the positive number and then adding to it.
For example let us represent in the binary form.
Here in an extra zero to the left of the binary number is appended to indicate
that it is positive. If this extra leftmost binary digit is set to then it indicates that the binary
number is negative. So the general convention for storing signed numbers is to append a binary
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 8/21
digit 0 or to the left of the binary number depending on the positive or negative sign of the
number. So in a n-bit word computer, as one bit is reserved for sign , one can use maximum up
to bits to store a signed number. So the largest signed number a 16-bit word can
represent is . On this machine since zero is defined as
it is redundant to use the number to
define a "minus zero". It is usually employed to represent an additional negative number i.eand hence the range of signed numbers that can be represented on a 16-bit word
machine is from to .
Floating Point Representation
Fractional numbers such as and large numbers like which fall outside
the range of a d-bit word machine , say for instance 16-bit word machine are stored and
processed in Exponential form. In exponential form these numbers have an embedded decimal
point and are called floating point numbers or real numbers. The floating point representation of
a real number is where is called mantissa and is the exponent. So thefloating - point representation of the fractional number is and
that of the large number is .
Typically computers use a 32-bit representation for a floating point. The left most bit is reserved
for the sign. The next seven bits are reserved for exponent and the last twenty four bits are used
for mantissa.
The shifting of the decimal point to the left of the most significant digit is called normalization
and the numbers represented in the normalized form are known as normalized floating pointnumbers.
For example , the normalized floating point form of the numbers , ,
are:
0.00695 = = .695E-2
56.2547 = = .562547E2
-684.6 = = -.6846E3
Inherent Errors
Inherent errors arise due to the data errors or due to the conversion errors.
Data Errors
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 9/21
If the data supplied for a problem is obtained from some experiment or from some measurement
then it is prone to errors due to the limitations in instrumentation or reading. Such errors are also
referred to as empirical errors. So when the data supplied is correct , say to two decimals there is
no use performing arithmetic accurate to four decimals!
Conversion Errors
Conversion errors arise due to the limitation on the number of the bits used for representingnumbers both under integer and floating point representation. So it is also called as
representation error. The digits that are not retained constitute the round-off error.
For example consider the case of representing a decimal number in a computer. The binary
equivalent of has a non-terminating form like ...... but the computer
has limited number of bits. If we add ten such numbers in a computer the result will not be
exactly due to the round -off error during the conversion of to binary form.
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 10/21
Computer Arithmetic
The most common computer arithmetic are integer arithmetic and floating point
arithmetic. Now these arithmetic systems will be briefly discussed.
Integer Arithmetic :
The result of any integer arithmetic operation is always an integer. The range of
integers that can be represented on a given computer is finite. The result of an integer
division is usually given as a quotient. The remainder is truncated as fractionalquantities which cannot be represented under the integer representation.
Eg:
Remark:
(1) Simple rules like , where are integers may not hold
under computer integer arithmetic due to the truncation of the remainder.
(2) An integer operation may result in a very small or a very large number which is
beyond the range of that the computer can handle. When the result is larger than the
maximum limit , it is referred to as an overflow and when it is less than the lower limit , it
is referred to as underflow.
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 11/21
Float ing Point Arithmetic:
In the floating point arithmetic all the numbers are stored and processed in normalized
exponential form . Firstly the process of addition under floating point arithmetic will be
discussed.
Addition under Floating Point Arithmetic:
Let and be the two numbers to be added and be the result. The normalizedfloating point representation of and are , ,
respectively. The rules for carrying out the addition are as follows :
(a) Set = maximum .
Say then .
b) Right shift by places, so that the exponent of are the same
and call it
c) Set
d) Normalize and let be its normalized representation.
e) Set
E.g : Add the numbers and
a)
b) on right shifting by 3 we get
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 12/21
c)
d) which is already in normalized form
i.e ,
e)
Remark: Substraction is nothing but addition of numbers with different signs.
Multiplication Under Floating Point Arithmetic:
If , are two real numbers in normalized form then their
product
E.g : Say , then
Since is already in normalized form ,
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 13/21
.
Remark:
(1)
(after normalization)
During the floating point arithmetic mantissa 'M' may be truncated due to the limitation
on the number of bits available for its representation on a computer.
(2) Floating point arithmetic is prone to the following errors:
a) Errors due to inexact representation of a decimal number in binary form. For example
. Since binary equivalent of has a repeating
fraction, it has to be terminated at some point.
b) Error due to round-off-effect
c) Subtractive cancellation : It is possible that some mantissa positions are unspecified.
These unspecified positions may be arbitrarily filled by the computer.This may lead to
serious loss of significance when two nearly equal numbers are subtracted.
For example if and thenhas only one significant digit. However the
mantissa will have provision to store more number of significant digits, which may get
arbitrarily filled as they may be specified. Further if the operands themselves are
approximate representation due to this non-specification problem the overall loss of
significance will get serious.
d) Basic laws of arithmetic such as associative, distributive may not be satisfied i.e
(3) Numerical computation involves a series of computations consisting of basic
arithmetic operation. There may be round-off or truncation error at every step of the
computation. These errors accumulate with the increasing number of computations in a
process. There can be situations where even a single operation may magnify the
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 14/21
roundoff errors to a level that completely ruins the result.
A computation process in which the cumulative effect of all input errors is grossly
magnified is said to be numerically unstable. It is important to understand the
conditions under which the process is likely to be 'sensitive' to input errors and become
unstable. Investigations to see how small changes in input parameters influence the
output are termed as sensitivity analysis.
(4) Roundoff and truncation errors effect on the final numerical result may be reduced
by
a) Increasing the significant figures of the computer either through hardware or through
software manipulations.For instance one may use double precision for floating point
arithmetic operations.
b) Minimizing the number of arithmetic operations. Here one may try to rearrange a
formula to reduce the number of arithmetic operations. For example in the evaluation of
a polynomial , it may be rearranged as
which requires less arithmetic operations.
c)A formula like may be replaced by to avoid substractive cancellation
d) While finding the sum of set of numbers, arrange the set so that they are in
ascending order of absolute value. i.e when then is better
than .
5) It may not be possible to simultaneously reduce both the truncation and round-off
error effects on the final result of a numerical computation. For instance in an iterative
procedure when one tries to reduce the round-off error by increasing the step size , it
may lead to higher truncation error and vice-versa. Hence proper care has to be taken
to reduce both the errors simultaneously.
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 15/21
Numerical Errors:
Numerical errors arise during computations due to round-off errors and truncation
errors.
Round-off Errors:
Round-off error occurs because computers use fixed number of bits and hence fixed
number of binary digits to represent numbers. In a numerical computation round-off
errors are introduced at every stage of computation. Hence though an individualround-off error due to a given number at a given numerical step may be small but the
cumulative effect can be significant.
When the number of bits required for representing a number are less then the number
is usually rounded to fit the available number of bits. This is done either by chopping or
by symmetric rounding.
Chopping : Rounding a number by chopping amounts to dropping the extra digits. Here
the given number is truncated. Suppose that we are using a computer with a fixed word
length of four digits. Then the truncated representation of the number will be
. The digits will be dropped. Now to evaluate the error due to chopping let us
consider the normalized representation of the given number i.e.
chopping error in representing .
So in general if a number is the true value of a given number and is the
normalized form of the rounded (chopped) number and is the
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 16/21
normalized form of the chopping error then
Since , the chopping error
Symmetric Round-off Error :
In the symmetric round-off method the last retained significant digit is rounded up by 1
if the first discarded digit is greater or equal to 5.In other words, if in is such
that then the last digit in is raised by 1 before chopping . For
example let be two given numbers to be rounded to five
digit numbers. The normalized form x and y are and .
On rounding these numbers to five digits we get and
respectively. Now w.r.t here
In either case error .
Truncation Errors:
Often an approximation is used in place of an exact mathematical procedure. For
instance consider the Taylor series expansion of say i.e.
Practically we cannot use all of the infinite number of terms in the series for computing
the sine of angle x. We usually terminate the process after a certain number of terms.
The error that results due to such a termination or truncation is called as 'truncation
error'.
Usually in evaluating logarithms, exponentials, trigonometric functions, hyperbolic
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 17/21
functions etc. an infinite series of the form is replaced by a finite series
. Thus a truncation error of is introduced in the computation.
For example let us consider evaluation of exponential function using first three terms at
Truncation Error
Some Fundamental definitions of Error Analysis:
Absolute and Relative Errors:
Absolute Error: Suppose that and denote the true and approximate values of a
datum then the error incurred on approximating by is given by
and the absolute error i.e. magnitude of the error is given by
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 18/21
Relative Error: Relative Error or normalized error in representing a true datum by
an approximate value is defined by
and
Sometimes is defined by
If and then
Machine Epsilon: Let us assume that we have a decimal computer system.
We know that we would encounter round-off error when a number is represented in
floating-point form. The relative round-off error due to chopping is defined by
Here we know that
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 19/21
i.e. maximum relative round-off error due to chopping is given by . We know that
the value of 'd' i.e the length of mantissa is machine dependent. Hence the maximum
relative round-off error due to chopping is also known as machine epsilon .
Similarly , maximum relative round-off error due to symmetric rounding is given by
Machine-Epsilon for symmetric rounding is given by,
It is important to note that the machine epsilon represents upper bound for the
round-off error due to floating point representation.
For a computer system with binary representation the machine epsilon due to chopping
and symmetric rounding are given by
respectively.
Eg: Assume that our binary machine has 24-bit mantissa. Then
. Say that our system can represent a q decimal digit
mantissa.
Then,
i.e
that our machine can store numbers with seven significant decimal digits.
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 20/21
Approximations and Round-off Errors
Approximations and errors are integral part of numerical methods. Prior to using the numerical
methods it is essential to know how errors arise, how they grow during the numerical
computations and how they affect the accuracy of a solution.Errors can come in a variety of
forms and sizes. To get a quick feel let us look at the following taxonomy of errors:
Further discussion will be focussed on errors due to computing machine and those due to
numerical method. Firstly the notion of significant digits will be introduced.
Significant Digits
Usually , the numerical solution to a given problem is sought to a desired level of accuracy and
mywbut.com
8/6/2019 Computer Number Systems, Approximation in Numerical Computation
http://slidepdf.com/reader/full/computer-number-systems-approximation-in-numerical-computation 21/21
precision wherein the error is below a set tolerance level.The idea of significant numbers is
essential to understand the concept of accuracy and precision in the solution and also to
designate the reliability of a numerical value.
The Significant Digits of a number are those that can be used with confidence. Suppose we seek
a numerical solution to an accuracy of and obtain as solution . Here the
solution is reliable only up to the first three decimal places i.e or the solution has
five significant digits . Some numbers like , , etc. have infinite number
of significant digits. For example consider ,
=
Such numbers can never be represented exactly on a computer which operates with fixed
number of significant digits due to hardware limitations.The omission of certain digits from such
numbers results in what is called round-off-error. Some thumb rules on the significant digits ,within the desired level of accuracy are :
(a) All non-zero digits are significant ,
(b)All zeros occurring between non-zero digits are significant,
(c)Trailing zeros following a decimal point are significant.
(e.g , , have three significant digits),
(d) Zeros between the decimal point and preceding a non-zero digit are not significant. For
example , , , have
four significant digits.
(e) Trailing zeros in large numbers without the decimal point are not significant. For instance
may be written in scientific notation as and contains only two significant
digits.
The concept of accuracy and precision are closely related to significant digits as follows:
Accuracy refers to the number of significant digits in a value. For example the number is
accurate to five significant digits: Precision refers to the number of decimal positions i.e the order
of magnitude of the last digit in a value. The number has a precision of or .