61
Summer 2007 CISC121 - Prof. McLeod 1 CISC121 – Lecture 12 Last time: – Efficient recursive and non- recursive sorts. – Analyzing the complexity of recursive methods. • Today – Last lecture!

Summer 2007CISC121 - Prof. McLeod1 CISC121 – Lecture 12 Last time: –Efficient recursive and non-recursive sorts. –Analyzing the complexity of recursive

Embed Size (px)

Citation preview

Summer 2007 CISC121 - Prof. McLeod 1

CISC121 – Lecture 12

• Last time:– Efficient recursive and non-recursive sorts.– Analyzing the complexity of recursive methods.

• Today– Last lecture!

Summer 2007 CISC121 - Prof. McLeod 2

You Will Need To:

• Look at exercise 5 and assignment 5.

• If you wish to replace any of your assignment marks you can submit the “makeup” assignment on the day of the final exam.

Summer 2007 CISC121 - Prof. McLeod 3

Final Exam…

• On the 16th, unless other arrangements have been made.

• Exam topics are listed off the course web site.• Exambank has 18 exams from 1996 to 2006.• I will not provide full exam solutions, but will be

happy to discuss individual exam problems.• Ana or Krista can supply extra tutoring if needed.

Ask them if you want an exam prep tutorial.

Summer 2007 CISC121 - Prof. McLeod 4

Today

• Analyze the complexity of mergesort.

• Number representation and roundoff error.

• Movie!

• “Satisfaction” survey

Summer 2007 CISC121 - Prof. McLeod 5

Mergesort – “aMergeSort” Code• Code for sorting arrays:

public static void aMergeSort (int[] A) { aMergeSort(A, 0, A.length-1); } // end aMergeSort public static void aMergeSort (int[] A, int first,

int last) { if (first < last) { int mid = (first + last) / 2; aMergeSort(A, first, mid); aMergeSort(A, mid + 1, last); aMerge(A, first, last); } // end if } // end aMergeSort recursive

Summer 2007 CISC121 - Prof. McLeod 6

Mergesort – “aMerge” Code private static void aMerge (int[] A, int first, int

last) { int mid = (first + last) / 2; int i1 = 0, i2 = first, i3 = mid + 1; int[] temp = new int[last - first + 1]; while (i2 <= mid && i3 <= last) { if (A[i2] < A[i3]) { temp[i1] = A[i2]; i2++; } else { temp[i1] = A[i3]; i3++; } i1++; } // end while

Summer 2007 CISC121 - Prof. McLeod 7

Mergesort – “aMerge” Code - Cont. while (i2 <= mid) { temp[i1] = A[i2]; i2++; i1++; } // end while

while (i3 <= last) { temp[i1] = A[i3]; i3++; i1++; } // end while

i1 = 0; i2 = first; while (i2 <= last) { A[i2] = temp[i1]; i1++; i2++; } // end while

} // end aMerge

Summer 2007 CISC121 - Prof. McLeod 8

Complexity of Mergesort

• Consider the aMergeSort code shown above:• Suppose that the entire method takes t(n) time,

where n is A.length. We want to know the big O notation for t(n).

• There are no loops in aMergeSort, just some constant time operations, the two recursive calls and the call to aMerge.

)(22

)( ntn

tn

tant merge

Summer 2007 CISC121 - Prof. McLeod 9

Complexity of Mergesort - Cont.

• What is the time function for aMerge?• There is some O(1) stuff and four loops that are

O(n):

• So,

cnantmerge )(

cnan

tn

tant

22)(

Summer 2007 CISC121 - Prof. McLeod 10

Complexity of Mergesort - Cont.

• So far, we have not made any mention of the state of the data. Does it make any difference if the data is in reverse order (worst case), random order (average case) or in order already (best case)?

• Express t(n) in a recursive expression:

otherwisecnn

tn

ta

nifb

nt

22

1

)(

Summer 2007 CISC121 - Prof. McLeod 11

Complexity of Mergesort - Cont.

• Assume that n is a power of 2:

• (It is easy enough to show that the proof still holds when n is not a power of two - but I’m not going to do that here).

otherwisecnn

ta

nifb

nt

22

1

)(

Summer 2007 CISC121 - Prof. McLeod 12

Complexity of Mergesort - Cont.

• Substitute n/2 for n, to get t(n/2):

)2(22

23)(

2222)(

,

222

2

22

2

2

icnn

tant

cnn

cn

taant

or

nc

nta

nt

Summer 2007 CISC121 - Prof. McLeod 13

Complexity of Mergesort - Cont.

• Do the next unrolling, which will be n/22:

• So, after i unrolling’s:

)3(32

27)(3

3

icn

ntant

icnn

tanti

ii

2212)(

Summer 2007 CISC121 - Prof. McLeod 14

Complexity of Mergesort - Cont.

• This recursion stops when the anchor case, n 1 is encountered. This will occur when:

• Substituting this back in the equation on the previous slide:

niorn

whenor

n

i

i

log,2

,

12

Summer 2007 CISC121 - Prof. McLeod 15

Complexity of Mergesort - Cont.

• At the anchor case:

• Now the equation can be simplified to yield the big O notation, which indicates that t(n) is O(nlog(n)).

ncnknjncnnbannt

or

ncnntannt

or

cnnn

nntannt

loglog1)(

,

log)1(1)(

,

)(log1)(

Summer 2007 CISC121 - Prof. McLeod 16

public static void quickSort (int[] A, int first, int last) {

int lower = first + 1; int upper = last; swap(A, first, (first+last)/2);

int pivot = A[first]; while (lower <= upper) { while (A[lower] < pivot) lower++; while (A[upper] > pivot) upper--; if (lower < upper) swap(A, lower++, upper--); else lower++; } swap(A, upper, first); if (first < upper - 1) quickSort(A, first, upper-1); if (upper + 1 < last) quickSort(A, upper+1, last);} // end quickSort(subarrays)

Summer 2007 CISC121 - Prof. McLeod 17

Complexity of Quicksort

• The worst case is when a near-median value is not chosen – the pivot value is always a maximum or a minimum value. Now the algorithm is O(n2).

• However, if the pivot values are always near the median value of the arrays, the algorithm is O(nlog(n)) – which is the best case. (See the derivation of this complexity for merge sort).

• The average case also turns out to be O(nlog(n)).

Summer 2007 CISC121 - Prof. McLeod 18

Number Representation

• Binary numbers or “base 2” is a natural representation of numbers to a computer.

• As a transition, hexadecimal (or “hex”, base 16) numbers are also used.

• Octal (base 8) numbers are used to a lesser degree.

• Decimal (base 10) numbers are *not* naturally represented in computers.

Summer 2007 CISC121 - Prof. McLeod 19

Number Representation - Cont.

• In base 2 (digits either 0 or 1):

r=2, a binary number: (110101.11)2=

1×25+1×24+0×23+1×22+0×21+1×20 +1×2-1 +1×2-2 =

=53.75 (in base 10)

“r” is the “radix” or the base of the number

Summer 2007 CISC121 - Prof. McLeod 20

Number Representation - Cont.

• Octal Numbers: a base-8 system with 8 digits: 0, 1, 2, 3, 4, 5, 6 and 7:

• For example:

(127.4)8 = 1×82+2×81+7×80+4×8-1=87.5

Summer 2007 CISC121 - Prof. McLeod 21

Number Representation - Cont.

• Hexadecimal Numbers: a base-16 system with 16 digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F:

• For example:

(B65F)16 = 11×163+6×162+5×161+15×160 = 46687.

Summer 2007 CISC121 - Prof. McLeod 22

Number Representation - Cont.

• The above series show how you can convert from binary, octal or hex to decimal.

• How to convert from decimal to one of the other bases?:

• integral part: divide by r and keep the remainder.• decimal part: multiply by r and keep the carry• “r” is the base - either 2, 8 or 16

Summer 2007 CISC121 - Prof. McLeod 23

Number Representation - Cont.

• For example,

convert 625.7610

to binary:

• So, 62510 is

10011100012

Divisor(r) Dividend Remainder

2 625

2 312 (quotient) 1

2 156 0

2 78 0

2 39 0

2 19 1

2 9 1

2 4 1

2 2 0

2 1 0

0 1

most significant digit

least significant digit

Summer 2007 CISC121 - Prof. McLeod 24

Number Representation - Cont.

• For the “0.7610”

part:

• So, 0.7610 is

0.11000010102

• 625.76 is:

(1001110001.1100001010)2

Multiplier(r) Multiplicand Carry

2 0 .76

2 1 .52 (product) 1

2 1 .04 1

2 0 .08 0

2 0 .16 0

2 0 .32 0

2 0 .64 0

2 1 .28 1

2 0 .56 0

2 1 .02 1

...

Summer 2007 CISC121 - Prof. McLeod 25

Number Representation - Cont.

• Converting between binary, octal and hex is much easier - done by “grouping” the numbers:

• For example:

(010110001101011.111100000110)2=(?)8

010 110 001 101 011 . 111 100 000 110

(2 6 1 5 3 . 7 4 0 6)8

Summer 2007 CISC121 - Prof. McLeod 26

Number Representation - Cont.

• Another example:

(2C6B.F06)16=(?)2

(2 C 6 B . F 0 6)16

( 0010 1100 0110 1011 . 1111 0000 0110)2

Summer 2007 CISC121 - Prof. McLeod 27

From Before: Integer Primitive Types in Java

• For byte, from -128 to 127, inclusive (1 byte).• For short, from -32768 to 32767, inclusive (2

bytes).• For int, from -2147483648 to 2147483647,

inclusive (4 bytes). • For long, from -9223372036854775808 to

9223372036854775807, inclusive (8 bytes).

• A “byte” is 8 bits, where a “bit” is either 1 or 0.

Summer 2007 CISC121 - Prof. McLeod 28

Storage of Integers

• An “un-signed” 8 digit binary number can range from 00000000 to 11111111

• 00000000 is 0 in base 10.• 11111111 is 1x20 + 1x21 + 1x22 + … + 1x27 = 255,

base 10.

Summer 2007 CISC121 - Prof. McLeod 29

Storage of Integers - Cont.

• So, how can a negative binary number be stored?• One way is to use the Two’s Complement

system of storage.• Make the most significant bit a negative number:• So, the lowest “signed” binary 8 digit number is

now: 10000000, which is -1x27, or -128 base 10.

Summer 2007 CISC121 - Prof. McLeod 30

Storage of Integers - Cont.

• Two’s Complement System:

binary base 10

10000000 -128

10000001 -127

11111111 -1

00000000 0

00000001 1

01111111 127

Summer 2007 CISC121 - Prof. McLeod 31

Storage of Integers - Cont.

• For example, the binary number

10010101 is

1x20 + 1x22 + 1x24 - 1x27

= 1 + 4 + 16 - 128

= -107 base 10

• Now you can see how the primitive integer type, byte, ranges from -128 to 127.

Summer 2007 CISC121 - Prof. McLeod 32

Storage of Integers - Cont.

• Suppose we wish to add 1 to the largest byte value: 01111111+00000001

• This would be equivalent to adding 1 to 127 in base 10 - the result would normally be 128.

• In base 2, using two’s compliment, the result of the addition is 10000000, which is -128 in base 10!

• So integer numbers wrap around, in the case of overflow - no warning is given in Java!

Summer 2007 CISC121 - Prof. McLeod 33

Storage of Integers - Cont.

• An int is stored in 4 bytes using “two’s complement”.

• An int ranges from:

10000000 00000000 00000000 00000000

to

01111111 11111111 11111111 11111111

or -2147483648 to 2147483647 in base 10

Summer 2007 CISC121 - Prof. McLeod 34

Real Primitive Types

• For float, (4 bytes) roughly ±1.4 x 10-38 to ±3.4 x 1038 to 7 significant digits.

• For double, (8 bytes) roughly ±4.9 x 10-308 to ±1.7 x 10308 to 15 significant digits.

Summer 2007 CISC121 - Prof. McLeod 35

Storage of Real Numbers

• The system used to store real numbers in Java complies with the IEEE standard number 754.

• Like an int, a float is stored in 4 bytes or 32 bits.

• These bits consist of 24 bits for the mantissa and 8 bits for the exponent:

00000000 00000000 00000000 00000000

mantissa exponent

Summer 2007 CISC121 - Prof. McLeod 36

Storage of Real Numbers - Cont.

• So a value is stored as:

value = mantissa 2exponent

• The exponent for a float can range from 2-128 to 2128, which is about 10-38 to 1038.

• The float mantissa must lie between -1.0 and 1.0 exclusive, and will have about 7 significant digits when converted to base 10.

Summer 2007 CISC121 - Prof. McLeod 37

Storage of Real Numbers - Cont.

• The double type is stored using 8 bytes or 64 bits - 53 bits for the mantissa, and 11 bits for the exponent.

• The exponent gives numbers between 2-1024 and 21024, which is about 10-308 and 10308.

• The mantissa allows for the storage of about 16 significant digits in base 10.

• (Double.MAX_VALUE is: 1.7976931348623157E308)

Summer 2007 CISC121 - Prof. McLeod 38

Storage of Real Numbers - Cont.

• See the following web site for more info:

http://grouper.ieee.org/groups/754/

• Or:

http://en.wikipedia.org/wiki/IEEE_floating-point_standard

Summer 2007 CISC121 - Prof. McLeod 39

Storage of Real Numbers - Cont.

• So, a real number can only occupy a finite amount of storage in memory.

• This effect is very important for two kinds of numbers:– Numbers like 0.1 that can be written exactly in base

10, but cannot be stored exactly in base 2.– Real numbers (like or e) that have an infinite number

of digits in their “real” representation can only be stored in a finite number of digits in memory.

• And, we will see that it has an effect on the accuracy of mathematical operations.

Summer 2007 CISC121 - Prof. McLeod 40

Roundoff Error

• Consider 0.1:

(0.1)10 = (0.0 0011 0011 0011 0011 0011…)2

• What happens to the part of a real number that cannot be stored?

• It is lost - the number is either truncated or rounded (truncated in Java).

• The “lost part” is called the Roundoff Error.

Summer 2007 CISC121 - Prof. McLeod 41

Storage of “Real” or “Floating-Point” Numbers - Cont.

• Compute:

• And, compare to 1000.

float sum = 0;

for (int i = 0; i < 10000; i++)

sum += 0.1;

System.out.println(sum);

10000

1

1.0i

Summer 2007 CISC121 - Prof. McLeod 42

Storage of “Real” or “Floating-Point” Numbers - Cont.

• Prints a value of 999.9029 to the screen.• If sum is declared to be a double then the

value: 1000.0000000001588 is printed to the screen.

• So, the individual roundoff errors have piled up to contribute to a cumulative error in this calculation.

• As expected, the roundoff error is smaller for a double than for a float.

Summer 2007 CISC121 - Prof. McLeod 43

Roundoff Error – Cont.

• This error is referred to in two different ways:

• The absolute error:

absolute error = |x - xapprox|

• The relative error:

relative error = (absolute error) |x|

Summer 2007 CISC121 - Prof. McLeod 44

Roundoff Error - Cont.

• So for the calculation of 1000 as shown above, the errors are:

• The relative error on the storage of 0.1 is the absolute error divided by 1000.

Type Absolute Relative

float 0.0971 9.71E-5

double 1.588E-10 1.588E-13

Summer 2007 CISC121 - Prof. McLeod 45

The Effects of Roundoff Error

• Roundoff error can have an effect on any arithmetic operation carried out involving real numbers.

• For example, consider subtracting two numbers that are very close together:

• Use the function

for example. As x approaches zero, cos(x) approaches 1.

)cos(1)( xxf

Summer 2007 CISC121 - Prof. McLeod 46

The Effects of Roundoff Error

• Using double variables, and a value of x of 1.0E-12, f(x) evaluates to 0.0.

• But, it can be shown that the function f(x) can also be represented by f’(x):

• For x = 1.0E-12, f’(x) evaluates to 5.0E-25.• The f’(x) function is less susceptible to roundoff

error.

)cos(1

)(sin)()('

2

x

xxfxf

Summer 2007 CISC121 - Prof. McLeod 47

The Effects of Roundoff Error - Cont.

• Another example. Consider the smallest root of the polynomial: ax2+bx+c=0:

• What happens when ac is small, compared to b?

• It is known that for the two roots, x1 and x2:

a

acbbx

2

42

1

a

cxx 21

Summer 2007 CISC121 - Prof. McLeod 48

The Effects of Roundoff Error - Cont.

• Which leads to an equation for the root which is not as susceptible to roundoff error:

• This equation approaches –c/b instead of zero when ac << b2.

acbb

cx

4

221

Summer 2007 CISC121 - Prof. McLeod 49

The Effects of Roundoff Error - Cont.

• The examples above show what can happen when two numbers that are very close are subtracted.

• Remember that this effect is a direct result of these numbers being stored with finite accuracy in memory.

Summer 2007 CISC121 - Prof. McLeod 50

The Effects of Roundoff Error - Cont.

• A similar effect occurs when an attempt is made to add a comparatively small number to a large number:

boolean aVal = ((1.0E10 + 1.0E-20)==1.0E10);System.out.println(aVal);

• Prints out true to the screen• Since 1.0E-20 is just too small to affect any of the bit

values used to store 1.0E10. The small number would have to be about 1.0E-5 or larger to affect the large number.

• So, keep this behaviour in mind when designing expressions!

Summer 2007 CISC121 - Prof. McLeod 51

The Effect on Summations

• Taylor Series are used to approximate many functions. For example:

• For ln(2):

1

1)1()1ln(

i

ii

i

xx

...4

1

3

1

2

11

)1()2ln(

1

1

i

i

i

Summer 2007 CISC121 - Prof. McLeod 52

The Effect on Summations – Cont.

• Since we cannot loop to infinity, how many terms would be sufficient?

• Since the sum is stored in a finite memory space, at some point the terms to be added will be much smaller than the sum itself.

• If the sum is stored in a float, which has about 7 significant digits, a term of about 1x10-8 would not be significant. So, i would be about 108 - that’s a lot of iterations!

Summer 2007 CISC121 - Prof. McLeod 53

The Effect on Summations - Cont.

• On testing using a float, it took 33554433 iterations and 25540 msec to compute! (sum no longer changing, value = 0.6931375)

• Math.log(2) = 0.6931471805599453• So, roundoff error had a significant effect and the

summation did not even provide the correct value. A float could only provide about 5 correct significant digits, tops.

• For double, about 1015 iterations would be required! (I didn’t try this one…)

• So, this series does not converge quickly, and roundoff error has a strong effect on the answer!

Summer 2007 CISC121 - Prof. McLeod 54

The Effect on Summations - Cont.

• Here is another way to compute natural logs:

• Using x = 1/3 will provide ln(2).

0

12

12

12

1

1ln

i

ixix

x

Summer 2007 CISC121 - Prof. McLeod 55

The Effect on Summations - Cont.

• For float, this took 8 iterations and <1msec (value = 0.6931472).

• Math.log(2) = 0.6931471805599453• For double, it took 17 iterations, <1 msec to give

the value = 0.6931471805599451• Using the Windows calculator ln(2) =

0.69314718055994530941723212145818 (!!)• So, the use of the 17 iterations still introduced a

slight roundoff error.

Summer 2007 CISC121 - Prof. McLeod 56

Aside - Extended Precision in Windows

Summer 2007 CISC121 - Prof. McLeod 57

Numeric Calculations

• Error is introduced into a calculation through two sources (assuming the formulae are correct!):– The inherent error in the numbers used in the

calculation.– Error resulting from roundoff error.

• Often the inherent error dominates the roundoff error.

• But, watch for conditions of slow convergence or ill-conditioned matrices, where roundoff error will accumulate or is amplified and end up swamping out the inherent error.

Summer 2007 CISC121 - Prof. McLeod 58

Numeric Calculations - Cont.

• Once a number is calculated, it is very important to be able to estimate the error using both sources, if necessary.

• The error must be known in order that the number produced by your program can be reported in a valid manner.

• This is a non-trivial topic in numeric calculation that we will not discuss in this course.

Summer 2007 CISC121 - Prof. McLeod 59

Real World Roundoff Error Disasters

• Arianne 5 Launch• Patriot Missiles

• See the movie!

Summer 2007 CISC121 - Prof. McLeod 60

Patriot Missile Problem

• The Patriot’s tracking system used a 24 bit number to keep track of the number of tenth seconds passed since the tracking system was turned “on”.

• 0.1 in binary is 0.0001100110011001100110011001100....

• If you only have 24 bits to store this number then the error is 0.0000000000000000000000011001100…. or 0.000000095 in base 10

Summer 2007 CISC121 - Prof. McLeod 61

Patriot Missile Problem, Cont.

• So over 100 hours of operation:

0.000000095×100×60×60×10=0.34 seconds out.

• A scud is moving a mach 5 = 1,676 metres per second, so the error is 0.34×1,676 = 570 metres.