64
Data Structures and Algorithm Analysis Dr. Malek Mouhoub Computer Science Department University of Regina Fall 2010 Malek Mouhoub, CS340 Fall 2010 1

Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

  • Upload
    others

  • View
    9

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

Data Structures and Algorithm Analysis

Dr. Malek Mouhoub

Computer Science Department

University of Regina

Fall 2010

Malek Mouhoub, CS340 Fall 2010 1

Page 2: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1. Algorithm Analysis

1. Algorithm Analysis

• 1.1 Mathematics Review

• 1.2 Introduction to Algorithm Analysis

• 1.3 Asymptotic notation and Growth of functions

• 1.4 Case Study

• 1.5 Data Structures and Algorithm Analysis

Malek Mouhoub, CS340 Fall 2010 2

Page 3: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.1 Mathematics Review

1.1 Mathematics Review

¨§

¥¦Exponents

XA XB = XA+B

XA

XB= XA−B

(XA)B = XAB

XN + XN = 2XN 6= X2N

2N + 2N = 2N+1

Malek Mouhoub, CS340 Fall 2010 3

Page 4: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.1 Mathematics Review

¨§

¥¦Logarithms

• By default, logarithms used in this course are to the base 2.

• XA = B ⇔ logX B = A

• logA B = logC BlogC A , A, B, C > 0, A 6= 1

• log AB = log A + log B; A,B > 0

• log A/B = log A− log B

• log (AB) = B log A

• log X < X ∀X > 0

• log 1 = 0, log 2 = 1, log 1, 024 = , log 1, 048, 576 =

Malek Mouhoub, CS340 Fall 2010 4

Page 5: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.1 Mathematics Review

¤£

¡¢Summations

•∑N

i=1ai = a1 + a2 + · · ·+ aN

• limN→∞∑N

i=1ai =

∑∞i=1

ai = a1 + a2 + · · · (infinite sum)

• Linearity

–∑N

i=1(cai + dbi) = c

∑N

i=1ai + d

∑N

i=1bi

–∑N

i=1Θ(f(i)) = Θ(

∑N

i=1f(i))

• General algebraic manipulations :

–∑N

i=1f(N) = Nf(N)

–∑N

i=n0f(i) =

∑N

i=1f(i)−

∑n0−1

i=1f(i)

Malek Mouhoub, CS340 Fall 2010 5

Page 6: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.1 Mathematics Review

¤£

¡¢Summations

• Geometric series :

–∑N

i=0Ai = AN+1−1

A−1

– if 0 < A < 1 then∑N

i=0Ai ≤ 1

1−A

• Arithmetic series :

–∑N

i=1i =

N(N+1)2 = N2+N

2 ≈ N22

–∑N

i=1i2 =

N(N+1)(2N+1)6 ≈ N3

3

–∑N

i=1ik ≈ Nk+1

|k+1| k 6= −1

∗ if k = −1 then HN =∑N

i=11i ≈ loge N

∗ error in approx :γ ≈ 0, 57721566

Malek Mouhoub, CS340 Fall 2010 6

Page 7: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.1 Mathematics Review

¤£

¡¢Products

• ∏Ni=1 ai = a1 × a2 × · · · × aN

• log(∏N

i=1 ai) =∑N

i=1 log ai

Malek Mouhoub, CS340 Fall 2010 7

Page 8: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.1 Mathematics Review

¨§

¥¦Proving statements

• Proof by induction

1. Proving a base case : establishing that a theorem is true for some small values.

2. Inductive hypothesis : the theorem is assumed to be true for all cases up to some limit

k.

3. Given this assumption, show that the theorem is true for k + 1

• Proof by Counter example : find an example showing that the theorem is not true.

• Proof by Contradiction : Assuming that the theorem is false and showing that this

assumption implies that some known property is false, and hence the original assumption

was erroneous.

Malek Mouhoub, CS340 Fall 2010 8

Page 9: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

1.2 Introduction to algorithm analysis

Boss gives the following problem to Mr Dupont, a fresh hired BSc in

computer science (to test him . . . or may be just for fun) :

T (1) = 3

T (2) = 10

T (n) = 2T (n− 1)− T (n− 2)

What is T (100) ?

Malek Mouhoub, CS340 Fall 2010 9

Page 10: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

Mr Dupont decides to directly code a recursive function tfn, in

Java, to solve the problem :

if (n==1) return 3; else

if (n==2) return 10;else return 2 ∗ tfn(n-1) - tfn(n-2);

1. First mistake : no analysis of the problem

⇒ risk to be fired !

2. Second mistake : bad choice of the programming language

⇒ increasing the risk to be fired !

Malek Mouhoub, CS340 Fall 2010 10

Page 11: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

n = 1 → 3

n = 2 → 10

n = 3 → 17

n = 35 → it takes 4.19 seconds

n = 100 → waits . . . and then kills the program !

Mr Dupont decides then to use C :

if (n==1) return 3;

if (n==2) return 10;

return 2 ∗ tfn(n-1) - tfn(n-2);

n = 35 → it takes only 1.25 seconds

n = 50 → waits and then kills the program !

Malek Mouhoub, CS340 Fall 2010 11

Page 12: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

Finally, Mr Dupont decides to (experimentally) analyze the problem : he times both

programs and plots the results.

0.1

1

10

100

25 30 35 40

seco

nd

s

JavaC

N

It seems that each time n increases by 1 the time increases by≈ 1.62. At n = 40

the C program took 13.79 seconds, so for n = 100 he estimates :

13.79× 1.6260 ≈ 1, 627, 995 years!!!

Malek Mouhoub, CS340 Fall 2010 12

Page 13: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

• Mr Dupont remembers he has seen this kind of problem in one

of the courses he has taken (data structures course).

• After consulting his course notes, Mr Dupont decides to use

Dynamic Programming :

int t[n+1];

t[1]=3;

t[2]=10;

for(i=3;i¡=n;i++)

t[i] = 2 ∗t[i-1] - t[i-2];

return t[n];

Malek Mouhoub, CS340 Fall 2010 13

Page 14: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

This solution provides a much better complexity in time but despite

of the space complexity :

• n = 100 takes only a fraction of a second,

• but for n = 10, 000, 000 (a test that may make the boss happy

. . . if it succeeds) a segmentation fault occurs. Too much

memory required.

Malek Mouhoub, CS340 Fall 2010 14

Page 15: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

• Mr Dupont analyses the problem again :

– there is no reason to keep all the values, only the last 2 :

if (n==1) return 3;

last = 3;

current = 10;

for (i=3;i<=n;i++) temp = current;

current = 2∗current - last;

last = temp; return current;

– At n = 100,000,000 it takes 3.00 seconds

– At n = 200,000,000 it takes 5,99 seconds

– at n = 300,000,000 it takes 8.99 seconds

Malek Mouhoub, CS340 Fall 2010 15

Page 16: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

How to solve such problems ?

1. Analyze the problem on paper in order to find an efficient algorithm in terms of time and

memory space complexity.

(a) First look at the problem :T (1) = 3

T (2) = 10

T (3) = 17

T (4) = 24

T (5) = 31

Each step increases the result by 7.

(b) Guess : T (n) = 7n− 4

(c) Proof by induction

2. Code : return 7∗n-4

Malek Mouhoub, CS340 Fall 2010 16

Page 17: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

An algorithm analyst might ask :

1. What makes the first program so slow ?

2. How fast are the 3 programs asymptotically ?

3. Is the last version really the ultimate solution ?

Malek Mouhoub, CS340 Fall 2010 17

Page 18: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

Let us look at the recursion tree for the first program at n=4.

4

3

2 1

2

Here each circle represents one call to the routine tfn. So, for n=4 there are 5 such

calls.

In general, a call to tfn(n) requires a recursive call to tfn(n-1)(represented by the

shaded region on the left) and a call to tfn(n-2)(shaded region on the right).

Malek Mouhoub, CS340 Fall 2010 18

Page 19: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

If we let f(n) represent the number of calls to compute T (n), then :

f(n) = f(n− 1) + f(n− 2) + 1

f(1) = f(2) = 1

This is a version of the famous Fibonacci recurrence.

It is known that f(n) ≈ 1.618n .

This agrees very well with the times we presented earlier where each increase of n by 1

increases the time by a factor of a little under 1.62.

We say such growth is exponential with asymptotic growth rate O(1.618n).

This answers question (1).

Malek Mouhoub, CS340 Fall 2010 19

Page 20: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

In the second and third program there was a loop

for (i=3;i<=n;i++)

This loop contained two or three assignments, a multiplication and a subtraction.

We say such a loop takes O(n) time.

This means that running time is proportional to n.

Recall that increasing n from 100 million to 300 million increased the time from

approximately 3 to approximately 9 seconds.

The last program has one multiplication and one subtraction and takes O(1) or

constant time.

This answers question (2).

Malek Mouhoub, CS340 Fall 2010 20

Page 21: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

• The answer to the last question is also NO. If the boss asked for

T (123456789879876543215566340014733134213)

we would get integer overflow on most computers.

• Switching to a floating point representation would be of no value

since we need to maintain all the significant digits in our results.

• The only alternative is to use a method to represent and

manipulate large integers.

Malek Mouhoub, CS340 Fall 2010 21

Page 22: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

• A natural way to represent a large integer is to use an array of

integers, where each array slot stores one digit.

• The addition and subtraction require a linear-time algorithm.

• A simple algorithm for multiplication requires a quadratic-time

cost.

Malek Mouhoub, CS340 Fall 2010 22

Page 23: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

The third question, “is the last program the ultimate solution”, is

more of a computer science question.

A Computer Scientist might ask :

1. How do you justify counting function calls in the first case,

counting array assignments in the second case, counting

variable assignments in the third, and counting arithmetic

operations in the last ?

2. Is it really true that you can multiply two arbitrary large numbers

together in constant time ?

3. Is the last program really the ultimate one ?

Malek Mouhoub, CS340 Fall 2010 23

Page 24: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.2 Introduction to algorithm analysis

CS questions with an engineering orientation :

• What general techniques can we use to solve computational

problems ?

• What data structures are best, and in what situations ?

• Which models should we use to analyze algorithms in practice ?

• When trying to improve the efficiency of a given program, which

aspects should we focus on first ?

Malek Mouhoub, CS340 Fall 2010 24

Page 25: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.3 Asymptotic notation and Growth of functions

1.3 Asymptotic notation and Growth of functions

• Running time of an algorithm almost always depends on the

amount of input : more inputs means more time. Thus the

running time T , is a function of the amount of input, N , or

T (N) = f(N) where N is in general a natural number.

• The exact value of the function depends on :

– the speed of the machine;

– the quality of the compiler and optimizer;

– the quality of the program that implements the algorithm;

– the basic fundamentals of the algorithm

• Typically, the last item is most important.

Malek Mouhoub, CS340 Fall 2010 25

Page 26: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.3 Asymptotic notation and Growth of functions

Worst-case versus Average-case

• Worst-case running time is a bound over all inputs of a certain

size N . (Guarantee)

• Average-case running time is an average over all inputs of a

certain size N . (Prediction)

Malek Mouhoub, CS340 Fall 2010 26

Page 27: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.3 Asymptotic notation and Growth of functions

Θ-notation

For a given function g(n), we denote by Θ(g(n)) the set of functions

Θ(g(n)) = f(n) : ∃ c1, c2, and n0 such that

0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0We say that g(n) is an asymptotically tight bound for f(n).

Example : The running time of insertion sort is T (n) = Θ(n2).

Malek Mouhoub, CS340 Fall 2010 27

Page 28: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.3 Asymptotic notation and Growth of functions

Ω-notation

For a given function g(n), we denote by Ω(g(n)) the set of

functions :

Ω(g(n)) = f(n) : ∃ c and n0 such that

0 ≤ cg(n) ≤ f(n) for all n ≥ n0We say that g(n) is an asymptotic lower bound for f(n).

Malek Mouhoub, CS340 Fall 2010 28

Page 29: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.3 Asymptotic notation and Growth of functions

Big-Oh notation

For a given function g(n), we denote by O(g(n)) the set of

functions :

O(g(n)) = f(n) : ∃ c and n0 such that

0 ≤ f(n) ≤ cg(n) for all n ≥ n0We say that g(n) is an asymptotic upper bound for f(n).

Note that O-notation is, in general, used informally to describe

asymptotically tight bounds (Θ-notation).

Malek Mouhoub, CS340 Fall 2010 29

Page 30: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.3 Asymptotic notation and Growth of functions

Big-Oh notation

• Exponential : dominant term is some constant times 2N .

• Cubic : dominant term is some constant times N3. We say O(N3).

• Quadratic : dominant term is some constant times N2. We say O(N2).

• O(N log N) : dominant term is some constant times N log N .

• Linear : dominant term is some constant times N . We say O(N).

• Logarithmic : dominant term is some constant times log N.

• Constant : c.

Note : Big-Oh ignores leading constants.

Malek Mouhoub, CS340 Fall 2010 30

Page 31: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.3 Asymptotic notation and Growth of functions

Dominant Term Matters

• Suppose we estimate 35N2 + N + N3.

• For N=10000 :

– Actual value is 1,003,500,010,000

– Estimate is 1,000,000,000,000

– Error in estimate is 0.35%, which is negligible.

• For large N , dominant term is usually indicative of algorithm’s

behavior.

• For small N , dominant term is not necessarily indicative of

behavior, BUT, typically programs on small inputs run so fast we

don’t care anyway.

Malek Mouhoub, CS340 Fall 2010 31

Page 32: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.3 Asymptotic notation and Growth of functions

Example 1 : Computing the Minimum

• Minimum item in an array

– Given an array of N items, find the smallest.

• Obvious algorithm is sequential scan.

• Running time is O(N) (linear) because we repeat a fixed

amount of work for each element in the array.

• A linear algorithm is a good as we can hope for because we

have to examine every element in the array, a process that

requires linear time.

Malek Mouhoub, CS340 Fall 2010 32

Page 33: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.3 Asymptotic notation and Growth of functions

Example 2 : Closest Points

• Closest Points in the Plane

– Given N points in a plane (that is, an x-y coordinate system,

find the pair of points that are closest together).

• Fundamental problem in graphics.

• Solution : Calculate the distance between each pair of points,

and retain the minimum distance.

• N(N − 1)/2 pairs of points, so the algorithm is quadratic.

• Better algorithms that use more subtle observations are known.

Malek Mouhoub, CS340 Fall 2010 33

Page 34: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.3 Asymptotic notation and Growth of functions

Example 3 : Co-linear Points in the Plane

• Co-linear points in the plane

– Given N points in the plane, determine if any three form a

straight line.

• Important in graphics : co-linear points introduce nasty

degenerate cases that require special handling.

• Solution : enumerate all groups of three points; for each

possible triplet of three points check if the points are co-linear.

This is a cubic algorithm.

Malek Mouhoub, CS340 Fall 2010 34

Page 35: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

1.4 Case Study

• Examine a problem with several different solutions.

– Will look at four algorithms

– Some algorithms much easier to code than others.

– Some algorithms much easier to prove correct than others.

– Some algorithms much, much faster (or slower) than others.

Malek Mouhoub, CS340 Fall 2010 35

Page 36: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

The problem

• Maximum Contiguous Subsequence Sum Problem

– Given (possibly negative integers) A1, A2, . . . , AN

find (and identify the sequence corresponding to) the

maximum value of (Ai + Ai+1 + · · ·+ Aj).

• The maximum contiguous subsequence sum is zero if all the

integers are negative.

• Examples :

– -2, 11, -4, 13, -4, 2

– 1, -3, 4, -2, -1, 6

Malek Mouhoub, CS340 Fall 2010 36

Page 37: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Brute Force Algorithm

int MaxSubSum1(const vector<int> & A)

int MaxSum =0;

for (int i=0; i<A.size(); i++)

for (int j=i; j <A.size();j++)

int ThisSum = 0;

for (int k=i; k<=j; k++)

ThisSum += A[k];

if (ThisSum > MaxSum)

MaxSum = ThisSum;

return MaxSum;

Malek Mouhoub, CS340 Fall 2010 37

Page 38: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Analysis

• Loop of size N inside of loop of size N inside of loop of size N

means O(N3), or cubic algorithm.

• Slight over-estimate that results from some loops being of size

less than N is not important.

Malek Mouhoub, CS340 Fall 2010 38

Page 39: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Actual Running time

• For N = 100, actual time is 0.47 seconds on a particular

computer.

• Can use this to estimate time for larger inputs :

T (N) = cN3

T (10N) = c(10N)3 = 1000cN3 = 1000T (N)

• Inputs size increases by a factor of 10 means that running time

increases by a factor of 1,000.

• For N=1000, estimate an actual time of 470 seconds. (Actual

was 449 seconds).

• For N=10,000, estimate 449000 seconds (6 days).

Malek Mouhoub, CS340 Fall 2010 39

Page 40: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

How to improve

• Remove a loop; not always possible.

• Here it is : innermost loop is unnecessary because it throws

away information.

• ThisSum for next j is easily obtained from old value of

ThisSum :

– Need Ai + Ai+1 + · · ·+ Aj−1 + Aj

– Just computed Ai + Ai+1 + · · ·+ Aj−1

– What we need is what we just computed +Aj .

Malek Mouhoub, CS340 Fall 2010 40

Page 41: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

The Better Algorithm

int MaxSubSum2(const vector<int> & A)

int MaxSum = 0;

for (int i=0; i < A.size(); i++)

int ThisSum =0;

for (int j=i; j< A.size(); j++)

ThisSum += A[j];

if (ThisSum > MaxSum)

MaxSum = ThisSum;

return MaxSum;

Malek Mouhoub, CS340 Fall 2010 41

Page 42: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Analysis

• Same logic as before : now the running time is quadratic, or

O(N2).

• As we will see, this algorithm is still usable for inputs in the tens

of thousands.

• Recall that the cubic algorithm was not practical for this amount

of input.

Malek Mouhoub, CS340 Fall 2010 42

Page 43: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Actual running time

• For N = 100, actual time is 0.0111 seconds on the same

particular computer.

• Can use this to estimate time for larger inputs :

T (N) = cN2

T (10N) = c(10N)2 = 100cN2 = 100T (N)

• Inputs size increases by a factor of 10 means that running time

increases by a factor of 100.

• for N = 1000, estimate a running time of 1.11

seconds. (Actual was 1.12 seconds).

• For N = 10, 000, estimate 111 seconds (=actual).

Malek Mouhoub, CS340 Fall 2010 43

Page 44: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Linear Algorithms

• Linear algorithm would be best.

• Running time is proportional to amount of input. Hard to do

better for an algorithm.

• If inputs increases by a factor of ten, then so does running time.

Malek Mouhoub, CS340 Fall 2010 44

Page 45: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Recursive algorithm

• Use a divide-and-conquer approach.

• The maximum subsequence either

– lies entirely in the first half

– lies entirely in the second half

– starts somewhere in the first half, goes to the last element in

the first half, continues at the first element in the second half,

ends somewhere in the second half.

• Compute all three possibilities, and use the maximum.

• First two possibilities easily computed recursively.

Malek Mouhoub, CS340 Fall 2010 45

Page 46: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Computing the third case

• Idea :

1. Find the largest sum in the first half, that includes the last element in the first half.

2. Find the largest sum in the second half that includes the first element in the second

half.

3. Add the 2 sums together.

• Implementation :

– Easily done with two loops.

– For maximum sum that starts in the first half and extends to the last element in the first

half, use a right-to-left scan starting at the last element in the first half.

– For the other maximum sum, do a left-to-right scan, starting at the first half.

Malek Mouhoub, CS340 Fall 2010 46

Page 47: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Analysis

• Let T (N) = the time for an algorithm to solve a problem of

size N .

• Then T (1) = 1 (1 will be the quantum time unit; constants

don’t matter).

• T (N) = 2T (N/2) + N

– Two recursive calls, each of size N/2. The time to solve

each recursive call is T (N/2) by the above definition.

– Case three takes O(N) time; we use N , because we will

throw out the constants eventually.

Malek Mouhoub, CS340 Fall 2010 47

Page 48: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Bottom Line

T (1) = 1 = 1 ∗ 1

T (2) = 2 ∗ T (1) + 2 = 4 = 2 ∗ 2 = 21 ∗ 2

T (4) = 2 ∗ T (2) + 4 = 12 = 4 ∗ 3 = 22 ∗ 3

T (8) = 2 ∗ T (4) + 8 = 32 = 8 ∗ 4 = 23 ∗ 4

T (16) = 2 ∗ T (8) + 16 = 80 = 16 ∗ 5 = 24 ∗ 5

T (32) = 2 ∗ T (16) + 32 = 192 = 32 ∗ 6 = 25 ∗ 6

T (64) = 2 ∗ T (32) + 64 = 448 = 64 ∗ 7 = 26 ∗ 7

T (N) = 2k ∗ (k + 1) = N(1 + log N) = O(N log N)

Malek Mouhoub, CS340 Fall 2010 48

Page 49: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

N log N

• Any recursive algorithm that solves two half-sized problems and

does linear non-recursive work to combine/split these solutions

will always take O(N log N) time because the above analysis

will always hold.

• This is a very significant improvement over quadratic.

• It is still not as good as O(N), but is not that far away either.

There is a linear-time algorithm for this problem. The running

time is clear, but the correctness is non-trivial.

Malek Mouhoub, CS340 Fall 2010 49

Page 50: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

The Linear-time algorithm

/** Linear-time maximum contiguous subsequence.

sum algorithm */

int maxSubSum4( const vector<int> & a )

/* 1*/ int maxSum = 0, thisSum = 0;

/* 2*/ for( int j = 0; j < a.size( ); j++ )

/* 3*/ thisSum += a[ j ];

/* 4*/ if( thisSum > maxSum )

/* 5*/ maxSum = thisSum;

/* 6*/ else if( thisSum < 0 )

/* 7*/ thisSum = 0;

/* 8*/ return maxSum;

Malek Mouhoub, CS340 Fall 2010 50

Page 51: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

The Logarithm

• Formal Definition

– For any B, N > 0, logB N = K if BK = N

– If the base B is omitted, it defaults to 2 in computer science.

• Examples :

– log 32 = 5 (because 25 = 32)

– log 1024 = 10

– log 1048576 = 20

– log 1billion = about 30

• The logarithm grows much more slowly than N , and slower

than√

N .

Malek Mouhoub, CS340 Fall 2010 51

Page 52: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Examples of the Logarithm

Bits in a binary number : how many bits are required to represent

N consecutive integers ?

Repeated doubling : starting from X = 1, how many times

should X be doubled before it is at least as large as N ?

Repeated halving : Starting from X = N , if N is repeatedly

halved, how many iterations must be applied to make N smaller

than or equal to 1? (Halving rounds up).

Answer to all of the above is log N (rounded up).

Malek Mouhoub, CS340 Fall 2010 52

Page 53: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Why log N

• B bits represents 2B integers. Thus 2B is at least as big as N ,

so B is at least log N . Since B must be an integer, round up if

needed.

• Same logic for the other examples.

Malek Mouhoub, CS340 Fall 2010 53

Page 54: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Repeated Halving Principle

• An algorithm is O(log N) if it takes constant time to reduce the

problem size by a constant fraction (which is usually 1/2).

• Reason : there will be log N iterations of constant work.

Malek Mouhoub, CS340 Fall 2010 54

Page 55: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Static Searching

• Given an integer X and an array A, return the position of X in

A or an indicator that it is not present. If X occurs more than

once, return any occurrence. The array A is not altered.

• If input array is not sorted, solution is to use a sequential

search. Running times :

– Unsuccessful search : O(N); every item is examined.

– Successful search :

∗ Worst case : O(N); every item is examined.

∗ Average case :O(N/2); half the items are examined.

• Can we do better if we know the array is sorted ?

Malek Mouhoub, CS340 Fall 2010 55

Page 56: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Binary Search

• Yes ! use a binary search.

• Look in the middle :

Case 1: If X is less than the item in the middle, then look in the

subarray to the left of the middle.

Case 2: If X is greater than the item in the middle, then look in

the subarray to the right of the middle.

Case 3 : If X is equal to the item in the middle, then we have a

match.

Base Case : If the subarray is empty, X is not found.

• This is logarithmic by the repeated halving principle.

Malek Mouhoub, CS340 Fall 2010 56

Page 57: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.4 Case Study

Binary Search Continued

• Binary search is an example of a data structure implementation :

– Insert : O(N) time per operation, because we must insert

and maintain the array in sorted order.

– Delete : O(N) time per operation, because we must slide

elements that are to the right of the deleted element over

one spot to maintain contiguity.

– Find : O(log N ) time per operation, via binary search.

• In this course we examine different data structures. Generally

we allow Insert, Delete, and Find, but Find and Delete are

usually restricted.

Malek Mouhoub, CS340 Fall 2010 57

Page 58: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.5 Data Structures and Algorithm Analysis

1.5 Data Structures and Algorithm Analysis

A scalar item

A sequential vector

A n-dimentional space A hierarchical tree

A linked list

Malek Mouhoub, CS340 Fall 2010 58

Page 59: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.5 Data Structures and Algorithm Analysis

• The most important property to express of any entity in a system is its type.

• In this course we used entities that are structured objects (e.g., an object that is a collection

of other objects).

• When determining its type, the kinds of distinguishing properties include :

Ordering : are elements ordered or unordered ? If ordering matters, is the order partial or

total ? Are elements removed FIFO (queues), LIFO (stacks), or by priority (priority

queues) ?

Duplicates : are duplicates allowed ?

Boundedness : is the object bounded in size or unbounded ? Can the bound change or it

is fixed at creation time ?

Associative access : are elements retrieved by an index or key ? Is the type of the index

built-in (e.g. as for sequences and arrays) or user-definable (e.g. as for symbol tables

and hash tables) ?

Shape : is the structure of the object linear, hierarchical, acyclic, n-dimensional, or

arbitrarily complex (e.g. graphs, forests) ?

Malek Mouhoub, CS340 Fall 2010 59

Page 60: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.5 Data Structures and Algorithm Analysis

Abstract Data Type (ADT)

• Set of data together with a set of operations.

• Definition of an ADT : [what to do ?]

– Definition of data and the set of operations (functions).

• Implementation of an ADT : [how to do it ?]

– How are the objects and operations implemented.

⇒ use the C++ class.

Malek Mouhoub, CS340 Fall 2010 60

Page 61: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.5 Data Structures and Algorithm Analysis

Array Implementation of Lists

• Contiguous allocation of memory to store the elements of the

list.

• Estimation (overestimation) of the maximum size of the list is

required

⇒ waste of memory space.

• O(N) for find, constant time for findKth

• But O(N) is required for insertion and deletion in the worst

case.

⇒ building a list by N successive inserts would require O(N2)in the worst case.

Malek Mouhoub, CS340 Fall 2010 61

Page 62: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.5 Data Structures and Algorithm Analysis

3412521622

n

12345

3412521622

n

12345

find(X): O(n) removeKth: O(n)remove(X):O(n)

3412521612

n

12345

insert(kth,X)O(n)

findKth=List[Kth]O(C)

12521622

n

12345

findKth(3)=52 find(52)=3 remove(34) insert(1,34)

List

Figure 1: Contiguous allocation of memory to store the elements of the list

Malek Mouhoub, CS340 Fall 2010 62

Page 63: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.5 Data Structures and Algorithm Analysis

Linked Lists

• Non contiguous allocation of memory.

• O(N) for find

• O(N) for findKth (but better time in practice if the calls to

findKth are in sorted order by the argument).

• Constant time for insertion and deletion.

Malek Mouhoub, CS340 Fall 2010 63

Page 64: Data Structures and Algorithm Analysismouhoubm/=postscript/=c3620/c36202.pdf · 1.2 Introduction to algorithm analysis If we let f (n) represent the number of calls to compute T (n),

1.5 Data Structures and Algorithm Analysis

3412345 12

52

16

22

678

5

3

7

4

Head Data Link

List = 34 12 52 16 22

34 12 52 16 22

Head

printList():O(n)find(x):O(n)findKth(i):O(i)

\

Figure 2: Non contiguous allocation of memory

Malek Mouhoub, CS340 Fall 2010 64