Vishnu Kotrajaras, PhD.1 Data Structures. Vishnu Kotrajaras, PhD.2 Introduction Why study data structure? Can understand more code. Can choose a correct

Vishnu Kotrajaras, PhD. 1

Data Structures


Introduction

Why study data structure?Can understand more code.

Can choose a correct data structure for any task.


Example, storing 5 numbers

1 2 3 54

PLinked ist

2

1 4

3 5

P

Tree (Binary Search Tree)


Choosing how to store

Heap 5

4 3

2 1

If we want to always retrieve a maximum value, heap is the best for that.


Estimating the program speed

Big O

if where c and N0 are constants

and N>=N0 This is telling us how the

program grows.

))(()( NfNT ))(()( NfcNT


BIG O example If T(N) = 339N and f(N) = N*N

Let us have N0 = 339 และ C = 1

Therefore 339N0 <= 1*(N0*N0)

->There are other possible answers. If we let f(N)=340N, we will have

T(N) <= 1*(340N) <=c*f(N) -> This also fits the definition.

Therefore T(N) <= 1*(340N) is also correct.

))0(()0( NfcNT

)())(()( 2NNfNT


BIG O example (cont.)

Therefore T(N)=O(N) is also correct. Which one should we use as an

answer? Normally, we choose the smallest

one. Therefore O(N) is our answer. How does it connect to a program

speed? Please read on.


sigmaOfSquare(int n) // calculate {

1: int tempSum;

2: tempSum = 0;

3: for (int i=1;i<=n;i++)4: tempSum += i*i;

5: return tempSum;

}

Find the speed of the following code

1 unit (declare only)

1 unit (assignment)

1 unit (return)

1 unit

n+1 unitn unit

Multiply, add, and assignment, each has n times. Therefore we have 3n unit. Total time is 5n+5 unit.


But it’s unreasonable to use so detailed process

It’s better to use an approximation time. That is Big O

From the example, the time can be estimated from the loop (other running times become insignificant)

The loop is performed n times.Therefore, Big O = O(n)

The detailed time is 5n+5, which matches O(n) -> (5n+5<= 6n).


Big O is O(n2).

Finding BIG O from various loops

For loop-> Its Big O is the number of repetition.

Nested loop

1: for (i = 1; i <= n; i++)2: for (j = 1; j <= n; j++)

statements;

n times

n times


Finding BIG O from various loops(cont.)

Here is the Big O for Nested loop:If T1(N)=O(f(N)) and T2(N)= O(g(N)),

thenT1(N)* T2(N)= O(f(N)*g(N))

From last page -> f(n) = g(n) = n Therefore they add up to O(n2).


Finding BIG O from various loops(cont2.)

Consecutive Statements1: for (i = 0; i <= n; i++)2: statement1;

3: for (j = 0; j <= n; j++)4: for (k = 0; k <= n; k++)5: statement2;

O(n)

O(n2)

The answer is their max. -> O(n2)



Big O definition for consecutive statements: If T1(N)=O(f(N)) and T2(N)= O(g(N)), then

T1(N)+ T2(N)= max(O(f(N),O(g(N)))From last page -> f(n) = O(n), g(n) = O(n2)

The answer is therefore O(n2)



Conditional statement1: if (condition)

2: Statement13: Else

4: Statement2

O(f(n))

O(g(n))

Use the max -> max(O(f(n),O(g(n)))


Finding BIG O from recursion1:mymethod (int n) {

2: if (n == 1) {

3: return 1;

4: } else {

5: return 2*mymethod(n – 1) + 1;

6: }

7:}

n times, big O = O(n)


Maximum Subsequence Sum, choosing the best Big O Maximum Subsequence Sum is:

For integer A1,A2, …, An

Maximum Subsequence Sum is that gives the maximum value. It is a consecutive sequence that gives the highest added value.

Example: -2, 11, -6, 16, -5, 7The sum of 11, -6, 16 is 21. But the max

sequence is 11, -6, 16, -5, 7 -> the sum is 23.

23 is the max. sub. Sum.

j

ikkA

consecutive


Solving max sub sum: 1st method1: int maxSubSum01 ( int [] a) {2: int maxSum = 0;3: for (int i = 0; i < a.length; i++) {4: for (int j = i; j < a.length; j++) {5: int theSum = 0;6: for (int k = i; k <= j; k++) {7: theSum += a[k];8: }9: if (theSum > maxSum) {10: maxSum = theSum;11: }12: }13: return maxSum;14: }15: }

First index

Last index

Sum from first to last.

Choose to store max value.


This first method has big O = O(n3).

Not good enough. Too many redundant calculations. If we have added elements from index 0 to 2, when we add elements from index 0 to 3, we should not start the addition from scratch.

Solving max sub sum: 1st method(cont.)


1: int maxSubSum02 (int [] a) {2: int maxSum = 0;3: for (int i = 0; i < a.length; i++) {4: int theSum = 0;5: for (int j = i; j < a.length; j++) {6: theSum += a[j];7: if (theSum > maxSum) {8: maxSum = theSum;

9: }10: }11: }12: return maxSum;13: }

Solving max sub sum: 2nd method

Starting position

Do the addition from the starting position and collect the result. BIG O = O(n2)


-2 11 -6 4

when i=0, j=0: theSum = -2 maxSum = 0

when i=0, j=1: theSum = -2 + 11 = 9 maxSum becomes 9.

when i=0, j=2: theSum = 9 + (-6) = 3

maxSum is still 9. when i=0, j=3: theSum = 3 + 4

maxSum is still 9.

Solving max sub sum: 2nd method(cont.)


Use divide and conquer The result sequence maybe in

The left half or the array, orThe right half, orLie between the left half and the right half. (its sequence contains the last element of the left half and the first element of the right half.)

Solving max sub sum: 3rd method


Solving max sub sum: 3rd method (cont.)

1 -2 7 -6 2 8 -5 4

Max sub sum on the left with (-6) is 1. Max sub sum on the right with (2) is 10.

Max sub sum on this side is 7.

Max sub sum on this side is 10.

Max sub sum that covers between the left side and the right side is therefore 1 +10 = 11 (this is the final answer).


1:int maxSumDivideConquer (int [] array, int leftindex, int rightindex {

2: //assume that the array can be divided evenly.3: if (leftindex == rightindex) { // Base Case5: if (array[leftindex] > 0 )6: return array[leftindex];7: else8: return 0; // min value of maxSubSum9: }10: int centerindex = (leftindex + rightindex)/2;12: int maxsumleft = maxSumDivideConquer(array,

leftindex, centerindex);13: int maxsumright = maxSumDivideConquer ( array,

centerindex + 1, right);

Solving max sub sum: 3rd method (cont 2.)

T(n)

T(n/2)

T(n/2)


14: int maxlefthalfSum = 0, lefthalfSum = 0;

15: //max sum – from the last element of the left

//side to the first element.

16: for (int i = center; i >= leftindex; i--) {

17: lefthalfSum = lefthalfSum + array[i];

18: if (lefthalfSum > maxlefthalfSum) {

19: maxlefthalfSum = lefthalfSum;

20: }

21: }


O(n/2)


22: int maxrighthalfSum = 0, righthalfSum = 0;

23: // max sum – from the first element of the right

//side to the last element.

24: for (int i = centerindex + 1; i <= rightindex; i++) {

25: righthalfSum = righthalfSum + array [i];

26: if (righthalfSum > maxrighthalfSum) {

27: maxrighthalfSum = righthalfSum;

28: }

29: }


O(n/2)


30: //finally, find max of the three.31: return max3 (maxsumleft, maxsumright,

maxlefthalfSum + maxrighthalfSum)}

Therefore the total time is T(n) = 2T(n/2) + 2O(n/2)


This part takes constant time. We can ignore.


We find the total BIG O:

T(n) = 2T(n/2) + 2O(n/2) = 2T(n/2) + O(n)

= 2T(n/2) + cnDivide everything by n, we get:


O(n) <= c*n according to the definition

cn

nT

n

nT

2

)2

()((1)


We can create a series of equations:


cTT

cn

nT

n

nT

cn

nT

n

nT

1

)1(

2

)2(

.................8

)8

(

4

)4

(

4

)4

(

2

)2

(

(X)

(3)

(2)


Do (1) + (2) + (3) +…..+ (x), we get:

The left and right hand side cancel each other out. And c is added for log2 n times.

Multiply both sides by n, we get:

Because T(1) is constant, we can conclude that Big O = O(n log n)


)(log*1

)1()(2 nc

T

n

nT

)(log**)1(*)( 2 nncTnnT


We improve on the 2nd method, with two points to note:

First, the first element of any maximum subsequence sum cannot be a negative value. For example: 3, -5, 1, 4, 7, -4

-5 cannot be the first element of our result. It can only make the total smaller. Any single positive number gives a better result anyway.

Solving max sub sum: 4th method


Second, any subsequence that is negative cannot begin max sub sum. Let us be in a loop execution. Let i be

the index of the first element of a subsequence an j be the index of the last element of that subsequence.

Let the last element make this subsequence negative.

Let p be any index between i+1 and j.

Solving max sub sum: 4th method (cont.)

3 4 1 -3 -9 1 5

i jp


Solving max sub sum: 4th method (cont 2.)

The next step of this loop -> increment j by one.

•If a[j] is negative, we will not get a better max sub sum. Max sub sum value will not change.

•If a[j] is positive, a[i]+…+a[j] will be greater than a[i]+…+a[j-1]. However, because a[i]+…+a[j-1] is negative, the new sum is never more than a stored max sub sum. The new sum cannot even match a[j] alone.

•Therefore if we have a negative subsequence, we should not move j. We should move i instead.


Should we only increment i by one or more?

From our assumption, we know that a[j] makes a[i]+…+a[j] negative. Therefore, incrementing i by one within the range between i and p will only make a[i]+…+ a[p] smaller. (p is any index between i and j).

If we want to get a larger max sub sum, we must start our subsequence from position j+1. Therefore i should be incremented to j+1.


3 4 1 -3 -9 1 5

i jp


1: int maxsubsumOptimum (int[] array) {2: int maxSum = 0, theSum = 0;3: for (int j = 0; j < a.length; j++) {4: theSum = theSum + array [j];5: if ( theSum > maxSum) {6: maxSum = theSum;7: } else if (theSum < 0) { // if a[j] makes the8: //sequence negative, 9: theSum = 0; // start again from 10: // position j+1.11: }12: }13: return maxSum;14: }



Logarithm in big O

If we can spend a constant time (O(1)) to divide a problem into equal subproblems (3rd method of the maximum subsequence sum problem), that problem will have big O = O(log n).

Usually ,we make an assumption that all data is in the system. Otherwise, reading data in will take O(n).


Example: O(log n)

finding 5 in a sorted array. If we start from the first array member, it takes

O(n) to find a number. But we know that the array is sorted:

So we can look at the middle of the array, and search from there, going to either left or right depending on the value of that middle element.

And keep searching by looking at the middle element of the subarray we are looking at, and so on.

This is called -> Binary Search.


int binarySearch (int[] a, int x) {

int left = 0, right = a.length – 1;

while (left <=right) {

int mid = (left + right)/2;

if (a[mid] < x ) {

left = mid + 1;

} else if (a[mid] > x) {

right = mid – 1;

} else {

return mid;

}

}

return -1; // reaching this point means -> not found.

}

Big O = O(log2 n)


Example: O(log n) (cont.) Greatest common divisor

long gcd (long m , long n) {while (n!=0) {

long rem = m%n;m = n;n = rem;

}return m;

}

The reduction of the remainder tells us the Big O. In this program, The remainder decreases without any specific pattern.

How do we find big O?


Big O of gcd We use the following definition: if M > N, M mod N < M/2

Prove: if N <= M/2: Because the remainder from M mod N

must be less than N, so it must also be less than M/2.

if N > M/2: M divided by N will = 1 + (M-N). The remainder is M-N or M – (> M/2). Therefore the remainder is less than M/2.

If we look at the code for gcd: The remainder from the xth loop will be used as m of

the (x+2)th loop. Therefore the remainder from the (x+2)th loop must

be less than half the remainder from the xth loop. Meaning -> with 2 iterations passed, the remainder

must surely reduce by half or more.


gcd (2564, 1988))


Calculate xn by divide and conquer. long power (long x, int n) {

if (n==0)

return 1;

if (isEven (n))

return power (x*x, n/2);

else

return power (x*x, n/2)*x;

}

Example: O(log n) (cont 2.)

Big O = O (log2 n)

The original problem is divided by half in each method call.


O(log n) definition

logk n = O(n) when k is constant.This definition tells us that a logarithmic function has a small growth rate.

f(n) = loga n has its big O = O(logb n), where a and b is a positive number more than 1.Any two logarithmic functions have the same growth rate.


let and

Any two logarithmic functions have the

same growth rate: a proofxna log ynb log

bnan

byax

nbyax

nbna

ba

yx

ln*logln*log

lnln

lnlnln

,

cna

bnn bba *)(log

ln

ln*loglog

)(loglog nOn ba


Runtime –small(top) to large (bottom) c log n logk n n n log n n2

n3

2n


Definitions other than big O

Big Omega ( ) T(N) = (g(N)) if there exist

constant C and N0 thatT(N) >= C g(N), where N>=N0

From def. if f(N) = (N2), then f(N) = (N) = (N1/2)We should choose the most realistic

answer.


Big Theta ( ) T(N) = (h(N)) if T(N) = O(h(N))

and T(N) = (h(N)) There exist c1, c2, N0 that make

c1*h(N) <= T(N) <= c2*h(N), where N >= N0

Definitions other than big O (CONT.)


small O T(N) = o(p(N)) if T(N) = O(p(N))

but T(N) (p(N))

Definitions other than big O (CONT 2.)


Notes from the definitions T(N) = O(f(N)) has the same meaning as f(N)

= (T(N))We can say f(N) is an “upper bound” of T(N), and

T(N) is a lower bound of f(N). f(N) = N2 and g(N) = 2N2 have the same Big

O และ Big . That is f(N) = (g(N)) f(N) = N2 can have several Big O -> (O(N3),

O(N4)) but the best value is O(N2).We can use f(N) = (N2) to tell that this value

is the best big O.


If T(N) is a Polynomial degree k, then

T(N) = (Nk)

From here, if T(N) = 5N4 + 4N3 + N, we know

that T(N) = (N4)

Thus, we have the latest definition:


Best case, Worst case, Average case worst case = a maximum running

time possible. best case = a minimum running time

possible. average case?

For each input, see how long the program runs.

average case running time = total time from every input divided by the number of input.


The average case definition is based on an assumption that: Each input has equal chance of

occurrence. If we do not want the assumption,

We must take a probability of each input into account.

Average case = (prob. of inputi * unit time when use inputi )

Average case

i


Example: Finding Average case Let’s say we want to find x in an array of

size n. Best case: find x in the first array slot. Worst case: x is in the array’s last slot, or x

is not in the array at all. Average case:

Assume each array slot has an equal chance of having x inside.

Therefore, a chance of x being in a slot is 1/n.


Average Case running time = 1/n * (steps used when finding x in the first slot) + 1/n * (steps used when finding x in the second slot) + ... + 1/n * (steps used when finding x in the last slot, or not finding x at all)

= (1 + 2 +… + n) / n = (n+1)/2 = O(n) = big O of worst

case

Example: Finding Average case (cont.)

Documents

Vishnu Kotrajaras, PhD.1 Data Structures. Vishnu Kotrajaras, PhD.2 Introduction Why study data structure? Can understand more code. Can choose a correct