Rossella Lau Lecture 6, DCO20105, Semester A,2005-6 DCO 20105 Data structures and algorithms Lecture 6: Algorithms and performance analysis Algorithms

Rossella Lau Lecture 6, DCO20105, Semester A,2005-6

DCO20105 Data structures and algorithms

Lecture 6: Algorithms and performance analysis

Algorithms Recursion Performance analysis Inline code expansion Big-O notation

-- By Rossella Lau


Algorithms

General features

Specific input and output descriptions

Clear, simple, straight forward process steps Step-wise refinement easily translated to a computer program

• Pseudo Code or Structured English which is quite similar to a program language but no syntax restriction

• E.g., the resize() in Lecture 1 (slide 10)

Usually, no data structure specified, provide termination, consistent and efficient to produce correct output


An example: merge()

// s1 and s2 must be in ordermerge(Stream s1, Stream s2, Stream out) { while (s1 || s2) { if (!s1) // s1 ends copy the rest of s2 to out if (!s2) copy the rest of s2 to out if (s1.front() < s2.front()) out.push_back(s1.front()); s1.pop(); else out.push_back(s2.front()); s2.pop();} }


Recursive AlgorithmsAn algorithm makes use of itself as part of the solution; e.g.,

Factorial n's factorial is the product of all integers between 1 and n The definition can be written as follows:

n! = 1 if n==0n! = n*(n-1) * (n-2) * … * 1 if n > 0

Obviously, the second line can be rewritten as:n! = n*(n-1)! if n > 0

By using the second definition, we may evaluate n! as5! = 5 * 4! 4! = 4 * 3! 3! = 3 * 2! 2! = 2 * 1! 1! = 1 * 0! 0! = 1


Recursive Process

From the evaluation of n's factorial according to the recursive definition, we may see

1. Each evaluation is reduced to a simpler case

2. The reduction is continued until a direct definition is given

3. The result substitutes back the previous reduction

5! = 5 * 24 = 120 4! = 4 * 6 = 24 3! = 3 * 2 = 6 2! = 2 * 1 = 2 1! = 1 * 1 = 1 0! = 1


Recursive definitionReconsider the data structure of a linked list:

The definition which defines an object of itself is called recursive definition

A recursive data structure may carry recursive algorithms Printing a linked list: print the item; print the rest of the list

template <class T>class Node { T item; Node<T> *next;};


The Tower of Hanoi Problem

Problem statement (Ford: 3-7)

Given three pegs, A, B, and C and a number of disks of differing diameters. Initially, all the disks are placed on peg A. The problem is to move all disks from peg A to peg C with the following constraints:

1. Disks placed on a peg should follow an order: a larger disk is always below a smaller disk.

2. Only the top disk on any peg may be moved to any other peg


The solution for the problem

The solution (Ford: prg3_3.cpp)

From the solution, it can be seen how the elegance of the recursive approach contributes to problem solving


More recursive functions I

int aFunction( int n, int m){ if (!n) return m; return aFunction (n-1, m+1); }

int bFunction( int n, int m){ if (!n) return 0; return bFunction (n-1, m) + m; }

int f( int n ){ if ( !n ) return 0; if ( !(n & 1) ) return f(n/2); return f(n/2) + 1; }

What are aFunction() and bFunction()?

What is the value of f(10)?


More recursive functions II

How would you compare the three solutions for the same simple problem?

int funny(int a, int b){ if ( a == 1 ) return b; return a & 1 ? funny ( a>>1, b<<1) + b : funny ( a>>1, b<<1);}


The underlying algorithm of recursion

All recursive algorithms can be rewritten as non-recursive algorithms

Because a function call uses the concept of a stack Non-recursive algorithms can make use of a stack to

rewrite the recursive algorithms

Some recursive algorithms don't even need a stack to rewrite the algorithm, e.g., factorial function

Recursive algorithms which must use a stack to be rewritten as non-recursive algorithms are called natural recursive functions, e.g., the Hanoi problem


Efficiency of recursionA recursive algorithm can always be converted to a non-

recursive algorithm

The recursive approach is not as efficient as the non-recursive version since additional operations and spaces are required for function calls

However, sometimes a recursive solution is the most natural and logical way of solving a problem

Conflict of machine efficiency and development efficiency Usually, if an algorithm is a natural recursive solution, it is not worth

a programmer's time to construct a non-recursive solution. Though, for recursive algorithms demanding frequent use, it maybe

worthwhile to rewrite


Performance Analysis

To see if an algorithm is efficient, we measure computer execution time (usually more critical) memory required (for some cases)

For execution efficiency, complexity is measured by: details of an algorithm: number of operations, cost of

operations, function used, etc scalability: how well the algorithm is executed when the

problem size (the size of data being processed) is increased


Execution time measurement

Use system time to measure program execution time; e.g., the code session in TimeSearch.cpp(v1.0)startTime = clock();

linearSearch(forSearch, SIZE, TARGET);

endTime = clock(); The difference of startTime and endTime is the execution time

Use system time to measure program execution time date program date


Notes on measurement execution time

Measurement should focus on the algorithm and avoid I/O inside

I/O may cause other system functions to be executed which are not needed every time; e.g., paging

I/O may cause waiting for user input

Usually, the system also runs other applications at the same time

Avoid other applications/programs which require a lot of I/O time running at the same time

Avoid other applications/programs which demand CPU time


Factors of efficiency

Number of operations

Cost of different operations, e.g., +/- is much simpler than */ constant value is more efficient than a variable's value cost of branch statements are more expensive than

sequence statements

Cost of function call is more expensive

A system function is usually faster than a user-defined function


Contradiction of execution and development efficiency

Literal values in a statement may be faster than using an identifier but an identifier is more meaningful and maintainable

A function call causes system overhead, program pointer and parameter passing, but is more meaningful and maintainable

System solution: Preprocessor, Optimizer, and new features in C++

Preprocessor: macro substitution can pretend value/function

Optimizer: some optimization processes suppress the problem


Optimizer

Many compilers have an optimization phase, called an optimizer, which changes the coding to an internal form to make the program more efficient

Popular optimization: common expression substitution function inline expansion

• Instead of doing a call, the function is expanded by its codes to replace the function call to reduce overhead

• Recursive functions cannot be inline expanded redundant codes removal arithmetic expression optimization


Manual function inline expansion

Other than using an optimizer to perform function inline expansion, C++ provides a new feature to allow a programmer to specify a function which should be inline expanded

E.g., the output of measuring messages can be rewritten as displayTime() in TimeSearch.cpp(v2.0) to reduce similar bulky codes (for better maintainability)

displayTime() can be further rewritten as the following to allow for better execution efficiency by avoiding overhead generated for the call:

inline void displayTime(……)


Scalability

An algorithm which is efficient for one problem size may not be efficient for large problem size

E.g., the Traveling Salesman Problem: the shortest path for a salesman to go to each destination. It, at least, involves: (n-1)! checking. When n is 100, it is already an astronomical value!!

To see if a program/function is scalable, big-O notation is used


Big O-notation

Count each line of coding as one execution timeunit, if the computation time for a problem size, n, e.g., is f(n) = 7n2 + 2n +8, we simply denote it as O(n2).

Formal definition f(n) is O(g(n)) if there exists positive numbers of a and b

that f(n) < a*g(n) holds for n>=b


Examples of Big-O analysis

Ford’s exercises: 3.22n + 5 n2 + 6n + 7

n 1/3 + n ½ + 7 (n3 + n2 )/ (n + 1)

Ford’s exercises: 3.25

bool g(int a[], int n, int k){ int i; for (i=0; i<n; i++) if (a[i] == k) return true; return true;}

void h(int a[], int b[], int n){ int i; for (i=0; i<n; i++) for (j=0; i<n; j++) a[i} += b[j];}


Asymptotic Analysis and Big-O notation

To allow for B-O measuring, the following asymptotic analysis functions are used for categorizing algorithms:

O(1) -- constant time O(log(n)) -- logarithmic time O(n) -- linear time O(nlog(n)) -- n-lon-n time O(n2) -- quadratic time O(n3) -- cubic time O(2n) -- exponential time


More examples of Big-O notation

Factorial

Linear search (TimeSearch.cpp)

Binary search (TimeSearch.cpp)

Tower of Hanoi


Interpretation of Big O-notation

It is a simplified complexity measurement notation

It is a simple way to see the relationship between the growth of the execution time and the growth of the problem size

It ignores all the coefficient numbers of f(n), the execution time function of the problem size, and treats all coefficient numbers or constants as 1


Typical meanings of Big O-notation

Algorithms with O(1) are ideal

Algorithms with O(f(n)) are near ideal when f(n) < n

Algorithms with O(f(n)) are acceptable when f(n) < nc

Algorithms with O(f(n)) are NP-complete or NP hard when f(n) > cn

c is a small number of constant


Typical Growth of Various f(n) with n.

Table 1.2 in Smith’s reference

f(n) n=3 n=10 n=30 n=100 n=300 n=1000

lg n 1.6 3.3 4.9 6.6 8.2 10

n 3 10 30 100 300 1000

nlg n 4.8 33 147 664 2469 9966

n2 9 100 900 10000 90000 106

N3 27 1000 27000 106 2.7* 107 109

2n 8 1024 109 1030 1090 10300

10n 1000 1010 1030 10100 10300 101000


Performance evaluationLinear search: O(n)

A measure of t(na=10,000) = 1.5 seconds t(nb=5,120,000) = 512 na / na * t(na) = 512*1.5 768 seconds

Binary search: O(log2 n) A measure of t(na=100,000) = 0.0001 second t(nb=100,000,000) = t(1000na) = log100,000,000 / log100,000 *

t(na) about 0.0002 second

Tower of Hanoi: O(2n) A measure of t(na = 10) = .10 second Evaluation: t(nb=15): 215/210 = 32 32 *.1 = 3.2 seconds

t(nc=20): 220/210 = 1024 1024 *.1 about 102.4 seconds An actual run: t(10)= .10 sec, t(15)=2.8 sec, t(20)=126.78sec


Big-O for some basic functionsVector

push_back(), searchBinary()push_front(), insert(), searchLinear()

Linked Listpush_back(), push_front(), Search() – only linear search

insert()


Big-O for bookShop

Big-O notation for inserting n items with option 4 (add/modify)


More on performance evaluation

Execution time: O(n2)

If t(na) = a, t(10 na) = 102 a = 100a

void List<T>::printTail() { for (Node<T> *ptr = head; ptr, ptr=ptr->next) {

cout << ptr->item; if (ptr->next) for (Node<T> *curr=ptr->next; curr, curr=curr->next) cout << “” << curr->item; cout << endl;

}}


SummaryAlgorithms should clearly specify how a solution solves a

problem

Recursive algorithms exhibit elegant solutions

The solution for Tower of Hanoi is a typical natural recursive algorithm

Program efficiency is usually measured by execution time and memory spaces

Inline expansion is one of the solutions to solve the development and efficiency contradiction

Big-O notation is a popular method to measure and evaluate the performance of an algorithm


Reference

Ford: 3.3-4, 3.6-7

Data Structures: Form and Function by Harry Smith, Harcourt Brace Jovanovich, 1987

STL online references http://www.sgi.com/tech/stl http://www.cppreference.com/

Example programs: TimeSearch.cpp(v2.0), Ford: prg3_3.cpp

-- END --

Documents

Rossella Lau Lecture 6, DCO20105, Semester A,2005-6 DCO 20105 Data structures and algorithms Lecture 6: Algorithms and performance analysis Algorithms