Introduction, Searching and Sorting UNIT-1 · 2020. 9. 22. · Introduction, Searching and Sorting UNIT-1 Blog: anilkumarprathipati.wordpress.com Page 1 1. Introduction: Algorithm:

Introduction, Searching and Sorting UNIT-1

Blog: anilkumarprathipati.wordpress.com Page 1

1. Introduction: Algorithm: The word algorithm came from the name of a Persian mathematician Abu Jafar

Mohammed Ibn Musa Al Khowarizmi (ninth century). An algorithm is simply s set of rules used to perform some calculations either by hand or more usually on a machine (computer).

Definition: An algorithm is a finite set of instructions that accomplishes a particular task. Another definition is a sequence of unambiguous instructions for solving a problem i.e, for obtaining a required output for any legitimate (genuine) input in a finite amount of time.

In addition all algorithms must satisfy the following properties (criteria / characteristics). 1. Input: zero or more quantities are externally supplied as input. Consider Fibonacci numbers program, here aim of the problem is to display ten Fibonacci

numbers. No input is required; in the problem itself this is clearly mentioned as ten Fibonacci values. So zero items required for input.

Another problem is displaying given numbers of evens, so user should accept how many evens required. Based on the user input the number of evens is to be displayed. So, one data item is required as input.

2. Output: At least one quantity is produced by given algorithm as output. In the case of Fibonacci numbers program after executing the program, first ten Fibonacci values

displayed as output. In second case, based on user input it should display given number of evens. An input of negative

number is wrong, should display proper error message as output. So this program displays at least one output as error message, or number if outputs that show given number of steps.

3. Definiteness: Each instruction is clear and unambiguous i.e. each step must be easy to understand and convey only a single meaning.

4. Effectiveness: each instruction must be very basic, so that it can be carried out by something. This step is common in both Fibonacci and primes. For example, if user enters a negative numbers

as input in evens, if you have a step like Step: If N < 0 then Go to ERROR A wrong instruction given as go to ERROR, those kinds of instructions should not be there in an

algorithm. 5. Finiteness: If we can trace out the instructions of an algorithm then for all cases, the algorithm

terminate after a finite number of steps. Either in the case of Fibonacci or even numbers problem should be solved in some number of

steps. For example, continuous display or Fibonacci series without termination leads to abnormal termination.

Criteria for Algorithms

Input: Zero or more inputs

Output: At least one output.

Finiteness: N number of steps.

Definiteness: Clear algorithm step.

Effectiveness: A carried out step.



2. Process for Design and analysis of algorithms:

Understand the Problem

Decide on: Computational means, Algo. Design technique

Design an Algorithm

Prove Correctness

Analyze the Algorithm

Code the algorithm

Yes

Yes

No

No

Fig: Algorithm Design and Analysis Process

1. Understand the problem: This is very crucial phase. If we did any mistake in this step the entire

algorithm becomes wrong. So before designing an algorithm is to understand the problem first. 2. Solution as an algorithm (Exact vs approximation solving): Solve the problem exactly if possible.

Even though some problems are solvable by exact method, but they are not faster when compared to approximation method. So in that situation we will use approximation method.

3. Algorithm techniques: In this we will use different design techniques like, i) Divide-and-conquer ii) Greedy method iii) Dynamic programming iv) Backtracking v) Branch and bound…. etc., 4. Prove correctness: once algorithm has been specified, next we have to prove its correctness.

Usually testing is used for proving correctness. 5. Analyze an algorithm: Analyzing an algorithm means studying the algorithm behavior ie.,

calculating the time complexity and space complexity. If the time complexity of algorithm is more then we will use one more designing technique such that time complexity should be minimum.

6. Coding an algorithm: after completion of all phases successfully then we will code an algorithm. Coding should not depend on any program language. We will use general notation (pseudo-code) and English language statement. Ultimately algorithms are implemented as computer programs.

3. Types of Algorithms: There are many types of algorithm but the most fundamental types of algorithm are: 1. Recursive algorithms 2. Dynamic programming algorithm 3. Backtracking algorithm 4. Divide and conquer algorithm 5. Greedy algorithm 6. Brute Force algorithm



7. Randomized algorithm 1) Simple recursive algorithm: Solves the base case directly and then recurs with a simpler or easier input every time (A base

value is set at the starting for which the algorithm terminates). It is use to solve the problems which can be broken into simpler or smaller problems of same type. Example: To find factorial using recursion, here is the pseudo code: Fact(x) If x is 0 /*0 is the base value and x is 0 is base case*/ return 1 return (x*Fact(x-1)) /* breaks the problem into small problems*/ 2) Dynamic programming algorithm: A dynamic programming algorithm (also known as dynamic optimization algorithm)

remembers the past result and uses them to find new result means it solve complex problems by breaking it down into a collection of simpler sub-problems, then solving each of those sub-problems only once, and storing their solution for future use instead of re-computing their solutions again.

Example: Fibonacci sequence, here is the pseudo code: Fib(n) if n=0 return 0 else prev_Fib=0,curr_Fib=1 repeat n-1 times /*if n=0 it will skip*/ next_Fib=prev_Fib+curr_Fib prev_Fib=curr_Fib curr_Fib=new_Fib return curr_Fib 3) Backtracking algorithm: Let's say we have a problem A and we divided it into three smaller problems B, C and D.

Now it may be the case that the solution to A does not depend on all the three sub-problems, in fact we don't even know on which one it depends.

Let's take a situation. Suppose you are standing in front of three tunnels, one of which is having a bag of gold at its end, but you don't know which one. So you'll try all three. First go in tunnel 1, if that is not the one, then come out of it, and go into tunnel 2, and again if that is not the one, come out of it and go into tunnel 3.

So basically in backtracking we attempt solving a sub-problem, and if we don't reach the desired solution, then undo whatever we did for solving that sub-problem, and try solving another sub-problem.

Example: Queens Problem 4) Divide and conquer algorithm: Divide and conquer consist of two parts first of all it divides the problems into smaller sub-problems

of the same type and solve them recursively and then combine them to form the solution of the original problem.

Example: Quick sort, Merge sort



5) Greedy algorithm: Greedy algorithm is an algorithm that solves the problem by taking optimal solution at the local level

(without regards for any consequences) with the hope of finding optimal solution at the global level. Greedy algorithm is used to find the optimal solution but it is not necessary that you will definitely

find the optimal solution by following this algorithm. Example: Huffman tree, counting money 6) Brute force algorithm: A brute force algorithm simply tries all the possibilities until a satisfactory solution is found. Such types of algorithm are also used to find the optimal (best) solution as it checks all the possible

solutions. And also used for finding a satisfactory solution (not the best), simply stop as soon as a solution of the

problem is found. Example: Exact string matching algorithm 7) Randomized algorithm: A randomized algorithm uses a random number at least once during the computation to make a

decision. Example: Quick Sort As we use random number to choose the pivot point.

4. Specification of Algorithm: There are various ways by which we can specify an algorithm.

AlgorithmPseudocode

Flowchart

Program

Natural Language

It is very easy to specify an algorithm using natural language. But many times specification of

algorithm by using natural language is not clear, and may require brief description. Such a specification creates difficulty, while actually implementing it (difficulty in converting into source code).

Hence many programmers prefer to have specification of algorithm by means of pseudo-code, which is very near to programming language, but it is not easily understood by normal users.

Another way of representing the algorithm is by flow chart. Flow chart is a graphical representation of an algorithm, but flowchart method work well only if the algorithm is small and simple.

One more specification is Program, which changes from one programming language to another programming language.

5. Examples: A) Air Travel: So, we have an airline; Barbet airlines, which serves several cities in the country. And of course,

although it serves several cities, it does not really connect all these cities directly. Only some of the cities are connect

ed by direct flights. And for other pair of cities you have to take a hopping flight. You have to go via an intermediate city. So, our first goal is to compute every pair of cities, which are actually connected by



this network served by this airline. So, how do we find out all pairs of cities A, B? Such that A and B are connected by a sequence of flights.

There are about ten cities; Delhi, Varanasi, Ahmadabad, down to Trivandrum in the south. And

you have some arrows indicating the flights. Now, some of these flights are in one direction. You can go from Delhi to Varanasi, but you cannot come back directly to Delhi. You must go to Ahmadabad and then come back. So, this is quite common.

Some pairs of important cities; like in this case Mumbai and Delhi might be connected by flights in both directions or even Mumbai and Calcutta. And so now we have these ten cities. We want to know is it really possible go from say Varanasi to Trivandrum or is it not possible, is it possible to go from Hyderabad to Delhi or is not possible. So, in this case what we really would like to know is the structure of this network. The map itself is not relevant. We just need to know how many cities are there, which are on the network and how are they connected by the flights. So, the picture below which has the circles and arrows represents this network. And these cities are the circles and the flights are the arrows and the arrow heads indicates the direction. So, if there is an arrow head in one direction, it is a one directional flight; if there is an arrow head in both ends, its means it is a bidirectional flight. Ignore the actual names of the cities. So, we can call them 1, 2, 3, 4, 5or a, b, c, d, e or whatever and solve the problem. So, this kind of a picture is called graph. We will study the graphs more formally when we come to this module in our course. It is as shown below.



Now, in terms of efficiency we have to look at what are the things that determine the complexity of a problem. It is fairly obvious in this particular case that if we have more cities, the problem is more complicated. So the number of cities, which we can call N, is certainly one parameter which determines how complicated the algorithm is going to be, or not how complicated the algorithm is going to be, or rather how long it is going to take to run. The other question which determines how complex the network is this how many direct flights there are...Obviously there are fewer flights, there are fewer places, which can be connected and we have to explore fewer possibilities.

So from this, it follows that computing the paths depends on both N and F. So, we will not have an algorithm which will always take twenty steps. It will have to depend some number of steps depending on N and F. Now what is this dependency, how does it grow? If N doubles, does our algorithm take two times more times? Does it takes four times more times? If N term grows to factor often, does it takes ten times more or hundred times more time?

The other question related to this is given this dependency on N and F what realistic size of networks can we handle? If the airline grows to twenty flights, we will still have be able to give our answer in the reasonable time. Remember that this kind of an answer is typically required when somebody is making an online booking or something. And you will not allowed to reply in a few seconds, right, it is not enough to come back after an hour and say “yes, there is a flight from Trivandrum to Calcutta”. So, what is the limit of our efficiency? Can we scale this algorithm to cover airlines, multiple airlines? So, we have a website which actually sells. Across all airlines I can take you from place A to place B. That depends on how larger value of N and F we can handle.

B) To sort-out phone numbers: Let assume you want sort our telephone directory one billion phone numbers. And you

have two types Algorithms as: 1) Navie based Algorithm (like brute force) ==> O(n2) 2) Best Algorithm ==> O(n log n) Coming to System Performance (Speed), CPU with speed (1 G Hz) 109 operations per second, Problem is to sort around 109 Phone numbers. By using Algorithm -1, What is the total time it takes? Where ’n’ is the input number, here 109 therefore time complexity will be calculated by using the

O(n2) formulae. i.e. Time complexity = (Total Number of operations) / (CPU Speed) ≈ (109) 2 operations / 109 operations per second ≈ 1018 / 109 Seconds ≈ 109 Seconds ≈ 16,666,666.67 Minutes ≈ 277,777.78 Hours ≈ 11,574.07 Days ≈ 385.80 Months ≈ 32.15 Years (approx). By using Algorithm -2, What is the total time it takes? ≈ 109 * log 2 109 / 109 Seconds ≈ 109 * 29.9 / 109 Seconds ≈ 29.9 Seconds Q): Which Algorithm do you choose? By default Algorithm-2.



C) To sort-out phone numbers: So, supposing we are playing a video game, right. So, this might be one of the action type games

where there are objects moving around the screen and we have to know, identify certain object, shoot them down, and capture them and whatever. So, let us assume that as part of the game in order to compute the score, it has to periodically find out the closest pair of objects on the screen. So, now, how do you find the closest pair of objects among group of objects? Well, of course, you can take every pair of them, find the distance between each pair and then take the smallest one, right. So, this will been an n squared algorithm, right. So, you compute the distance between any two objects and then after doing this for every pair you take the smallest value. Now, it turns out, that there is a clever algorithm, again, which takes time n log n. So, what is this distinction between n-square and n log n in this context?

Now, on a modern kind of gaming computer with a large display it is not unreasonable to assume, that we could have a resolution of say, 2500 by 1500 pixel. If we have one of these reasonable size 20 inch monitors, we could easily get this kind of resolutions. So, we will have about 3.75 millions points on the screen. Now, we have some objects placed at some of these points.

So, let us assume that we have five lakh objects, five hundred thousand objects placed on the screen. So, if you were to now compute the pair of objects among these five hundred thousand, which are closest to each other and we use the naïve n squared algorithm, then you would expect to take 25 10 into the 10 steps because that is 5 into 10 to the 5 whole square. 25 into 10 to the 10 is 2500 second, which is around 40 minutes.

Now, if you are playing a game, an action game in which reflexes determine your score, obviously, each update cannot take 40 minutes. That would not be an effective game. On

the other hand, you can easily check, that if you do n log n calculation for 5 into 10 to the 5, you take something like 10 to the 6 or 10 to the 7 seconds. So, this will be a fraction of second, 1-10th or 1-100th of the second, which is well below your human response time. So, it will be, essentially, instantaneously. So, we move from a game, which is hopelessly slow to one in which it can really test your reflexes as a human.

6. Performance Analysis: Performance analysis or analysis of algorithms refers to the task of determining the efficiency of an

algorithm i.e. how much computing time and storage are required to run (or execute). This analysis of algorithm helps in judging the value of one algorithm over another.

To judge an algorithm, particularly two things are taken into consideration 1. Space complexity 2. Time complexity. Space Complexity: The space complexity of an algorithm (program) is the amount of memory it needs to run to

completion. The space needed by an algorithm has the following components. 1. Instruction Space. 2. Data Space. 3. Environment Stack Space. Instruction Space: Instruction space is the space needed to store the compiled version of the program

instructions. The amount of instruction space that is needed depends on factors such as- i). The compiler used to compile the program into machine code. ii). The compiler options in effect at the time of compilation. iii). The target computer, i.,e computer on which the algorithm run. Note that, one compiler may produce less code as compared to another compiler, when the same

program is compiled by these two. Data Space: Data space is the space needed to store all constant and variable values. Data space has

two components.



i). Space needed by constants, for example 0, 1, 2.134. ii). Space needed by dynamically allocated objects such as arrays, structures, classes. Environmental Stack Space: Environmental stack space is used during execution of functions. Each

time function is involved the following data are saved as the environmental stack. i). The return address. ii). Value of local variables. iii). Value of formal parameters in the function being invoked. Environmental stack space is mainly used in recursive functions. Thus, the space requirement of any

program p may therefore be written as Space complexity S(P) = C + Sp (Instance characteristics). This equation shows that the total space needed by a program is divided into two parts. Fixed space requirements(C) is independent of instance characteristics of the inputs and outputs. - Instruction space - Space for simple variables, fixed-size structure variables, constants. A variable space requirements (SP(1)) dependent on instance characteristics 1. - This part includes dynamically allocated space and the recursion stack space. Example of instance character is: Examples: 1 Algorithm REC (float x, float y, float z) { Return (X + Y +Y * Z + (X + Y +Z)) /(X+ Y) + 4.0; } In the above algorithm, there are no instance characteristics and the space needed by X, Y, Z is

independent of instance characteristics, therefore we can write, S(XYZ) =3+0=3 One space each for X, Y and Z Space complexity is O(1). Examples: 2 Algorithm ADD ( float [], int n) { sum = 0.0; for i=1 to n do sum=sum+X[i]; return sum; } Here, atleast n words since X must be large enough to hold the n elements to be summed. Here the

problem instances is characterized by n, the number of elements to be summed. So, we can write, S(ADD) =3+n 3-one each for n, I and sum Where n- is for array X[], Space complexity is O(n). Time Complexity: The time complexity of an algorithm is the amount of compile time it needs to run to completion. We

can measure time complexity of an algorithm in two approaches 1. Priori analysis or compile time 2. Posteriori analysis or run (execution) time. In priori analysis before the algorithm is executed we will analyze the behavior of the algorithm. A

priori analysis concentrates on determining the order if execution of statements.



In Posteriori analysis while the algorithm is executed we measure the execution time. Posteriori analysis gives accurate values but it is very costly.

As we know that the compile time does not depend on the size of the input. Hence, we will confine ourselves to consider only the run-time which depends on the size of the input and this run-time is denoted by TP(n). Hence

Time complexity T(P) = C + TP(n). The time (T(P)) taken by a program P is the sum of the compile time and execution time. The

compile time does not depend on the instance characteristics, so we concentrate on the runtime of a program. This runtime is denoted by tp (instance characteristics).

The following equation determines the number of addition, subtraction, multiplication, division compares, loads stores and so on, that would be made by the code for p.

tp(n) = CaADD(n)+ CsSUB(n)+ CmMUL(n)+ CdDIV(n)+…………….. where n denotes instance characteristics, and Ca, Cs, Cm, Cd and so on….. As denote the time needed for an addition, subtraction, multiplication, division and so on, and ADD,

SUB, MUL, DIV and so on, are functions whose values are the number of additions, subtractions, multiplications, divisions and so on. But this method is an impossible task to find out time complexity.

Another method is step count. By using step count, we can determine the number if steps needed by a program to solve a particular problem in 2 ways.

Method 1: introduce a global variable “count”, which is initialized to zero. So each time a statement

in the signal program is executed, count is incremented by the step count of that statement.

Example:

Algorithm Sum(a, n) { s:=0;

for i:=1 to n do {

s:=s+a[i];

}

return s;}

Algorithm sum with count statement added

count:=0;Algorithm Sum(a,n) { s:=0; count:=count+1; for i:=1 to n do { count:=count +1; s:=s+a[i]; count:=count+1; } count:=count+1; //for last time of for loop count:=count+1; //for return statement return s;}Thus the total number of steps are 2n+3

Method 2: The second method to determine the step count of an algorithm is to build a table in which we list the total number of steps contributed by each statement.



1. Algorithm Sum(a, n)

2. {

3. s:=0;

4. for i:=1 to n do

5. s:=s+a[i];

6. return s;

7. }

Total

Statement S/e Frequency Total steps

2n+3 steps

0

0

1

1

1

1

0

-

-

1

n+1

n

1

-

0

0

1

n+1

n

1

0

Ex:

The S/e (steps per execution) of a statement is the amount by which the count changes as a result of

the execution of that statement. The frequency determines the total number of times each statement is executed.

Complexity of Algorithms: 1. Best Case: Inputs are provided in such a way that the minimum time is required to process them. 2. Average Case: The amount of time the algorithm takes on an average set of inputs. 3. Worst Case: The amount of time the algorithm takes on the worst possible set of inputs.

3 4 5 6 7 9 10 12 15

Linear Search

A 1 2 3 4 5 6 7 8 9

Example:

Best Case: If we want to search an element 3, whether it is present in the array or not. First, A(1) is

compared with 3, match occurs. So the number of comparisons is only one. It is observed that search takes minimum number of comparisons, so it comes under best case.

Time complexity is O(1). Average Case: If we want to search an element 7, whether it is present in the array or not. First, A(1) is compared with 7 i,.e, (3=7), no match occurs. Next, compare A(2) and 7, no match

occurs. Compare A(3) and A(4) with 7, no match occurs. Up to now 4 comparisons takes place. Now

compare A(5) and 7 (i.,e, 7=7), so match occurs. The number of comparisons is 5. It is observed that search takes average number of comparisons. So it comes under average case.

Note: If there are n elements, then we require n/2 comparisons.

. . . Time complexity is O

n2

= O(n) (we neglect constant)

Worst Case: If we want to search an element 15, whether it is present in the array or not. First, A(1) is compared with 15 (i.,e, 3=15), no match occurs. Continue this process until either

element is found or the list is exhausted. The element is found at 9th comparison. So number of comparisons are 9.

Time complexity is O(n). Note: If the element is not found in array, then we have to search entire array, so it comes under worst

case.



7. Asymptotic Notation: Accurate measurement of time complexity is possible with asymptotic notation. Asymptotic

complexity gives an idea of how rapidly the space requirement or time requirement grow as problem size increase. When there is a computing device that can execute 1000 complex operations per second. The size of the problem is that can be solved in a second or minute or an hour by algorithms of different asymptotic complexity. In general asymptotic complexity is a measure of algorithm not problem. Usually the complexity of an algorithm is as a function relating the input length to the number of steps (time complexity) or storage location (space complexity). For example, the running time is expressed as a function of the input size ‘n’ as follows.

f(n)=n4+100n2+10n+50 (running time) There are four important asymptotic notations. 1. Big oh notation (O) 2. Omega notation (). 3. Theta notation () Let f(n) and g(n) are two non-negative functions. Big oh notation: Big oh notation is denoted by ‘O’. it is used to describe the efficiency of an algorithm. It is used to

represent the upper bound of an algorithms running time. Using Big O notation, we can give largest amount of time taken by the algorithm to complete.

Definition: Let f(n) and g(n) be the two non-negative functions. We say that f(n) is said to be O(g(n)) if and only if there exists a positive constant ‘c’ and ‘n0‘ such that,

f(n)c*g(n) for all non-negative values of n, where n≥n0. Here, g(n) is the upper bound for f(n).

Ex: Let f(n) = 2n4 + 5n2 + 2n +3

2n4 + 5n4 + 2n4 +3n4

(2+5+2+3)n4

12n4.

f(n)=12n4

This implies g(n)=n4, n

c=12 and n =1

f(n)=O(n4). ... ..

. ..

<

<

<

>1

0

n0

f(n)

c*g(n)

The above definition states that the function ‘f’ is almost ‘c’ times the function ‘g’ when ‘n’ is greater

than or equal to n0. This notion provides an upper bound for the function ‘f’ i.,e, the function g(n) is an upper bound on

the value of f(n) for all n, where n≥ n0. Big omega notation: Big omega notation is denoted by ‘’. It is used to represent the lower bound of an algorithms

running time. Using big omega notation we can give shortest amount of time taken by the algorithm to complete.

Definition: The function f(n)= (g(n)) (read as for of n is omega of g of n) if and only if there exist positive constants ‘c’ and ‘n0’ such that,

f(n) ≥ c*g(n) for all n, n≥n0



Example:

. .. >1

g(n)=n4, c=2 and n =1

f(n)= . ... ..

0

(n4)

f(n) 2n4, n>

c*g(n)

n0

f(n)Let f(n) = 2n4 + 5n2 + 2n +3

2n4 (for example as n lower order oterms are insignificant)

> ,

Big Theta notation: The big theta notation is denoted by ‘’. It is in between the upper bound and lower bound of an

algorithms running time. Definition: Let f(n) and g(n) be the two non-negetive functions. We say that f(n) is said to be (g(n))

if and only if there exists a positive constants ‘c1’ and ‘c2’, such that, c1g(n) f(n) c2g((n) for all non-negative values n, where n ≥ n0. The above definition states that the function f(n) lies between ‘c1’times the function g(n) and ‘c2’,

times the function g(n) where ‘c1’ and ‘c2’ are positive constants. This notation provides both lower and upper bounds for the function f(n) i.,e, g(n) is both lower and

upper bounds on the value of f(n), for large n. in other words theta notation says that f(n) is both O(g(n)) and (g(n)) for all n, where n≥n0.

This function f(n) = (g(n)) iff g(n) is both upper and lower bound an f(n).

Example:

n0

f(n)2n4 2n4 + 5n2 + 2n +3 12n4

2n4 f(n) 12n4 , n 1

g(n) = n4

c1=2, c2=12 and n0=1

f(n)=(n4)

. ..

. ..

. ..

f(n) = 2n4 + 5n2 + 2n +3

c1*g(n)

c2*g(n)

n

8. Insertion Sort

Why study Sorting algorithms? There are a variety of situations that we can encounter

o Are all keys distinct? o How large is the set of keys to be ordered? o Need guaranteed performance?

Various algorithms are better suited to some of these situations Some important Definitions: Internal Sort - The data to be sorted is all stored in the computer’s main memory.



External Sort - Some of the data to be sorted might be stored in some external, slower device. In Place Sort - The amount of extra space required to sort the data is constant with the input size. Stable Sort - It preserves relative order of records with equal keys. Insertion Sort:

In insertion sort algorithm, for every iteration, moves an element from unsorted portion to sorted portion until all the elements are sorted in the list.

Step 1: Assume that first element in the list is in sorted portion of the list and remaining all elements are in unsorted portion.

Step 2: Consider first element from the unsorted list and insert that element into the sorted list in order specified.

Step 3: Repeat the above process until all the elements from the unsorted list are moved into the sorted list.

Example: The list of unsorted elements are

15 20 10 30 50 18 5 45

Assume that sorted portion of the list is empty and all elements in the list are in unsorted portion of the list as shown in below

sorted unsorted

15 20 10 30 50 18 5 45

Move the first element 15 from unsorted portion to sorted portion of the list sorted unsorted

To move element 20 from unsorted to sorted portion, compare 20 with 15 and insert it at correct position sorted unsorted

15 20 10 30 50 18 5 45

To move element 10 from unsorted to sorted portion, compare 10 with 20 and it is smaller to swap. Then compare 10 with 15 and it is smaller to swap. Finally 10 is insert at its correct position in sorted portion of the list

sorted unsorted

10 15 20 30 50 18 5 45

To move element 30 from unsorted to sorted portion, compare 30 with 20, 15, 10 and it is larger than all these. So 30 is directly inserted at last position in sorted portion of the list

sorted unsorted

10 15 20 30 50 18 5 45

To move element 50 from unsorted to sorted portion, compare 50 with 30, 20, 15, 10 and it is larger than all these. So 50 is directly inserted at last position in sorted portion of the list

sorted unsorted

10 15 20 30 50 18 5 45

To move element 18 from unsorted to sorted portion, compare 18 with 50, 30, 20, 15, 10. Since 18 is larger than 15 and move 20, 30, 50 one position to the right in the list and insert 18 after 15 in the sorted portion.

sorted unsorted

10 15 18 20 30 50 5 45

15 20 10 30 50 18 5 45



To move element 5 from unsorted to sorted portion, compare 5 with 50, 30, 20, 18, 15, 10. Since 5 is smaller than all these elements, move 10,15,18, 20, 30, 50 one position to the right in the list and insert 5 at first position in the sorted list. sorted unsorted

5 10 15 18 20 30 50 45

To move the element 45, from unsorted portion to sorted portion iff compare 45 with 50, 30. Since 45 is larger than 30 and move 50 one position to the right in the list and insert 45 after 30 in the sorted list.

sorted unsorted

5 10 15 18 20 30 45 50

Now unsorted portion of the list has become empty, so we stop the process. The final sorted list of elements are

5 10 15 18 20 30 45 50

Algorithm: for j ← 2 to n do key ← A[ j ] Insert A[ j ] into the sorted sequence A[1 . . j -1] i ← j - 1 while i > 0 and A[i] > key do A[i + 1] ← A[i] i ← i – 1 A[i + 1] ← key Time Complexity Equation:

INSERTION-SORT(A)for j ← 2 to n

do key ← A[ j ] Insert A[ j ] into the sorted sequence A[1 . . j -1]

i ← j - 1 while i > 0 and A[i] > key

do A[i + 1] ← A[i] i ← i – 1

A[i + 1] ← key

cost t imes c1 n c2 n-1 0 n-1 c4 n-1 c5 c6 c7 c8 n-1

1

n

j jt

2

n

j jt

2)1(

n

j jt

2)1(

)1(11)1()1()( 82

72

62

5421

nctctctcncncncnTn

jj

n

jj

n

jj

tj: # of t imes the while statement is executed at iterat ion j

Best Case: The array is already sorted “while i > 0 and A[i] > key” A[i] ≤ key upon the first time the while loop test is run (when i = j -1) Tj = 1 T(n) = c1n + c2(n -1) + c4(n -1) + c5(n -1) + c8(n-1) = (c1 + c2 + c4 + c5 + c8)n + (c2 + c4 + c5 + c8) = an + b = (n)



Worst Case:

• The array is in reverse sorted order– Always A[i] > key in while loop test– Have to compare key with all elements to the lef t of the j-th posit ion

compare with j-1 elements tj = j

a quadrat ic funct ion of n

• T(n) = O(n2) order of growth in n2

1 2 2

( 1) ( 1) ( 1)1 ( 1)2 2 2

n n n

j j j

n n n n n nj j j

)1(2

)1(2

)1(12

)1()1()1()( 8765421

ncnncnncnncncncncnT

cbnan 2

“while i > 0 and A[i] > key”

using we have:

Average Case: In average case also it takes nearly Θ(n2). Overall Time complexity:

Best Case Average Case Worst Case Ω(n) Θ(n2) O(n2)

Space Complexity is O(1) //Constant time, It does not depends on Input data size.

Advantages: 1) Good running time for “almost sorted” arrays. It takes as Ω(n) Disadvantages: 1) Θ (n2) running time in worst and average case 2) ‘n2/2’ comparisons and exchanges It sorts the elements in-place and stable algorithm.

9. Selection Sort: In selection sort, the first element in the list is selected and it is compared repeatedly with all the

remaining elements in the list. Step 1 - Select the first element of the list (i.e., Element at first position in the list). Step 2: Compare the selected element with all the other elements in the list. Step 3: In every comparison, if any element is found smaller than the selected element (for Ascending

order), then both are swapped. Step 4: Repeat the same procedure with element in the next position in the list till the entire list is

sorted. Example:

15 20 10 30 50 18 5 45

Iteration 1: Select the first position element in the list, compare it with all other elements in the list and whenever

we found a smaller element than the element at first position then swap those elements. List after 1st iteration:

5 20 15 30 50 18 10 45

Iteration 2:



Select the second position element in the list, compare it with all other elements in the list and whenever we found a smaller element than the element at first position then swap those elements.

List after 2nd iteration: 5 10 20 30 50 18 15 45

Iteration 3: Select the third position element in the list, compare it with all other elements in the list and whenever

we found a smaller element than the element at first position then swap those elements. List after 3rd iteration:

5 10 15 30 50 20 18 45

Iteration 4: Select the fourth position element in the list, compare it with all other elements in the list and

whenever we found a smaller element than the element at first position then swap those elements. List after 4th iteration:

5 10 15 18 50 30 20 45

Iteration 5: Select the fifth position element in the list, compare it with all other elements in the list and whenever

we found a smaller element than the element at first position then swap those elements. List after 5th iteration:

5 10 15 18 20 50 30 45

Iteration 6: Select the sixth position element in the list, compare it with all other elements in the list and whenever

we found a smaller element than the element at first position then swap those elements. List after 6th iteration:

5 10 15 18 20 30 50 45

Iteration 7: Select the seventh position element in the list, compare it with all other elements in the list and

whenever we found a smaller element than the element at first position then swap those elements. List after 7th iteration:

5 10 15 18 20 30 45 50

Algorithm: Alg.: SELECTION-SORT(A) n ← length[A] for j ← 1 to n - 1 do smallest ← j for i ← j + 1 to n do if A[i] < A[smallest] then smallest ← i exchange A[j] ↔ A[smallest]



1

Alg.: SELECTION-SORT(A)

←n length[A] for ←j 1 to n - 1

do smallest ← j for ←i j + 1 to n do if A[i] < A[smallest]

then ←smallest i exchange A[j] A[smallest]

cost t imes

c1 1

c2 n

c3 n-1

c4 c5 c6

c7 n-1

1

1)1(n

jjn

1

1)(n

jjn

1

1)(n

jjn

1 1 1

21 2 3 4 5 6 7

1 1 2( ) ( 1) ( 1) ( 1) ( )

n n n

j j jT n c c n c n c n j c n j c n j c n n

For each i from 1 to n - 1, there is one exchange and n - i comparisons, so there is a total of n - 1 exchanges and (n − 1) + (n − 2) + ...+ 2 + 1 = n(n − 1)/2 comparisons. These observations hold, no matter what the input data is.

Overall Time complexity: Best Case Average Case Worst Case

Ω(n2) Θ(n2) O(n2) Space Complexity is O(1) //Constant time, It does not depends on Input data size. It sorts the elements in-place and not stable algorithm.

It takes ‘n2/2’ comparisons and ‘n’ exchanges 10. Binary Search:

Divide-and-conquer method: Divide-and-conquer are probably the best known general algorithm design technique. The principle behind the Divide-and-conquer algorithm design technique is that it is easier to solve several smaller instance of a problem than the larger one.

The “divide-and-conquer” technique involves solving a particular problem by dividing it into one or more cub-problems of smaller size, recursively solving each sub-problem and then “merging” the solution of sub-problems to produce a solution to the original problem.

Divide-and-conquer algorithms work according to the following general plan. 1. Divide: Divide the problem into a number of smaller sub-problems ideally of about the same size. 2. Conquer: The smaller sub-problems are solved, typically recursively. If the sub-problem sizes are

small enough, just solve the sub-problems in a straight forward manner. 3. Combine: If necessary, the solution obtained the smaller problems are connected to get the

solution to the original problem. The following figure shows-



Problem of size n

Sub-problem 2 size n/2

Sub-problem 1 size n/2

Solution to sub-problem 1

Solution to sub-problem 2

Solution to the original problem

Fig: Divide-and-Conquer technique (Typical case). Control abstraction for divide-and-conquer technique:

Control abstraction means a procedure whose flow of control is clear but whose primary operations are satisfied by other procedure whose precise meanings are left undefined. Algorithm DandC(p) { if small (p) then return S(p) else { Divide P into small instances P1, P2, P3……..Pk, k≥1; Apply DandC to each of these sub-problems;\ return combine (DandC(P1), DandC(P1),…. (DandC(Pk); } } Algorithm: Control abstraction for divide-and-conquer DandC(p) is the divide-and-conquer algorithm, where P is the problem to be solved. Small(p) is a Boolean valued function(i.e., either true or false) that determines whether the input size is small enough that the answer can be computed without splitting. If this, is so the function S is invoked. Otherwise the problem P is divided into smaller sub-problems. These sub-problems P1, P2, P3……..Pk, are solved by receive applications of DandC.

Combine is a function that combines the solution of the K sub-problems to get the solution for original problem ‘P’. Example: Specify an application that divide-and-conquer cannot be applied. Solution: Let us consider the problem of computing the sum of n numbers a0, a1,…..an-1. If n>1, we divide the problem into two instances of the same problem. That is to compute the sum of the first [n/2] numbers and to compute the sum of the remaining [n/2] numbers. Once each of these two sum is compute (by applying the same method recursively), we can add their values to get the sum in question- a0+ a1+….+an-1= (a0+ a1+….+a[n/2]-1)+ a[n/2]-1+………+ an-1). For example, the sum of 1 to 10 numbers is as follows- (1+2+3+4+………………..+10) = (1+2+3+4+5)+(6+7+8+9+10) = [(1+2) + (3+4+5)] + [(6+7) + (8+9+10)] = …..



= ….. = (1) + (2) +…………..+ (10). This is not an efficient way to compute the sum of n numbers using divide-and-conquer technique. In this type of problem, it is better to use brute-force method. A recursion problem can be solved by using below three cases: 1) Substitution method 2) Recursive tree method 3) Master Theorem Substitution Method:

The substitution method is the algebraic method to solve simultaneous linear equations. As the word says, in this method, the value of one variable from one equation is substituted in the other equation. In this way, a pair of the linear equation gets transformed into one linear equation with only one variable, which can then easily be solved. For example, refer the merge sort time complexity calculation. Recursion Tree Method:

A recursion tree models the costs (time) of a recursive execution of an algorithm. The recursion tree method is good for generating guesses for the substitution method. The recursion-tree method can be unreliable, just like any method that uses ellipses. Example recursive equation as T(n) = T(n/4) + T(n/2) + n2

Solve T(n) = T(n/4) + T(n/2) + n2:

(n/16)2 (n/8)2 (n/8)2 (n/4)2

(n/4)2

(1)

…

516 n

2

n2

25256 n

2

n2(1+ 516 +( 516 )2

+( 516 )3

+ )

…

Total == (n2)

n2

(n/2)2

Master Theorem: The master method applies to recurrences of the form as T(n) = a.T(n/b) + f(n) Where: a >= 1, b > 1, and f is asymptotically positive and in the form f = Θ(nk logpn) n = size of the problem a = number of sub-problems in the recursion and a >= 1 n/b = size of each sub-problem f(n) = cost of work done outside the recursive calls like dividing into sub-problems and cost of combining them to get the solution.



(a > bk means log ba > k)

(a = bk means log ba = k)

(a < bk means log ba < k)

f (n/b)f (n/b) f (n/b)

(1)

…

Recursion tree:

…f (n) a

f (n/b2)f (n/b2) f (n/b2) …ah = logbn

f (n)

a f (n/b)

a2 f (n/b2)

…#leaves = ah= alogbn= nlogba

nlogba(1)

Applications of Divide-and Conquer: The applications of divide-and-conquer methods are-

1. Binary search. 2. Quick sort 3. Merge sort.

Binary Search:

Binary search is an efficient searching technique that works with only sorted lists. So the list must be sorted before using the binary search method. Binary search is based on divide-and-conquer technique.

The process of binary search is as follows: The method starts with looking at the middle element of the list. If it matches with the key

element, then search is complete. Otherwise, the key element may be in the first half or second half of the list. If the key element is less than the middle element, then the search continues with the first half of the list. If the key element is greater than the middle element, then the search continues with the second half of the list. This process continues until the key element is found or the search fails indicating that the key is not there in the list.

Consider the list of elements: -4, -1, 0, 5, 10, 18, 32, 33, 98, 147, 154, 198, 250, 500. Trace the binary search algorithm searching for the element -1.



Sol: The given list of elements are:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

-4 -1 0 5 10 18 27 32 33 98 147 154 198 250 500

Low High

Searching key '-1': Here the key to search is '-1'First calculate mid;Mid = (low + high)/2

= (0 +14) /2 =7

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

-4 -1 0 5 10 18 27 32 33 98 147 154 198 250 500

Low HighMid

>< First Half Second Half >< Here, the search key -1 is less than the middle element (32) in the list. So the search process

continues with the first half of the list.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

-4 -1 0 5 10 18 27 32 33 98 147 154 198 250 500

Low High

Now mid = (0+6)/2 =3.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

-4 -1 0 5 10 18 27 32 33 98 147 154 198 250 500

Low HighMid

>< First Half Second Half >< The search key ‘-1’ is less than the middle element (5) in the list. So the search process continues

with the first half of the list.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

-4 -1 0 5 10 18 27 32 33 98 147 154 198 250 500

Low High

Now mid= ( 0+2)/2 =1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

-4 -1 0 5 10 18 27 32 33 98 147 154 198 250 500

Low HighMid

Here, the search key -1 is found at position 1. The following algorithm gives the iterative binary Search Algorithm Algorithm BinarySearch(a, n, key) { // a is an array of size n elements // key is the element to be searched // if key is found in array a, then return j, such that //key = a[i]



//otherwise return -1. Low: = 0; High: = n-1; While (low high) do { Mid: = (low + high)/2; If ( key = a[mid]) then Return mid; Else if (key < a[mid]) { High: = mid +1; } Else if( key > a[mid]) { Low: = mid +1; } } The following algorithm gives Recursive Binary Search Algorithms Binsearch ( a, n, key, low, high) { // a is array of size n, Key is the element to be searched // if key is found then return j, such that key = a[i]. //otherwise return -1 If ( low high) then { Mid: = (low + high)/2; If ( key = a[mid]) then Return mid; Else if (key < a[mid]) Binsearch ( a, n, key, low, mid-1); Else if ( key > a[mid]) Binsearch ( a, n, key, mid+1, high); } Return -1; } Advantages of Binary Search: The main advantage of binary search is that it is faster than sequential

(linear) search. Because it takes fewer comparisons, to determine whether the given key is in the list, then the linear search method.

Disadvantages of Binary Search: The disadvantage of binary search is that can be applied to only a sorted list of elements. The binary search is unsuccessful if the list is unsorted.

Efficiency of Binary Search: To evaluate binary search, count the number of comparisons in the best case, average case, and worst case.

Best Case: The best case occurs if the middle element happens to be the key element. Then only one comparison is needed to find it. Thus the efficiency of binary search is O(1).

Ex: Let the given list is: 1, 5, 10, 11, 12.

1 5 10 11 12Low HighMid

Let key = 10. Since the key is the middle element and is found at our first attempt.



Worst Case: Assume that in worst case, the key element is not there in the list. So the process of divides the list in half continues until there is only one item left to check.

Items left to search Comparisons so far

16 8 4 2 1

01234

For a list of size 16, there are 4 comparisons to reach a list of size one, given that there is one comparison for each division, and each division splits the list size in half.

In general, if n is the size of the list and c is the number of comparisons, then

C = log2 n Eficiency in worst case = O(log n). ..

Average Case: In binary search, the average case efficiency is near to the worst case efficiency. So the average case efficiency will be taken as O(log n).

Efficiency in average case = O (log n).

Binary Search

Best Case O(1)

Average Case O( log n)

Worst Case O(log n)

T(n)=T(n/2)+1 T(n/2)= T(n/4)+1+1 Put the value of The(n/2) in above so T(n)=T(n/4)+1+1 . . . . T(n/2^k)+1+1+1.....+1 =T(2^k/2^k)+1+1....+1 up to k =T(1)+k As we taken 2^k=n K = log n So Time complexity is O(log n)



11. Merge Sort Merge sort is based on divide-and-conquer technique. Merge sort method is a two phase process-

1. Dividing 2. Merging

Dividing Phase: During the dividing phase, each time the given list of elements is divided into two parts. This division process continues until the list is small enough to divide. Merging Phase: Merging is the process of combining two sorted lists, so that, the resultant list is also the sorted one. Suppose A is a sorted list with n element and B is a sorted list with n2 elements. The operation that combines the elements of A and B into a single sorted list C with n=n1 + n2, elements is called merging. Algorithm-(Divide algorithm) Algorithm Divide (a, low, high) { // a is an array, low is the starting index and high is the end index of a If( low < high) then { Mid: = (low + high) /2;

Divide( a, low, mid); Divide( a, mid +1, high);

Merge(a, low, mid, high); } } The merging algorithm is as follows- Algorithm Merge( a, low, mid, high) { L:= low; H:= high;

J:= mid +1; K:= low; While (low mid AND j high) do {

If (a[low < a[j]) then { B[k] = a[low]; K:= k+1; Low:= low+1; } Else { B[k]= a[j]; K: = k+1; J: = j+1; }

} While (low mid) do { B[k]=a[low]; K: = k+1; Low: =low + 1; }



While (j high) do { B[k]=a[j]; K: = k+1; j: =j + 1; } //copy elements of b to a For i: = l to n do { A[i]: =b[i]; }

} Ex: Let the list is: - 500, 345, 13, 256, 98, 1, 12, 3, 34, 45, 78, 92.

500 345 13 256 98 1 12 3 34 45 78 92

500 345 13 256 98 1 12 3 34 45 78 92

500 345 13 256 98 1

500 345 13 256 98 1

500 345 13 256 98 1

12 3 34 45 78 92

12 3 34

12 3 34

45 78 92

45 78 92

345 500 98 25613 1

13 345 500 1 98 256

500 345 13 256 98 1

3 12 45 7834 92

3 12 34 45 78 92

3 12 34 45 78 92

1 3 12 13 34 45 78 92 98 256 354 500 Sorted List

The merge sort algorithm works as follows- Step 1: If the length of the list is 0 or 1, then it is already sorted, otherwise, Step 2: Divide the unsorted list into two sub-lists of about half the size. Step 3: Again sub-divide the sub-list into two parts. This process continues until each element in the list becomes a single element. Step 4: Apply merging to each sub-list and continue this process until we get one sorted list.



Efficiency of Merge List: Let ‘n’ be the size of the given list/ then the running time for merge sort is given by the recurrence relation.

T(n) =

a if n=1, a is a constant

2T(n/2) + Cn if n>1, C is constant{

Assume that ‘n’ is a power of 2 i.e. n=2k. This can be rewritten as k=log2 n.

Let T(n) = 2T (n/2) +Cn 1 We can solve this equation by using successive substitution.

1Replace n by n/2 in equation, ,we getT (n/2) = 2T(n/4) + Cn

22

Thus, T(n) = 2 2T (n/4) + Cn + Cn 2

= 4T (n/4) + 2Cn

= 4T 2 T (n/8) + Cn + 2Cn4

...

...

...

= 2 k T(1) + KCn= a n + Cn log n

. .. T (n) = O( n log n)

. .. k = log2 n

Worst case O(n log n)

Best case O(n)

Average case O(n log n)

Space Complexity О(n)

12. Quick Sort

The quick sort is considered to be a fast method to sort the elements. It was developed by CAR Hoare. This method is based on divide-and-conquer technique i.e. the entire list is divided into various partitions and sorting is applied again and again on these partitions. This method is also called as partition exchange sorts. The quick sort can be illustrated by the following example 12 6 18 4 9 8 2 15 The reduction step of the quick sort algorithm finds the final position of one of the numbers. In this example, we use the first number, 12, which is called the pivot (rotate) element. This is accomplished as follows- Let ‘i’ be the position of the second element and ‘j’ be the position of the last element. i.e. i =2 and j =8, in this example. Assume that a [n+1] =, where ‘a’ is an array of size n.

12 6 18 4 9 8 2 15

[1] [2] [3] [4] [5] [6] [7] [8] [9]

i j2 8



First scan the list from left to right (from i to j) can compare each and every element with the pivot. This process continues until an element found which is greater than or equal to pivot element. If such an element found, then that element position becomes the value of ‘i’. Now scan the list from right to left (from j to i) and compare each and every element with the pivot. This process continues until an element found which is less than or equal to pivot element. If such an element finds then that element’s position become ‘j’ value. Now compare ‘i’ and ‘j’. If i < >

Now take a sub-list that has more than one element and follow the same process as above. At last,

we get the sorted list that is, we get

2 6 8 9 18 34 56 64 92 The following algorithm shows the quick sort algorithm- Algorithm Quicksort(i, j) { // sorts the array from a[i] through a[j] If ( i



// There is no need for combining solution } } Algorithm Partition (a, left, right) { // The element from a[left] through a[right] are rearranged in such a manner that if initially // pivot =a[left] then after completion a[j]= pivot, then return. Here j is the position where // pivot partition the list into two partitions. Note that a[right]= . pivot: a[left]; i:= left; j:=right; repeat { repeat i: =i+1; until (a[i] ≥ pivot); repeat j: =j-1; until (a[j] < pivot); if( i



This can be rewritten as, y=log2n.

Thus, the total number of comparisons would be O (n) + O (n) + O (n) +……..(y terms) = O (n * y).

. ... .

.Efficency in best case O( n log n) ( y=log2 n)

Worst Case: In worst case, assume that the pivot partition the list into two parts, so that one of the partition has no elements while the other has all the other elements.



Total number of comparisons will be-

Thus, the efficiency of quick-sort in worst case is O (n2). Average Case: Let cA(n) be the average number of key comparisons made by quick-sort on a list of elements of size n. assuming that the partitions split can happen in each position k(1kn) With the same probability 1/n, we get the following recurrence relation.

The left child of each node represents a sub-problem size 1/4 as large, and the right child represents a sub-problem size 3/4 as large.

There are log4/3 n levels, and so the total partitioning time is O(nlog4/3n). Now, there's a mathematical fact that

logan = logbn / logba for all positive numbers a, b, and n. Letting a=4/3 and b=2, we get that

log4/3 n=log n / log(4/3) Quick Sort

Best Case O(n log n)

Average Case O(n log n)

Worst Case O(n2)

Documents

Introduction, Searching and Sorting UNIT-1 · 2020. 9. 22. · Introduction, Searching and Sorting UNIT-1 Blog: anilkumarprathipati.wordpress.com Page 1 1. Introduction: Algorithm: