Algorithms and Complexity I

Algorithms and Complexity, I

Anders Yeo

CS2860 September 2011

Department of Computer ScienceEgham, Surrey TW20 0EX, England

Abstract

The aim of these notes is to provide the reader with a toolbox of useful al-gorithms and a comfortable familiarity with the kinds of data structures thatthey rely on. We do this by studying algorithms in isolation as small programsdoing not-very-useful jobs. We establish some order within the presentationby looking at families of algorithms grouped firstly by their function (that iswhat they are supposed to do) and then by their strategy (that is how theydo what they do). We introduce engineering techniques that provide analysesof running programs and measure the efficiency of a particular implementationrunning with a particular input data set. We also develop mathematical toolsthat allow us to say something about the general behaviour of an algorithm,independently of the kind of computer it is running on and in terms of theamount of data to be processed.

As well as the standard families of sorting and searching problems, we lookat heuristic solutions to combinatorial search problems, and provide literaturepointers to more advanced works dealing with topics such as probabilistic andparallel algorithms.

This document is c© Anders Yeo 2005.

Several parts of this document are inspired by the CS2490 blue book byAdrian Johnstone and Elizabeth Scott.

Permission is given to freely distribute this document electronically andon paper. You may not change this document or incorporate parts of itin other documents: it must be distributed intact.

Please send errata to the author at the address on the title page orelectronically to [email protected].

Contents

1 The efficiency of programs 1

1.1 The speed of computers 2

1.2 The study of algorithms 3

1.3 Is an algorithm a program? 3

1.4 The relationship between data structures and algorithms 4

1.5 Pseudo-code and the telephone book structure 5

1.6 Searching 6

1.6.1 Linear search 6

1.6.2 Binary search 7

1.7 Sorting 9

1.7.1 Bubble sort 10

1.7.2 Insertion sort 11

1.8 Algorithms based on indexable data structures compared to ref-erence based structures 12

1.9 Concrete implementations 12

1.9.1 Bubble sort on an array 13

1.9.2 Insertion sort on an array 13

1.9.3 Bubble sort on the phone book 14

1.9.4 Insertion sort on the phone book 15

1.10 The performance of bubble sort 16

1.11 Rates of growth and tractability 16

1.12 Instrumenting a program – counting operations 17

1.13 Counting operations by hand 21

2 Divide and conquer? 23

2.1 Merge sorts 23

2.1.1 Sorting in two halves 23

2.1.2 Merge sort 0 25

2.1.3 Worst case analysis of merge sort 0 25

2.1.4 Merge sort 1 26

2.1.5 Worst case analysis of merge sort 1 28

2.1.6 Java code for merge sort 1 28

2.2 Quick sort 29

2.2.1 Partition algorithm 30

2.2.2 Worst case analysis of quick sort 30

2.2.3 Java code for quick sort 31

CONTENTS ii

2.2.4 Choosing the pivot value 32

2.2.5 Worst case analysis 35

2.3 Heap sort 35

2.3.1 Heaps 35

2.3.2 Turning arrays into heaps 36

2.3.3 Sorting heaps 39

2.3.4 Complexity analysis of heap sort 39

2.3.5 Java code for heap sort 41

3 So what do we mean by complexity? 43

3.1 Big-O notation - O(f(n)) 43

3.2 Omega notation - Ω(f(n)) 44

3.3 Theta notation - Θ(f(n)) 44

3.4 Small-o notation - o(f(n)) 45

3.5 A hierarchy of orders 46

3.6 Worst case analysis 46

3.7 Best case analysis 48

3.7.1 Best case analysis of bubble sort 48

3.7.2 Best case analysis of merge sort1 48

3.7.3 Best case analysis of quick sort 49

3.8 Average case analysis 50

3.8.1 Expected and typical values 50

3.8.2 Average case analysis of linear search 51

3.8.3 Average case analysis of merge sort 52

3.8.4 Average case analysis of quick sort 52

4 Searching algorithms 54

4.1 Overview 54

4.2 Doing better than linear search 54

4.2.1 Ordering the data? 55

4.2.2 Binary search 55

4.2.3 Interpolation search 55

4.3 Dealing with dynamic data structures 56

4.4 Hash coding 56

4.5 Binary search trees 58

4.5.1 Traversing binary trees 58

4.5.2 Structure of binary search trees 58

4.5.3 Optimum search trees 59

4.5.4 Balanced trees 60

4.5.5 Rotations in a binary search tree 61

4.5.6 Insertion in a balanced binary tree 63

4.5.7 Building balanced binary search trees 66

4.6 Multiway search trees 66

4.6.1 Insertion in multiway search trees 66

4.6.2 B-trees 67

CONTENTS iii

5 String matching 69

5.1 The naive string matching algorithm 705.2 The Rabin-Karp algorithm 72

6 Game Theory 76

6.1 Drop-down tic-tac-toe 766.2 Evaluation functions 806.3 Thoughts on min-max trees 836.4 Alpha-Beta pruning and pseudo codes 866.5 Pseudocode for the min-max algorithm 866.6 Alpha-Beta pruning. 87

6.6.1 Pseudocode for the Alpha-Beta pruning 886.6.2 General thoughts on Alpha-Beta pruning 90

A CS2860 questions 93

B Appendix B: Some useful inductive proofs 96

B.1 Examples 96

Chapter 1

The efficiency of programs

On first acquaintance, computers seem impossibly fast. Forty years ago, nearlyall accounting was done by hand. Armies of clerks used to process payroll datafor large companies but within a very few years between 1955 and 1965 mostof this activity was displaced by the arrival of computer-based payroll systems.These could calculate and print pay slips at a rate that would be astounding ifit had not become so familiar to us1. Neophyte computer users, that is peoplewho are new to computing find it hard to believe that we find our machines aretoo slow.

Amongst everyday computer users a different view prevails which we mightsummarise as ‘there’ll be a faster one along in a moment’ syndrome. Thesetechnology buffs believe that although they might find their machines a bitsluggish sometimes, the hardware manufacturers will soon be able to sell thema machine that is twice as fast for no extra money. There is good reason toaccept this view. The performance of computers in the £1–2,000 bracket hasshown spectacular improvements over the last twenty years. Imagine how itwould be if motor cars showed similar improvements.

There is another commonly held view which is that no matter how muchfaster the hardware becomes, the software vendors will load more and morefunctionality into their products in a way that effectively nullifies the improve-ment and forces everybody to continuously upgrade their computers. This is anatural thing for the computer industry to do: they exist to sell new machinesand new software so it is important to them that products should becomerapidly obsolescent.

As academic computer scientists we need to recognise that there is sometruth in the views of each of the neophytes, the technology buffs and the cyn-ics. However, our task is to study the underlying engineering and theoreticalrealities. We are not interested here in why large software companies do whatthey do, but in what could be done in an ideal world. In this course, we studyfundamentals, not the effects of marketing.

The purpose of this course is to study the art of the possible in computerprogramming. We do this by measuring the performance of computer programsand developing insights that allow us to design programs that are efficient. All

1You can read the story of one of the pioneering commercial computers in [Bir94].

The speed of computers 2

real computer systems are finite: they have limited memory (both core anddisk space); they have limited network bandwidth; and they have limited exe-cution speed. Our aim is to write programs that deliver (provable) acceptableperformance within those finite limitations.

1.1 The speed of computers

Real computer systems also cost real money, and at any given time a com-puter that is ten times as fast an another will cost considerably more thanten times as much. In practice, therefore, high performance computers have ahigh price/performance ratio compared to cheap, low performance computersand buying extra computing resource becomes increasingly expensive as thedemands increase. To make matters worse, at any given time in the history ofcomputing the range of available computers has rarely displayed more than afactor of a few hundred in relative performance. If we need a computer that ismillions of times faster than presently available technology, we are unlikely tobe able to buy it at any price.

These facts of life have in recent years been obscured by the rapid improve-ment in technology with time. For many years during the 1950’s to 1970’s itturned out that execution speed and memory capacity of the best availablecomputers tended to improve by roughly 20–25% per year. During the late1970’s and 1980’s the emergence of the microprocessor fuelled a growth rateof around 30% per year which was, incidentally, coupled to an extraordinaryplummeting in the cost of systems. This increase in the rate of improvementand drop in cost was caused by the shift to highly integrated systems whichcould immediately benefit from the improvements in technology provided bythe integrated circuit processing engineers. In the late 1980’s and through tothe present day the adoption of new ways of designing computers have pro-vided architectural improvements that have in 1994, for instance, generated ayear-on-year performance improvement of 158%. (Source: [HP96, pp 1–3])

These engineering achievements, whilst impressive, still only yield a netperformance improvement for the most powerful available computer of around1,000 times over the last thirty years. The fact that prices have dropped at amuch greater rate has had an enormous impact on society, in that computershave found many new cost-effective applications such as word-processors andspreadsheets. However, the absolute limit on what is feasible to computerise,as opposed to what is cost-effective to computerise, is not much affected by afactor of 1,000 performance increase. The harsh reality of life to the ComputerScientist is that many interesting and deceptively simple problems, such asfinding the shortest route between fifty towns, require such astounding amountsof computation that they are infeasible on any realistic computer2.

2All is not lost! If we are prepared to accept a short route between fifty towns rather thaninsisting on finding the guaranteed shortest route then we do know how to write a programthat finishes in a reasonable time. This will be discussed more in the second term

The study of algorithms 3

1.2 The study of algorithms

In the pioneering days of computing (a period between 1945 and the arrivalof the first commercial computers in the mid-1950’s) programmers and design-ers were for the most part wholly concerned with working out how to programsolutions to individual practical problems rather than philosophising on the gen-eralised aspects of computing. Efficiency was certainly a great concern giventhe low performance of those machines but problems and their solutions wereconsidered together as an amalgam of a particular technique and a particu-lar machine. The generalised study of methods by which problems might beattacked came later after considerable experience had been built up of real pro-grams. These methods are called algorithms to distinguish them from particularprograms that implement the method specified by the algorithm.

By the early 1960’s books on the design and analysis of algorithms werebeginning to appear which offered three things:

1. a compendium of algorithms for solving standard problems,

2. mathematical techniques for analysing the behaviour of algorithms interms of their performance on input data sets of various sizes,

3. taxonomies (classifications by family) of algorithms that reveal their un-derlying strategies and illuminate the issues involved in the design of novelalgorithms.

We will in this course consider all of these three topics.

1.3 Is an algorithm a program?

In short, no, although algorithms are the foundation of programs. A famousbook by the designer of the Pascal language (Niklaus Wirth) is called Algo-rithms + data structures = programs which rather sums up the conventionalview of computer programming. In this view, when programming a solutionto a problem you first select relevant data structures with which to model theelements of your problem inside the computer, then you think of a broad brushsolution to your problem which may be refined into individual operations suchas sorting a set of numbers, locating a particular record and so on. Finally,these basic data level operations are programmed using a particular selectedalgorithm that is known to be efficient on the chosen data structure.

Becoming proficient in this style of program design requires a familiaritywith elementary data structures such as arrays, queues, lists, stacks, trees andgraphs along with a working repertoire of efficient algorithms for completingcommon tasks. Real programs are often made up of a network of largely inde-pendent data structures and algorithms which are used as necessary to assemblean overall solution to some problem. A range of data structures and algorithmsis deployed within each single program and the programmer needs to be familiarwith a rather wide range of techniques if they are to be effective.

The relationship between data structures and algorithms 4

One of the aims of this course, therefore, is to equip you with such a toolboxof algorithms and a comfortable familiarity with the kinds of data structuresthat you have already encountered as part of your elementary programmingcourses. We will do this by studying algorithms in isolation as small programsdoing not-very-useful jobs. We will establish some order within the presentationby looking at families of algorithms grouped firstly by their function (that iswhat they are supposed to do) and then by their strategy (that is how theydo what they do). Along the way, we will introduce engineering techniquesthat allow us to analyse running programs and measure the efficiency of aparticular implementation running with a particular input data set, and wewill also develop mathematical tools that allow us to say something about thegeneral behaviour of an algorithm, independently of the kind of computer it isrunning on and in terms of the amount of data to be processed.

Occasionally, we shall stray away from the conventional doctrine of algo-rithms and data structures by looking at approaches to programming that rep-resent the problem to be solved as a generalised cost that must be manipulatedto find a solution in a way that may involve some statistically random behaviouron the part of the computer. At the end of the course we will have a very brieflook at some of the more intriguing aspects of program design that involve otherunconventional approaches and give some pointers into the relevant computingliterature.

1.4 The relationship between data structures and algorithms

The independent selection of (a) data structures to represent the quantities ina program and (b) algorithms to operate on those structures is rarely helpful.The key observation is that different ways of representing data make explicitdifferent relationships within the data. If the structure makes explicit a usefulproperty of the data, then algorithms working with that property will be moreefficient.

Perhaps the simplest example is the subscriber telephone directory. Thisis a data structure comprising a list of telephone users and their telephonenumbers in alphabetical order of the subscribers name. To find the number fora particular subscriber we look up the subscribers name, which can be donein a variety of reasonably efficient ways, as we shall see. Humans tend to usea rather informal algorithm which we might call flicking through the pages.We make a guess at where in the directory the name might be, based on oursubscriber’s surname. If we find we have opened the directory too far in wejump backwards and try again. If we have found a page that comes before theone we are interested in then we jump forwards and try again. This informalalgorithm, which is based on triangulating down onto the result is the basis ofa very efficient search algorithm called binary search which we shall examine insection 1.6.2.

Imagine, on the other hand, that we had been asked to find the subscriberbelonging to a particular telephone number. Since telephone numbers are allo-cated to names in a way that is for all intents and purposes random, the best

Pseudo-code and the telephone book structure 5

we can do with a conventional telephone directory is to read it entry by entrysearching for the right number. The order in which we search doesn’t matter,but we often call this kind of searching linear or sequential searching becausewe can’t do any better than looking at the numbers one-by-one starting at thebeginning. In the worst case (being given a telephone number that does notappear in the particular directory we are searching) we shall have to check everysingle telephone number before getting a result. Doing this manually would beso time consuming that we might consider it intractable.

If, on the other hand, we had a telephone directory ordered by number, thelookup-by-number problem becomes essentially identical to the lookup-by-nameproblem and may be solved as easily. We can tell, as soon as we see a numbergreater than the one we are looking for that we have gone too far. On the otherhand, searching by name has become intractable unless we keep both copies ofthe telephone directory to hand.

A related problem is the location of an example of a particular kind ofsubscriber such bakers or car repair shops. The telephone companies do provideus with a data structure to make this kind of searching easier: it is called theYellow pages and comprises a classified list of subscribers by business type.Within each class of business the subscribers are again arranged in alphabeticalorder, and it is normal to scan these in turn to find a business which is in theright area. This is another linear search but it is a much less intimidating onebecause the number of subscribers in each class is small – typically a few pagesworth at most.

The order in which data elements are held is only one aspect of the wayin which choice of data structure affects the efficiency of algorithms. It turnsout that representing a list of records as a sorted array allows some searchingoperations to be performed more efficiently than if the list is represented bya linked-list of records, regardless of whether they are sorted or not. On theother hand, putting new records into a linked-list representation is more efficientthan putting new records into a large array of data. We shall have more to sayon the distinction between data structures that allow direct access (indexablestructures) such as arrays and data structures that force sequential access torecords (non-indexable structures) such as linked lists in section 1.8. First,though, we shall look at some real algorithms.

1.5 Pseudo-code and the telephone book structure

We will use the metaphor of the telephone book to explore the properties offour simple algorithms. We shall illustrate most of the algorithms discussedin this section by implementing them for use with the telephone book, and weshall provide a copy of the telephone book and Java implementations of someof the algorithms for you to experiment with.

We are going to use an informal pseudo-code to describe most of the algo-rithms, which looks basically like a simple mixture of C++, Java and Pascal.Later on, we shall also give some real implementations of these algorithms inJava. In general we shall use the Pascal symbol, :=, for assignment and the Java

Searching 6

symbol, ==, for ‘equals’. If an element, array say, is an array, we shall writearray[i] for the entry at the ith index in the array, and as in Java, array indexeswill start at 0. If an element is a record, record say, with fields name1, name2etc, then we shall write record.name2 for the entry in field name2 of record.We shall also use a generic print statement which takes as many arguments asyou like, separated by commas. Arguments enclosed in quotes are treated asstrings and printed literally, all other arguments are treated as identifiers andtheir values will be printed. We shall declare identifiers in C/Java style, forexample

integer n /* declares an integer n */

integer a[6] /* declares an array of integers of size 6 */

phone_book_type /* declares a new type which has a record structure */

string name

integer number

phone_book.name := "john" /* assigns the string john to the record */

1.6 Searching

Searching is perhaps the most common large-scale application for computersystems. All kinds of organisations now hold databases ranging in size from afew dozen to many millions of records. Of course, databases are not useful unlesswe can also provide efficient algorithms to retrieve information by searching forrecords with a particular property. We will now examine two algorithms withvery different efficiencies: the linear and binary searches.

1.6.1 Linear search

For our first example, let us code up the steps that a human would have to gothrough if they had to perform the lookup of a person in a telephone directoryby number. We will begin with perhaps the simplest search algorithm: thelinear search. The idea is to simply scan up sequentially through the phonebook records until we find a name that matches the target.

Since this is our first algorithm, we include the declarations and initialisa-tions described above for completeness.

Searching 7

1 phone_book_type integer number,

string name

2

3 integer phone_book_size := 60_000_000

4 phone_book_type phone_book[phone_book_size]

5 integer index := 0

6

7 while (index < phone_book_size) &&

(phone_book[index].number != search_target.number)

8 index := index + 1

9

10 if index == phone_book_size

11 print("Target ", target.number, " not found")

12 else

13 print("Target ", target.number,

" belongs to ", phone_book[index].name)

In line 5 a variable called index is declared and initialised to zero, the indexvalue for the first record in the array phone_book[]. (The code for loadingthis array has not been included.) The while loop in lines 7–8 then scans upthrough the list comparing the number in the target record with the phonenumbers in the array.

The while loop condition is in two parts, so there are two ways in whichthe loop might terminate: either a matching record might be found or the endof the array might be encountered. The if statement in lines 10–13 checks tosee whether the index has run off the end of the array, and if not it prints outthe name of the corresponding subscriber.

In the worst case (a missing number) this algorithm will compare the targetagainst every record in the phone book, so its execution time will grow inproportion to the length of the list. Can we do any better? Well, if nothing isknown about the ordering of the numbers in directory then no. If on the otherhand we can preprocess the directory by sorting it into numerical order then avery efficient algorithm called binary search may be used.

1.6.2 Binary search

The binary search is a distillation of the algorithm that people seem to usewhen looking up a name in the telephone directory. In section 1.4 above wenoted that people first guess where in the telephone directory the name is, andthen jump forwards or backwards as necessary, ‘homing in’ on the right answer.

The guesses that people make are based on their knowledge of the distribu-tion of names in everyday life: it is pretty likely that a name beginning withthe letter C will be near the front, so we guess a little way in. In general, whenwe are sorting arrays of data we do not know the distribution of the data, sowe cannot exploit such knowledge. We want to balance the two probabilitiesthat the record we want will be (a) before our first guess and (b) after our firstguess. In the absence of any information on the distribution, the best we cando is to choose the record in the middle of the array. Over a large number of

Searching 8

random choices of target record, we would expect roughly half of them to bebefore our guess and half of them after.

Having made our initial guess, we can look to see if we have found theright record (in which case we stop) or if we have to guess again. If we mustguess again, we can narrow the choice down to one half of the array. This is apowerful technique: simply by making one comparison between the target andthe record in the middle of the array we have managed to rule out half of therecords in our array.

For our second guess, we cannot, over the long run, do better than to guessthe record that is in the middle of the half of the array containing our target.The guess may succeed, but if not we can simply throw away half of thatsegment and guess again in the middle of the other half of the segment. Wecontinue in this way, discarding half of the remaining records at each step untileither we find a record that matches the target or we run out of records tocheck.

5 integer index

6

7 binary_search(integer low_index, high_index)

8

9 if low_index > high_index

10 index := -1 // search failed

11 else

12 integer mid_index := (low_index + high_index) / 2

13 if phone_book[mid_index].number == target.number

14 index := mid_index // search succeeded

15 else // search continues

16 if target.number < phone_book[mid_index].number

17 binary_search(low_index, mid_index - 1)

18 else

19 binary_search(mid_index + 1, high_index)

20

21

22

23

24 binary_search(0, phone_book_size - 1)

25

26 if index == -1

27 print("Target ", target.number, " not found")

28 else

29 print("Target ", target.number,

" belongs to ", phone_book[index].name)

As in our linear search algorithm, we declare a variable called index in line5, although in this case it is only used to hold the final result of the searchrather than playing any role in the search algorithm itself. The printing of theresults in lines 26–29 is made a little easier because this algorithm sets the valueof index to -1 (an illegal array index value) if the search fails.

The body of the code in lines 7–22 defines a recursive function calledbinary_search that first checks the middle record, and if that is not the record

Sorting 9

sought, calls itself again on one half of the array. The whole sort process isstarted by a call to the function binary_search in line 24 (note that 0 is thefirst index of the array and phone_book_size - 1 is the last index of the arrayto be sorted).

You will recall that the linear search required, in the worst case, every recordto be compared against the target and so execution times rose in proportionto the number of records in the phone book. The worst case for searching al-gorithms is in general triggered by being asked to search for a record that isnot in the set being searched. What is the worst case for binary search? Well,at each stage of the process we discard (approximately) half of the remainingrecords. Eventually we no records left, at which point the process must termi-nate. The worst case performance for n phone book records therefore amountsto the number of times we can apply the divide-by-two operation to n, ignoringfractions, until we have less than one left. This number is the logarithm to base2 of n, written log2 n. In this course, when we speak of logarithms they areusually to base 2, so we will simply write log n. We will insert an explicit baseif necessary.

Let us compare the number of comparisons required by the linear and binarysearch algorithms for different sized phone books.

Linear search Binary searchapproximately

(n) (log n)

16 4256 8

65536 161,048,576 20

1,073,741,824 30

Note that the above values are approximate, and we will compute moreaccurate values later in this blue book.

We can store a thousand-million records (one billion in the US) and searchthem all in only 30 steps – a truly remarkable result. To achieve this, all weneed to do is pre-sort the records by the field that we are interested in. Let usnow look at some sorting algorithms.

1.7 Sorting

Watching people sort can give some insights. Given a stack of cards to sort,most people perform something close to what we call an insertion sort. Theypick a card off the top and then try and place it within the body of the cardsin a sequence with its neighbours. The choice of cards, and the places they areinserted into the deck seem to depend on the cards as they present themselvesalong with the whims of the (human) sorter. We shall look at a more methodicalway of performing insertion sorts in the next section. First we look at whatmany people find to be the conceptually easiest sorting algorithm: the bubblesort.

Sorting 10

1.7.1 Bubble sort

Imagine that our phone book is in random order and we want to sort into orderby number. We can examine first two entries. If the lower entry rightfullybelongs above the upper entry then we can exchange them, and they will nowbe in correct relative order. Now let us examine entries two and three. Entrytwo will be the result of our last comparison, but it might need to be aboveentry three in which case we exchange again. We keep doing this, checkinga pair, optionally exchanging and then using the highest result as part of thenext pair until we have worked our way up to the top of the array. After thispass we can guarantee that the highest entry in the phone book really is thehighest (largest) phone number. We say that this number has ‘bubbled’ up tothe top, hence the name. This is all very well, but we can only guarantee thatthe highest element is now in its correctly sorted position. We need to repeatthe process again looking at every remaining pair of elements up to but notincluding the last one to get the next sorted record in place. In fact, we needto make as many passes as there are records in the phone, less one (why?)

Here is a worked example for a set of six telephone numbers.

index Pass 1 Pass 2 Pass 3 Pass 4 Pass 55 4601007 7843425 7843425 7843425 7843425 78434254 1737592 4601007 6297732 6297732 6297732 62977323 5641863 1737592 4601007 5641863 5641863 56418632 6297732 5641863 1737592 4601007 4601007 46010071 3772980 6297732 5641863 1737592 3772980 37729800 7843425 3772980 3772980 3772980 1737592 1737592

Here is some pseudo-code to bubble sort our pre-loaded phone_book array.

5 integer pass

6

7 for pass from 1 to phone_book_size - 1 do

8

9 integer element

10

11 for element from 0 to (phone_book_size - 1) - pass do

12 if (phone_book[element].number > phone_book[element + 1].number)

13 phone_book_record_type temp := phone_book[element]

14 phone_book[element] := phone_book[element + 1]

15 phone_book[element + 1] := temp

16

17

18

The bubble sort comprises two nested for loops, one based on the inductionvariable pass which scans across the entire phone book and an inner one basedon element which scans across the unsorted part of the array.

For each value of element, we check the book entries phone_book[element]and phone_book[element + 1]. If they are out-of-order then we interchangethrough a temporary record called temp.

Sorting 11

It is easy to see that a bubble sort of n records requires n−1 comparisons onthe first pass, n−2 on the second and so on down for a total of n−1 passes. Itsexecution time is therefore proportional to the square of the number of records.We will discuss this more later in this chapter.

1.7.2 Insertion sort

Bubble sort is a type of exchange (or interchange) sort, so called because thedominant action is the swapping over of two elements. Exchanging is a naturaland easy operation to implement because it is localised : when we exchange twoelements the rest of the array is unchanged.

As we have already noted, people tend to use a different style of sortingin which they make a small sorted sequence and then insert elements into it.The problem with this from our point of view as computer programmers is thatthe only way we can insert an element into an array is to make a gap at therequired place by moving part of the array up one slot. We will now look at aversion of the insertion sort which attempts to minimise this disruption to thearray by inserting in a very methodical fashion.

A naıve implementation of insertion sort might start searching our already-sorted section of the array at the first record and scan up until it finds anelement that is greater than the target element, then continue the scan up tothe end of the array, moving elements up so as to make a space for the elementto be inserted. Actually, we can do a little better than this by recognising thatwe are going to have to move elements up anyway, so we might as well combinethe searching and moving operations into a single scan of the top part of thearray in which elements are checked and moved at the same time. Here is analgorithm which takes an unsorted array phone_book[] and sorts it in place,using a single temporary phone book record.

5 integer pass

6

7 for pass from 1 to phone_book_size-1 do

8 integer count

9 phone_book_type temp := phone_book[pass]

10

11 count :=pass - 1

12 while (temp.number < phone_book[count].number)

&& (count>=0) do

13 phone_book[count + 1] := phone_book[count]

14 count := count -1

15

16 phone_book[count + 1] := temp //insert

17

This algorithm makes n − 1 passes over the phone book, controlled by thefor pass ... do loop which starts at line 7. On each pass, it starts at therecord indexed by pass which it copies to a temporary variable temp. Thewhile loop in lines 12–15 then runs down the array moving each element up byone until it finds an element that is less than or equal to temp. At this point,

Algorithms based on indexable data structures compared to reference based structures 12

all of the elements that are greater than temp have been moved up by one slot,and there is a ‘gap’ beneath them which in line 16 is filled by inserting temp.

Is this algorithm any faster than bubble sort? Well, in the worst case wemight have to check every record on each pass, so there would be 1 + 2 + 3 +. . . + (n − 2) + (n − 1) + n comparisons altogether. This is the same as forthe bubble sort, so in this worst case there seems no advantage. We shall lookat the algorithm’s behaviour versus bubble sort again later and try them onpartially sorted data.

It turns out that it is much easier to analyse the worst case behaviour ofan algorithm than any form of ‘average’ case because algorithm performanceis usually data dependent, and so we need to know how well the data is or-dered before we start sorting to say anything useful. These two algorithmstend to perform better on partially sorted data. Remarkably, one of the high-performance sorting algorithms that we shall look at later actually performsworse on ready-sorted data than on random data!

1.8 Algorithms based on indexable data structures compared

to reference based structures

We have already seen that the order in which data is stored critically affectsthe efficiency of algorithms. Another fundamental aspect of our data structuresis indexing. An indexable data structure such as our array of telephone bookrecords can be interrogated by record number. A non-indexable structure, suchas a linked list, does not allow this direct (or random) access.

If we put all of our records into a linked list then we have to access themsequentially, but not all non-indexable structures are as inefficient as that: ifwe were to load the records as leaves of a binary tree in some predefined orderthen we might be able to access an individual record in fewer steps by tracingdown tree nodes. We will examine this idea in detail in Chapter 4.

For now we note that some algorithms require indexable access (binarysearch, for instance) and so are unsuitable for implementation on a non-indexablestructure. Some other algorithms only make sequential access to the data (suchas insertion sort), and these can be implemented on both indexable and non-indexable structures. We also note that linked structures allow insertion inconstant time, whereas arrays require all records to be moved during an inser-tion in the worst case. Finally, linked structures impose a space overhead whichwill be at least proportional to the number of records, and might be greater inthe case of a tree structure.

1.9 Concrete implementations

We use pseudo-code so as to suppress some of the detail of our implementations,but we also need implementations in real languages. In this section we give arepresentative implementation of bubble sort and insertion sort in Java.

Concrete implementations 13

1.9.1 Bubble sort on an array

In the following example Java program we declare a four element array of in-tegers and write a Java function called Bubble() that performs the bubblesort.

// Bubble sort running on a 4-element integer array

public class Bubble

int s[] = 4, 3, 2, 1;

private int n, i, j, temp, pass;

public Bubble()

n=4;

System.out.println("The unsorted array: ");

for (i = 0; i < n; i++)

System.out.println(s[i]);

for (pass=1; pass<n; pass++)

for(i = 0; i < n-pass; i++)

if(s[i] > s[i+1])

temp = s[i];

s[i] = s[i+1];

s[i+1] = temp;

System.out.println("The sorted array: ");

for (j = 0; j<n; j++)

System.out.println(s[j]);

public static void main(String[] arg)

Bubble b = new Bubble();

In this case, the translation between pseudo-code and Java is rather straight-forward. Sometimes it is not quite as simple.

1.9.2 Insertion sort on an array

In the following example Java program we declare a four element array of inte-gers and write a Java function called Insertion() that performs the insertionsort.


// Insertion sort running on a 4-element integer array

public class Insertion

int s[] = 4, 3, 2, 1;

private int n, i, j, temp, flag;

public Insertion()

n=4;

System.out.println("The unsorted array: ");

for (i = 0; i < n; i++)


for(i = 1; i < n; i++)

temp = s[i];

for(j = i-1; j >= 0 && temp < s[j]; j--)

s[j+1] = s[j];

s[j+1] = temp;


for (i = 0; i<n; i++)



Insertion b = new Insertion();

In this case, the translation between pseudo-code and Java is again straightfor-ward.

1.9.3 Bubble sort on the phone book

The Java functions above only works on arrays of integers. In practice, we areusually sorting arrays of compound records, and we may be sorting on names,numbers or even combinations of the two. Before giving the Java code for thistype of bubble sort, we need to define the phone book as follows.

public class PhoneBook

String number, name;

public PhoneBook(String n, String s)

number = n;

name = s;

public String getNumber()

return number;


public String getName()

return name;

If the above code is saved in a file called ”PhoneBook.java” and the belowcode is saved in a file called ”Bubble phone.java”, then the first 1000 entriesin the file ”phone.ran” will be sorted (these files can be downloaded from thecourse webpage). The output will appear in the file ”phone.bubble”.

// ===================== CS2860 example of Bubble sort on a phone book ===========

import java.io.*;

import java.util.*;

class Bubble_phone

public static void main (String[] arg) throws IOException, FileNotFoundException

int size_of_phonebook=1000;

int i,k;

PhoneBook temp;

PhoneBook[] array = new PhoneBook [size_of_phonebook];

String s, temp1, temp2;

// ============ Read in phone book from "phone.ran" ===============================

BufferedReader inFile = new BufferedReader(new FileReader("phone.ran"));

for (i=0; i<size_of_phonebook; i++)

s=inFile.readLine();

StringTokenizer stok=new StringTokenizer(s);

temp1 = stok.nextToken();


array[i] = new PhoneBook(temp1, temp2);

// ============ Sort the phone book =============================================


for (k=0; k<size_of_phonebook-i; k++)

if ((array[k].name).compareTo(array[k+1].name)>0)

temp = array[k];

array[k] = array[k+1];

array[k+1] = temp;

// ============== Save the sorted phone book in "phone.bubble" ====================

PrintWriter outFile=new PrintWriter(new BufferedWriter(new FileWriter("phone.bubble")));


outFile.println(array[i].getNumber() + " " + array[i].getName());

outFile.close();

1.9.4 Insertion sort on the phone book

We will in this section give the Java code for insertion sort on the phonebook. As in the previous section you will need the files ”PhoneBook.java” and”phone.ran”. The following code (saved as ”Insertion phone.java”) will nowsort the phone book in ”phone.ran” and output the result in ”phone.insertion”.

The performance of bubble sort 16

// ===================== CS2860 example of Insertion sort on a phone book ===========

import java.io.*;

import java.util.*;

class Insertion_phone

public static void main (String[] arg) throws IOException, FileNotFoundException

int size_of_phonebook=1000;

int i,j;

PhoneBook temp;

PhoneBook[] array = new PhoneBook [size_of_phonebook];

String s, temp1, temp2;

// ============ Read in phone book from "phone.ran" ===============================

BufferedReader inFile = new BufferedReader(new FileReader("phone.ran"));


s=inFile.readLine();

StringTokenizer stok=new StringTokenizer(s);



array[i] = new PhoneBook(temp1, temp2);

// ============ Sort the phone book =============================================


temp = array[i];

for (j=i-1; j>=0 && (temp.name).compareTo(array[j].name)<0; j--)

array[j+1] = array[j];

array[j+1] = temp;

// ============== Save the sorted phone book in "phone.insertion" ====================

PrintWriter outFile=new PrintWriter(new BufferedWriter(new FileWriter("phone.insertion")));


outFile.println(array[i].getNumber() + " " + array[i].getName());

outFile.close();

1.10 The performance of bubble sort

It is instructive to look at the behaviour of algorithms in terms of the way theirperformance changes as we present larger and larger input data sets. Table 1.1shows the performance of a bubble sort algorithm working on a decreasing arrayand on a random array. The leftmost column shows the number of elementsbeing sorted in each case. In the body of the table we show the number of copyand comparisons the algorithm uses (this will be explained later). We also showthe same data but normalised against n2, that is divided by n× n.

From the table we can see that both our measures do grow roughly as n2

once we get a large enough number of records.

1.11 Rates of growth and tractability

Our sorting algorithms require work proportional to n2, but the binary searchalgorithm requires computation proportional to log2 n. What does this mean

Instrumenting a program – counting operations 17

number of Decreasing sequence Random sequenceelements copy & compare (copy & comp)/n2 copy & compare (copy & comp)/n2

10000 199980000 1.9998 124678258 1.246786666 88857780 1.9997 55199379 1.242234444 39489384 1.99955 24742702 1.252852962 17540964 1.99932 10969846 1.250351974 7789404 1.99899 4857715 1.246631316 3461080 1.99848 2150978 1.24201877 1536504 1.99772 958627 1.24638584 680944 1.99658 424991 1.2461389 301864 1.99486 189112 1.24974259 133644 1.99228 85462 1.27401172 58824 1.98837 36148 1.22188114 25764 1.98246 17068 1.3133376 11400 1.97368 7476 1.2943250 4900 1.96 2900 1.1633 2112 1.93939 1335 1.225922 924 1.90909 514 1.0619814 364 1.85714 217 1.107149 144 1.77778 96 1.185196 60 1.66667 40 1.111114 24 1.5 12 0.75

Table 1.1 Performance of bubble sort

in practice? Table 1.2 shows the growth of some functions for values between10 and 100. These show us that functions such as n2 and n3 grow rapidly, butthat they are completely swamped by the exponential and factorial functions.In practice, if an algorithm can be shown to be exponential or worse then wesay that it is an infeasible algorithm, because even the fastest computer willquickly become overwhelmed by the required computation as we increase thesize of the input data set. If we could show that a given problem admitted onlyexponential algorithms then we would say that the problem was intractable.

1.12 Instrumenting a program – counting operations

How do we find out the behaviour of an algorithm? One way is to work throughthe algorithm by hand counting the number of operations, we shall do this atthe end of this section. Another method is to add some book-keeping code tothe implementation and get the program to do the counting for us. We shalllook at this approach below. It is also possible to get a debugger to step throughan implementation or to run a profiling program which reports the amount oftime spent in the various sections of your code.

If we wanted to decide how long time a program will take to complete itstask, we would need to know exactly how long every possible command takes toexecute, and then count how many times each command is performed. Howeverthis would normally be to complicated and take way to long to do, so thereforewe simplify this by just counting ”important” operations.


n log2(n) n log2(n) n2 n3 2n n!

10 3.32 33.22 100 1,000 1.02E+03 3.63E+0620 4.32 86.44 400 8,000 1.05E+06 2.43E+1830 4.91 147.21 900 27,000 1.07E+09 2.65E+3240 5.32 212.88 1,600 64,000 1.10E+12 8.16E+4750 5.64 282.19 2,500 125,000 1.13E+15 3.04E+6460 5.91 354.41 3,600 216,000 1.15E+18 8.32E+8170 6.13 429.05 4,900 343,000 1.18E+21 1.20E+10080 6.32 505.75 6,400 512,000 1.21E+24 7.16E+11890 6.49 584.27 8,100 729,000 1.24E+27 1.49E+138100 6.64 664.39 10,000 1,000,000 1.27E+30 9.33E+157

Table 1.2 Growth of functions

An important operation can be operations that take extra long time toperform, or the operation that is performed the most times. If we choose thecorrect operation (or operations) to be important, and then just count thenumber of times they are performed, then this will give a very good indicationof how long time a program will need to complete its task.

We will for the remainder of this course consider the record comparisonsand record copy operations as our important operations. For most searching,sorting and other data structures, this turns out to be the most ”important”operations. There are several reasons for this, which we shall not go into here,but the most important reason is that copy and compare operations are theoperations that are performed most times in most of our algorithms.

A record comparison is any time we compare an element of our array (record)to some other value (which may also be an element of our array). A record copyoperation is an operation where an element of our array is either assigned somevalue, or its value is assigned to some other variable.

If we consider the array s[], then statements of the following form are recordcomparison.

⋄ If (s[i] == 7) j = i;

⋄ If (s[7] < s[j]) i = 10;

⋄ If ((s[21] > 9) && (s[i] < s[j])) ... (This is in fact two compare opera-tions).

⋄ If (s[j]! = 11) s[4] = 11; (This is also counted as a record copy).

⋄ for (i = 1; i < s[4]; i++)... (This is counted every time the check ”i < s[4]”is made, which may be many times).

The following statements are considered as record copy operations.

⋄ s[4] = 11;

⋄ s[i] = s[j];


⋄ s[2] = s[8] ∗ s[11] + s[i]/2;

⋄ i = 27 + 22 ∗ s[i];

⋄ j = s[3] + 1;

Even though several array entries are looked at in some of the above exam-ples, they are (for simplicity) only considered as one operation.

How many copy operations are performed in the below code? How manycompare operations are performed?

s[3]=4;

For (s[2]=2; s[2]<s[3]; j=s[2])

s[2]=s[2]+1;

The answer is that the following copy and compare operations are per-formed.

s[3]=4 copy

s[2]=2 copy

s[2]<s[3] compare

s[2]=s[2]+1 copy (s[2] is now 3)

j=s[2] copy

s[2]<s[3] compare

s[2]=s[2]+1 copy (s[2] is now 4)

j=s[2] copy

s[2]<s[3] compare

So there is 6 copy operations and 3 compare operations. Now that we knowwhat a copy and compare operation is, lets return to our simplified example, letus add some code to instrument the program. We are interested in the contentsof the sort array at each step, and the number of record comparison or recordcopies. We add output statements to print the contents of the array, and use alocal variable ops to count the number of basic operations:


// Chatty Insertion sort running on a 4-element integer array

public class Insertion_chatty

int s[] = 4, 3, 2, 1;

private int n, i, j, k, temp, flag, ops;

public Insertion_chatty()

ops=0;

n=4;

System.out.print("Initial: s=");

for (i = 0; i < n; i++)

System.out.print(s[i]);

System.out.println(", ops=" + ops);

for(i = 1; i < n; i++)

temp = s[i]; ops++;

for(j = i-1; j >= 0 && temp<s[j]; j--)

ops++; // This is for the comparison temp<s[j]

s[j+1] = s[j]; ops++;

System.out.print("i=" + i + " j=" + j + ", s=");

for (k = 0; k < n; k++)

System.out.print(s[k]);


if (j>=0)

ops++; // This is because every time the loop exits

// and j>=0, then we did check temp<s[j].

s[j+1] = temp; ops++;

System.out.print("Final: s=");

for (i = 0; i < n; i++)

System.out.print(s[i]);



Insertion_chatty ic = new Insertion_chatty();

When we run the above program we get a good idea of how the algorithmworks, as well as a count on the number of copy and compare operations. Infact if we run he above code we get the following output:

Counting operations by hand 21

Initial: s=4321, ops=0

i=1 j=0, s=4421, ops=2

i=2 j=1, s=3441, ops=6

i=2 j=0, s=3341, ops=8

i=3 j=2, s=2344, ops=12

i=3 j=1, s=2334, ops=14

i=3 j=0, s=2234, ops=16

Final: s=1234, ops=18

1.13 Counting operations by hand

In this section we are going to compare the bubble and insertion sorts by count-ing the numbers of compare and copy operations carried out in the worst case.Recall the Bubble sort algorithm.

Bubble sort

for i from 1 to n-1 do

for j from 0 to n-1-i do

if ( S[j] > S[j+1] )

x := S[j]

S[j] := S[j+1]

S[j+1] := x

If we count the compare operations carried out when we input the array[n, n-1, ... , 2, 1], we get the following.

Comparei ARRAY S Operations

n, n− 1, n − 2, n − 3, ..., 11 n-1

n− 1, n − 2, n − 3, ..., 1, n2 n-2

n− 2, n − 3, ..., 1, n − 1, n3 n-3

............... ...

............n-1 1

1, 2, 3, ..., n − 1, n

TOTAL = n(n−1)2

Note that when i = 1 the variable j takes on the values 0, 1, ..., n − 1 − 1which is n − 1 distinct values, and for each value we perform one compareoperation (S[j] > S[j + 1]). When i = 2 the variable j takes on the values0, 1, ..., n − 1 − 2 which is n − 2 distinct values. This continues as illustrated

Counting operations by hand 22

above. Therefore we have 1+2+3+ . . .+(n− 1) compare operations. see thatthis is exactly n(n− 1)/2 compare operations in class (see also Appendix B).

So how many copy operations do we use? By considering how the algorithmworks, we note that if we start with a decreasing sequence, then every time weperform the comparison S[j] > S[j+1] it will be true. So every time we performa comparison we will perform 3 copy operations (x := S[j], S[j] := S[j+1] andS[j + 1] := x). So the number of copy operations is 3n(n− 1)/2.

Therefore Bubble sort will use 2n(n− 1) copy and compare operations on adecreasing sequence. How many operations does insertion sort use?

Insertion sort


x := S[i]

j := i- 1

while (j>=0) && (x < S[j]) do

S[j+1] := S[j]

j:=j-1

S[j+1] := x

If we count the operations carried on input array [n, n-1, ... , 2, 1]

for this algorithm we get (1+2+1)+(1+4+1)+. . .+(1+2n−2+1) = (n+2)(n−1)copy and compare operations. We will in class see why the above equation istrue, but can you already see why it holds?

So in the worst case the insertion sort algorithm is slightly better thanbubble sort.

Chapter 2

Divide and conquer?

In this section we shall begin by counting record copy and record compare oper-ations separately, and consider the worst case performance of various algorithmsaccording to these values.

Recall that in the worst case, sorting n elements using a bubble sort cantake Copybn = 3n(n − 1)/2 record copy operations and Compbn = n(n − 1)/2record compare operations (the superscript b stands for bubble sort).

Note that Copyb100 = 14850 and Compb100 = 4950, while Copyb50 = 3675and Compb50 = 1225. Sorting half of the list only needs about a quarter of thenumber of operations, and sorting two lists of length 50 takes half as long assorting one list of length 100.

Generally,

Compbn/2 =n2 ×

(

n2 − 1

)

2=

n(n− 2)

8

Copybn/2 = 3×n2 ×

(

n2 − 1

)

2=

3n(n− 2)

8

So sorting two equal halves takes about half the number of operations re-quired for one full length list.

Perhaps we can improve efficiency by splitting the list into two parts, sortingeach part and then recombining them.

(This chapter only contains outline material. You will need to take notes inlectures and possibly read up about the algorithms discussed in text books.)

2.1 Merge sorts

2.1.1 Sorting in two halves

We assume that we have an array S of size n with the elements up to the(n/2) − 1 position sorted and the elements from the n/2 position separatelysorted (e.g. [2, 5, 7, 11, 1, 8, 12, 14]). The following pseudocode will merge thetwo halfs into a sorted array, using a insertion-sort type approach.

Merge sorts 24

int k := n/2

for i from k to n-1 do

int temp := S[i]

j:=i-1

while (j>=i-k) && (temp < S[j])

S[j+1] := S[j]

j:=j-1

S[j+1] := temp

To calculate the ‘order’ of this algorithm we use the following result (seeappendix B, for a proof of this):

Theorem 1 For n ≥ 0, 1 + 2 + 3 + 4 + . . . + n = n(n+ 1)/2.

In worst case the algorithm uses the following number of record copy andcompare operations (when n is even).

Copymn = (1+k+1)(n−1−k+1) =(n+ 4)n

4Compmn = k(n−1−k+1) =

n2

4

The above is computed by noting that the outer loop (ie ”for i from k ton-1”) has n−1−k+1 = n/2 iterations and the inner loop (i.e. ”while (j>=i-k)&& (temp < S[j])”) has at most k iterations. So if we sort the two halves ofthe array using bubble sort and then merge them using the above algorithm,the result is still of order n2 but the sort cost is (see the previous page for thenumber of copy and compare operations for bubble sort on an array of sizen/2).

Copy − operations =3n(n− 2)

8+

3n(n− 2)

8+

n(n+ 4)

4= n2 − n

2

Compare− operations =n(n− 2)

8+

n(n− 2)

8+

n2

4=

n2 − n

2

giving a total of

3n2

2− n

operations, instead of

2n2 − 2n

copy and compare operations for bubble sort. Thus the ‘constant of pro-portionality’ is lower and the sort will be faster.

Merge sorts 25

2.1.2 Merge sort 0

If it is worth splitting the array in half to sort then it is worth splitting eachpart to sort it. We end up with a recursive algorithm.

We need to be able to merge part of the array, so we provide the first andlast index of the region whose entries are to be merged.

merge0(int low, int high, int S[])

int k := (high - low + 1)/2

for i from low+k to high do

int temp := S[i]

j := i-1

while (j>=i-k) && (temp < S[j])

S[j+1] := S[j]

j:=j-1

S[j+1] := temp

which in worst case takes a total of n2/2 + n record copy and compareoperations (see previous page). We can write the sort routine recursively

merge_sort0(int low, int high, int S[])

if low < high


merge_sort0(low, low+k-1, S)

merge_sort0(low+k, high, S)

merge0(low, high, S)

To save writing out two essentially equivalent calculations for each algo-rithm, from now on we shall count the total number of record compare andcopy operations rather than counting the two numbers separately.

2.1.3 Worst case analysis of merge sort 0

We shall now show that in worst case merge sort 0 can take

W (n) = n(n− 1) + n log n

operations to sort an array of size n.

To calculate the order of merge sort 0 we use

Theorem 2 1. 1 + 2 + 22 + . . . + 2r = 2r+1 − 1

2. 1 + 1/2 + 1/22 + . . . + 1/2r−1 = (2r − 1)/2r−1

Merge sorts 26

Suppose that merge_sort0 takes W (n) record copy and compare operationsin worst case. What is W (n)?

If n = 1 then low = high and so W (1) = 0. We have already seen thatmerge0(l, l+n-1, S) takes

n2

2+

2n

2operations in the worst case. So, if n is even (so high = n− 1 is odd),

W (n) = W (n/2) +W (n/2) + (n2

2+

2n

2) = 2W (n/2) + (

n2

2+

2n

2).

If n/2 is even

W (n/2) = 2W (n/4) + (n2

8+

2n

4)

and so

W (n) = 22W (n/4) + (n2

4+

2n

2) + (

n2

2+

2n

2).

If n = 2r we have

W (n) = 2rW (1) + (n2

2r+

2n

2) + . . . + (

n2

2+

2n

2).

Recall that if n = 2r then log2 n = r, and that

log ab = b log a, 2log2 n = n.

There are r = log2 n terms in the equation for W (n) above and W (1) = 0so we have

W (n) =n2

2(

1

2r−1+

1

2r−2+ . . .+

1

2+ 1) +

2n log n

2

W (n) =n2

2(2r − 1

2r−1) + n log n

W (n) = n(n− 1) + n log n

At the end of the day this sort is still of order n2.

2.1.4 Merge sort 1

[TA, p.415] [N&N, p.52]Part of the problem with the merge part of the sort as we have described

it so far is that when we find where to insert the current element we have tomove all of the elements above that point up one place. In the worst case thetop element in the first half of the array gets moved n/2 times.

We now consider having a second array which we use to put the elementsonce we have established their correct position (the details of this algorithmwill be given in class).

Merge sorts 27

int U[]

int k := n/2

int j := 0

int p := 0

int i := n/2

for q from 0 to n-1

U[q] := S[q]

while (j <= n/2-1) and (i <= n-1) do

if ( U[j] <= U[i] )

S[p] := U[j]

j := j+1

else

S[p] := U[i]

i := i+1

p := p+1

if (j <= n/2 -1)

for q from p to n-1 do

S[q] := U[j]

j := j+1

In worst case this takes 3n− 1 record copy and compare operations.

We now use the above merge as the basis for a recursive sort algorithm.

merge1(int low, int high, int S[], U[])


for q from low to high

U[q] := S[q]

int j := low

int p := low

int i := low + k

while (j <= low + k - 1) and (i <= high) do

if ( U[j] <= U[i] )

S[p] := U[j]

j := j+1

else

S[p] := U[i]

i := i+1

p := p+1

if (j <= low + k - 1)

for q from p to high do

S[q] := U[j]

j := j+1

Merge sorts 28

merge_sort1(int low, int high, int S[], U[])

if low < high


merge_sort1(low, low+k-1, S, U)

merge_sort1(low+k, high, S, U)

merge1(low, high, S, U)

2.1.5 Worst case analysis of merge sort 1

Suppose that the worst case record copy and compare operations for merge sortis W (n). Then we have

W (n) = 2W (n/2) + 3n− 1.

Using the same kind of calculations that we used in Section 2.1.3 (see alsoAppendix B) we can show that

W (n) = 3n log2 n− n+ 1.

This is a genuine improvement over the bubble and insertion sorts, but atthe cost of doubling the data space required.

2.1.6 Java code for merge sort 1

public class MergeSort1

int S[] = -1, 5, 1, 3, 2, 27, -3;

int U[]= new int[7];

private int i, j, k, p, q;

public MergeSort1()

System.out.println("The original array: ");

for (i = 0; i < 7; i++)

System.out.println(S[i]);

merge_sort1(0,6,S,U);


for (j = 0; j<7; j++)

System.out.println(S[j]);

public void merge1(int low, int high, int S[], int U[])

k = (high - low + 1)/2;

Quick sort 29

for(i = low; i <= high; i++)

U[i]=S[i];

j = low;

p = low;

i = low + k;

while (j <= (low + k - 1) && i <= high)

if (U[j] <= U[i])

S[p] = U[j];

j = j + 1;

else

S[p] = U[i];

i = i + 1;

p=p+1;

if (j <= low + k - 1)

for(q = p; q <= high; q++)

S[q] = U[j];

j = j + 1;

public void merge_sort1(int low, int high, int S[], int U[])

int k;

if (low < high)

k = (high - low + 1)/2;

merge_sort1(low, low+k-1, S, U);

merge_sort1(low+k, high, S, U);

merge1(low, high, S, U);


MergeSort1 m = new MergeSort1();

2.2 Quick sort

[N&N p.59]

Suppose that we split the array into two parts so that all the elements in onepart are smaller than any of the elements in the other part. Then when the twoparts were sorted they would not need to be merged; they would automaticallybe in the correct order. Thus we would save in the worst case 3n − 1 mergeoperations.

Quick sort 30

2.2.1 Partition algorithm

The idea is to pick an element then put all the smaller elements to the left andall the larger elements to the right. Initially we pick the first element.

int partition(int low, high, S[])

int pivot := S[low]

int k := low

for i from low + 1 to high do

if (S[i] < pivot)

S[k] := S[i]

S[i] := S[k+1]

k := k+1

S[k] := pivot

return k

In the worst case on a n element array this takes

W (n) = 1 + 3(n − 1) + 1 = 3n− 1

record copy and compare operations. We turn this into a sorting algorithmby partitioning and sorting the two parts. There is no need to merge afterwardsbecause the two parts are already relatively ordered.

quick_sort(int low, high, S[])

if (low < high)

int k := partition(low, high, S)

quick_sort(low, k-1, S)

quick_sort(k+1, high, S)

The details of quick sort will be discussed in detail in class.

2.2.2 Worst case analysis of quick sort

We shall show that in the worst case the number of operations required toperform a quick sort on an array of size n is

3n2

2+

n

2− 2.

If W (n) is the worst case number of record compare and copy operationsrequired for an n element array, then

W (n) = (3n− 1) +W (n− 1− k) +W (k).

Quick sort 31

The minimum number of partition steps that we will need is r where n = 2r,so r = log2 n. But the maximum possible number of partitions is n − 1, whenthe array is sorted. So in worst case we have

W (n) = (3n − 1) +W (n− 1) +W (0) = (3n − 1) +W (n− 1)

W (n) = (3n−1)+(3(n−1)−1)+W (n−2) = . . . = (3n−1)+. . .+(3×2−1)+W (1)

W (n) =3n(n + 1)

2− 3− (n− 1) =

3n2

2+

n

2− 2

(which is actually the same order as bubble sort!).

2.2.3 Java code for quick sort

public class QuickSort

int n=4;

int S[] = 4,3,2,1;

private int i, j;

public QuickSort()


for (i = 0; i < n; i++)


quick(0,n-1,S);


for (j = 0; j<n; j++)


public int partition(int low, int high, int S[])

int pivot = S[low];

int k = low;

for(i = low+1; i <= high; i++)

if (S[i] < pivot)

S[k] = S[i];

S[i] = S[k+1];

k = k+1;

S[k] = pivot;

return k;

Quick sort 32

public void quick(int low, int high, int S[])

if (low < high)

int k = partition(low, high, S);

quick(low, k-1, S);

quick(k+1, high, S);


QuickSort q = new QuickSort();

2.2.4 Choosing the pivot value

After an array has been ‘partitioned’ the pivot value ends up in its correctposition. If we could choose the ‘mid-value’ (median) in the array to be thepivot then the two partitions would have the same size. If, at each stage inthe sort, the pivot can be chosen to be the median value we will thus minimisethe number of partitions, and hence recursive calls, required during the sortingprocess. This in turn makes the sort more efficient.

In general it is too expensive to calculate the median value in the array,but if the array is already sorted then the median will be in the mid point ofthe array. Thus, if we have reason to believe that the array is at least partiallysorted then eventually some of the partitions will be fully sorted, so it may bemore efficient to use the mid point of the array as the pivot. This also has thecomfortable consequence that sorting an already sorted array has best ratherthan worst case time complexity.

The following algorithm for partnMid() partitions an array on the basis ofits mid point entry.

int partnMid(low, high, S)

int p := low + (high - low)/2

int pivot := S[p]

int k := p

for i from p-1 to low do

if (S[i] > pivot)

S[k] := S[i]

S[i] := S[k-1]

k := k-1

for i from p+1 to high do

if (S[i] < pivot)

S[k] := S[i]

S[i] := S[k+1]

Quick sort 33

k := k+1

S[k] := pivot

return k

For example, suppose we have a sorted array to which we then add anotherelement [a0 a1 a2 a3 a4 a5 b]. Suppose that b is smaller than all the keysalready in the array. We sort the new array using the version of quick sortwhich employs the mid-point partition algorithm. This works as follows.

Quick sort 34

S = [2 3 5 7 9 11 0]

quick sort (0,6,S)

------------------

low=0 high=6 p=3 pivot=S[3]= 7 1 op

first loop

k=3 i=2 2 3 5 7 9 11 0 1 op

i=1 2 3 5 7 9 11 0 1 op

i=0 2 3 5 7 9 11 0 1 op

second loop

k=3 i=4 2 3 5 7 9 11 0 1 op

i=5 2 3 5 7 9 11 0 1 op

i=6 2 3 5 0 9 11 9 3 ops

k=4 2 3 5 0 7 11 9 1 op

quick sort (0,3,S)

------------------

low=0 high=3 p=1 pivot=S[1]=3 1 op

first loop

k=1 i=0 2 3 5 0 7 11 9 1 op

second loop

i=2 2 3 5 0 7 11 9 1 op

i=3 2 0 5 5 7 11 9 3 ops

k=2 2 0 3 5 7 11 9 1 op

quick sort (0,1,S)

------------------


first loop

second loop

k=0 i=1 0 0 3 5 7 11 9 3 ops

k=1 0 2 3 5 7 11 9 1 op

quick sort (0,0,S)

------------------

quick sort (2,1,S)

------------------

quick sort (3,3,S)

------------------

quick sort (5,6,S)

------------------


first loop

second loop

k=5 i=6 0 2 3 5 7 9 9 3 ops

k=6 0 2 3 5 7 9 11 1 op

quick sort (5,5,S)

------------------

quick sort (7,6,S)

------------------

Heap sort 35

Thus the array has been sorted using 27 record copy and compare operations.

2.2.5 Worst case analysis

In worst case partnMid takes 3n−1 copy and compare operations. Thus, in thiscase if W (n) is the worst case number of copy and compare operations requiredto sort an array with n elements using this partition algorithm, we have

W (n) = 3n− 1 +W (k) +W (n− 1− k)

Thus, if we can produce equal partitions for the two subsorts, so that k =(n− 1)/2 then we have

WS(n) = 3n− 1 + 2WS(n− 1/2).

(WS stands for ‘worst sorted’ because it corresponds to the behaviour ofquick sort on an already sorted array.)

If sorting on each half again produces two equal partitions we get

WS(n) = 4WS(n− 3

4) + (3(n − 1)− 2) + (3n − 1).

If we can produce equally sized partitions at every level of the sort then,using the same type of calculations as in Section 2.1.3, it can be shown that

WS(n) = 3(n + 1) log(n+ 1)− 5n− 1

However, in the worst case, where one of partitions at each stage turns outto have size 1, then number of operations required is the same as for the originalquick sort. So the full worst case for quick sort with mid-point pivot is no betterthan that when the first element is used as the pivot.

2.3 Heap sort

Now we look at another type of sort which is n log n but which doesn’t need tomake a copy of the array. So far we have treated arrays in their ‘natural’ order,dealing with each element in turn. In this section we view arrays as binary treesand deal with the elements in an order which is natural for trees.

2.3.1 Heaps

We think of an array as a binary tree by taking the first entry to be the root, thenext two entries to be its children, then next four entries to be the grandchildren,and so on. So we think of

[a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10]

as the tree

Heap sort 36

a0

/ \

a1 a2

/ \ / \

a3 a4 a5 a6

/ \ / \

a7 a8 a9 a10

The top node is at index 0 in the array. The element at indexes 1 and 2 arethe next level nodes, the children of the node at index 0. The next 4 indexesare the nodes at the third level in the tree, and so on. It can be seen that thechildren of the node at index i are at indexes 2i+ 1 and 2i+ 2.

Consider the index (n/2)− 1 in the array. If n is odd then n/2 = (n− 1)/2and the children of (n/2) − 1 are at indexes n − 2, n − 1. If n is even there isonly one child at index n − 1. Any children of nodes to the right of (n/2) − 1would be outside the range of the array, so all the indexes in the second, righthand, half of the array must be leaf nodes.

An array is a heap if when it is written as a binary tree every parent node isgreater than or equal to each of its children.

For example,

33 [33 27 6 10 8 1 3 7 2 5 4]

/ \

27 6

/ \ / \

10 8 1 3

/ \ / \

7 2 5 4

is a heap, but

33 [33 27 2 10 8 3 1 7 6 5 4]

/ \

27 2

/ \ / \

10 8 3 1

/ \ / \

7 6 5 4

is not a heap.We use this model as this basis for heap sort. The sort has two stages. First

the array to be sorted is turned into a heap, then the heap is reordered so thatthe final array is in ascending order.

2.3.2 Turning arrays into heaps

In what follows we shall call the entries in the array keys.

Heap sort 37

We begin by describing the algorithm sift which takes an index p (for place)and an array with the property that the keys to the right of p are in heaps. Thealgorithm returns the array with the keys to the right of p− 1 in heaps.

sift(int p, size, S[])

int siftkey := S[p]

int parent := p

bool notfound := true

int largerch

while ((2*parent+1 <= (size-1)) && (notfound))

if (2*parent+1 <(size-1) && S[2*parent+1]<S[2*parent + 2])

largerch := 2*parent + 2

else

largerch := 2*parent+1

if (siftkey < S[largerch])

S[parent] := S[largerch]

parent := largerch

else

notfound := false

S[parent] := siftkey

The input size is the number of elements in the input array and p is thecurrent index to be put in heap order. This element is compared to its twochildren. If it is smaller than either of them then it is swapped with the largerof its children. This may have destroyed the heap order of the tree beneaththe new position of the parent. So the process must be applied again until theparent is in the correct position.

For example, suppose S is the following array

[2 5 11 8 10 1 4 3 7 9 6] 2

/ \

5 11

/ \ /\

8 10 1 4

/\ /\

3 7 9 6

which has the property that the elements to the right of index 1 are in heaps.Calling sift(1,11,S) has the following effect

Heap sort 38

siftkey = 5 so S[1] := S[4] 2

parent = 1 parent := 4 / \

S[3] < S[4] 10 11

largerch = 4 / \ /\

8 10 1 4

/\ /\

3 7 9 6

siftkey = 5 so S[4] := S[9] 2

parent = 4 parent := 9 / \

S[9] > S[10] 10 11

largerch = 9 / \ /\

8 9 1 4

/\ /\

3 7 9 6

siftkey = 5 so S[9] := siftkey 2

parent = 9 / \

2*parent+1 > 10 10 11

largerch = 9 / \ /\

8 9 1 4

/\ /\

3 7 5 6

We end up with the array [2 10 11 8 9 1 4 3 7 5 6] which has the prop-erty that entries to the right of index 0 form a forest of heaps.

We noted in the previous section that all the nodes at indexes to the rightof (n/2) − 1 are leaf nodes, so the right hand half of an array is automaticallya forest of heaps. We use an algorithm called makeHeap which calls sift on theindexes from (n/2) − 1 to 0 in turn, at the end of each call the array will haveone extra index in heap order. At the end of makeHeap the array will be aheap.

makeHeap(int size, S[])

for i from ((size/2)-1) to 0 do

sift(i,size,S)

For example, calling sift(0,11,S) where S is the array[2 10 11 8 9 1 4 3 7 5 6] from the previous example, will turn S into aheap

[11 10 4 8 9 1 2 3 7 5 6] 11

/ \

10 4

/ \ / \

8 9 1 2

/\ /\

3 7 5 6

Heap sort 39

2.3.3 Sorting heaps

To turn this heap based approach into a sorting algorithm we need to give analgorithm which takes a heap and returns the same elements sorted in increasingorder.

The basis of such an algorithm is the observation that the largest elementof a heap is always at index 0. We swap the first and last element of the heap,so that the largest element is now in its correct position.

Using the example from the previous section we have

[11 10 4 8 9 1 2 3 7 5 6] becomes [6 10 4 8 9 1 2 3 7 5 11]

We then call sift(0,n-1,S) on the first n− 1 elements in the array, afterwhich the second largest element of the array will be at index 0.

[6 10 4 8 9 1 2 3 7 5 11] becomes [10 9 4 8 6 1 2 3 7 5 11]

then we can put the key at index 0 in its correct place

[10 9 4 8 6 1 2 3 7 5 11] becomes [5 9 4 8 6 1 2 3 7 10 11].

We carry on calling sift and then putting the key at index 0 into its correctplace until the array is sorted.

The following is the algorithm for heap sort.

heap(int size, S[])

makeHeap(size, S)

for i from size-1 to 1 do

int hold := S[i]

S[i] := S[0]

S[0] := hold

sift(0,i,S)

2.3.4 Complexity analysis of heap sort

Suppose that we have an array of size n, and that n = 2r − 1 for some integerr. When p = 0, i.e. at depth 1, the while loop in sift(0,n,S) is executed forparent=0, parent=1 or 2, parent=3 or 4 or 5 or 6, etc. In other words thewhile loop is executed for one value of parent at each level in the tree exceptfor the bottom level.

level

parent=0 1

/ \

parent=1 parent=2 2

/ \ / \

parent=3 parent=4 parent=5 parent=6 3

. . .

. . .

. . .

parent=2^(r-1)-1 ............. parent=2^r-2 r

Heap sort 40

In worst case the while loop in sift(p,n,S) takes 3 record copy and com-pare operations, and, when p = 0, is executed r − 1 times. Thus in worstcase

sift(0,n,S) takes 2+3(r-1) record operations

where n = 2r − 1.When p is at level 2, i.e. when p = 1 or p = 2, the while loop can be

executed r − 2 times. So in worst case

sift(1,n,S), sift(2,n,S) each take 2+3(r-2) record operations.

In general for the 2l−1 nodes p = 2l−1 − 1, 2l−1, 2l−1 +1, . . . , 2l − 2 at level l

sift(p,n,S) takes 2+3(r-l) operations.

For p = (n/2)−1 = 2r−1−2, p is the last node at level r−1, so in makeHeap,sift(p,n,S) is executed for all values of p in levels 1, 2, 3, ..., r− 1. Thus theworst case number of operations in makeHeap(n,S) is

M(n) = (2+3(r−1))+2.(2+3(r−2))+4.(2+3(r−3))+. . .+2r−2(2+3(r−(r−1)))

where n = 2r − 1.We shall not give the calculation here, but it is possible to re-arrange the

expression for M(n) and show that

M(n) = 4n− 3 log(n+ 1)− 1.

Finally, heap(n,S) executes makeHeap(n,S) then performs 3 record copiesand executes sift(0,j,S) for every value of j from n− 1 down to 1. We haveseen above that in the worst case, the while loop in sift(0,j,S) executes for anode in all but the last level of the tree. If j is the size of the tree then its depthis s where 2s−1 ≤ j ≤ 2s − 1 and the worst case number of record operations insift(0,j,S) is 2 + 3(s − 1).

Now, j starts at n− 1 = 2r − 2 and for 2r − 2 ≥ j ≥ 2r−1 sift(0,j,S) canexecute 2 + 3(r − 1) operations, giving

(5 + 3(r − 1))(2r−1 − 1)

record operations in total. For 2r−1 − 1 ≥ j ≥ 2r−2 sift(0,j,S) canexecute 2 + 3(r − 2) operations, giving

(5 + 3(r − 2))2r−2

record operations.Carrying on in this way we get that heap takes

W (n) = M(n)+(5+3(r−1))(2r−1−1)+(5+3(r−2))2r−2+. . .+(5+3(1))2+(5+3(0))

operations.This can then be re-arranged to get

W (n) = 3(n− 1) log(n+ 1) + 3n− 3

Heap sort 41

2.3.5 Java code for heap sort

public class HeapSort

final int size = 12;

int S[] = 6,10,4,8,9,1,7,3,2,5,11,5;

private int i, j;

public HeapSort()


for (i = 0; i < size; i++)


heap(size,S);


for (j = 0; j<size; j++)


public void sift(int p, int S[], int n)

int siftkey = S[p];

int parent = p;

int notfound = 1;

int largerch;

while (2*parent+1 <= (n-1) && notfound > 0)

if (2*parent+1<(n-1) && S[2*parent+1]<S[2*parent + 2])

largerch = 2*parent + 2;

else

largerch = 2*parent+1;

if (siftkey < S[largerch])

S[parent] = S[largerch];

parent = largerch;

else

notfound = 0;

S[parent] = siftkey;

public void makeHeap(int size, int S[])

for (i = ((size/2)-1); i>=0; i--)

sift(i, S, size);

public void heap(int size, int S[])

makeHeap(size, S);

Heap sort 42

for (i=size-1; i>=1; i--)

int hold = S[i];

S[i] = S[0];

S[0] = hold;

sift(0, S, i);


HeapSort h = new HeapSort();

Chapter 3

So what do we mean by complexity?

So far we have analysed algorithms by counting the compare and copy opera-tions which can arise in the worst case. Rather than counting the exact numberof operations, we can get a reasonable idea of the performance of an algorithmby approximating the number of operations. For example, we may say that in-sertion sort takes approximately n2 operations on an input array of size n. Wewill now define formally what is meant by ‘approximately’ using the so-called‘big-O’ notation. Also, rather than looking just at how an algorithm behavesin worst case, it is often useful to know what the expected performance is in anaverage (and hopefully typical) case.

3.1 Big-O notation - O(f(n))

We have seen in the first chapter that the time taken by an algorithm whichperforms n operations on input of size n increases much more slowly than onewhich performs n2 operations. While n is small, 20n behaves more like n2 thann, but once n gets large (over 40) 20n is closer to n than n2.

The set of all functions which are eventually dominated by a constant mul-tiple of f is called ‘big-O of f ’ and written O(f(n)). Because our functions arecounting operations they never have negative values, so we are only interestedin functions f with the property that if n ≥ 0 then f(n) ≥ 0. Thus we can usethe following definition

O(f(n)) = g | for some c,N, g(i) ≤ cf(i), for all i ≥ N.For i ≥ 1 we have i ≤ i2 so n ∈ O(n2).

For i ≥ 1 we have 20i ≤ 20i2 so 20n ∈ O(n2).

For all i we have 379i2 ≤ 379i2 so 379n2 ∈ O(n2).

For i ≥ 7 we have 3i2 + 5i+ 8 ≤ 4i2 and so 3n2 + 5n + 8 ∈ O(n2).

We have i ≤ 2i and so log2 i ≤ log2 2i = i log2 2 = i. Thus n log2 n ∈ O(n2).

n, 20n, 3n2 + 5n+ 8, n log2 n ⊆ O(n2)

If g ∈ O(f(n)) then for some N, c we have that for i ≥ N , g(i) ≤ c.f(n). Sofor any m we have m.g(i) ≤ m.c.f(i) and hence m.g(n) ∈ O(f(n)).

Omega notation - Ω(f(n)) 44

If g, h ∈ O(f(n)) then for some M,d we have that for i ≥ M , h(i) ≤ d.f(i).So for i ≥ max(N,M), g(i)+h(i) ≤ (c+d)f(i) and hence g(n)+h(n) ∈ O(f(n)).

n+ 8, 79n − 4, 23n2 − 6n+ 11, 4n log2(n+ 3) ⊆ O(n2)

If W (n) is the number of operations carried out in worst case using analgorithm and if W (n) ∈ O(f(n)) then we say the algorithm is order at mostf(n).

Since3n2

2+

n

2− 2 ≤ 2n2, for n ≥ 1,

we have that the version of quick sort described above is of order at most n2.

3.2 Omega notation - Ω(f(n))

It is also useful to be able to talk about the set of functions which eventuallybecome bigger than some function f .

The set of all functions which eventually dominate a positive constant mul-tiple of f is called ‘omega of f ’ and written Ω(f(n)).

Ω(f(n)) = g | for some c > 0, N, g(i) ≥ cf(i), for all i ≥ N.

We have 5i2 ≥ i2 so 5n2 ∈ Ω(n2).

For i ≥ 1 we have 5i2 ≥ i and 5i3 ≥ i2 so 5n2 ∈ Ω(n) and 5n3 ∈ Ω(n2).

We also have n2 ≥ 15 .5n

2, so n2 ∈ Ω(5n2).

Since i2 − 10i ≥ 12 i

2 for i ≥ 20 we have n2 − 10n ∈ Ω(n2).

5n2, 5n3, n2 − 10n, n!, 23n2 − 6n+ 11 ⊆ Ω(n2)

Since i 6≥ i2 for any i ≥ 1 we have that n does not dominate n2.

Similarly, log2 i 6≥ i for any i ≥ 4, so log2 n does not dominate n and hencen log2 n does not dominate n2.

If W (n) is the number of operations carried out in worst case using analgorithm and if W (n) ∈ Ω(f(n)) then we say the algorithm is order at leastf(n).

Since3n2

2+

n

2− 2 ≥ n2, for n ≥ 2,

we have that the version of quick sort described above is of order at least n2.

3.3 Theta notation - Θ(f(n))

We have seen that some functions both dominate and are dominated by n2.Such functions essentially behave like n2; they don’t grow very much faster orvery much slower than n2 as n increases.

Small-o notation - o(f(n)) 45

5n2, 23n2 − 6n + 11 and3n2

2+

n

2− 2 all dominate and are dominated by

n2. They are all also in both O(n2) and Ω(n2).

The functions which both dominate and are dominated by f(n) are exactlythe functions which are in both O(f(n)) and Ω(f(n)).

We say that a function g(n) has order f(n) if it dominates and is dominatedby f(n), and we call the set of functions of order f(n), Θ(f(n)). So

Θ(f(n)) = O(f(n)) ∩ Ω(f(n)).

5n2, 23n2 − 6n+ 11,3n2

2+

n

2− 2 ⊆ Θ(n2).

So we now have a formal definition of what we mean when we say that thequick sort algorithm given in the previous section is, in worst case, of order n2.

Similarly, since 3n log n−n+1 ∈ Θ(n log n), we have that the second versionof the merge sort algorithm above is of order n log n.

Of course, it is also true that

3n log n− n+ 1 ∈ Θ(3n log n− n+ 1)

and that

3n log n− n+ 1 ∈ Θ(3n log n)

so that merge sort is also of order 3n log n−n+1 and of order 3n log n. However,these are considered to be the same order.

For a composite polynomial function the convention is to describe its orderto be its highest term, and to ignore any co-efficients. So, for example, weusually say that

6n+ 28n3 − 11 +28n5

n− 1

has order n4.

We have that g(n) ∈ Θ(f(n)) if and only if there exist c, d > 0 and aninteger N such that

c.f(i) ≤ g(i) ≤ d.f(i), for all i ≥ N.

3.4 Small-o notation - o(f(n))

The asymptotic bound provided by the big-O notation may or may not beasymptotically tight. The bound 2n2 ∈ O(n2) is asymptotically tight, but thebound 2n ∈ O(n2) is not. We use o-notation to denote an upper bound that isnot asymptotically tight. We formally define o(f(n)) as the set

o(f(n)) = g | for any c > 0, ∃N, g(i) < cf(i), for all i ≥ N

For example 2n ∈ o(n2) but 2n2 6∈ o(n2).

A hierarchy of orders 46

The definitions of big-O and small-o are similar. The main difference isthat if g(n) ∈ O(f(n)), the bound g(n) ≤ cf(n) holds for some constant c,but in g(n) ∈ o(f(n)), the bound g(n) < cf(n) holds for all constants c >0. Intuitively, in the small-o notation the function g(n) becomes insignificantrelative to f(n) as n approaches infinitely; that is,

limn→∞

g(n)f(n) = 0, (when g ∈ o(f)).

Some authors use this limit as a definition of the small-o notation.

Examples: 4n3 + 2n2 ∈ o(n3log(n)), 4n4 + 2n2 6∈ o(n4) and 1n ∈ o(1).

3.5 A hierarchy of orders

We have seen that 5n2 ∈ Θ(n2), so 5n2 and n2 are of the same order.

We have also seen that n log n ∈ O(n2) but n2 6∈ O(n log n), so n log n andn2 are not of the same order. In fact, n log n is of order strictly less than n2.

If g(n) ∈ O(f(n)) and f(n) ∈ O(h(n)) then there exist c, d,N,M such thatg(i) ≤ c.f(i) for i ≥ N and f(i) ≤ d.h(i) for i ≤ M . So g(i) ≤ cd.h(i) fori ≥ max(N,M) and hence g(n) ∈ O(h(n)).

Thus we have that if f(n) ∈ O(h(n)) then O(f(n)) ⊆ O(h(n)).

We say that f(n) has order type less than or equal to h(n) if O(f(n)) ⊆O(h(n)), and order type strictly less than h(n) if O(f(n)) ⊂ O(h(n)).

An algorithm A1 is an improvement over an equivalent algorithm A2 if theorder type of the worst case number of operations required for A1 is strictlyless than the order type for A2.

3.6 Worst case analysis

In most cases the number of operations performed by an algorithm dependsnot only on the size of the input but also on the particular contents of thatinput. This is not always the case. For example, an algorithm which countedthe number of equal entries in two arrays of size n would need n compareoperations regardless of the contents of the arrays. In every case this algorithmhas order n.

For algorithms which use different numbers of operations on different inputsof the same size, we have concentrated on calculating the order in the worstcase. To be sure that we have an exact answer we need to do two steps: first weanalyse the algorithm to find an upper bound on the number of operations thatcan be carried out, then we find an example which actually takes this numberof operations.

For example, in the merge sort 0 algorithm we showed that there could beat most n(n − 1) + n log n operations but we didn’t show that there actuallywas an array which required this number of operations.

Worst case analysis 47

Consider [2, 1]. This takes 4 = 2(2−1)+2×log 2 operations so it is worst casefor n = 2. Now let n = 2r and suppose that merge sort 0 on [2r

′

, 2r′−1, . . . , 2, 1]

uses

S(2r′

) = 2r′

(2r′ − 1) + 2r

′

log 2r′

operations for all r′ < r. This implies that the algorithm run on [2r, 2r−1, . . . , 2, 1]uses

S(n

2) + S(

n

2) +M(n)

operations, where M(n) is the number of merge0 operations. Calculatingthis number by stepping through the algorithm we get that

M(n) = n2/2 + n

So, adding up the formulae, we eventually see that [n, n− 1, . . . , 2, 1] takes

S(n) = 2×(n

2(n

2−1)+

n

2log

n

2)+

n2

2+n =

n2

2−n+n((log(n))−1)+

n2

2+n = n2−n+nlog(n)

operations. So this is the worst case value for merge sort 0.

If we look at merge sort1 on the array [2, 1] we find that this takes 5 =3× 2 log 2− 2 + 1 operations. In general, we take the integers 2n, 2n− 2, . . . , 2and order them into an array [a0, . . . , an−1] which takes worst case number ofoperations to sort. Then we take the integers 2n − 1, 2n − 3, . . . , 1 and orderthem into an array [b0, . . . , bn−1] which takes worst case number of operationsto sort. Then we form a new array [a0, . . . , an−1, b0, . . . , bn−1], and assume byinduction that merge sort1 takes

2(3n log n− n+ 1)

operations to sort the two halves of this array into the form

[2, 4, . . . , 2n, 1, 3, . . . , 2n− 1]

and then 3(2n)−1 operations to merge this into the final sorted array. Thisgives a total of

W (2n) = 6n log n− 2n+ 2 + 6n− 1 = 3(2n) log 2n − (2n) + 1

so again we get a example matching the worst case.

This doesn’t actually tell what the worst case array looks like but

[4, 2, 3, 1], [8, 4, 6, 2, 7, 3, 5, 1]

are worst case arrays for n = 4 and n = 8 respectively. The details of theabove computations will be discussed in class.

Best case analysis 48

3.7 Best case analysis

Looking at the number of operations required in an algorithm in the worst casedoes not give all the information that we may need. For example, it may be thatthe worst case only occurs very rarely. Also, how can we distinguish betweentwo algorithms which perform in the same way on their worst cases?

One thing that we can do is to also look at the best possible performanceof the algorithm.

3.7.1 Best case analysis of bubble sort

The bubble sort algorithm that we looked at in the first chapter is


for j from 0 to n-1-i do

if ( S[j] > S[j+1] )

x := S[j]

S[j] := S[j+1]

S[j+1] := x

No matter what the actual values in the input array are, the outer and innerfor loops are executed the full number of times. Also, in all cases the inner loopperforms the test, so when this algorithm is executed there are at least

(n− 1) + (n− 2) + . . .+ 1 =n(n− 1)

2

copy and compare operations carried out. This gives a lower bound on the bestcase complexity of bubble sort.

It is not hard to see that an already sorted array takes this number ofoperations to sort (for example we can use induction). So it is in fact the bestcase.

We often use B(n) to denote the best case number of operations requiredfor input of size n. For bubble sort we have

B(n) =n(n− 1)

2.

3.7.2 Best case analysis of merge sort1

In all cases the while loop in merge1 is executed at least (n/2) times, eachtime involving 2 compare and copy operations. Thus, for n ≥ 2, merge1 alwaysexecutes at least

n+ 2(n

2) = 2n

operations.Thus if B(m) denotes the best case number of operations carried out by

merge sort1 on an array of size m we have

B(n) ≥ 2B(n/2) + 2n.

Best case analysis 49

Using the same techniques as for calculating the worst case, we get that

B(n) ≥ 2n log n.

It can be seen by induction that if S is an already sorted array of size n thenmerge sort1 executes

2n log n

copy and compare operations. So the lower bound can be achieved and, formerge sort1

B(n) = 2n log n.

3.7.3 Best case analysis of quick sort

We analyse the quick sort algorithm described in Section 2.2.1 above.The partition aspect of the quick sort algorithm is most efficient on al-

ready sorted arrays. But in these cases it partitions the array into two unequalparts, requiring, over the full execution of the sort, more recursive calls to thesort routine. The minimum number of subcalls to quick sort occurs when thepartition always produces two equally size parts, i.e. returns k = (n− 1)/2.

To ensure that k = (n− 1)/2 the ‘if’ statement inside partition has to beentered (n− 1)/2 times. This, in this case, partition executes

2 +3(n − 1)

2+

n− 1

2= 2 + 2(n− 1) = 2n

copy and compare operations.Then a lower bound on the best case performance of quick sort is given

by

B(n) ≥ 2n+ 2B(n− 1

2)

We can calculate this value in the same way as for merge sort1 and showthat this has order n log n.

Actually running the quick sort algorithm on the array [2, 1, 3] we find thatit takes

6 = 2(n+ 1) log(n + 1)− 3n − 1

operations, where n = 3. So

B(n) ≤ 2(n + 1) log(n+ 1)− 3n− 1, when n = 3

It is possible, using techniques similar to those in Chapter 2, to show, byinduction, that

B(n) ≤ 2(n + 1) log(n+ 1)− 3n− 1

and so in best case quick sort is no worse than n log n.Although we shall not show it here, it turns out that this lower bound is in

fact the best case and

B(n) = 2(n+ 1) log(n+ 1)− 3n − 1.

Average case analysis 50

3.8 Average case analysis

3.8.1 Expected and typical values

Suppose that we have an algorithm with two classes, C1 and C7, of inputs ofsize n. Inputs from C1 cause the algorithm to execute n operations and thenterminate and inputs from C7 cause the algorithm to execute 7n operationsand then terminate.

Suppose also that we have a six sided fair die with 1 on five of its sides and7 on the other side.

We repeatedly run the algorithm using the die to determine which class totake the next input from. We select an input from C1 if we throw a 1 and fromC7 if we throw a 7. Since the die is fair we expect to throw a 1 about five timesas often as we throw a 7.

Thus we have an experiment, or a trial, whose possible outcomes are thatn operations are executed and that 7n operations are executed. Recall fromCS110 that the probability of the die landing with a given side up is 1/6, andso the probability of throwing a 7 is 1/6 and the probability of throwing a 1 is5/6.

If a typical outcome of an experiment is one which is most likely to happen,then the typical number of operations executed by the algorithm is n. Unfortu-nately it is not always easy to determine the typical outcome of an experiment,and indeed there may not be one. For example, if the die had three 1’s andthree 7’s then both outcomes would be equally likely.

Rather than talking about a typical case we talk about the expected oraverage case.

If we repeat the algorithm running experiment often enough we expect thatin roughly 5/6ths of the time n operations are executed and in one in six cases7n operations are executed. Thus, if we run the algorithm 600 times we expectabout 1200n operations to be executed.

(Remember, this is only an expectation not an estimate. It is possible thateither 600n or 4200n operations will be executed.)

If we have an experiment whose possible outcomes (sample space) aren1, n2, . . . , nk, and if the probability of outcome ni is pi then the expected oraverage value of the experiment is

n1p1 + n2p2 + . . . nmpm.

So in the above example, the average number of operations executed is

n5

6+ 7n

1

6= 2n.

Note, no actual execution of the algorithm will use 2n operations, this isjust an expected average over many executions.

When we talk about the average case complexity of an algorithm we shallmean the expected value, not a typical value. However, expected and typicalvalues are similar if the behaviour of the algorithm is reasonably uniform.


For example, if we have three classes of input, C1, C4, and C7, which taken, 4n, and 7n operations respectively, and a die with 1, 4, 4, 4, 4, 7 on its sides,then the expected number of operations is given by

n1

6+ 4n

4

6+ 7n

1

6= 4n

which we would also say is the typical, or most likely value.

3.8.2 Average case analysis of linear search

Suppose that we have an unordered array of n elements and suppose that wehave an algorithm which starts at the first entry and searches the array for aparticular given value.

int linear_search(int value, int S[])

int i := 0

boolean Found := false

i=0

while (i<=n-1) && (!Found) do

if (S[i] == value)

notFound := true

else

i++

if (Found)

return i

else

return -1

If the value being searched for is at index k then the algorithm must performk + 1 record comparisons to find it. If the value is not in the array then thealgorithm will perform n record comparisons.

If the array is unordered we assume that the element we are looking for isequally likely to be in any position.

If the element is in the array then the probability that it is at index k is 1/n.Thus the expected number of record comparisons performed by linear search

in this case is

1

n+ 2

1

n+ . . . + n

1

n=

n+ 1

2.

If the element is not in the array then the number of record comparisonsexecuted is n.

Thus if we have that the probability that the given element is in the arrayis p then the expected (average) number of record comparisons carried out bylinear search is

A(n) =p(n+ 1)

2+ (1− p)n = n(1− p

2) +

p

2.


3.8.3 Average case analysis of merge sort

We shall not calculate the average case number of compare and copy operationscarried out by merge sort1. But recall that in best case

B(n) = 2n log n− 2(n − 1)

so merge sort1 is of order n log n in the best case.In the worst case

W (n) = 3n log n− n+ 1.

so merge sort 1 is of order n log n in the worst case.If A(n) is the average case number of copy and compare operations carried

out by merge sort1 we have

2n log n− 2(n− 1) = B(n) ≤ A(n) ≤ W (n) = 3n log n− n+ 1.

So A(n) ∈ Θ(n log n) and hence merge sort1 is of order n log n in the averagecase, and in fact in every case.

3.8.4 Average case analysis of quick sort

When we run the partition algorithm on an array of size n it will return anindex k which is the correct position for the pivot value, the value originallyat index 0. This pivot position is determined by the number of elements in thearray that are smaller than the pivot value.

If there are k elements in S which are smaller than the pivot value, S[0],then partition performs

2 + 3k + (n− 1− k) = n+ 2k + 1

copy and compare operations.There can be 0, or 1, or ,... , or n−1 elements in the array that are less than

the pivot value. If the array is unordered then the probability of each of thesenumbers is equally likely, so, for each k between 0 and n − 1, the probabilitythat there are k elements less than the pivot value is 1

n .Thus if A(n) is the average number of copy and compare operations carried

out by quick sort then we have

A(n) =n−1∑

k=0

1

n(2k + n+ 1 +A(k) +A(n− 1− k))

This can be added up to give

A(n) = 2n+2

n

n−1∑

i=0

A(i)

Substituting n− 1 for n we have

A(n− 1) = 2(n − 1) +2

n− 1

n−2∑

i=0

A(i)


and sonA(n)− (n − 1)A(n − 1) = 4n− 2 + 2A(n − 1)

nA(n) = 4n− 2 + (n+ 1)A(n − 1)

A(n) = 4− 2

n+

n+ 1

nA(n− 1)

A(n) = 4− 2

n+

n+ 1

n

(

4− 2

n− 1

)

+n+ 1

n− 1

(

4− 2

n− 2

)

+

. . . +n+ 1

3

(

4− 2

2

)

+n+ 1

2A(1)

A(n) = 4(n+ 1)

(

1

n+ 1+

1

n+ ...+

1

3

)

−2(n+ 1)

(

1

(n+ 1)n+

1

n(n− 1)+ ...+

1

3× 2

)

+n+ 1

2

Theorem 31

2+ . . . +

1

n< loge n < 1 +

1

2+ . . .+

1

n− 1

Using this result and the fact that Θ(loge n) = Θ(log2 n) it can be shown that

A(n) ∈ Θ(n log2 n).

Thus quick sort is of order n log n in average case.

Chapter 4

Searching algorithms

4.1 Overview

We shall assume that data items contain several data fields, one of which willbe nominated as the key. In different contexts the same data may have differ-ent keys. For example, a telephone entry contains a name, an address and atelephone number. It is possible to use either the name or the number as thekey.

The data is searched by searching for the specified key and the type of searchwhich can be deployed depends on whether the keys are sorted. In a telephonebook entries are normally sorted by name. Thus if the name is taken as thekey, so we are searching for an entry by name, then we can use an efficientbinary search. However, if we attempt to search for any entry by number, sothat number is being used as the key, a linear search is the only alternative. Inthis section, with the help of example code, we shall look at ways of structuringdata so that efficient search algorithms can be used.

This chapter contains only outline sketch notes to provide the basic motiva-tion for this part of the course. You will need to make notes of your own fromlectures and should also look in the recommended textbooks.

4.2 Doing better than linear search

[TA86, pp. 431 ff] [NN96, pp. 4 ff]

If the data to be sorted is not ordered in any way then a linear search, in whicheach element in the data set is compared with the target, is the best we cando. A linear search has order Θ(n), where n is the size of the data set to besearched.

If the data is ordered then we may be able to search more efficiently, if thedata is also structured in an appropriate way. We shall see below that if thedata is ordered and structured appropriately then there is a search algorithm,binary search, which is of order Θ(log n). But first we consider whether weshould try to do better than a linear search.

Doing better than linear search 55

4.2.1 Ordering the data?

Linear searches can be used on any data set, and they are of order Θ(n). Spe-cialised searches require the data to be sorted and even a good sorting algorithmhas order Θ(n log n). The best possible search algorithms have order Θ(log n)so if the data is only to be searched once then it is not worth sorting it first.However, the assumption is that the data will be searched a large number, Nsay, of times and one Θ(n log n) operation followed by N ×Θ(log n) search op-erations will be more efficient that than N ×Θ(n) search operations when N islarge.

4.2.2 Binary search[TA86, pp. 441 ff] [NN96, pp. 10–11 (iterative version)] [NN96, pp. 47–52 (recursive

version)]

This search assumes that the input is held in an array and sorted by key, withthe lowest key first. The key to be found, target say, is compared with the keyat the mid-point on the list. If they are the same then the entry has been found.If target is less than the mid key then the search is repeated on the lower halfof the data, if target is greater the mid key then the search is repeated on theupper half of the data. On a sorted array binary search has order Θ(log n).

Given the above description, it is natural to implement binary search as arecursive algorithm, as it was done in chapter 1. However, with any recursivealgorithm there is a least a theoretical possibility of running out of stack space.On small machines and large sets of data this theoretical possibility can becomea reality. It is possible to implement a binary search algorithm which usesiteration rather than recursion. You should attempt to re-write the algorithmso that the recursion is replace with iteration, if you get stuck you can find outhow to do it in [NN96].

4.2.3 Interpolation search[TA86, pp. 443 ff] [NN96, pp. 318–320]

Interpolation search is a modification of binary search which attempts to exploitknowledge of the distribution of the data set. If we assume that the keys areevenly distributed then instead of starting our search for the target in the mid-point of the data set we begin at a point close to where we expect the target tobe.

For example, suppose we believe that our phone book contains approxi-mately the same number of names beginning with each letter of the alphabet.If the target key for which we are searching begins with B then instead of com-paring first with the key in the middle of the set (which is likely to begin withJ or K) we could begin by comparing with a key which is 1/13th of the wayinto the data set. (If you think about this it is quite likely that this is whatyou actually do when you use a telephone directory.)

An algorithm for interpolation search in which the start key depends on thevalue of the target key is given in [NN96].

Dealing with dynamic data structures 56

4.3 Dealing with dynamic data structures

The discussion in the previous section applies realistically to telephone directo-ries which are printed once and then used repeatedly. However, data structuresused in programs often change as the program is executed. Thus we have tohave insertion operations which allow entries to be added to the data set atrun time without upsetting the order of the data. The order of such insertionoperations has to be included in the overall order of the searching algorithm.

The binary search algorithm as described above requires the data to bestored in a sorted array. If we wish to add or delete an entry from this arraythen we first of all have to find the insertion/deletion point and then moveall the remaining entries up the array, for an insertion, or down the array fora deletion. This operation is ultimately linear, so, if the expected number ofinsertions and deletions is comparable with the expected number of searchesthis is worse than just using linear search.

An alternative is to store the data in a linked list. In this case insertion anddeletion are as efficient as searching because, once the insertion/deletion pointhas been found, the cost of inserting or deleting a link is constant. However, abinary search is not as efficient as linear search on a linked list because the listhas to be traversed to find its mid-point!

In the next two sections we shall consider methods of structuring the dataso that it can be dynamically modified but more efficient searching can still bebe done. In the first case, hash coding, we ultimately use a linear search ona linked list but we structure the data so that the length of the list is small.Hash coding was introduced in the course CS1211, so the material presented inthis blue book shouldn’t be completely new to you. In the second case, binarysearch trees, we structure the linked list into a tree to allow what is essentiallya binary search to be carried out. Binary search trees were also introduced inCS1211, but will be explained in more detail here.

4.4 Hash coding[TA86, pp. 521 ff] [CLR90, pp. 219–243] [NN96, pp. 326–332] [Knu73, 506–549]

Hash coding is a method of structuring data which is aimed at increasing theefficiency of a search. Instead of a linked list of data we have several linked listsstored in an array (hash table). A hash function is used to determine whichlist a given data element is stored on, and only that list has to be searched forthat element.

Hash coded data is easier to maintain than a binary tree structure in thatinsertion and deletion of elements is reasonably straightforward but, in worstcase, the order of a search on a hash coded structure is the same as for linearsearch. In practice, however, hash coding is usually much better than linearsearch and it is the method that most compilers use to keep track of the variablesin a program. In this discussion we shall assume that our keys begin with letters,but it is clearly applicable to any situation.

We make one linked list for each letter of the alphabet, and insert all the

Hash coding 57

keys that begin with a particular letter on a separate list. (The elements of thearray which are the heads of each of these lists are often called buckets.)

delta delay drain dozy- - --

count

boing

adrian

bcount

angle

beta-

-

-

-

-

-

This might improve search times by a factor of 26, whilst keeping insertiontime small. The catch is that the keys may not be evenly divided by initial letter.In worst case all the keys may begin with the same letter (which is the casewhen the data is the list of all reserved identifier names in our compiler generatorrdp) and there is no improvement in efficiency. So rather than assigning keysto buckets on the basis of the first letter of the key, we use a hash function.

A hash function is simply a calculation on a key that yields a randomnumber. Hash functions are deterministic – that is they always yield the samenumber for a given key.

Perhaps the simplest hash function for a key is to add together the ASCIIvalues for all of the characters in the string, and then take the modulus of theresult with the number of sub-lists available, giving a hash value of n say. Ifwe are adding a new element we then insert it at the top of the list at index n,and if we are searching for an element then we perform a linear search on thelist at index n.

For example if we have a hash table of size 7 and a hash function whichtakes the ASCII value of each character and adds them up modulo seven, thenour hash table might look like follows.

0 → NULL

1 → bcb → NULL

2 → bcc → cic → NULL

3 → cid → NULL

4 → NULL

5 → aab → NULL

6 → NULL

In the above hash table we note that the ASCII value for ’b’=98, ’c’=99and ’i’=105, which implies that the hash function for ”cic” gives us the value

Binary search trees 58

’c’+’i’+’c’= 303 = 2 modulo 7. Therefore if we are searching for the string”cic” we only have to look at the linked list starting at two.

There are many other options for hash functions, some of which performconsiderably better than the above. But we will not discuss these in this course.

4.5 Binary search trees[TA86, pp. 448 ff] [CLR90, pp. 244–262] [NN96, pp. 321 ff] [Knu73, 422–450]

In the case of hash coded structures the ultimate order of the searching algo-rithm is still Θ(n), all that is done is that the constant of proportionality isreduced. In order to use a binary search on a dynamic data structure, andhence get worst case Θ(log n) search time, we structure the data as a binarytree. Recall that a binary tree is a rooted tree in which each node has at mosttwo children.

4.5.1 Traversing binary trees

The following are the three standard tree-traversal algorithms for binary trees:

⋄ preorder visit the root, then traverse the left subtree in preorder thentraverse the right subtree in preorder,

⋄ inorder traverse the left subtree in inorder, then visit the root, thentraverse the right subtree in inorder,

⋄ postorder traverse the left subtree in postorder, then traverse the rightsubtree in postorder then visit the root.

A

U

N

WU

B

A

U

B

N

~

C

FED

G H I

C D

HGFE

I J K L

/

U ^ ^ N

Preorder: ABDGCEHIF Preorder: ABCEIFJDGKHL

Inorder: DGBAHEICF Inorder: EICFJBGKDHLA

Postorder: GDBHIEFCA Postorder: IEJFCKGLHDBA

4.5.2 Structure of binary search trees

In order to use a binary search, the keys in the binary tree have to be ordered.


In a binary search tree, all the left-hand descendants of a node with searchkey k have keys that are less than or equal to k and all the right-hand descen-dants have keys that are greater than or equal to k. The inorder traversal ofsuch a binary tree yields the records in ascending key order.

The following a two binary search trees which both correspond to the or-dered array

30 47 86 95 115 130 138 159 166 184 206 212 219 224 237 258 296 307 314

30 86 130 159 206 258 307219

184

115 237

47 138 212 296

95 166 224 314

30

47

86

95

115

258

296

307

314

·

·

To search for a key in a binary tree the tree is traversed, starting at the root,and the target is simply compared with the current node key. If the target isequal to the current key then the search is complete, if it less than the currentnode then the left child is selected next, otherwise the right child is selected.

To delete a record with key k in the tree, we first find it, and then if it has twochildren we look for its inorder successor. By the definition of a binary searchtree, its successor will be the next element found using an inorder traversal. Thesuccessor element can not have a left subtree, since a left descendent would itselfbe an inorder successor of k. If the node containing k is a leaf (i.e. it has nochildren), it can be deleted immediately. If the node containing k has only onesubtree, then its child node can be moved up to take its place.

The process of inserting a node in the tree is similar.

4.5.3 Optimum search trees[TA86, pp. 459 ff]

The efficiency of search on a binary search tree depends on the structure of thetree. The worst case number of comparison operations that will be requiredis the length of the longest branch of the tree. This is called the depth of thetree. The two examples in the previous section are binary search trees for thesame data however, the maximum comparisons required is 5 for the first treeand 19 for the second tree. In fact we can see that in worst case the binarysearch tree will just be a linked list and we will have made no search efficiencyimprovement at all. We obviously want to construct the trees which allowfor the most efficient search, i.e. the ones with the lowest depth. In the casewhere the probabilities of different keys being required are different, an optimumsearch tree is one which minimises the expected number of comparisons for agiven set of keys and probabilities. We shall only consider the case where we


assume that the probability of searching for each value is equally likely, but thetechniques below can be extended to non-evenly distributed values by includingthe probability of each value in the balance calculations.

For keys whose values are all equally likely, an optimum binary search treehas all of its branches essentially the same length. Of course, unless there areexactly a power of 2 data items then it will not be possible for the branches tobe exactly the same length, but the branches could be within one of being thesame length. Constructing optimum trees is hard (costly), the fastest knownalgorithm to construct such a tree has order Θ(n2). But if we relax the require-ment slightly and allow the branches to be close to optimum length then thereare efficient algorithms for constructing and maintaining such trees.

4.5.4 Balanced trees[TA86, pp. 461 ff] [Knu73, pp. 451–470]

The balance of a node in a binary tree is defined to be the depth of its leftsubtree minus the depth of its right subtree. A balanced binary tree is a binarytree in which the balance of every node is either 0, 1 or −1.

3 14 31 49 7137

27

16 41

12 18 34 52

32 97

3 14

27

16

12 18

8

Not balanced Balanced

0

−1 0

1 0

2

4−1

1

0

0 0

0

0

1

0 0

29 60

−1

0

00

0

00

We need to insure that when a node is inserted into a balanced tree the resultingtree is still balanced. This will not necessarily be the case if a simple insertionalgorithm is used. We can add 39 in its ‘natural’ position the above balancedtree without it becoming unbalanced, but not 4 or 101.

3 14 31 49 7137

27

16 41

12 18 34 52

4 32 39 97

−1

2

1

−1

-1

0

0 −1

29 60

−2

-1

-1

101


Fortunately there are efficient algorithms for inserting new nodes in to a bal-anced tree without destroying the balance. These algorithms perform localrotations to insure that the tree is structured correctly.

4.5.5 Rotations in a binary search tree

There are two rotations that we can perform which change the balances of thenodes but have the property that the new tree is still a binary search tree, sothe nodes in a left subtree all have values which are less the parent’s value andthe nodes in a right subtree all have values which are greater than the parent’s.

N ?R

A

B C

D E F G

N

A

B

D E

RC

G

F

/

?R

A

C

F G

B

D

E

~

OriginalLeft Rotation Right rotation

R

~ ~

We shall assume that below A the tree is balanced, so all the balances are0, 1 or −1 and that A is unbalanced by 1, so it has a balance of 2 or −2. Weshall look at four cases where rotations can restore the balance of such a tree,these cases will be when A has balance 2 and B has balance 1 or −1, and whenA has balance −2 and C has balance 1 or −1.

Case 1: A has balance 2 and B has balance 1Since A has balance 2 the subtree from B has depth two greater than thesubtree from C, so we assume that the depth from C is n and the depth fromB is n + 2. Then the depth from A is n + 3. Since B has balance 1 the depthfrom D is one greater than the depth from E, and since B has depth n+2, thedepth from D must be n+ 1.

N ?R

A

B C

D E F G

?R

A

C

F G

B

D

E

~

balance A = 2, balance B = 1 balance A = 0, balance B = 0

n+ 3

n+ 2 n

n+ 1 n

n+ 2

n+ 1

n+ 1

n

n

/ R

/ R

If we perform a right rotation on the node A then the depths of the trees fromC, E and D remain unchanged, so the depth of the tree from A is now n + 1and the depth of the whole tree (from B) is now one less, n+ 2.

Case 2: A has balance 2 and B has balance −1Again, since A has balance 2, we can assume that the depth from C is n andthe depth from B is n+2. Since B has balance −1, the depth from E must ben + 1 and the depth from D must be n. Since E has depth n + 1, one of thechildren of E must have depth n, and since E has balance 0, 1 or −1 the otherchild must have depth n or n− 1.


N ?R

A

B C

D E F G

balance A = 2, balance B = −1 left rotation on B

n+ 3

n+ 2 n

n

n+ 1

RH I

n, n− 1 n, n− 1

E

RI

A

B

D H

?RC

F G

A

?RC

F G

~

E

B

D H I

j

?

n+ 2

nn, n− 1 n, n− 1

n+ 1

n

then right rotation on A

s =

R

= ~n+ 1

~

If we perform a left rotation on the node B and then a right rotation on thenode A, then the depths of the trees from C, D, H and I remain unchanged, sothe depth of the tree from B is now n+ 1, the depth from A is now n+ 1 andthe depth of the whole tree (from E) is again one less, n + 2. The balances ofthe nodes C, D, F , G, H and I remain unchanged, the balance of E becomes 0,the balance of B becomes 0 or 1 depending on the depth of H, and the balanceof A becomes 0 or 1, depending on the depth of I.

Case 3: A has balance−2 and C has balance −1Since A has balance −2 we can assume that the depth from C is n+2 and thedepth from B is n. Since C has balance −1 and depth n+2, the depth from Fmust be n and the depth from G must be n+ 1.

N ?R

A

B C

D E F G

balance A = −2, balance C = −1 left rotation on A

n+ 3

nn+ 2

n n+ 1

/ R

N

A

B

D E

n+ 1

n /

RC

G

n+ 2

n+ 1

F

n

+

R

If we perform a left rotation on the node A then the depths of the trees fromB, F and G remain unchanged, so the depth of the tree from A is now n + 1and the depth of the whole tree (from C) is now one less, n+ 2.

Case 4: A has balance −2 and C has balance 1Since A has balance −2, we can assume that the depth from C is n+2 and thedepth from B is n. Since C has balance 1, the depth from F must be n+1 andthe depth from G must be n. Since F has depth n + 1, one of the children ofF must have depth n, and since F has balance 0, 1 or −1 the other child musthave depth n or n− 1.

N ?R

A

B C

D E F G

balance A = −2, balance C = −1 right rotation on C

n+ 3

nn+ 2

n+ 1n

/ R

?RF G

n, n− 1 n, n− 1

N

A

B

D E

/F

?F

RC

G

R

G

w

/ N

A

B

D E

/F

F

RC

G

R

G/

+

^

n+ 2

n+ 1

n

n, n− 1 n, n− 1 n

n+ 1

then left rotation on A


If we perform a right rotation on the node C and then a left rotation on thenode A, then the depths of the trees from B, G, H and I remain unchanged, sothe depth of the tree from C is now n+ 1, the depth from A is now n+ 1 andthe depth of the whole tree (from F ) is again one less, n+ 2. The balances ofthe nodes B, D, E G, H and I remain unchanged, the balance of F becomes 0,the balance of C becomes 0 or 1 depending on the depth of I, and the balanceof A becomes 0 or 1, depending on the depth of H.

4.5.6 Insertion in a balanced binary tree

We shall now show how to use the rotations described above as the basis of aninsertion algorithm for balanced binary search trees.

⋄ Begin with a balanced binary search tree, so all the nodes in the leftsubtree of a node have value less than that node, all the nodes in theright subtree have value higher than that node, and the balance of all thenodes is 0, 1 or −1. Suppose that we wish to add the key N .

⋄ Traverse the tree until either a node labelled N is found, in which casethe algorithm terminates, or until the insertion point for N is found, inwhich case make a new node labelled N and add it to the tree in thecorrect place.

⋄ Re-calculate the balances of the nodes in the tree. If the tree is stillbalanced then the algorithm terminates. If not then rebalance the tree asdescribed below.

Before describing the balancing algorithm we consider the different ways inwhich the tree can become unbalanced, as these correspond to the special casesthat the balancing algorithm has to deal with.

Suppose that A is a node in what was a balanced binary tree which hasbecome unbalanced due to the insertion of one new node, and that none of thenodes below A have become unbalanced. Suppose that A has left child B andright child C. For A to have become unbalanced then new node must have beenadded at the end of the subtree under B or the subtree under C.

If the balance of A was 0 then the depths of its left and right subtrees werethe same, and adding a node can only increase the depth by 1, so A would stillbe balanced. If the balance of A was 1 then the left subtree was deeper than theright one, so adding a node under C would not make A unbalanced. Similarly,if the balance of A was −1 then to make the node unbalanced we must add thenode under C.

N ?R

A

B C

D E F G

Balance A = 1

n+ 2

n+ 1 n/ R

N ?R

A

B C

D E F G

Balance A = −1

n+ 2

nn+ 1

/ R


First suppose that A had balance 1 and that the new node was added underB. In order for the balance of A to change the depth of B must change andhence the new node must be added to the deepest subtree of B. If one subtreeof B was deeper than the other before the new node was added then addingthe new node to the deepest tree would make B unbalanced. We assumed thatall the nodes below A were still balanced after the insertion, thus the only waythat A can become unbalanced by adding a node under B is if the subtreesunder B were the same length, i.e. if the balance of B was originally 0.

Thus there are two cases, the new node was added to left subtree of B, thebalance of B became 1, and the balance of A became 2, or the new node wasadded to right subtree of B, the balance of B became −1, and the balance ofA became 2,

N N? ?R R

A A

B BC C

D DE EF FG G

Case 1: balance B = 1 Case 2: balance B = −1

n+ 3 n+ 3

n+ 2 n+ 2n n

n+ 1 nn n+ 1

/ /R R

......

Now suppose that A had balance −1 and that the new node was addedunder C. As for the case when the node was added under B, the only way thatA can become unbalanced by adding a node under C is if the balance of C wasoriginally 0. Thus again there are two cases, the new node was added to rightsubtree of C, the balance of C became −1, and the balance of A became −2, orthe new node was added to left subtree of C, the balance of C became 1, andthe balance of A became −2.

N N? ?R R

A A

B BC C

D DE EF FG G

Case 3: balance C = −1 Case 4: balance C = 1

n+ 3 n+ 3

n nn+ 2 n+ 2

n+ 1nn n+ 1

/ /R R

......

The four cases that we have identified correspond to the four types of rota-tion that we discussed in the previous section. If we are in case 1 after the newnode has been added, so A has balance 2 and B has balance 1, we can applya right rotation to A to get a new tree in which B has balance 0 and A hasbalance 1.


N ?R

A

B C

D E F G

original

n+ 2

n+ 1n

n n

/ R

B

D

E

n+ 2

n+ 1

n ?R

A

C

F G

n+ 1

nR

R

...

after new node and rotation

The relative order of the elements in the new tree is still the same, the value ofa node is greater than the values of the nodes in its left subtree and less thanthe values in its right subtree. So the new tree is still a binary search tree. Allthe nodes in the subtree beginning at B are now all balanced, and the depth ofthis tree is n+2, the same as the depth of the original subtree it replaces. Thusthe balance of the rest of the surrounding tree is unchanged by this operation.

If we are in case 2 after the new node has been added, so A has balance2 and B has balance −1, then we can apply the rotations as in case 2, a leftrotation on B and then a right rotation to A, to get a new subtree which hasdepth n + 2 and all of whose nodes are balanced. If A has balance −2 and Chas balance −1 then we perform the rotation as in case 3 to balance the tree,and if A has balance −2 and C has balance 1 then we perform the rotations asin case 4 to balance the tree.

?R

AB

CD

E

F G

Case 2

n+ 1n+ 1

n

n+ 1

/

R......

U

n+ 2

or

B

D

A

n /

RC

G

n+ 2

...F

E

Case 3

n+ 1

n

n+ 1=

~

w

B

D

A

n /

RF

C

n+ 2

...G

E

Case 4

n+ 1

n

n+ 1=

~

w... or

This gives the following algorithm to re-balance a tree which has had one nodeadded to it.

While (there are still unbalanced nodes in the tree) 1. Find a node A which is unbalanced but all of whose descendants are

balanced.

2. If A has balance 2 and its left child B has balance 1, perform a rightrotation on A.

3. If A has balance 2 and its left child B has balance −1, perform a leftrotation on the right child E of B and then perform a left rotation on A.

4. If A has balance −2 and its right child C has balance −1, perform a leftrotation on A.

5. If A has balance −2 and its right child C has balance 1, perform a rightrotation on the left child F of C and then perform a left rotation on A.

Multiway search trees 66

4.5.7 Building balanced binary search trees

Of course, we can use the insertion algorithm to construct balanced binarysearch trees. The tree constructed will depend on the order in which the ele-ments were added.

The following two trees were constructed by inserting one element at a timeand re-balancing if necessary. They both contain the same data but in the firstcase the elements were added in the order 1, 2, 3, 4, 5, 6 and in the second casethey were added in the order 6, 5, 4, 3, 2, 1.

N ?R R

4 3

2 25 4

1 13 56 6

1, 2, 3, 4, 5, 6 6, 5, 4, 3, 2, 1

/ /R R

4.6 Multiway search trees[TA86, pp. 473 ff] [Knu73, pp. 471–480]

In a multiway search tree the tree need not be binary, and several keys may beassociated with the same node. If a node has m ≥ 2 children then it will havem − 1 associated keys. There is a fixed upper bound on the number of keyswhich can be in a given node. Thus a multiway tree of degree 3 can have atmost 3 keys in any given node.

4,7,21

1,2 23,50,61

28,33

/ ~

5 10? ^

Such trees are searched by proceeding down the (m+1)’st child if the target isbigger than the mth key but smaller than the m+1st key in the current node.

4.6.1 Insertion in multiway search trees

A simple insertion algorithm for multiway trees is to search for a node whosesmallest element is greater than or equal to the element to be inserted, andwhose largest element is less than or equal to the element to be inserted, andwhich has less than its full complement of keys. The new element is then justinserted in this node. If there is no such node then a node with keys largerand/or smaller than the new entry is found and a new leaf is created with thenew entry in it.

We insert −3, 3 and 56 in the above 3-tree as follows


4,7,21

−3, 1, 2 23,50,61

28,33

/ ~

5 10? ^

4,7,21 4,7,21

−3, 1, 2 −3, 1, 223,50,61 23,50,61

28,33 28,33

/ /~ ~

5 510 10? ?^ ^

3 3~ ~

56^

The problem with this method is that the trees can become unbalanced andhence searching is not optimal.

4.6.2 B-trees

[TA86, pp. 486 ff] [CLR90, pp. 381–399] [NN96, pp. 325–327]

B-trees are a particular form of multiway search tree for which there is aninsertion algorithm which preserves the balance of the tree without having torotate the nodes. Thus the levels of nodes in a tree are preserved after aninsertion.

In a B-tree there is a given integer m and all the nodes have between 1 andm entries in them. Every node apart from leaf nodes (and some nodes fromthe penultimate level if the number of elements is not a power of 2) have r+ 1children, where r is the number of elements in the node.

We shall only look at 3, 2-trees, B-trees in which the nodes can contain atmost two elements, but the generalisation to m+1,m trees is not very different.

A 3, 2-tree is a tree, of depth d say, in which each node has one or twoelements and every node of depth less that d− 1 has one more child that it haselements in it.

20,30

1,2

50,60

70,81/

~2410?/

12 21 27 45 56

? N W R

Insertion of an element k is carried out as follows:

⋄ Traverse the tree until either a node containing k is found, in which casestop, or until a node v at the bottom of the tree is reached.

⋄ If v is not a leaf node then we can make a new leaf node with label k andadd it as a child of v and stop.

⋄ If v is a leaf node with only one element then add k to v and stop.

⋄ If v already contains two elements then add k to v and then split v asfollows:


1. Suppose that v has elements k1 < k2 < k3 and parent u. Remove vfrom the tree and make two new nodes v1 and v2 which have elementsk1 and k3 respectively. Make these children of u and add k2 to u.

h1, h2

k1, k2, k3

~6

= ?

h1, h2, k2

~= ? Uk1 k3

2. If the node u now has three elements l1 < l2 < l3 then we split thisby creating two new nodes with labels l1 and l3, the first gets thefirst two subtrees of u and the second gets the other two subtrees,and l2 is added to the parent, w, of u.

. . . l2

k1 k3

Wl1 l3

/ N W R

3. If w now has three elements then repeat the splitting until all thetree nodes have at most two elements in them.

The following shows the process of adding a key 85 to the example above.

20,30

1,2

50,60

70,81,85/

~2410?/

12 21 27 45 56

? N W R

20,30

1,2

50,60,81

/

2410?/

12 21 27 45 56

? N70 85

j

? N R

20,30,60

1,2/

2410?/

12 21 27 45 56

? N70 85

? N R

50 81j j

1,2/

2410

12 21 27 45 56? N

70 85 ? N R

50 81

20 60

30

+

) j

U U/

You can read more about B-trees, and the slightly more efficient variationsB∗- and B+-trees, in [TA86, pp. 486 ff]

Chapter 5

String matching

In this chapter we shall consider string matching problems. This normallymeans finding all (or the first) occurnces of a pattern in some text. For examplefind all occurences of the pattern ”as” in the below text.

”No one would have believed in the last years of the nineteenth century thatthis world was being watched keenly and closely by intelligences greaterthan man’s and yet as mortal as his own; that as men busied themselvesabout their various concerns they were scrutinised and studied, perhapsalmost as narrowly as a man with a microscope might scrutinise the tran-sient creatures that swarm and multiply in a drop of water.”

There are in fact seven occurences of ”as” in the above text. Other applica-tions of string matching could be in word processors, finding patterns in DNAsequences as well as in many other topics.

We formally define the string-matching problem as follows. Let T be a textof length n, where the i’th element of T is denoted by T [i]. For simplicitywe will assume that each element in T is a text character. Let P denote thepattern which we are looking for in T . Let P have length m, and let the i’thelement in P be denoted by P [i]. We will often picture T and P as in Figure5.1. Note that traditionally arrays considered in string matchings start at index1, not zero.

text T

pattern P

T [1] T [2] T [3] .. .. T [n]

P [1] .. P [m]

Figure 5.1 An example of how we picture a pattern P and a text string T .

The naive string matching algorithm 70

text T

pattern P

a b c d a b c d a b c d

c d a b

Figure 5.2 An example of a pattern P that occurs with shift equal to two in T .

We will say that P occurs with shift s in T if we by deleting the first selement of T would end up with a string starting with the pattern P . See anexample of a string occuring with shift 2 in Figure 5.2.

In other words if P occurs with shift s in T then P = T [s+1..s+m], whereT [s+1..s+m] denotes the string with the following characters: T [s+1], T [s+2], T [s + 3], ..., T [s + m]. We will give three different methods of finding alloccurences of a P in a text T .

The naive string matching algorithm is the striaght forward method that iseasy to program and has a worst case time complexity of O(m(n − m + 1)).The Rabin-Karp algorithm is similar to the naive string-matching algorithm andalso has a worst case time complexity of O(m(n − m + 1)). However using asimilar approach to hash tables it turns out to be a very fast algorithm in mostcases. The Knuth-Morris-Pratt algorithm is the fastest algorithm with a timecomplexity of O(n+m). However it is also the most complicated algorithm toprogram. We will now describe each of the above algorithms in more details.

5.1 The naive string matching algorithm

The naive string matching algorithm is the straight forward way of finding alloccurences of P in T . We simply check if P occurs in T with shift s for allpossible values of s. This is done in the pseudocode below.

1 naive_string_matching (string T, string P)

2

3 for s from 0 to n-m do

4 bool match:=true

5 for i from 1 to m do

6 if P[i]!=T[i+s]

7 match=false

8 if (match)

9 print << "Pattern occurs with shift " << s

10

11

Note that the loop s tries all possible shifts for P . Now convince yourselfwhy s only goes up to n − m. For each value of s, we first set a boolean

The naive string matching algorithm 71

shift=0

a b a a b b a a b

b a a

match=false

shift=1

a b a a b b a a b

b a a

match=true

shift=2

a b a a b b a a b

b a a

match=false

shift=3

a b a a b b a a b

b a a

match=false

shift=4

a b a a b b a a b

b a a

match=false

shift=5

a b a a b b a a b

b a a

match=true

shift=6

a b a a b b a a b

b a a

match=false

Figure 5.3 An example of naive string matching(T,P) on the stringsT = ”abaabbaab” and P = ”baa”.

value to true, and we then change this value to false if any of the charactersP [1], P [2], . . . , P [m] do not match the corresponding character in T (ie T [s +1], T [s + 2], . . . , T [s +m]). So if the boolean value remains true after the loopin lines 5-7, then P must occur with shift s in T .

We will not give the java code for the naive string matching algorithm as itis quite simple to program. But look at Figure 5.3 for a very short illustrationof how the algorithm works if T = ”abaabbaab” and P = ”baa”. Hopefully youcan see exactly which steps the algorithm caries out.

Of course once the boolean value match becomes false we do not have tocontinue comparing characters from T and P for that specific value of shift.So the below pseudocode is a slight improvement on the previous code.

1 naive_string_matching2 (string T, string P)

2

3 for s from 0 to n-m do

4 bool match:=true

5 int i=1

6 while (match) and (i<=m) do

7 if P[i]!=T[i+s]

8 match=false

9 i:=i+1

10

11 if (match)

12 print << "Pattern occurs with shift " << s

13

14

The Rabin-Karp algorithm 72

Try running this algorithm on the above example (when T = ”abaabbaab”and P = ”baa”).

No matter which version of the naive string matching algorithm we use wenote that if T contains n ”a”’s and P contains m ”a”’s then the inner loop willalways have to compare m values from P with m values from T . As the outerloop goes from 0 to n−m this gives us a worst case analysis of O(m(n−m+1)).

For our first implementation of the naive string matching algorithm we notethat this is also our best case analysis as the inner loop always compares mvalues from P with m values from T . However in our second implementation(ie ”naive string matching2”) we note that if T contains n ”a”’s and P containsm ”b”’s, then the algorithm would only use O(n −m + 1) time. Can you seewhy?

5.2 The Rabin-Karp algorithm

Rabin and Karp have come up with a string matching algorithm which performswell in practice, even though its worst case running time is O((n −m + 1)m)just like the naive string matching algorithm. The algorithm uses a kind ofhash function, just as we did for hash coding. To make things simpler supposethat every character in both T and P belong to the set ’0’,’1’,’2’,. . .,’9’. Wecan then view each string as a number. The string ”27172” corresponds tothe number 27,172 (note that the string ”0027172” also corresponds to 27,172).With this interpretation the pattern P corresponds to a number, say p. Fur-thermore the strings T [1..m], T [2..m + 1], ..., T [n − m + 1..n] all correspondto numbers, say t1, t2, ..., tn−m+1, respectively. Now our problem is to de-cide which of the numbers t1, t2, . . . , tn−m+1 are equal to p, as if p = ti thenP [1..m] = T [i..i +m− 1].

The problem is that the numbers p, t1, t2, . . . , tn−m+1 may be very large andcannot be stored as a number in the computer. However if p = ti then we knowthat p modular q is equal to ti modular q, for any positive interger, q. So if wecould compute all of the following numbers, then we would perhaps be able torule out some shifts (as if p modular q is not equal to ti modular q then P doesnot occur in T with shift i− 1).

p′ = p mod q t′1 = t1 mod qt′2 = t2 mod q.. .. .........

t′n−m+1 = tn−m+1 mod q

As an example let P = ”0302” and let T = ”4030201503022” and let q = 17.In this case we get the following values.


p′ = 302 p′ = 13

t1 = 4030 t′1 = 1t2 = 302 t′2 = 13t3 = 3020 t′3 = 11t4 = 201 t′4 = 14t5 = 2015 t′5 = 9t6 = 150 t′6 = 14t7 = 1503 t′7 = 7t8 = 5030 t′8 = 15t9 = 302 t′9 = 13t10 = 3022 t′10 = 13

We can now see that we only have to check t2, t9, t10 to see if they areactually matches. We see that t2 and t9 are matches, wheras t10 is not. Wesay that t10 is a spurious hit. So how is the above going to help us? Can wecompute the above numbers quickly?

YES! We can. We use an approach which is similar to Horners rule for hashfunctions.

1 compute modular values (string P) 23 p′ := 04 for j from 1 to m do5 p′ := (10 × p′ + P [j]) mod q67 t′1 := 08 for i from 1 to m do9 t′1 := (10 × t′1 + T [i]) mod q1011 for i from 2 to n−m+ 1 do12 t′i := (10 × (t′i−1 − 10m−1T [i− 1]) + T [i+m− 1]) mod q13

In lines 3-5 we compute p′ by noting that p′ = 10m−1P [1] + 10m−2P [2] +. . . + 10P [m − 1] + P [m] modular q. This is equal to the following where wemay take modular q at any intermediate calculation:

10(10(10...(10P [0]) + P [1]) + P [2])...) + P [m] modular q.

In lines 7-9 we compute t′1 in the same way as we computed p′. In lines11-12 we compute t′i for all i = 2, 3, . . . , n−m+ 1 using the formula below.

t′i

= 10m−1T [i] + 10m−2T [i+ 1] + . . .+ 10T [i+m− 2] + T [i+m− 1]= 10

(

10m−1T [i− 1] + 10m−2T [i] + . . .+ T [i+m− 2]− 10m−1T [i− 1])

+ T [i+m− 1]= 10

(

t′i−1

− 10m−1T [i− 1])

+ T [i+m− 1]

Note that the function compute modular values only uses time O(m+n) as


lines 3-5 take O(m) time while lines 7-12 take O(n) time. Now the Rabin-Karpalgorithm can be written as follows:

1 rabin karp (string T, string P) 23 compute modular values (string P)4 for i from 1 to n−m+ 1 do5 if (p′ == t′i) 6 bool match:=true7 for j from 1 to m do8 if (P [j]! = T [i+ j − 1])9 match:=false10 if (match)11 print << ”Pattern occurs with shift ” << i-112 13 14

The advantage of Rabin-Karp over the naive string matching algorithm isthat if P does not occur in T with shift s, then p′ should only be equal to t′s+1

with probability 1/q. So if q is a large prime number (why prime?) then thereshould be very few spurious hits. This means that we would expect to get thefollowing time complexity if P occurs in T h times.

O(n+m) + h×O(m) + (n−m+ 1− h)

(

1

q×O(m) +

q − 1

q× 1

)

This is bounded above by the following formula

O(n+m+ hm+m(n−m+ 1)

q)

We note that if h is of the same order as n − m, which is the case if P =”aaaaa..aaa” and T = ”aaaa...aaaaa”, then this is still O(m(n −m + 1)), sothere is no improvement over the naive string matching algorithm. However ifh = 0, then we would expect to use at most O(n+m+m(n−m+ 1)/q) timewhich is alot better than the naive string matching algorithm when q is large.Note that this is only on average, and some strings T and patterns P may usemore operations than this (if the number of spurious hits are considerably largerthan 1/q).

This completes the Rabin Karp algorithm when all characters were in theset ’0’,’1’,. . ., ’9’. What happens if the characters can be any ASCII-value(i.e they have a value between 0 and 255). Then the following algorithm willwork. You should convince yourself of this.


1 rabin karp ascii (string T, string P) 23 p′ = 04 for j from 1 to m do5 p′ = (256 × p′ + P [j]) mod q67 t′1 = 08 for i from 1 to m do9 t′1 = (256 × t′1 + T [i]) mod q1011 for i from 2 to n−m+ 1 do12 t′i = (256 × (t′i−1 − 256m−1T [i− 1]) + T [i+m− 1]) mod q1314 for i from 1 to n−m+ 1 do15 if (p′ == t′i) 16 bool match=true17 for j from 1 to m do18 if (P [j]! = T [i+ j − 1])19 match=false20 if (match)21 print << ”Pattern occurs with shift ” << i-122 23 24

Note that the value 256m−1 should be pre-computed, so that we do not haveto compute it from scratch everytime line 12 is performed.

For an algorithm with worst case time complexity O(n+m), see appendixC.

Chapter 6

Game Theory

Computers are getting better and better at playing games. The best chesscomputers can now compete with the best human players and there is hardlyany common games for which you cannot buy a computer game. So how doyou write programs that are good at playing games?

We will look at the most common method, called the min-max algorithm,when considering games such as chess, tic-tac-toe, draughts, chinese checkers,reversi, 4-in-a-row, 5-in-a-row, and many many more. As speed is a very im-portant part of this kind of game theory, we will also look at ways of speedingup the algorithms. The most common way of increasing the speed is to use theso-called α-β pruning.

In this chapter we restrict ourselves to two-person games, where the twoplayers alternate making moves. We will furthermore assume that there are noelements of chance (such as poker, backgammon, etc). The first game we willlook at is called drop-down tic-tac-toe.

6.1 Drop-down tic-tac-toe

Consider a version of the tic-tac-toe, where we can only place a piece in coordi-nate (i, j) (that is in row i and column j) if i = 1 or if there is already a piecein coordinate (i − 1, j). See the below for an example of how the game maydevelop.

column 1 2 3row321 x

board 1

⇒

x o

board 2

⇒x

x o

board 3

⇒ o

x

x o

board 4

⇒ o

x x

x o

board 5

⇒ o o

x x

x o

board 6

⇒ o o

x x

x x o

board 7

⇒ o o

x o x

x x o

board 8

Drop-down tic-tac-toe 77

It is called drop-down tic-tac-toe as you may think of each piece having todrop all the way to the bottom of the 3 × 3 grid when dropped from the topof a column. The reason we consider this game instead of ’normal’ tic-tac-toeis that it actually requires (a little) more thought to play than tic-tac-toe, butmore importantly there are fewer legal moves in each turn. Therefore we canactually analyse this game completely.

Could ’x’ have won the match above, or got a draw if he had played better?Before answering this question we will try to decide if ’x’ could have won thematch above, or got a draw if he had played better from board 2 onwards. Wewill give a value to each board according to the following rules.

1 = ’o’ will win if both players play optimally.0 = it will end in a draw if both players play optimally.-1 = ’x’ will win if both players play optimally.

It is easy to give a value to a board where the game is finished, as below.

o o

x o x

x x o

Value=1

x x o

o o x

x x o

Value=0

x

x x o

o o x

Value=-1

But how do we decide the board value for other boards? If it is ’o’s turnthen we will assume that ’o’ makes the move that is best for ’o’. So if we knowthe value for each board that can result after ’o’s move then ’o’ will choose themove resulting in the maximum possible value. Similarly, if it is ’x’s turn wewill assume that ’x’ makes the move that is optimal for ’x’. In other words, ’x’will place a piece resulting in a board of minimum value. In Figure 6.1 we knowthe board values of the right-most boards, as they represent finished games. Inthe boards in the second right-most column the game is either finished or it is’x’s turn. If it is ’x’s turn then he picks the move that maximizes the boardvalue (the board value is the number to the right of the board). In the thirdright-most column the game is either finished or it is ’o’s turn. If it is ’o’sturn then he picks the move that minimizes the board value. By continuingthis process we see that our left-most board gets the value 1. Therefore ’o’ willalways win if he always chooses the move that maximizes the board value (nomatter what ’x’ does).

The tree in Figure 6.1 is called a min-max tree, as we alternately takeminimum values and maximum values when we move through the tree frombottom up. So if ’x’ always starts the game do you think there is a winningstrategy for ’x’ (that is can ’x’ always win if he plays optimally)? Or is thereis a winning strategy for ’0’? Or will the game always end in a draw if bothplayers play optimally? In order to decide this we need to build a min-max treeas before, but starting from the empty board. See Figure 6.2 for the first partof the tree.

We see that the board value of the empty board is 0, which implies that ifboth players play optimally then the game will always end in a draw.

So the min-max tree tells us who will win the game (if both players playoptimally) and what moves they should make at any given board. As games


f1 DDDDDDDDDDDDD

ff 1 BBBBBBBBbbbff1CCCCCCCCCCeee%%%ff1TTTTT"""

ff 1@@@ff 1ZZZ"""ff1ff1BBBBBBBBJJJJhhhff1eee(((ff1ff1 AAAAAbbbff1PPP

fff 1fff 1 AAAAAASSSSfff 1AAAAAeeeff f1 TTTTTfff 1LLLLLLJJJJfff 1AAAAAASSSSfff1AAAAAeeeaaafff 1@@@` `fff 1ccc###fff1 fff 1!!!%%%fff1"""fff1fff 1 fff1aaa!!!

ccc 1ccc 1%%%%ccc 1ccc 1cc c1,,,,ccc 0ccc 1ccc 1ccc 1ccc1ccc1@@@@aaaaccc1ccc 1ccc 1 SSSSSccc 1ccc 1 TTTTTT@@@@ccc 1llllhhhhccc1ccc1ccc 1eeeeaaaaccc 1XXXXccc1ccc1QQQQccc1 ccc 1ccc1ccc1

eeee 0 eee e1eeee 1eeee 1eee e1eeee 0eeee 0 eeee1eeee 1eeee 1eeee0 eeee 1eeee 1eeee 0 eee e1eeee 1eeee 1eeee1eeee 1eeee 0@@@eeee 1

ffff 0ffff 0ffff 0ffff0ffff 0ffff 0

1Figure 6.1 A min-max tree for the drop-down tic-tac-toe.


0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

1

0

0

0

1

0

0

0

1

1

0

0

0

0

0

0

00−1−10−10−10−1−10−1−10001−1−10010−1000−1−10−1100−1−10000−1−1001−10−1−1000−10100−1−11000−1−10−1−10−10−10−1−100

Figure 6.2 First part of the min-max tree for the drop-down tic-tac-toe.

Evaluation functions 80

such as chess, draughts, chinese checkers can all be analysed using min-maxtrees you may think that all of these games have been analysed in this way.However to build the full min-max tree for chess we would need billion of yearson the fastest computer in the world, so this is not feasible.

However there are ways around the above time problems, which we willdiscuss in the following chapters. We can speed up our algorithm using atechnique called α-β pruning (or alpha-beta pruning) and we can use what iscalled an evaluation function instead of building the whole min-max tree.

6.2 Evaluation functions

Even though it would take billion of years to compute a move for the gameof chess, if we had to build the complete min-max tree, we can still use themin-max tree to decide on good moves in chess. We do this by not buildingthe complete min-max tree, but only part of it. At the bottom of the tree wewill then not have a finished game, but some unfinished board instead. So wewill have to give a board value to such an unfinished board. For chess this istypically done by giving a value to each type of piece (e.g. pawn=1, bishop=5,knight=5, rook=10,...) and summing up the value of all the white pieces andsubtracting the value of all the black pieces. If either side is in checkmatethen the value should be either ∞ or −∞ depending on who has won. This is avery simple evaluation function for chess, and more complicated ones take otherfactors such as double-pawns into account. The whole purpose of an evaluationfunction is to give a value to the board which indicates who is winning. Forchess this will typically be a value which gets larger if the board gets better forwhite, and will be small (i.e. have a large negative value) if the board is goodfor black.

However as chess is a very complicated game we will illustrate the evaluationfunction using normal tic-tac-toe (not drop-down tic-tac-toe). We want to givea value to a board, such that the higher the value the better the board is for’o’ and the smaller the value the better the board is for ’x’. We will give anexample of such a function below.

Given a board in tic-tac-toe there are 3 rows, 3 columns and 2 diagonals.It is good for ’o’ if many of these rows/columns/diagonals have lots of ’o’sand no ’x’s. If a row/column/diagonal contains both ’o’s and ’x’s then this isequally good/bad for both ’o’ and ’x’. Therefore we can assign a value to eachrow/column/diagonal using the below table.

Pieces in the Value of thatrow/column/diagonal row/column/diagonal

”o ”, ” o ”, ” o” 1

”oo ”, ”o o”, ” oo” 3

”ooo” +∞”x ”, ” x ”, ” x” -1

”xx ”, ”x x”, ” xx” -3

”xxx” −∞All other 0


The evaluation function now sums up the 8 values we get byusing the above table on the 3 rows, 3 columns and 2 diagonalsin our board. So what value will our evaluation function returnwhen used on the board to the right.

o x

o x

x o

We get the following values (where diagonal 1 is from the bottom left to thetop right).

Row Pieces Value Column Pieces Value Diagonal Pieces Value1 ”x o” 0 1 ”oox” 0 1 ”xx ” -32 ”ox ” 0 2 ”xx ” -3 2 ”oxo” 03 ”ox ” 0 3 ” o” 1

So the value of the board is 0+0+0+0−3+1−3+0 = −5. This indicatesthat the board is good for ’x’, which is true. Verify that the following boardshave the following values.

x ox x ox oValue=+∞

o x

o x

o x

Value=8

o o xo xxValue= −∞

o x o

o x x

x o

Value=0

Note that the evaluation function is not always accurate as if a board isgood or bad for ’o’ also depends on whose turn it is. The below board is verygood for ’o’ if it is ’o’s turn, even though it gets a negative value.

o

o x

x x

Value=-4

However, as all the boards we will use the evaluation function on will havethe same person to move, we will not incorporate whose move it is into theevaluation function. We also note that we are not claiming that the evaluationfunction always tells us accurately who is leading or who will win the game,it is only an indication of who seems to be leading. As we normally have tocompute the evaluation function for thousands or millions of boards the mainpriority for the evaluation function is that we can compute it quickly. We willdiscuss what makes good evaluation functions later.

First we will see how we use evaluation function together with the min-maxtree. Instead of computing the whole min-max tree, which normally will be tootime consuming, we will only compute the tree down to a given depth, denotedby D. The higher the depth the slower the program will run, but the bettermoves it will find. At the bottom of the tree (i.e. at depth D) we will then useour evaluation function. At all other depths we will use the normal min-maxstrategy, if the board has any children. If it has no children it means the gamewas finished and we use our evaluation function to determine who has won.


ff 1 BBBBBBBBBBBBBBBBBBBTTTTTTTTTT

ff 4 eeeeeeeZZZZZZXXXXXX ff 3@@@@@@HHHHHHff 2ZZZZZZXXXXXX ff 1HHHHHHff 1XXXXXX %%%%%%%

eee 1aaaaaaaXXXXXXXhhhhhheee 1aaaaaaaXXXXXXXhhhhhheee 1aaaaaaXXXXXXhhhhhheee4aaaaaaXXXXXXhhhhhheee 1aaaaaaXXXXXXeee 1aaaaaa``````eee 3PPPPPP``````eee1PPPPPPhhhhhhheee 4PPPPPPhhhhhheee 4PPPPPPhhhhhheee 2PPPPPPhhhhhheee4XXXXXXXhhhhhheee 1XXXXXXhhhhhheee 1XXXXXXhhhhhheee 1XXXXXXeee1` ` ` `((((((eee 1``````((((((eee 1``````((((((eee 1hhhhhh((((((eee 1hhhhhh((((((

aaa 4aaa 6aaa 1aaa 7aaa 8aaa 1aaa 3aaa 2aaa 1aaa 4aaa 1aaa 3aaa 4aaa 1aaa 5aaa 7aaa 1aaa 8aaa 3aaa 1aaa 2aaa 4aaa 4aaa1aaa 4aaa 2aaa 1aaa 4aaa 1aaa 4aaa 2aaa 1aaa 2aaa 1aaa 4aaa 2aaa 1aaa 2aaa 1aaa 6aaa 1aaa 1aaa 8aaa 1aaa 1aaa 3aaa1aaa 2aaa 5aaa 1aaa 1aaa 1aaa 4aaa 1aaa 1aaa 8aaa 1aaa 1aaa 2aaa 2

Figure 6.3 A min-max tree for tic-tac-toe, with depth D = 3.

Thoughts on min-max trees 83

See Figure 6.3 for an example of the min-max tree with depthD = 3, using the evaluation function above and starting at theboard to the right. We see from the min-max tree in Figure 6.3which move ’x’ should make as his first move! He should makea move which results in a board with value −∞. If he does thatthen no matter what move ’o’ makes it will result in a board ofvalue −∞. Continuing this way ’x’ can make sure that the lastboard will have value −∞, which means ’x’ has won.

o

o x x

See Figure 6.4 for the first part of the min-max tree with depth 3, whenstarting from the empty board. See Figure 6.5 for the first part of the min-maxtree with depth 4, when starting from the empty board. In both cases we seethat the best move ’x’ can make is to start in the middle of the board. Thebest move ’o’ can then make is to put a piece in one of the corners.

6.3 Thoughts on min-max trees

There are many decisions to be made when you want to use the min-maxalgorithm. We will discuss a few of the common ones here.

What depth should you use? This normally depends on the speed of yourcomputer, the complexity of your evaluation function, the patience of the userand many more factors. The deeper the depth the better the computer will play,but the slower it will be. In many games there are also more possible moves atthe beginning of the game, compared to the end of the game, so some programsuse different depth depending on the state of the game. Another option is touse a certain (small) depth and if that went very quick it will try again withdepth one larger. It will continue like this until it becomes too slow.

What evaluation function should you use? This is often the most difficultpart of producing a good game-playing computer program. It often requires agood knowledge of the game we are trying to program. So there is basically noeasy answer to this question. This brings us to the next question.

Should you have an advanced evaluation function, or should you search toa deeper depth? This is again a difficult question to answer. In chess thereare often around 30 possible moves at a certain position (even though it variesa lot depending on the state of the game), so if the evaluation is made 30times slower, we will have to search to depth one less, if the speed shouldn’tsuffer. Often experiments will be run to decide what is best. You may basicallytry the simple and fast evaluation function and then try the better but slowerevaluation function, and see which performs best. You may even make thecomputer play itself using the two different strategies.

What if two moves are equally good? If the min-max tree indicates thattwo moves are equally good we may do several different things. Often we justpick the first move we found. We may also choose to pick a random move,so the computer doesn’t get too predictable. Or we may try to obtain moreinformation on the two moves.

There are many more decisions to be made when writing a games program.Many of these are also related to the α-β pruning which we will discuss in the


next chapter.

−5

−3

−2

−3

−2

−5

−2

−3

−2

−3

−6−5−6−3−6−5−6−4−4−4−5−2−5−4−4−4−5−6−6−3−6−4−6−5−4−5−4−2−4−4−5−4−5−6−5−6−6−5−6−5−4−5−4−4−2−4−5−4−5−6−4−6−3−6−6−5−4−4−4−5−2−5−4−4−4−6−5−6−3−6−5−6

Figure 6.4 Part of the min-max tree for tic-tac-toe, starting at the empty board,with depth D = 3.


0

2

3

2

3

0

3

2

3

2

−10

−12000000

−13

−11010

−102

−10000

−11300

−110

−10

−1−10

−101

−10031

−10000

−120

−10101

−13

−10000002

−10

−1

Figure 6.5 Part of the min-max tree for tic-tac-toe, starting at the empty board,with depth D = 4.

Alpha-Beta pruning and pseudo codes 86

6.4 Alpha-Beta pruning and pseudo codes

We will now give a short overview of the pseudocodes for the min-max algorithmand the Alpha-Beta pruning. These notes are only designed to give a roughoverview of the area and you should be taking notes at the lectures.

6.5 Pseudocode for the min-max algorithm

The following is the pseudocode for the min-max algorithm.

MinMax(Board,depth,turn)

If (depth=0) or (the game is over)

return value of board

If (turn=’o’) // Max-step

val=-M;

for all possible moves, Q, that ’o’ has

Board’= Board after move Q.

val=max(val,MinMax(Board’,depth-1,swap(turn)))

else // Min-step

val=M;

for all possible moves, Q, that ’x’ has


val=min(val,MinMax(Board’,depth-1,swap(turn)))

return val

The input contains a ”Board” which is a representation of a given board,for which we want to compute the value which the min-max tree gives us. Italso contains a variable to indicate to what depth we shall search and a variableto indicate who’s turn it is. We assume that ’o’ wants to maximize the value ofthe board and ’x’ wants to minimize the value of the board. If the ”depth” iszero or if somebody has already won the game, then we just return the valuewhich our evaluation function gives us. Otherwise we check whose turn it is. Ifit is ’o’s turn then we need to find the maximum value of the children of theboard we are considering. We use ”val” to store the maximum value we havefound so far. We then make recursive calls to all possible boards that can beobtained by placing a ’o’, where each recursive call will be searched to a depthof one less and it will be the other players turn.

If it is ’x’s turn we need to find the minimum value of the children of theboard we are considering instead of the maximum.

It is a good idea to step through the above algorithm on some of the exam-ples that are given in class.

Alpha-Beta pruning. 87

6.6 Alpha-Beta pruning.

Alpha-Beta pruning is a method to delete large parts of the min-max tree,without losing any information. For example consider the situation:

A

/ \

MAX / \

/ \

B C

MIN / | / | \

/ | / | \

10 8 6 ? ?

We note that B will get the value 8, as this is the minimum of 10 and8. When we consider C the first child has the value 6. This means that thevalue of C is at most 6 (as C = min(6, ?, ?)) no matter what the value ofthe other children of C are. But this means that A will have the value 8 (asA = max(8,≤ 6)) as the value of B is guaranteed to be larger than that ofC (C ≤ 6 ≤ 8 = B). This means that we do not have to consider the otherchildren of C at all, as we already know the value of A.

Alpha-Beta pruning deletes part of the tree using the above approach. Ofcourse we do not have to examine the nodes marked ”?” in the below tree either(if we examine nodes in a left-to-right order). You should convince yourself ofthis!

__ S __

___/ | \___

MIN ___/ | \___

__/ | \__

U V W

| /| / | \

MAX | / | / | \

| / | / | \

10 12 ? 8 14 ?

In order to write a program that deletes part of the tree according to theabove approach we will for each board we consider have two variables Alphaand Beta. We will only be interested in values that are larger than Alpha andsmaller than Beta. If the value of a node is smaller than Alpha our functionwill just return Alpha instead of the actual value. Analogously if the value ofa node is greater than Beta then we will just return Beta instead of the actualvalue.

For the example above we will have the following function calls (which isexplained on the next page and will be further explained at the lecture):

MinMax(S): Alpha=-infinity, Beta=infinity

MinMax(U)


Alpha=-infinity, Beta=infinity

MinMax(10)

return 10

Set Alpha=10

return 10

Set Beta=10

MinMax(V)

Alpha=-infinity, Beta=10

MinMax(12)

return 12

As 12>Beta return 10

MinMax(W)

Alpha=-infinity, Beta=10

MinMax(8)

return 8

Set Alpha=8

MinMax(14)

return 14

As 14>Beta return 10

Return 10

Return 10

So in order to compute the value of S we first compute the value of U whichis 10. Therefore we set Beta equal to 10, as for all the following children of Swe are only interested in their value if it is less than 10 (ie greater than Alphaand less than Beta). If it is greater than 10 we will never use the value anyway!So as soon as a child of V has value greater than 10 we know that V will havevalue greater than 10, so we just return 10 (as we do not care what the actualvalue of V is. The same approach is followed when we consider W .

6.6.1 Pseudocode for the Alpha-Beta pruning

The following is the pseudocode for the min-max algorithm with Alpha-Betapruning.

MinMax_AB(Board,depth,turn,A,B)

If (depth=0) or (the game is over)

return value of board

If (turn=’o’) // Max-step

for all possible moves, Q, that ’o’ has


A=max(A,MinMax_AB(Board’,depth-1,swap(turn),A,B))

If A>=B

return B

return A

else // Min-step

for all possible moves, Q, that ’x’ has



B=min(B,MinMax_AB(Board’,depth-1,swap(turn),A,B))

If A>=B

return A

return B

We will now show which steps are performed on the following example.


__ S __

MAX ___/ | \___

’o’ ___/ | \___

__/ | \__

U V W

| /| / | \

MIN | / | / | \

’x’ | / | / | \

C D E F G H

14 27 22 20 10 40

MinMax_AB(S,2,’o’,-infinity,+infinity):

A=-infinity, B=infinity

Board’ = U

A=max(A,MinMax_AB(U,1,’x’,-infinity,+infinity)):

A=-infinity, B=infinity

Board’ = C

B=min(B,MinMax_AB(C,0,’o’,-infinity,+infinity)):

return 14

So B=min(infinity,14)=14

return 14

So A=max(-infinity,14)=14

Board’ = V

A=max(A,MinMax_AB(V,1,’x’,14,infinity)):

A=14, B=infinity

Board’ = D

B=min(B,MinMax_AB(D,0,’o’,14,+infinity)):

return 27


Board’ = E

B=min(B,MinMax_AB(E,0,’o’,14,27)):

return 22

So B=min(27,22)=22

return 22

So A=max(14,22)=22

Board’ = W

A=max(A,MinMax_AB(W,1,’x’,22,infinity)):

A=22, B=infinity

Board’ = F

B=min(B,MinMax_AB(F,0,’o’,22,+infinity)):

return 20


As A=22>20=B we return 22

So A=max(22,22)=22

return A=22

6.6.2 General thoughts on Alpha-Beta pruning

Alpha-Beta pruning makes no difference to the speed of the algorithm if at eachstep of the algorithm we consider the worse moves before the better moves. This


is the case as we only delete a child of a node A from the tree if we know thata previous child of A is guaranteed to produce a better move.

On the other hand if we in every step of the algorithm consider the bestmove first, then Alpha-Beta pruning may delete a huge part of the tree. In factif the min-max tree has N nodes we may have only O(

√N) nodes in the tree

if we use Alpha-Beta pruning. For example if a chess game has approximately35 legal moves at each stage of the game and we search to depth 8, then wewill consider close to 2.300 billion nodes (ie 358 ≈ 2.300.000.000.000) whereasAlpha-Beta may (in the best case) reduce the tree to around 1.5 million nodes(ie

√358 ≈ 1.500.000).

So it is very important to try and consider better moves before worse moves!One common approach is to run the algorithm at a smaller depth in order totry and determine which moves are likely to be best. We could for example runthe algorithm for depths 2, 3, 4, .... until we run out of time. But use the resultof depth i − 1 in order to decide which order to visit nodes when we run thealgorithm with depth i.

There are other more advanced algorithms that can also speed up the min-max tree, but Alpha-Beta pruning is the easiest to implement and very effective.It is used in most games involving two players, where there is perfect informationand no element of chance.

Bibliography

[Bir94] Peter J. Bird. LEO, the first business computer. Hasler Publishing,1994.

[CLR90] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. In-troduction to algorithms. MIT Press, 1990.

[HP96] John Hennessy and David A Patterson. Computer architecture, aquantitative approach. Morgan Kaufmann, second edition, 1996.

[Knu73] Donald Knuth. The art of computer programming. Volume 3 sortingand searching. Addison Wesley, 1973.

[NN96] Richard Neapolitan and Kumarss Naimipour. Foundations of algo-rithms. D. C. Heath and Co., 1996.

[TA86] Aaron M. Tenenbaum and Moshe J. Augenstein. Data structures usingPascal. Prentice Hall, second edition, 1986.

Appendix A

CS2860 questions

1. Write out the insertion sort algorithm which sorts arrays of integer recordsinto ascending order.

Count the number of record copy and compare operations required to sortthe array [2,1,3,4].

2. Write out the bubble sort algorithm given in class which sorts arrays ofinteger records into ascending order.

Show all the steps carried out when this algorithm is applied to the array[5,4,3,1,2] (draw the array after every iteration of the INNER loop).

3. Describe the binary search algorithm.

Give the worst case number of compare operations required to find agiven element in an array containing 128 elements using binary search,justifying your answer.

Give one reason why we might use

(a) binary search

(b) linear search

4. Write out the second version of the merge sort algorithm given in class,merge sort1, which uses a temporary array to hold values while the arrayis being sorted.

What is the worst case order of this algorithm?

Give an example of an array of four elements which requires the worstcase number of operations to sort it using merge sort1.

5. Write a quick sort routine which sorts arrays of integers into increasingorder, and whose pivot value is the value which is a third (1/3) of the wayalong the array.

6. Explain what is meant by a heap in the heap sort method discussed inclass.

Give a brief description of the sift algorithm, and explain how it is usedto create the heap sort algorithm.

94

Use the makeHeap algorithm to turn the array [3,4,6,1,2,4,7] into a heap,showing the array at each step.

7. On the same axes, sketch the graphs of log x, x, x log2 x, x2 and 2x.

Explain what is meant by an algorithm which has exponential order.

Discuss why algorithms of exponential order are unacceptable for imple-mentation on any computer.

8. Explain what is meant by the order of an algorithm, including in yourexplanation an explanation of O(f(n)), Ω(f(n)), Θ(f(n)) and o(f(n)).

What are the worst case, best case and average case orders of

(a) bubble sort

(b) merge sort 1

(c) quick sort

You are given an array of 100 integers, randomly sorted. Suppose that

you are given an integer n and told that there is a probability of1

250that the integer is in the array. Give the expected number of compareoperations carried out by a linear search algorithm which returns the first,left most, index which contains n or −1 if no such index exists. (Justifyyour answer.)

9. What is a binary search tree? Write out an algorithm for searching abinary search tree for a given key.

For the array [23,14,10,18,18,32,1,45,56] draw the binary search tree thatresults from adding the elements to the tree in strict left-to-right order.

10. What is a balanced binary tree? Give a definition of the balance of a treeand specify which balance values a balanced tree may exhibit.

With the aid of a diagram, describe the left and right rotation operationsused to maintain tree balance during insertion.

Write a sketch of an insertion routine for balanced binary trees, includinga description of the balancing algorithm and the special cases it has todeal with.

11. Briefly explain what is meant by

(a) a multiway search tree

(b) a B-tree

Describe the insertion algorithm for a 3, 2-tree.

Use the insertion algorithm to build a 3, 2-tree from the input [1, 7, 9, 4, 36, 5, 6, 8],where the elements are input in strict left-to-right order so 1 is input firstand 8 is input last, showing the trees constructed at each step.

95

12. Write out the naive string matching algorithm given in class, naive string matching

algorithm.

What is the worst case order of this algorithm?

How many compare operations is used if the text, T , has length 20 andthe pattern, P , has length 5.

13. Explain how the Rabin-Karp algorithm works.

In which cases in Rabin-Karp considerably faster than the naive stringmatching algorithm?

What is the worst case order of Rabin-Karp?

14. Define the prefix function used in the Knuth-Morris-Pratt algorithm.

What is the prefix function for the pattern ”aabaaabaab”.

Show how the Knuth-Morris-Pratt algorithm would run if we are look-ing for occurences of the pattern P = ”aabaaabaab” in the text T =”aabaaabaabaaabaab”.

15. Explain in words what the min-max algorithm is.

Explain in words what α− β-prunning is.

What is the main purpose of α− β-prunning?

Appendix B

Appendix B: Some useful inductive proofs

We will throughout the course be giving simple proofs using induction. Sohere we will describe what induction is, before giving a couple of examples ofinductive proofs.

If we want to prove that some statement holds for all integers n, then oneway of doing this is to show that it holds for n = 1, and n = 2, and n = 3, etc...However we would never finish! Therefore a better way is to show that it holdsfor n = 1 (or n = 0), and then show that if it holds for all smaller values of nthen it also holds for n, where n is arbitrary. If we have shown the above, thenclearly it holds for n = 1 (we have shown this), and for n = 2 (as we know itheld for 1 which is all smaller values than 2), and for n = 3 (as we know it heldfor 1 and 2, which are all smaller values than 3), etc. So we have shown that itmust hold for n = 1, 2, 3, 4, . . ., which was the desired result.

B.1 Examples

Theorem 1: 1 + 2 + . . .+ k = k(k+1)2

Proof, by induction on k: We consider the following steps.

Basis step, k = 1: 1 = 1×22 , so OK

Induction step: Let k > 1. Assume that the statement holds for all k′ ∈1, 2, . . . , k − 1. Prove that it holds for k.

1 + 2 + . . .+ k = (1 + 2 + . . .+ (k − 1)) + kby our assumption

= (k−1)k2 + k

= k2−k+2k2

= k2+k2

= k(k+1)2

So the statement holds for all k. Q.E.D.

Examples 97

Theorem 2: If W (n) = 2W (n2 ) + 3n − 1 and W (1) = 0, then W (n) =3nlog(n)− n+ 1, for all n = 2r, for some integer r.

Proof, by induction on r: We consider the following steps.

Basis step, r = 0 (ie n = 1): W (n) = 0 = 3× 1× 0− 1 + 1, so OK.

Induction step: Assume that the theorem is true for all r′, with 0 < r′ < r.

W (n) = 2W (n2 ) + 3n− 1= 2(3n

2 log(n2 )− n

2 + 1) + 3n − 1= 3n(log(n)− 1)− n+ 2 + 3n− 1= 3nlog(n)− n+ 1

The above holds as n/2 = 2r−1 and n = 2r, so we can use our inductivehypothesis on n/2, as r − 1 < r.

So the statement holds for all n. Q.E.D.

Documents

Algorithms and Complexity I