Con - Unicampmarco/cursos/ea876_08_1/referencias/oo… · Con ten ts Preface i 1 In tro duction 1.1 Go o d Programs. 1 2 Design Strategies 3 2.1 Design of classes. 3 2.2 Programming

Solving Problems: Obje t Oriented Design andAlgorithms in JavaJohn Morris1 Hanney Shabhan2O tober 20, 2002

1Centre for Intelligent Information Pro essing Systems, Department of Ele tri aland Ele troni Engineering, The University of Western Australia, Nedlands WA 6907,Australia. email: morris�ee.uwa.edu.au2North Virginia Community College, Va. email: xx�yy.nv .edu

ContentsPrefa e i1 Introdu tion 11.1 Good Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Design Strategies 32.1 Design of lasses . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Programming Languages . . . . . . . . . . . . . . . . . . . . . . . 52.3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.1 Operations or apabilities . . . . . . . . . . . . . . . . . . 62.3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3.3 Instan e Variables . . . . . . . . . . . . . . . . . . . . . . 72.3.4 Iterators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.5 Extending a lass - Inheritan e . . . . . . . . . . . . . . . 82.3.6 Inheritan e with overriding . . . . . . . . . . . . . . . . . 92.3.7 Interfa es . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.8 Destru tors . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Java Colle tion API . . . . . . . . . . . . . . . . . . . . . . . . . 102.4.1 Interfa es . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Colle tion . . . . . . . . . . . . . . . . . . . . . . . . . . . 11List extends Colle tion . . . . . . . . . . . . . . . . . . . . 11Set extends Colle tion . . . . . . . . . . . . . . . . . . . . 11SortedSet extends Set . . . . . . . . . . . . . . . . . . . . 11HashSet extends Colle tion . . . . . . . . . . . . . . . . . 12LinkedHashSet extends HashSet . . . . . . . . . . . . . . 12RandomA ess . . . . . . . . . . . . . . . . . . . . . . . . 122.4.2 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Abstra tColle tion implements Colle tion . . . . . . . . . 12Abstra tList extends Abstra tColle tion implements List 12Abstra tSequentialList extends Abstra tList . . . . . . . 13LinkedList extends Abstra tSequentialList . . . . . . . . . 13ArrayList extends Abstra tList implements RandomA ess 13Ve tor extends Abstra tList implements RandomA ess . 13Sta k extends Ve tor . . . . . . . . . . . . . . . . . . . . . 13Abstra tSet extends Abstra tColle tion implements Set . 13HashSet extends Abstra tSet implements Set . . . . . . . 13LinkedHashSet extends HashSet implements Set . . . . . 13TreeSet extends Abstra tSet implements SortedSet . . . . 133

4 CONTENTSMapping keys to values . . . . . . . . . . . . . . . . . . . 14Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14SortedMap . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.3 Map lasses . . . . . . . . . . . . . . . . . . . . . . . . . . 14Abstra tMap . . . . . . . . . . . . . . . . . . . . . . . . . 14HashMap . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Hashtable . . . . . . . . . . . . . . . . . . . . . . . . . . . 14IdentityHashMap . . . . . . . . . . . . . . . . . . . . . . . 14WeakHashMap . . . . . . . . . . . . . . . . . . . . . . . . 14TreeMap extends Abstra tMap implements SortedMap . . 153 Data Stru tures 173.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.1 What is an Array? . . . . . . . . . . . . . . . . . . . . . . 173.1.2 De laring Arrays in Java . . . . . . . . . . . . . . . . . . . 183.1.3 Stati Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 183.1.4 Dynami Arrays . . . . . . . . . . . . . . . . . . . . . . . 183.2 Implementation of the Colle tion lass . . . . . . . . . . . . . . 20Constru tor . . . . . . . . . . . . . . . . . . . . . . . . . . 21Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.1 Performan e . . . . . . . . . . . . . . . . . . . . . . . . . 22add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 ontains . . . . . . . . . . . . . . . . . . . . . . . . . . . 22remove . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.2 Dire t addressing . . . . . . . . . . . . . . . . . . . . . . . 233.2.3 Resour e Usage . . . . . . . . . . . . . . . . . . . . . . . . 233.2.4 Con lusion . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 Sear hing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3.1 Sequential Sear hes . . . . . . . . . . . . . . . . . . . . . . 253.3.2 A More EÆ ient Sear h . . . . . . . . . . . . . . . . . . . 253.3.3 Binary Sear h . . . . . . . . . . . . . . . . . . . . . . . . . 263.3.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.3.5 Implementation of the ontains method . . . . . . . . . 293.4 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.4.1 Linked lists . . . . . . . . . . . . . . . . . . . . . . . . . . 31Handle for the list . . . . . . . . . . . . . . . . . . . . . . 31Adding to a list . . . . . . . . . . . . . . . . . . . . . . . . 323.4.2 List variants . . . . . . . . . . . . . . . . . . . . . . . . . 34Cir ularly Linked Lists . . . . . . . . . . . . . . . . . . . . 34Doubly Linked Lists . . . . . . . . . . . . . . . . . . . . . 34Lists in arrays . . . . . . . . . . . . . . . . . . . . . . . . 353.5 Sta ks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.5.1 Sta k Frames . . . . . . . . . . . . . . . . . . . . . . . . . 373.6 Re ursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.6.1 Re ursive fun tions . . . . . . . . . . . . . . . . . . . . . . 38Example: Fa torial . . . . . . . . . . . . . . . . . . . . . . 39

CONTENTS 54 Trees 434.1 Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Data Stru ture . . . . . . . . . . . . . . . . . . . . . . . . 444.1.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Complete Trees . . . . . . . . . . . . . . . . . . . . . . . . 464.1.2 General binary trees . . . . . . . . . . . . . . . . . . . . . 475 Complexity 495.1 The O() notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.1.1 Properties of the O() notation . . . . . . . . . . . . . . . 505.2 Polynomial and Intra table Algorithms . . . . . . . . . . . . . . . 515.2.1 Polynomial time omplexity . . . . . . . . . . . . . . . . . 515.2.2 Intra table problems . . . . . . . . . . . . . . . . . . . . . 515.3 Analysing an algorithm . . . . . . . . . . . . . . . . . . . . . . . 515.3.1 Simple Statement Sequen e . . . . . . . . . . . . . . . . . 515.3.2 Simple Loops . . . . . . . . . . . . . . . . . . . . . . . . . 516 Queues 536.1 Performan e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.1.1 Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . 54Aside . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.2 Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.2.1 Heap Properties . . . . . . . . . . . . . . . . . . . . . . . 55Extra ting the highest priority item . . . . . . . . . . . . 556.2.2 Addition to a heap . . . . . . . . . . . . . . . . . . . . . . 56Storage of omplete trees . . . . . . . . . . . . . . . . . . 56Animation - Heap Insertion . . . . . . . . . . . . . . . . . 607 Sorting 637.1 Bubble, Sele tion, Insertion Sorts . . . . . . . . . . . . . . . . . . 63Animation - Insertion Sort . . . . . . . . . . . . . . . . . 647.1.1 Bubble Sort . . . . . . . . . . . . . . . . . . . . . . . . . . 647.1.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657.2 Heap Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Animation - Heap Sort . . . . . . . . . . . . . . . . . . . . 667.3 Qui k Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Animation - Qui k sort . . . . . . . . . . . . . . . . . . . 677.3.1 Partition in pla e . . . . . . . . . . . . . . . . . . . . . . . 677.3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Animation - Qui k sort . . . . . . . . . . . . . . . . . . . 687.3.3 Qui k sort - The Fa ts! . . . . . . . . . . . . . . . . . . . 68Median-of-3 Pivot . . . . . . . . . . . . . . . . . . . . . . 69Random pivot . . . . . . . . . . . . . . . . . . . . . . . . 69A Qui ker Qui k Sort . . . . . . . . . . . . . . . . . . . . 708 Bin Sort 718.1 Bin Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718.1.1 Constraints on bin sort . . . . . . . . . . . . . . . . . . . 71Animation - Bin Sort . . . . . . . . . . . . . . . . . . . . 738.2 Radix Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6 CONTENTS8.2.1 Generalised Radix Sorting . . . . . . . . . . . . . . . . . . 74Animation - Radix Sort . . . . . . . . . . . . . . . . . . . 759 Sear hing Revisited 779.1 Red-bla k Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779.1.1 De�nition of a red-bla k tree . . . . . . . . . . . . . . . . 779.1.2 Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . 789.1.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799.1.4 Red-Bla k Tree Operation . . . . . . . . . . . . . . . . . . 819.1.5 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Animation - Red-bla k tree . . . . . . . . . . . . . . . . . 829.1.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829.2 AVL Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849.2.1 De�nition of an AVL tree . . . . . . . . . . . . . . . . . . 849.2.2 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849.2.3 General n-ary trees . . . . . . . . . . . . . . . . . . . . . . 869.3 Hash Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889.3.1 Dire t Address Tables . . . . . . . . . . . . . . . . . . . . 889.3.2 Mapping fun tions . . . . . . . . . . . . . . . . . . . . . . 889.3.3 Handling the ollisions . . . . . . . . . . . . . . . . . . . . 899.3.4 Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . 899.3.5 Re-hashing . . . . . . . . . . . . . . . . . . . . . . . . . . 909.3.6 Linear probing . . . . . . . . . . . . . . . . . . . . . . . . 909.3.7 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 909.3.8 Quadrati Probing . . . . . . . . . . . . . . . . . . . . . . 919.3.9 Over ow area . . . . . . . . . . . . . . . . . . . . . . . . . 919.3.10 Summary - Hash Table Organization . . . . . . . . . . . . 92Animation - hash table . . . . . . . . . . . . . . . . . . . 929.3.11 Hashing Fun tions . . . . . . . . . . . . . . . . . . . . . . 929.3.12 Mapping keys to natural numbers . . . . . . . . . . . . . 9210 Dynami Algorithms 9510.1 Fibona i Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 9510.1.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9510.1.2 An Iterative Solution . . . . . . . . . . . . . . . . . . . . . 9610.1.3 Free Lun h? . . . . . . . . . . . . . . . . . . . . . . . . . . 9610.2 Binomial CoeÆ ients . . . . . . . . . . . . . . . . . . . . . . . . . 9610.3 Optimal Binary Sear h Trees . . . . . . . . . . . . . . . . . . . . 98Animation - Optimal binary sear h tree . . . . . . . . . . 10110.3.1 Spa e omplexity . . . . . . . . . . . . . . . . . . . . . . . 10110.3.2 Optimal sub-stru ture . . . . . . . . . . . . . . . . . . . . 10111 The Rest 103

Prefa eAnimations The animations referred to in this text will eventually be avail-able on a CD a ompanying it. Currently, you an run them all by loading thispage and jumping to the appropriate entry:http:// iips.ee.uwa.edu.au/~morris/Year2/PLDS210/alg_anim.htmlTypographi al onventions Key terms are highlighted in the text whenthey �rst appear:In studying algorithms, we are usually on erned with the time omplexity of an algorithm, but sometimes we will onsider itsspa e omplexity.These highlighted terms mostly represent key on epts whi h a professionalsoftware engineer should know well. Most of these key terms have been olle tedat the end of ea h hapter as a summary of important on epts introdu ed inthat hapter.Se tions of a tual ode are printed in a Courier font:// This program may be opied verbatim// into a text editorpubli lass E ho {publi stati void main( String arg[℄ ) {for( int j=0;j<arg.length;j++ ) {System.out.print( arg[j℄ );}System.out.println();}}Similarly, referen es to a tual names with program fragments in the body of thetext use the same font:The E ho program just prints out a program's arguments from themain method.Sometimes algorithms are represented in pseudo- ode, a semi-formal notationwhi h does not follow synta ti al onventions for any parti ular language, butwhi h is designed to be readily onverted into any language:// Pseudo- ode representation of the fa torial// fun tionfa torial( n ) = if ( n = 0 ) then 1else n * fa torial( n-1 )i

ii PREFACE

Chapter 1Introdu tionIn this text, we introdu e you to the tools whi h will enable you to solve realworld problems with omputer programs. It assumes that you� know the basi s of programming,� an write, debug and run simple programs and� have some simple understanding of obje t-oriented design.The last point is the least important, as, in hapter 2, we review the datamodelling approa h for writing obje t oriented programs. In this approa h,obje ts in the real world of the problem you wish to solve dire tly orrespondto lasses in the program you write to solve that problem.1.1 Good ProgramsThere are a number of fa ets to goodprograms: they must� run orre tly� run eÆ iently� be easy to read and understand� be easy to debug and� be easy to modify.What does orre t mean?We need to have some formal no-tion of the meaning of orre t:thus we de�ne it to mean "runin a ordan e with the spe i� a-tions".The �rst of these is obvious - programs whi h don't run orre tly are learlyof little use. `EÆ iently' is usually understood to mean in the minimum time- but o asionally there will be other onstraints, su h as memory use, whi hwill be paramount. As will be demonstrated later, better running times willgenerally be obtained from use of the most appropriate data stru tures andalgorithms, rather than through ha king, e.g. removing a few statements bysome lever oding - or even worse, programming in assembler! This ourse willfo us on solving problems eÆ iently: you will be introdu ed to a number offundamental data stru tures and algorithms (or pro edures) for manipulating1

2 CHAPTER 1. INTRODUCTIONthem The importan e of the other points is less obvious. The early history ofmany omputer installations is, however, testimony to their importan e. Manystudies have quanti�ed the enormous osts of failing to build software systemsthat had all the hara teristi s listed. A lassi referen e is Boehm's text1.Unfortunately, mu h re ent eviden e suggests that these prin iples are still notwell understood! Any perusal of the ACM Software Engineering Notes RisksForum2 will soon onvin e you that there is an enormous amount of poor soft-ware in use. The dis ipline of software engineering is on erned with buildinglarge software systems whi h perform as their users expe ted, are reliable andeasy to maintain. This text will introdu e some software engineering prin i-ples but we will on entrate on the reation of small programs only. By usingwell-known, eÆ ient te hniques for solving problems, not only do you produ e orre t and fast programs in the minimum time, but you make your programseasier to modify. Another software engineer will �nd it mu h simpler to workwith a well-known solution than something that has been ha ked together and"looks a bit like" some textbook algorithm.Key Terms orre tA orre t program runs in a ordan e with its spe i� ations.algorithmA pre isely spe i�ed pro edure for solving a problem.ha kingProdu ing a omputer program rapidly, without thought and withoutany design methodology.

1B J Boehm, ..2P Xx, ACM Software Engineering Notes, Risks Forum, appearing in most issues of SENfrom 19xx to 19xx.

Chapter 2Design StrategiesIn this hapter, we brie y review some basi software design strategies, fo ussingon obje t oriented design. Whilst it is possible to ` obble together' almost anysort of ode to make e�e tive small programs, any program of non-trivial sizeneeds to be arefully designed. The design of large software systems is thedomain of software engineering texts: here we will fo us on the design of smallsoftware modules or lasses (sometimes alled omponents). Before we look atthe way a lass is written in some omputer language, we need to examine thepro ess by whi h we build1 a lass.2.1 Design of lassesComputer programs model things: your bank's entral omputer models the oldpaper ledgers that the bank used before omputers were invented; the weatherbureau's omputer models the atmosphere and the land and sea masses ad-joining it; a `Dungeons and Dragons' game program models a virtual world ofdungeons and dragons. Within ea h large model, there are generally smallermodels: the bank's program has models for individual a ounts and transa -tions whi h are applied to those a ounts; the weather bureau's program hasmodels for louds, storms, land masses, et . and the game has models for dun-geons, dragons, weapons, et . A large program models the behaviour of somelarge system ontaining many smaller obje ts (whi h may themselves be om-posed of other obje ts and so on). Within the large program, individual software lasses model the behaviour of groups of obje ts in the real (or virtual) worldof the program. By adopting this modeling viewpoint, it be omes obvious howthe software program should be designed: the �rst task is to identify obje tswithin the world that the whole program is attempting to model. These obje tsare potential software lasses. However, before any individual lass is designed,a software engineer will attempt to group obje ts in the program's world into lasses of obje ts with similar properties or behaviours. As an example of a lass, onsider ontainers for obje ts. Firstly, we need to onsider the behaviourof ontainers: what an they do? Let's list some of the things that we mightexpe t to be able to do with a ontainer:1We use the term 'build' ons iously to indi ate that onstru tion of a lass should follow areful design and not be an ad ho pro ess. 3

4 CHAPTER 2. DESIGN STRATEGIES� Build or onstru t a new one� Destroy one� Add an obje t to it� Remove an obje t from it or� Find an obje t in it satisfying some property.These are the primary behaviours or operations that an be performed on simple ontainers and will be the basis for our example. Of ourse, there are otherthings that we might like to do with ontainers: merge two ontainers together,divide one into two smaller ontainers, et . You will �nd that, generally, ifyou have produ ed a good basi design, it is straightforward to add additional apabilities to the software whi h models our ontainers. We an also imagineseveral variants of our basi ontainer:(i) A Set that requires ea h item in it to beunique,(ii) A Bag that may ontain dupli ate items,(iii) An Ordered Colle tion in whi h the items are stored in orderso that it may be sear hed eÆ iently,(iv) A Priority Queue from whi h an item with the highestpriority may be immediately extra ted,et .These are spe ializations of a basi olle tion: all will have the �ve apabil-ities listed above, but may have additional properties or hara teristi s. Forexample, the add operation in variants (i) and (ii) behaves di�erently when anattempt is made to add an item that is a dupli ate of one that already existsin the ontainer. Variant (iv) will have an additional `remove highest priorityitem' apability absent from the others. The spe ializations are said to inheritthe �ve basi apabilities from a parent lass. Thus we an de�ne a hierar hy of lasses in whi h all the hildren or spe ializations inherit some basi apabilitiesfrom a parent and add new apabilities or variations of basi ones. In our on-tainers example, if we all the basi ontainer a Colle tion, then we might have hild or derived lasses alled Set, Bag, OrderedColle tion and PriorityQueue.The apabilities or operations whi h may be performed on a lass are en apsu-lated inmethods (Java) ormember fun tions (C++) of the lass. Thus ourColle tion lass would have methods alled:Colle tion Constru tor whi h onstru ts new obje ts of a lass.�Colle tion Destru tor whi h destroys obje ts when they are nolonger needed. Note the naming onvention followsC++ here: in Java, destru tors are not needed.add Add a new item to the olle tion.remove Remove a spe i�ed item from the olle tion. ontains Does the olle tion ontain a parti ular item?The OrderedColle tion lass might2 have an additional method:next Return the next item in the olle iton2As we will see, the designers of Java's Colle tion API have adopted a slightly di�erentapproa h - using an Iterator to provide this apability.

2.2. PROGRAMMING LANGUAGES 5and the PriorityQueue one might haveremoveHighestPriority Extra t the highest priority item2.2 Programming LanguagesA software engineer views a programming language as a tool to solve a problem.Some languages have ex ellent apabilities for parti ular types of problem: Pro-log is good for logi problems whi h must handle large numbers of rules, SQLhandles database transa tions well, VHDL has fa ilities whi h allow it to beused for ele troni ir uit design, C an be used e�e tively on real time systemson small mi ropro essors, et . Of ourse, some languages have multiple domainsof appli ability: C is also good for omputationally intensive problems in whi hevery y le of the omputer must be put to good use. There is a group of lan-guages whi h an implement obje t oriented designs well: C++, Java, Ei�eland Ada 95 are the most prominent. In this text, we will use Java for all the ode examples (although translation of them to any of the other obje t-orientedlanguages is generally straightforward). Thus the next se tions will review theway in whi h obje t oriented designs are implemented in Java. We start byshowing how a lass is written.2.3 ClassesClasses are the elementary building blo ks of any Java program. Before startingto implement a (large) software proje t, you should sit down and list all theimportant obje ts in the system that the software models. Ea h of these obje tsis likely to be a andidate for a lass. Having identi�ed all the key obje ts, youshould determine how ea h obje t behaves or what fun tions it an perform.This determines the interfa e3 of the obje t, e.g. the way it an be used. Inthis pro ess, you should also determine how the di�erent lasses intera t withea h other. Finally, you an start to think how to implement the lass. A lassde laration, whi h des ribes the lass, may ontain:� Constru tors: fun tions to reate new obje ts of the lass� Destru tor: a fun tion to delete obje ts of the lass� Methods: fun tions that an be applied to an obje t of the lass� Instan e variables: variables whose values apture the properties or har-a teristi s of an obje t of the lass.Ea h of these elements is dis ussed in more detail below. We will illustrate the on epts with the implementation of a simple olle tion lass. This example isillustrative, we will use it to design a simple lass. The lass will not be veryversatile and if you need a olle tion lass with a full set of apabilities, youshould use one from the standard API.3Note that here we use the term interfa e in its more general sense rather than the morespe i� sense that it has in Java.

6 CHAPTER 2. DESIGN STRATEGIES2.3.1 Operations or apabilitiesColle tions are ontainers for obje ts: we want to be able to manipulate these olle tions in various ways:� reate a new (empty) olle tion,� add obje ts to the olle tion,� remove obje ts from the olle tion and� determine whether a olle tion ontains an obje t.There are other ways in whi h you may wish to manipulate olle tions: e.g. reate a opy of a olle tion, merge two olle tions together, et .Exer ise:Make a list of other operations that you may like to perform on a olle tion.On e we have de ided the basi operations that are needed, we need to de�ne aformal interfa e for the lass. This is a set of the names, arguments and returnvalues for ea h operation. Let's name our olle tion, SimpleColle tion. Thisname is arefully hosen to distinguish it from the Colle tion interfa e in thestandard Java API. As we extend the SimpleColle tion, it will approa h the(extensive) apabilities of the olle tions4 de�ned in the standard API.2.3.2 MethodsFor our �rst basi interfa e, we spe ify a method for ea h of the apabilities wehave listed above:publi lass SimpleColle tion {publi SimpleColle tion( int max_items ) { ... }publi boolean add( Obje t a ) { ... }publi boolean remove( Obje t a ) { ... }publi boolean ontains( Obje t a ) { ... }}The implementation details are deliberately deferred: in fa t, in the se tions fol-lowing, we will study a number of possible implementations. This will illustratethe proper separation between the interfa e presented here - whi h representsthe general or abstra t properties that we require our olle tion to possess - andany a tual implementation of those properties. When implementing any lass,you should follow the same approa h in the design of the lass. Start with theinterfa e - whi h de�nes the behaviour of obje ts of the lass or the apabilitiesthey should have.4The standard API Colle tion is a tually a Java interfa e: thus it spe i�es the methodsfor a set of related lasses, see Se tion 0. As we develop the SimpleColle tion, we will addmore features of the standard Colle tion.

2.3. CLASSES 72.3.3 Instan e VariablesThe values of instan e variables represent the hara teristi s or properties of anindividual obje t. Whilst all obje ts of the same lass have ommon apabilitiesor operations (represented by the methods that may be invoked on individualobje ts) and a set of instan e variables with the same names and types, the val-ues of the instan e variables may be di�erent, giving ea h obje t its individual hara teristi s or properties. In general, instan e variables should be de laredprivate: this hides them from ode outside the lass - implementing an infor-mation hiding poli y - and leads to more robust, easily maintained ode. Inour SimpleColle tion, the a tual details of the instan e variables to be usedfor the various possible olle tions are des ribed in the following hapters, herewe will give a simple example whi h uses an array to store obje ts.publi lass SimpleColle tion {private int ount;private Obje t items[℄;publi SimpleColle tion( int max_items ) { ... }publi boolean add( Obje t a ) { ... }publi boolean remove( Obje t a ) { ... }publi boolean ontains( Obje t a ) { ... }}We have added two instan e variables - the integer, ount, and the array ofObje ts, items - as private variables to the lass. These allow us to implementa olle tion in whi h items are stored in an array: see the next hapter for details.However, we emphasize again that the instan e variables should be hidden fromusers of the lass: the users are only on erned with the behaviour representedby the presen e of methods whi h perform allowed operations - not on the detailsof the implementation. As we will see, there are many ways of implementing abasi olle tion - although they di�er in performan e and resour e requirements,they all o�er the basi apabilities of a olle tion.2.3.4 IteratorsAll the implementations of olle tions in the standard API provide iterators:this is guaranteed by the appearan e of theIterator iterator()method in the Colle tion interfa e whi h returns an iterator whi h will stepthrough a olle tion. The use of iterators may be best illustrated by an example.Suppose we wish to have a olle tion from whi h we an produ e a (rudimentary,i.e. not ne essarily well formatted!) report. We an do this by extending anyone of the Colle tion lasses:

8 CHAPTER 2. DESIGN STRATEGIESpubli lass ReportableColle tionextends SimpleColle tion {publi AColle tion() { super(); }publi makeReport( PrintWriter pr ) {Iterator it = this.iterator();while ( it.hasNext() ) {Colle tion a = it.next();pr.println( a.toString() );}}...}In line 4, we obtain an iterator by invoking the iterator()method (pres ribedin the Colle tion interfa e) on this. Then in the while loop, lines 5-8, werepeatedly invoke the iterator's next() method to return the next obje t in the olle tion, as long as hasNext() returns true.2.3.5 Extending a lass - Inheritan eNote that our SimpleColle tion's add, remove and ontains methods takean Obje t as an argument. This means that any type of obje t may be addedto a SimpleColle tion: Obje t is the ultimate an estor of all Java obje ts. ASimpleColle tion may ontain obje ts that annot be arranged in any order:for example, in many appli ations, it may not make sense to arrange olours inorder. Thus a program modelling the behaviour of a olle tion of oloured ballsmay simply use the olours to identify di�erent balls - the balls being otherwiseidenti al. Thus there is no need to arrange them in order. However, if the ballsall had di�erent masses, then there is a natural way to order them and we maywish to exploit this order. Sin e the items in the olle tion now have additional apabilities: the ability to sort them and extra t them in order, we model thisby extending the SimpleColle tion lass:publi lass OrderedColle tionextends SimpleColle tion {publi OrderedColle tion( int max_items ) { ... }publi void sort() { ... }publi boolean isSorted() { ... }publi Obje t removeFirst() { ... }}Note: The new lass has only two methods: be ause it extends Simple-Colle tion, it automati ally inherits the methods of SimpleColle tion andthese methods do not need to be implemented again in OrderedColle tion.We ontinue to say nothing about the implementation details: there is no spe -i� ation of the way in whi h Java will store the ordered obje ts. We we will seeseveral ways of doing this in subsequent hapters. In fa t, we will devote on-siderable time to dis ussing the tradeo�s asso iated with various strategies fororganising the obje ts in ordered olle tions. This is the information hidingprin iple: the user of our OrderedColle tion lass should not be on ernedwith how the implementor has de ided to implement the lass - there are manyways whi h will produ e the desired behaviour.

2.3. CLASSES 92.3.6 Inheritan e with overridingIt is easy to imagine that our OrderedColle tion might be more eÆ ient ifwe sorted the items as we added them. Card players usually do this as theypi k up the ards in their hand; they insert them in the orre t pla e in thehand to make it easier to assess the value of the hand and de ide strategieslater. (We will examine this type of sorting - alled insertion sorting - later,see Chapter 7.1.) In the SimpleColle tion, the add method an add a newitem to any part of the olle tion - to the beginning, the end or any pla e in themiddle. However, in the OrderedColle tion, we ould de ide to override theinherited method and supply one more appropriate to the sorted olle tion, sothe spe i� ation be omes:publi lass OrderedColle tionextends SimpleColle tion {publi OrderedColle tion( int max_items ) { ... }/* Override add of SimpleColle tion to insert itemsinto their proper pla e in the olle tion */publi boolean add( Obje t a ) { ... }/* Override remove of SimpleColle tion to rearrangeitems as ne essary following the deletion */publi boolean remove( Obje t a ) { ... }publi Obje t removeFirst() { ... }}Note that in most situations, the removemethod will also need to be overriddenin order to ensure that the gap left by the deleted item is losed properly topresent a ontinuously ordered olle tion. As the sort and isSorted methodsare now redundant, we omit them.2.3.7 Interfa esA lass whi h inherits from another spe ializes the parent - by adding eithermethods or attributes. Thus obje ts of the spe ialization (or inheriting) lasshave all the attributes of obje ts of the parent lass and (optionally) some ad-ditional ones. However, as we will see in later hapters, we need di�erent stru -tures - arrays, lists, trees, hash tables, et . - to a hieve the various apabilitiesthat we need - simpli ity, speed, guaranteed performan e, et .. Thus we an'teasily make a single hierar hy in whi h all the lasses inherit attributes andmethods from their parents. However, all olle tions - whatever stru tures areused for storing their items - have ertain ommon apabilities: add, remove, ontains, et .. Java handles this situation by allowing you to spe ify an inter-fa e. An interfa e des ribes the behaviour that is required of all lasses whi himplement the interfa e. Thus it is a set of method spe i� ations. Rememberthat you spe ify the behaviour of obje ts of a lass by de�ning the operationsthat an be performed on obje ts as spe i� ations for methods that an beapplied to obje ts of the lass. Thus in the standard Java API, Colle tion isan interfa e whi h spe i�es the ommon behaviour of all olle tions. In Java,

10 CHAPTER 2. DESIGN STRATEGIESinterfa es may also be spe ialized, by extending the interfa e with additionalmethod spe i� ations. The standard API uses this ability to de�ne in reasinglyspe ialized families of lasses:Colle tion the ultimate parent of them allList a spe ialization of Colle tionSet a spe ialization of Colle tionSortedSet a spe ialization of SetIn the following hapters, we will dis uss the implementation details of several olle tions based on this subset of the standard API's Colle tion interfa e.De�ning this simple interfa e allows us to build omplete, fun tioning lasseswhi h server to illustrate the use of the stru tures dis ussed in ea h hapterwithout implementing all the methods in the Colle tion interfa e. For seriousappli ations, we would generally re ommend using the lasses supplied withyour Java system rather than 're-invent wheels' - and possibly introdu e subtlebugs.publi interfa e SimpleColle tion {publi SimpleColle tion( int max_items );publi boolean add( Obje t a );publi boolean remove( Obje t a );publi boolean ontains( Obje t a );int size();}For simpli ity, implementations of SimpleColle tionmay also deviate slightlyfrom the behaviour spe i�ed for Colle tion's in some instan es, e.g. we willbe ontent to return false from an unsu essful add operation rather thanthrowing an ex eption as spe i�ed in the standard do umentation.2.3.8 Destru torsUnlike C++ and some other obje t oriented languages, Java does not need ex-pli it destru tors. An independent thread, the garbage olle tor, is started upwith every running Java program. The garbage olle tor re laims obje ts whi hare no longer in use. This is a major bene�t for programmers, it is no longerne essary to keep tra k of the 'life' of obje ts and delete them when they areno longer needed: the garbage olle tor takes are of this and removes a majorsour e of programming errors. However, it has some performan e impli ationsfor Java programs: refer to a spe ialist Java texts for dis ussions of this issue.2.4 Java Colle tion APIJava has an extensive group of interfa es and lasses whi h implement olle tionsin a variety of forms. Later hapters will dis uss the implementation of the mostimportant members of this group. We will show ways to implement simpli�edversions of these lasses and explain the di�erent apabilities that they provide.We will also dis uss the performan e of these implementations. Here we providean overview of the whole group of interfa es and lasses provided in Java Version2. In ea h se tion, we will point out whi h of the standard lasses orresponds to

2.4. JAVA COLLECTION API 11the simpli�ed example we dis uss. The simpli�ed examples are designed to beillustrative only, we would re ommend programmers to use the standard lassesfor produ tion software5 .2.4.1 Interfa esCollection

List Set

SortedSet

Colle tion is an interfa e spe ifying a basi set ofmethods whi h must be implemented by all olle -tions. Three spe ialiazation interfa es are derivedfrom it: List, Set and SortedSet.Interfa es only de�ne methods: there must be someimplementing lasses. The hierar hy of olle tion lasses is shown in Figure 2.1. Note that some ofthese lasses are abstra t lasses: they are not fullimplementations of the interfa es that they imple-ment, so there must be a on rete lass whi h imple-ments the remaining methods.Colle tionThe basi interfa e whi h ontains methods that a simple ontainer lass shouldprovide. However no assumption is made about hara teristi s or stru ture ofthe ontainer.List extends Colle tionAs its name implies, items in a list are ordered in some way - usually basedon the order in whi h they were added to the list. This ordering should be ontrasted with an ordering determined by omparing items in the list basedon some key for ea h item. Lists provide iterators, whi h are lasses asso iatedwith a list that provide methods enabling you to traverse a list in order.Set extends Colle tionSets are olle tions with no dupli ate items in them: they model mathemati alsets. The Set interfa e provides for an iterator whi h will return all the elementsof the set one by one, but it is not required to return the elements of the set inany parti ular order.SortedSet extends SetElements in a SortedSet have an ordering de�ned. The iterator for a SortedSetreturns the elements based on this ordering. A Comparator (an obje t imple-menting the Comparator interfa e) returns the relative order of any two elementsof the SortedSet.5This re ommendation is based on the assumption that the lasses provided by any parti -ular Java implementation are likely to be arefully written and thoroughly tested. We trustthat suppliers of Java systems generally take suÆ ient are in their implementations that thisassumption is a valid one!

12 CHAPTER 2. DESIGN STRATEGIESimp List

AbstractListimp Set

AbstractSet

imp CollectionAbstractCollection

AbstractSequentialList ArrayListimp RandomAccess

Vectorimp RandomAccess

HashSet TreeSet

LinkedList

imp SortedSet

LinkedHashSetFigure 2.1: Java's Colle tion lasses, i.e. the lasses whi h implement theColle tion interfa e.HashSet extends Colle tionHashSet's are based on hash tables (see Chapter 9.3). As a onsequen e, theorder in whi h the iterator returns elements is not de�ned and may hange asthe omposition of the HashSet hanges. This behaviour will be ome lear on ehash tables have been examined.LinkedHashSet extends HashSetThis variant of the HashSet returns an iterator that returns items from the setin a predi table order.RandomA essThis interfa e is a tag or marker interfa e; it has no methods, it simply serves to`tag' lasses that provide the apability to eÆ iently randomly a ess individualelements.2.4.2 ClassesA hierar hy of lasses implement the interfa es derived from Colle tion. Gen-erally, an abstra t lass heads ea h group: the abstra t lass implements someof the methods required by the interfa es redu ing the e�ort needed to fullyimplement the interfa e.Abstra tColle tion implements Colle tionSkeletal implementation of the Colle tion interfa e6 designed to redu e thee�ort needed to build a full implementation of the interfa e.Abstra tList extends Abstra tColle tion implements ListAbstra tList is a skeletal implementation of the List interfa e whi h is fullyimplemented in ArrayList and Ve tor.6Note that this and the following lasses also implement other interfa es ( ommonlyCloneable and Serializable) whi h have been omitted here in order to fo us on theColle tion related interfa es.

2.4. JAVA COLLECTION API 13Abstra tSequentialList extends Abstra tListSkeletal implementation of a sequential list - designed to be implemented as alinked list (see Chapter 3.4.1). LinkedList ompletes the implementation.LinkedList extends Abstra tSequentialListLinked lists are an important data stru ture providing fast addition and a essto the ends of the list.ArrayList extends Abstra tList implements RandomA essArrays provide fast a ess to randomly hosen elements: this lass provides thebasi list apabilities plus eÆ ient a ess to random elements of the list.Ve tor extends Abstra tList implements RandomA essVe tor provides an extensible array with dire t a ess to any element of thearray.Sta k extends Ve torThe �rst element extra ted from a sta k (see Se tion 3.5) is always the last oneadded to it. This is known as Last-In-First-Out (LIFO) behaviour.Abstra tSet extends Abstra tColle tion implements SetSkeletal implementation of the Set interfa e designed to simplify the task ofimplementing HashSet and TreeSet - the two standard lasses derived fromAbstra tSet - as well as user lasses with spe ial apabilities.HashSet extends Abstra tSet implements SetAn implementation of a hash table (see Chapter 9.3). On average, hash tablesprovide very fast a ess to the items stored in them, but do not store the itemsin order.LinkedHashSet extends HashSet implements SetA hash table that remembers the order in whi h items where inserted into the olle tion: the iterator returns the items in this order.TreeSet extends Abstra tSet implements SortedSetThis lass implements a sorted set using a tree stru ture (see Se tion 4). Treestru tures enable O(logn) times to be guaranteed for add, remove and sear hoperations.

14 CHAPTER 2. DESIGN STRATEGIESMapping keys to valuesThe idea of a key whi h identi�es an item is represented by the Map interfa eand its extensions. We may say that a key maps to a `value', whi h may be somefun tion (usually omposition or on atenation) of the values of the individualelements of any obje t.MapA map may not ontain dupli ate keys, i.e. ea h key must be unique. The orderof items in a map may be de�ned by the ordering of the keys (as in a TreeMap)or not (as in a HashMap).SortedMapA Map that guarantees that the items ontained in it will appear a ording tothe natural ordering of its keys.2.4.3 Map lassesAbstra tMapThe usual skeletal implementation of the Map interfa e to aid implementationof spe i� Maps.HashMapHashMap and Hashtable are very similar: they provide implementations of hashtables (see Se tion 9.3). HashMap is not syn hronized and permits null valuesand null keys. Hash tables an provide very fast average a ess times - oftenrequiring only a single omparison.HashtableHashtable is similar to HashMap (qv).IdentityHashMapIn an IdentityHashMap, keys must be the same obje t (their referen es must beequal) to be onsidered equal, i.e. the test is if( key1 == key2 ) ... In othersituations, keys are tested with their equals method (i.e. if( key1.equals(key2 ) ) ...).WeakHashMapA hash table from whi h items may be automati ally dis arded (destroyed bythe garbage olle tor) if their keys are no longer in use.

2.4. JAVA COLLECTION API 15TreeMap extends Abstra tMap implements SortedMapAn implementation of a red-bla k tree (see Chapter 9.1).Key TermsSoftware EngineeringThe dis ipline of building large software systems - the opposite of ha king(qv).MethodsFun tions whi h operate on obje ts of a lass. This term is used in Javaand several other obje t oriented languages, in C++, the term memberfun tions is usually preferred.Member fun tionsThe term used in C++ to des ribe the fun tions whi h operate on obje tsof a lass - synonymous with methods.Constru torA spe ial method whi h onstru ts new obje ts of a lass.Destru torA method whi h destroys obje ts when they are no longer needed: itsprimary purpose is to release the resour es (prin ipally memory) usedby an obje t for use by other obje ts.Software EngineeringThe dis ipline of building large software systems - the opposite of ha king((qv))Interfa eIn a general sense, the publi methods of a target lass whi h are ableto be invoked by a lass whi h manipulates obje ts of the target lass.In Java, interfa e is a keyword and the formal term for an interfa espe i� ation for a group of methods whi h must be implemented byimplementing lass.Information HidingA software design strategy whi h exposes only needed information tothe user of a lass. Information whi h the user doesn't need is hidden:in Java, marking variables and methods private prevents other lassesfrom a essing them dire tly or `hides' them.Garbage Colle torA pro ess whi h examines the memory used by a running program anddetermines whi h obje ts are no longer needed so that the memory theyuse may be re laimed.Tag or Marker Interfa eJava interfa e that serves to `tag' or 'mark' a lass as belonging to a groupof lasses whi h may be used in the same way or have some ommonproperties; it ontains no methods.IteratorA lass that enables a program to s an (or iterate) through all the itemsin a olle tion; it keeps tra k of the urrent lo ation within the olle tionand ensures that all items are eventually returned.

16 CHAPTER 2. DESIGN STRATEGIES

Chapter 3Data Stru turesIn this hapter, we will examine some fundamental data stru tures: arrays, lists,sta ks and trees.3.1 Arrays3.1.1 What is an Array?An array is a fundamental data stru ture used to store obje ts of a parti ulartype in ontiguous memory lo ations. Figure 3.1 illustrates an array of 10 32-bitobje ts lo ated in a segment of word-aligned, byte-addressable memory startingat memory address 400.You need three pie es of information to gain a ess to an array element:� The starting address of the array or base address.� The array element number or index of the element you wish to a ess.� The size, in bytes, of a single obje t of the data type (or lass) stored inthe array. This size, along with the index, will be used to al ulate theaddress of an individual element in the array from the address of the �rstelement using this formula:addressofarrayelement = baseaddress+ (obje tsize � arrayindex)Here is an example: suppose you wanted to a ess the third element of the10 element array depi ted in Figure 3.1. The address of the �rst element (thearray's base address) is 400. The data type size is 32 bits or 4 bytes. But whatabout the array index? In Java, following the onvention in C, the index of the�rst element of an array is 0. This is onsistent with the formula above. So thethird element has an index of 2 and its address will be400 + (4 � 2) = 408In general, to a ess any ith element of a Java array you'll use an index of i-1.Figure 3.1: Layout of an array in memory17

18 CHAPTER 3. DATA STRUCTURES3.1.2 De laring Arrays in JavaThere are two ways to de lare arrays in Java: stati ally and dynami ally. Themethod you hoose in your program will depend on how you intend to use thearray.3.1.3 Stati ArraysAs an example, we will build a simple lass modelling a histogram. Re all thata histogram stores the ounts (or frequen ies) of various events or obje ts. Wewill use an array inside the Histogram lass to store these ounts:publi lass Histogram {private int size = 10;private int ounts[size℄;...}In this ase, a �xed number (size) of elements has been allo ated in the arrayby the de laration in line 3. The problem with stati arrays is that on e youhave de lared the array size, it is �xed for the duration of the program. Youhave to know ahead of time how many elements you will need and reserve, inadvan e, suÆ ient memory. A more exible alternative is dynami memoryallo ation.3.1.4 Dynami ArraysWhereas stati arrays are de lared prior to runtime and are generally reservedin sta k memory, dynami arrays are reated in the heap1. This heap is anarea of memory from whi h your program dynami ally allo ates memory whenrequested by the program with the new[ ℄ operator. A more exible variantof the Histogram lass doesn't spe ify the array size in the de laration, butallo ates the array using the new operator at runtime - in this example, whenthe Histogram onstru tor is alled:publi lass Histogram {private int size;private int ounts[℄;publi Histogram( int size ) { ounts = new int[size℄;this.size = size;...}...}The de laration int ounts[℄ only de lares the referen e for the array; itdoesn't allo ate any spa e for the array. Attempts to referen e elements ofthe array before spa e has been allo ated with the new[℄ statement will auseruntime errors. There is a possibility that there might not be enough memoryleft in the heap, in whi h ase your program would have to properly handle su h1Do not onfuse this heap with the heap data stru ture dis ussed later in se tion 6.2.

3.1. ARRAYS 19an error2 Assuming there is suÆ ient spa e, you an then use the dynami allyde lared array just like the stati array. Dynami array allo ation is exiblebe ause the size of the array an be determined at runtime and then used withthe new[ ℄ operator to reserve the spa e in the heap. Our Histogram lass be- omes mu h more exible be ause its user an determine how many frequen iesor ounts will be needed for the problem at hand after the program has started.It the invokes the onstru tor Histogram(int size) with a parameter whi hsets the size of the array. It then uses the parameter in a for statement toinitialize all elements of ounts to the value of 0. Finally it initializes size tothe value of array size. We an even add a hangeSize() method to hangethe size of a Histogram obje t's array after the obje t has been reated. To dothis, hangeSize() allo ates a new array, opies any data from the old arrayinto it and then adjusts the referen e ounts. We've added printHistogram()whi h prints the ontents of a Histogram obje t's array. Here is the ode forthe four lass methods:publi lass Histogram {publi Histogram(int size) { ounts = new int[size℄;this.size = size;for(int j=0; j <size; j++) ounts[i℄ = 0;}void hangeSize(int new_size) {int new_array = new int[new_size℄;for(int j=0; (j<size)&&(j<new_size); j++)new_array[j℄ = ounts[j℄;if( new_size > size ) {for(int k=size; k<new_size; k++ )new_array[k℄ = 0;}size = new_size; ounts = new_array;}void printHistogram() {for(int j=0; j<size; j++)System.out.print(" " + ounts[i℄);System.out.println();}}Histogram obje ts an now be tested using the following ode fragment:2By at hing the runtime error OutOfMemoryError.

20 CHAPTER 3. DATA STRUCTURES> Java TestHistogram10 5 13 12 38 7 43 23 78 9 7 8 0 3410 5 13 12 3 0 0 0 0 0 0 0 0 0 08 7 43 23 78 9 7 8 0 34 0 0 0 0 0 0 0 0 0 0> Figure 3.2: Output from program testing the Histogram lass.publi lass TestHistogram {publi stati void main() {Histogram h1 = new Histogram(5),h2 = new Histogram(10);h1.printHistogram();h2.printHistogram();h1. hangeSize(25);h2. hangeSize(30);h1.printHistogram();h2.printHistogram();}}Here we have de lared two Histogram obje ts, h1 and h2. The numbers inparentheses on the right side of f1 and f2 will be used to initialize the array.After h1 and h2 are reated, the printHistogram() fun tion is alled for ea hobje t. Next, the hangeSize() fun tion is alled on ea h obje t with argument25 for f1 and 30 for f2. The printHistogram() fun tion is alled on e againto show that the array in ea h obje t is indeed larger. The runtime output isgiven in Figure 3.2 below.3.2 Implementation of the Colle tion lassHere we will implement the SimpleColle tion lass using arrays: First de�nethe attributes:publi lass AColle tion implements SimpleColle tion {private Obje t a[℄;private int ount = 0;...}Remember that we are implementing a simpli�ed version of the standard APIColle tion: Java's rules for implementation of an interfa e require all themethods of the interfa e to be implemented. We will initially keep this examplesimple by de�ning a simpler interfa e with fewer methods: ompletion of a fullimplementation of the Colle tion interfa e is left as an exer ise.

3.2. IMPLEMENTATION OF THE COLLECTION CLASS 21Constru torNow we add the lass methods - starting with the onstru tors. At this point,we will only de�ne one onstru tor: others ould possibly be de�ned.publi lass AColle tion implements SimpleColle tion {private Obje t a[℄;private int ount = 0;publi AColle tion( int max_items ) {a = new Obje t[ max_items ℄;}}Note that the items in the olle tion are of type Obje t - the an estor of allJava lasses. Thus this lass may ontain obje ts of any type3.MethodsNow we need to de�ne the methods themselves:publi lass AColle tion implements SimpleColle tion {private Obje t a[℄;private int ount = 0;publi AColle tion( int max_items ) {...}publi boolean add( Obje t x ) {if( ount > (a.length-1) ) {// trap error...}a[ ount++ ℄ = x;}

3Although, if we wish to make olle tions of Java's primitive types: har, int, et ., wewill need to use the 'wrapper' lasses: Chara ter, Integer, et .

22 CHAPTER 3. DATA STRUCTURESpubli boolean ontains( Obje t x ) {for(int k=0;k< ount;k++) {if( a[k℄.equals(x) ) return true;}return false;}publi void remove( Obje t x ) {for(int k=0;k< ount;k++) {if( a[k℄.equals(x) { ount--;// Move remaining obje ts upfor(;k< ount;k++) { a[k℄ = a[k+1℄; }return true;}}return false;}}Note that in the add method, we have deferred implementation of the odewhi h handles the ase of adding more items than the array's apa ity. One ofour justi� ations for deferring this is that it is possible to handle this problemwithout ausing an error - at the ost of some time. In fa t, the Ve tor lass inthe standard API - whi h is essentially an array based olle tion - extends thearray when ne essary.3.2.1 Performan eaddIn 'normal' ases, i.e. no array over ow, it takes a �xed number of instru tionsto add a new item - irrespe tive of the urrent size of the olle tion. Thusthe time required will be onstant for any ombination of ma hine + operatingsystem + virtual ma hine (or run-time system in non-Java environments). ontainsIn the worst ase, you will have to sear h all n items urrently in the olle tion.Thus the time required by an invo ation of ontains will be a linear fun tionof n: T ontains(n) = n+ bwhere and b are onstants determined by your ma hine + operating system +virtual ma hine ombination. The b term allows for the �xed ost of alling anymethod and the instru tions for setting up the loop, et .As we will see later in hapter 5, we will usually ignore this term be ause it will be ome negligible asn be omes very large.

3.2. IMPLEMENTATION OF THE COLLECTION CLASS 23removeIf the item to be removed is found in index position k � 1, then the time toremove an item from a olle tion ontaining n items will beTremove(n) = 1k + 2(n� k) + bwhere 1 is the time for a ompare operation, and 2 is the time for movingan item down one position in the array. As before, the b term represents �xedoverheads of invoking a method, et .. In the two extreme ases, k = 0 andk = n, we have: Tremove(n) = 2n+ bfor k = 0 and Tremove(n) = 1n+ bfor k = n.In both ases, we have Tremove(n) = n for some onstant, (ignoring the b asn!1) and we des ribe remove as taking time proportional to n.3.2.2 Dire t addressingOne operation that is parti ularly eÆ ient with arrays is the ability to dire tlyaddress any element if the index of that element is known. This eÆ ien yarises from the ability to dire tly ompute the memory of the desired element.The Colle tion interfa e in ludes a method:publi Obje t get( int index );whi h works very eÆ iently with arrays (taking onstant time independent ofthe olle tion size) be ause of the ability to dire tly ompute the address ofthe desired element. This apability allows some operations on the Ve tor andArrayList lasses to be very eÆ ient - ompared to their implementations inother lasses, su h as lists (see se tion 3.4) or trees (see se tion 4).3.2.3 Resour e UsageArrays have low overheads for the storage of items. In arrays of primitive datatypes, the data items themselves an be stored dire tly in the array. This meansthat the only additional memory (or overhead) is for a referen e to the array'sbase address and some information about the array dimensions. With arrays ofobje ts in Java, the array will store a referen e (pointer) to the a tual obje tresulting in an overhead of a pointer (usually 32 bits on a 32-bit ma hine) forea h data item: this still a relatively low overhead ompared to that asso iatedwith some stru tures that we will study in further hapters.

24 CHAPTER 3. DATA STRUCTURESExer ise:1. Make a version of the add method whi h in reases the array's apa itywhen ne essary - mimi ing the behaviour of the Ve tor lass.2. What is the ost of this apability?That is, how long will it take to in rease the array size - performingany ne essary housekeeping too?3. Dis uss alternative ways of handling the full array problem.3.2.4 Con lusionArrays are fundamental data stru tures that give programmers the ability tomanipulate obje ts lo ated in ontiguous memory spa e. Stati ally allo atedarrays su�er the limitation of not being able to hange size during the exe utionof a program. Dynami array allo ation an be used to defer setting the sizeof an array until program runtime. This o�ers greater exibility and eÆ ientlyuses a omputer's memory.

3.3. SEARCHING 253.3 Sear hingComputer systems are often used to store large amounts of data from whi hindividual re ords must be retrieved a ording to some sear h riterion. Thusthe eÆ ient storage of data to fa ilitate fast sear hing is an important issue. Inthis se tion, we shall investigate the performan e of some sear hing algorithmsand the data stru tures whi h they use.3.3.1 Sequential Sear hesLet's examine how long it will take to �nd an item mat hing a key in the olle tions we have dis ussed so far. We're interested in:� the average time� the worst- ase time and� the best possible time.However, we will generally be most on erned with the worst- ase time as al- ulations based on worst- ase times an lead to guaranteed performan e predi -tions. Conveniently, the worst- ase times are generally easier to al ulate thanaverage times. If there are n items in our olle tion - whether it is stored as anarray or as a linked list - then it is obvious that in the worst ase, when thereis no item in the olle tion with the desired key, then n omparisons of the keywith keys of the items in the olle tion will have to be made.To simplify analysis and omparison of algorithms, we look for a dominantoperation and ount the number of times that dominant is performed. Forsear hing, we hoose the omparison as the dominant operation. Sin e the sear hrequires n omparisons in the worst ase, we say this is a O(n) (pronoun e this"big-Oh-n" or "Oh-n") algorithm. The best ase - in whi h the �rst omparisonreturns a mat h - requires a single omparison and is O(1). The average timedepends on the probability that the key will be found in the olle tion - this issomething that we would not expe t to know in the majority of ases. Thus inthis ase, as in most others, estimation of the average time is of little utility.If the performan e of the system is vital, i.e. it's part of a life- riti al system,then we must use the worst ase in our design al ulations as it represents thebest guaranteed performan e.3.3.2 A More EÆ ient Sear hHowever, if we pla e our items in an array and sort them in either as ending ordes ending order on the key �rst, then we an obtain mu h better performan ewith an algorithm alled binary sear h.In binary sear h, we �rst ompare the key with the item in the middle positionof the array. If there's a mat h, we an return immediately. If the key is lessthan the middle key, then the item sought must lie in the lower half of the array;if it's greater then the item sought must lie in the upper half.

26 CHAPTER 3. DATA STRUCTURESWhat do we mean by key?In order for items in a olle tion to be sear hable, ea h data item must have akey. The key is the part of the item whi h identi�es it for sear hing purposes.For simple data items, e.g. those in a olle tion of integers, the key is the dataitem itself. More omplex data items will ontain at least one key. For example,entries in an ele troni telephone dire tory ontains a name, an address and atelephone number. The dire tory is sear hed by name, thus the name is the keyby whi h ea h entry is identi�ed. Emergen y servi es may want to look up theaddress using the telephone number: thus the telephone number may also bea key for ea h entry. Keys may be formed in omplex ways: in the telephonedire tory example, the key used when looking up the dire tory by name will beformed from the family and given names (or ompany name) a ording to somerules devised by the dire tory providers. A olle tion of data items is sear hableas long as there is some way of forming a key for ea h data items - it doesntmatter how that key is formed.The asso iation between a key and the data item is alled a mapping - fromthe key to the data item. The lasses in the standard API implementing theMap interfa e model ontainers in whi h items have keys that map to them.3.3.3 Binary Sear hSuppose we are looking for x (the sear h key) in an array a, ranging from a[0℄through a[99℄, and sorted in as ending order. We start by omparing x witha[49℄, i.e. the item in the middle of the array4 . If x < a[49℄, sin e the arrayis sorted and a[49℄ � a[50℄, then we also have x � a[50℄. x is also less thana[51℄, a[52℄ and so on. If x is in the array at all, it must be somewhere froma[0℄ through a[48℄. By similar logi , if x � a[49℄, it must be somewhere froma[49℄ through a[99℄, if it is in the array. E�e tively, we have sli ed the arrayin two and narrowed our sear h down to one of the sli es. Binary sear hingpro eeds by repeating this idea over and over. In general, if the urrent sli eof the array extends from low to high, we divide the urrent sli e in two parts(hen e the name - binary sear h) by he king the sear h key against a[middle℄,where middle is the mean of low and high. If x < a[middle℄, we ontinue thesear h in the lower half of the array by setting the new high to middle-1. Wethen repeat the pro edure on the lower (or upper) half of the array. In ourexample, initially low is 0 and high is 99. We set middle to 49 , if x < a[49℄,we set the new high equal to 48. Note that now low is 0 and high is 48 andthe sli e extends from a[0℄ through a[48℄.A BinarySear h fun tion an now be implemented iteratively. In this simpleexample, our olle tion is an array ontaining size ints. The key is also anint.4It doesnt matter whether we round up or down when omputing middle - as long as middleremains within the array bounds.

3.3. SEARCHING 27int BinarySear h( int a[℄, int size, int key ) {int low = 0, high = size;while ( low <= high ){int middle = (low + high)/2;if ( key < a[middle℄ )high = middle - 1;elseif ( key > a[middle℄ )low = middle + 1;elsereturn middle;}return -1;}Our BinarySear h fun tion an also be implemented re ursively:int BinarySear h( int a[℄, int low, int high,int key ) {if (high == low ) {if ( key == a[low℄ ) return low;else return -1;}else {int middle = (low + high )/2;if ( key == a[middle℄ ) return middle;elseif ( key < a[middle℄)return BinarySear h(a, low, middle-1, key);elsereturn BinarySear h(a, middle+1, high, key);}}Points to note:1. There is a termination ondition (two of them in fa t!)� If low > high then the partition to be sear hed has no elements init and� If there is a mat h with the element in the middle of the urrentpartition, then we an return immediately.2. add will need to be modi�ed to ensure that ea h item added is pla ed inits orre t pla e in the array. The pro edure is simple:� Sear h the array until the orre t spot to insert the new item is found,� Move all the following items up one position and� Insert the new item into the empty position thus reated.

28 CHAPTER 3. DATA STRUCTURES3.3.4 Analysis

lowmid

high

low mid high

high

key<item[mid]

key>item[mid]

key=item[mid]

mid

log n steps2

midlow

low high

~n/4 items

items~n/2

n items Ea h step of the algorithm di-vides the blo k of items beingsear hed in half. We an dividea set of n items in half at mostlog2 n times. Thus the runningtime of a binary sear h is propor-tional to log n and we say this isa O(logn) algorithm.Binary sear h requires a more omplexprogram than our original sear h andthus for small n it may run slower thanthe simple linear sear h. However,limn!1 lognn = 0Thus, for large n, logn is mu h smallerthan n and an O(logn) algorithm ismu h faster than an O(n) one.

f(n)

n

log(n)

nPlot of n and log n vs nWe will examine this behaviour more formally in a later se tion. First, let's seewhat we an do about the insertion (add) operation. In the worst ase, insertionmay require n operations to insert into a sorted list. We an �nd the pla e inthe list where the new item belongs using binary sear h in O(logn) operations.However, we have to shu�e all the following items up one pla e to make wayfor the new one. In the worst ase, the new item is the �rst in the list, requiringn move operations for the shu�e! A similar analysis will show that deletion isalso an O(n) operation. If our olle tion is stati , i.e. it doesn't hange veryoften - if at all - then we may not be on erned with the time required to hangeits ontents: we may be prepared for the initial build of the olle tion and theo asional insertion and deletion to take some time. In return, we will be ableto use a simple data stru ture (an array) whi h has little memory overhead.However, if our olle tion is large and dynami , i.e. items are being added anddeleted ontinually, then we an obtain onsiderably better performan e usinga data stru ture alled a tree (see Se tion 4).

3.3. SEARCHING 293.3.5 Implementation of the ontains methodIn the AColle tion lass de�ned in the previous hapter, we have a method - ontains - whi h sear hes the olle tion for a spe i�ed item. We an implementthis eÆ iently if the items are sorted so we modify the add method to sear hfor the proper pla e to insert a new item - after shifting the remaining items upone position. Items added to the olle tion now must have some order, so werequire that they implement the Comparable interfa e. This means that theymust have a ompareTomethod, where a. ompareTo( b ) returns -1, 0 or +1when a < b, a == b or a > b, respe tively, i.e. ompareTo de�nes the orderof the items. To keep the example simple, ode for he king that arguments toadd and ontains are in fa t omparable obje ts has been omitted5 - as has a he k on the olle tions apa ity before adding a new item.publi lass AColle tion implements SimpleColle tion {private Obje t a[℄;private int ount = 0;publi AColle tion( int max_items ) { ... }publi boolean add( Obje t x ) {int j;// Add x so that a remains sortedComparable x = (Comparable)x;for( j=0; j< ount; j++ ) {if ( x. ompareTo( a[j℄ ) <= 0 ) {for(k= ount;k>j;k--) { a[k℄ = a[k-1℄; }}}a[j℄ = x;return true;}

5Java will throw a ClassCastEx eption when obje ts whi h arent omparable are added, sothis `short ut is not unreasonable: the methods spe i� ation an easily be written to in ludethis behaviour.

30 CHAPTER 3. DATA STRUCTURESprivate int BinSear h( int low, int high,Comparable key ) {if (high == low ) {if ( key. ompareTo( a[low℄ ) == 0 )return low;elsereturn -1;}else {int middle = (low + high )/2;int k = key. ompareTo( a[middle℄ );if ( k == 0 ) return middle;elseif ( k < 0 )return BinSear h(low, middle-1, key);elsereturn BinSear h(middle+1, high, key);}}publi boolean ontains( Obje t x ) {int pos = BinSear h( 0, ount-1, (Comparable)x );return pos >= 0;}}Notes:1. Several hanges were made to mat h the AColle tion lass environment:the binary sear h fun tion was renamed to re e t the hanges.2. BinSear h should not be a essed dire tly (i.e. from outside the ode ofthe AColle tion lass) so it is marked private.3. When using a re ursive version of BinSear h, we need to have a separatere ursively alled fun tion.Exer ise:1. Write the orresponding iterative version of the AColle tion lass.Compare its running time with that of the re ursive version. Notethat to see a signi� ant di�eren e, you will need to build quite a large olle tion - try one with 106 elements (unless your omputer is veryslow!).2. Plot the times for both versions against the size of the olle tion. Trya geometri sequen e - 106, 2� 106, 4� 106, 8� 106, ... - to generatethis graph qui kly! What do you observe about the di�eren e betweenthe two versions?

3.4. LISTS 313.4 ListsThe array implementation of our olle tion has one serious drawba k: you mustknow the maximum number of items in your olle tion when you reate it.This presents problems in programs in whi h this maximum number annot bepredi ted a urately when the program starts up. Fortunately, we an use astru ture alled a linked list to over ome this limitation.3.4.1 Linked listsThe linked list is a very exible dynami data stru ture : items may be addedto it or deleted from it at will. A programmer need not worry about how manyitems a program will have to a ommodate: this allows us to write robust pro-grams whi h require mu h less maintenan e. A very ommon sour e of problemsin program maintenan e is the need to in rease the apa ity of a program tohandle larger olle tions: even the most generous allowan e for growth tends toprove inadequate over time!item next

node

item next

node

item next

node In a linked list, ea h item is allo ated spa e asit is added to the list. A link to the next itemin the list is kept with ea h item.Ea h node of the list has two elements� the item being stored in the list and� a pointer6 to the next item in the list.The last node in the list ontains a null pointer to indi ate that it is the end ortail of the list. As items are added to a list, memory for a node is dynami allyallo ated. Thus the number of items that may be added to a list is limited onlyby the amount of memory available.Handle for the listThe variable (or handle) whi h represents the list is simply a pointer to the nodeat the head of the list. Thus a olle tion whi h is implemented by a linked list�rst reates a separate lass for the nodes of the list and has a single attribute- a referen e to the node at the head of the list:6Java attempts to avoid using the term pointer, preferring referen e instead. We follow onventional usage here (and in subsequent hapters) and use the term pointer for referen esto other obje ts in linked data stru tures su h as this one. The diagram shows that thismat hes our onventional view of su h stru tures whi h are most often drawn with arrowspointing from one node to the next.

32 CHAPTER 3. DATA STRUCTURESpubli lass LColle tion implements SimpleColle tion { lass Node {Obje t item;Node next;}private Node head;private int ount;publi LColle tion() {head = null; ount = 0;}...}In this implementation, we have hosen to make Node an inner lass. Thishas the advantage of hiding the Node obje ts altogether from the user of theLColle tion lass. However, there may be good reasons for making Node anormal external lass, e.g. you may wish to use Node in another linked datastru ture.Adding to a listThe simplest strategy for adding an item to a list is to:(a) allo ate spa e for a new node,(b) opy the item into it,( ) make the new node's next pointer point to the urrent head of the list and(d) make the head of the list point to the newly allo ated node.This strategy is fast and eÆ ient, but ea h item is added to the head of the list.publi lass LColle tion implements SimpleColle tion { lass Node {Obje t item;Node next;}private Node head, tail;private int ount;publi LColle tion() { head = tail = null; }// Add to the head of the listpubli boolean add( Obje t x ) {Node n = new Node();n.item = x; n.next = head;head = n; ount++;return true;}...}

3.4. LISTS 33An alternative is to add to the lass attributes for both head and tail pointers:publi lass LColle tion implements SimpleColle tion { lass Node {Obje t item;Node next;}private Node head, tail;private int ount;publi LColle tion() { head = tail = null; }...}The ode for add is now trivially modi�ed to make a list in whi h the item mostre ently added to the list is the list's tail.Exer ise:Write variants of the add method above to:a Keep both head and tail pointers updated orre tly andb Add an item to the tail of a list.Note that the only hange to the publi methods of LColle tion ( omparedto AColle tion) is that the LColle tion onstru tor has no arguments. As a onsequen e, appli ations whi h need to hange AColle tion to LColle tionwill need only trivial hanges. These hanges ould be avoided also by addinga se ond onstru tor to LColle tion whi h requires, but ignores, the (nowredundant) max size parameter.publi lass LColle tion implements SimpleColle tion { lass Node {Obje t item;Node next;}private Node head, tail;private int ount;publi LColle tion() { head = tail = null; }// Constru tor for ompatibility with AColle tionpubli LColle tion( int max_size ) { this(); }...}Note that the se ond onstru tor simply alls the �rst, ignoring the max sizeargument! The data stru ture is hanged, but sin e the details (the attributesof the obje t or the elements of the stru ture) are hidden from the user, thereis no impa t on the user's program. With the ex eption of the added exibilitythat any number of items may be added to our olle tion, this implementationprovides exa tly the same high level behaviour as the previous one. The ram-i� ations for the ost of software maintenan e are signi� ant. The linked list

34 CHAPTER 3. DATA STRUCTURESimplementation has ex hanged exibility for eÆ ien y - on most systems, thesystem all to allo ate memory is relatively expensive. Pre-allo ation in thearray- based implementation is generally more eÆ ient. More examples of su htrade-o�s will be found later.Exer ise:Compare the time required to add a �xed number of items to anAColle tion with that required for an LColle tion. Even on a relativelyfast omputer, you an expe t to measure signi� ant di�eren es with < 106items.The study of data stru tures and algorithms will enable you to make the imple-mentation de ision whi h most losely mat hes your users' spe i� ations.3.4.2 List variantsCir ularly Linked ListsBy ensuring that the tail of the list is always pointing to the head, we anbuild a ir ularly linked list. If the node referen e - the one alled head in ourimplementation - points to the urrent "tail" of the list instead of the head, thenthe "head" is found trivially via tail.next, permitting us to have either LIFOor FIFO lists with only one referen e. In modern pro essors, the few bytes ofmemory saved in this way would probably not be regarded as signi� ant unlessthe olle tion was extremely large. A ir ularly linked list would more likely beused in an appli ation whi h required "round-robin" s heduling or pro essing.Doubly Linked Listsitem prev

node

next item prev

node

next item prev

node

next

Doubly linked lists have a pointer to the pre- eding item as well as one to the next. Theypermit s anning or sear hing of the list in bothdire tions. (To go ba kwards in a simple list,it is ne essary to go ba k to the start and s anforwards.)Many appli ations require sear hing ba kwards and forwards through se tionsof a list: for example, sear hing for a ommon name like "Kim" in a Korean tele-phone dire tory would probably need mu h s anning ba kwards and forwardsthrough a small region of the whole list, so the ba kward links be ome veryuseful. In this ase, the node lass is altered to have two links:

3.5. STACKS 35publi lass DLColle tion implements SimpleColle tion { lass Node {Obje t item;Node next, previous;}private Node head, tail;private int ount;publi LColle tion() { head = tail = null; }...}Exer ise:1. Complete the ode for the add and remove methods for DLColle tion.2. What other method(s) would you add to DLColle tion to take ad-vantage of its additional apabilities?Lists in arraysAlthough this might seem pointless (Why impose a stru ture whi h has theoverhead of the "next" pointers on an array?), this is just what memory allo a-tors do to manage available spa e. Memory is just an array of words. After aseries of memory allo ations and de-allo ations, there are blo ks of free memorys attered throughout the available heap spa e. In order to be able to re-use thismemory, memory allo ators will usually link freed blo ks together in a free listby writing pointers to the next free blo k in the blo k itself. An external freelist pointer pointer points to the �rst blo k in the free list. When a new blo kof memory is requested, the allo ator will generally s an the free list looking fora freed blo k of suitable size and delete it from the free list (re-linking the freelist around the deleted blo k). Many variations of memory allo ators have beenproposed: refer to a text on operating systems or implementation of fun tionallanguages for more details. The entry in the index under garbage olle tion willprobably lead to a dis ussion of this topi .3.5 Sta ksAnother way of storing data is in a sta k. A sta k is often implemented withonly two prin iple operations (apart from a onstru tor and destru tor meth-ods): push adds an item to a sta kpop extra ts the most re ently pushed item from thesta k.Other methods su h as

36 CHAPTER 3. DATA STRUCTUREStop returns the item at the top without removing it [9℄isEmpty determines whether the sta k has anything in itare sometimes added.A ommon model of a sta k is a plateor oin sta ker. Plates are "pushed"onto to the top and "popped" o� thetop. Sta ks form Last-In-First-Out(LIFO) queues and have many appli a-tions from the parsing of algebrai ex-pressions to ...A formal spe i� ation of a sta k lasswould look like:Push

Pop

publi lass Sta k extends AColle tion {/* Constru t a new sta kPre- ondition: (max_items > 0)*/publi Sta k( int max_items );/* Push an item onto a sta kPre- ondition: (existing item ount < max_items) &&(item != null)Post- ondition: item has been added to the top of s*/publi void push( Obje t item );/* Pop an item of a sta kPre- ondition: (existing item ount >= 1)Post- ondition: top item has been removed from sta k*/publi Obje t pop( p);Points to note: A sta k is simply another olle tion of data items and thusit would be possible to use exa tly the same spe i� ation as the one used forour general olle tion. However, olle tions with the LIFO semanti s of sta ksare so important in omputer s ien e that it is appropriate to set up a limitedspe i� ation appropriate to sta ks only. Although a linked list implementationof a sta k is possible (adding and deleting from the head of a linked list produ esexa tly the LIFO semanti s of a sta k), the most ommon appli ations for sta kshave a spa e restraint so that using an array implementation is a natural andeÆ ient one In most operating systems, allo ation and de-allo ation of memoryis a relatively expensive operation, there is a penalty for the exibility of linkedlist implementations.

3.5. STACKS 373.5.1 Sta k FramesAlmost invariably, programs ompiled from modern high level languages makeuse of a sta k frame for the working memory of ea h pro edure or fun tioninvo ation. When any pro edure or fun tion is alled, a number of words -the sta k frame - is pushed onto a program sta k. When the pro edure orfun tion returns, this frame of data is popped o� the sta k. As a fun tion alls another fun tion, �rst its arguments, then the return address and �nallyspa e for lo al variables is pushed onto the sta k. Sin e ea h fun tion runs inits own "environment" or ontext, it be omes possible for a fun tion to allitself - a te hnique known as re ursion. This apability is extremely usefuland extensively used - be ause many problems are elegantly spe i�ed or solvedin a re ursive way.In the example below, we have a pair ofmutually re ursive fun tions: fun tionf (implemented as a stati method inside some lass) alls fun tion g (anotherstati method in the same lass) whi h in turn alls fun tion f. Spa e for lo alvariables (a in fun tion f and p and q in fun tion g) must be allo ated orre tly- one instan e of a for ea h invo ation of f and one ea h of p and q for ea hinvo ation of g. By pushing a sta k frame ( ontaining the lo al variables) ontothe program sta k ea h time a fun tion is invoked, spa e is reated for em opiesof the lo al variables for ea h invo ation. The frame is popped o� the sta k asea h fun tion returns.stati int f(int x, int y) {int a;if ( term_ ond ) return ...;a = .....;return g(a);}stati int g(int z) {int p, q;p = ...; q = ...;return f(p,q);}

38 CHAPTER 3. DATA STRUCTURESNote how all of fun tion f and g'senvironment (their parametersand lo al variables) are found inthe sta k frame. When f is alleda se ond time from g, a newframe for the se ond invo ationof f is reated.

��

��

for gframeStack

parameters

return address

local variables

parameters

return address

local variables

parameters

return address

local variables

Stackframe

Stackframefor f

for f

xy

xy

a

a

z

qp

3.6 Re ursionRe ursion is a simple, but powerful, te hniquefor de�ning algorithms in whi h a problemis divided into smaller parts or sub-problemsand one or more of those parts is solved usingthe same algorithm. Thus the problems to besolved be ome progressively smaller until it istrivial to �nd a solution. Thus a re ursive al-gorithm must have two parts:� a base ase or terminating ondition - aproblem whi h has a known solution� re ursive steps - in whi h the problem isdivided into smaller parts and the re ur-sive algorithm is applied to one or moreof the smaller problems.Many examples of re ursion may be found:the te hnique is useful both for the de�nitionof mathemati al fun tions and for the de�ni-tion of data stru tures. Naturally, if a datastru ture may be de�ned re ursively, it maybe pro essed by a re ursive fun tion!

re urFrom the Latin, re- =ba k + urrere = to runTo happen again, espe- ially at repeated inter-vals.3.6.1 Re ursive fun tionsMany mathemati al fun tions an be de�ned re ursively:� fa torial� Fibona i numbers

3.6. RECURSION 39� Eu lid's GCD (greatest ommon denominator )� Fourier TransformMany problems an be solved re ursively, e.g. games of all types from simpleones like the Towers of Hanoi problem to omplex ones like hess. In games,the re ursive solutions are parti ularly onvenient be ause, having solved theproblem by a series of re ursive alls, you want to �nd out how you got to thesolution. By keeping tra k of the move hosen at any point, the program allsta k does this housekeeping for you! This is explained in more detail later.Example: Fa torialThe fa torial fun tion is one of the simplest examples of a re ursive de�nition.We an al ulate n! from this relation:n! = n� (n� 1)� (n� 2)� ::::� 2� 1Observing that the (n� 1)� (n� 2)� ::: is a tually (n� 1)!, we an de�ne thefa torial fun tion this way:fa torial( n ) = if ( n = 0 ) then 1else n * fa torial( n-1 )A natural way to al ulate fa torials is to write a re ursive fun tion whi hmat hes this de�nition:int fa torial( int n ) {if ( n == 0 ) return 1;else return n*fa torial(n-1);}Note how this fun tion alls itself to evaluate the next term. The sequen e of alls unfolds like this: for n = 5:fa torial(5)! 5� fa torial(4)fa torial(4)! 4� fa torial(3)fa torial(3)! 3� fa torial(2)fa torial(2)! 2� fa torial(1)fa torial(1)! 1� fa torial(0)fa torial(0)! 0Finally, the all fa torial(0) returns 1 to the alling fun tion, whi h ompeletesits al ulation and returns a result to the fun tion that alled it. This is knownas unwinding the re ursion.fa torial(0)! 1fa torial(1)! 1� 1 = 1fa torial(2)! 1� 2 = 2fa torial(3)! 2� 3 = 6fa torial(4)! 6� 4 = 24fa torial(5)! 24� 5 = 120 whi h is the �nal result.There's only one re ursive all, so this is an example of linear re ursion.Eventually the sequen e of alls will rea h the termination ondition (n = 0

40 CHAPTER 3. DATA STRUCTURESin this ase) , stop making further alls and start to unwind the re ursion.However, before it rea hes the termination ondition, it will have pushed nsta k frames onto the program's run-time sta k. The termination ondition isobviously extremely important when dealing with re ursive fun tions. If it isomitted, then the fun tion will ontinue to all itself until the program runs outof sta k spa e - usually with moderately unpleasant results! Failure to in ludea orre t termination ondition in a re ursive fun tion is a re ipe for disaster!Another ommonly used (and abused!) example of a re ursive fun tion is the al ulation of Fibona i numbers. Following the de�nition:fib( n ) = if ( n = 0 ) then 1if ( n = 1 ) then 1else fib( n-1 ) + fib( n-2 )one an write:int fib( int n ) {if ( (n == 0) || (n == 1) ) return 1;else return fib(n-1) + fib(n-2);}Short and elegant, it uses tree re ursion (in whi h two or more re ursive allsare made at ea h step) to provide a neat solution - that is a tually a disaster!We shall re-visit this and show why it is su h a disaster later in the hapteron dynami algorithms10. Data stru tures also may be re ursively de�ned.One of the most important lass of stru ture - trees (see Chapter 4) - allowsre ursive de�nitions whi h lead to simple (and eÆ ient) re ursive fun tions formanipulating them. But in order to see why trees are valuable stru tures, let's�rst examine the problem of sear hing.Exer ise:An important fun tion in probability is the fun tion that al ulates binomial oeÆ ients, written C(n;m), and de�ned as:C(n;m) = n!(n�m)!�m!It an also be de�ned re ursively:C(n; 0) = 1; for n > 0C(n; n) = 1; for n > 0C(n;m) = C(n� 1;m) + C(n� 1;m� 1); for n > m � 0Write a fun tion, int binomial( int n, int m ), that al ulatesC(n;m). However, be areful not to use this fun tion for any real pro- essing, like the re ursive al ulation of Fibona i numbers, it is extremelyineÆ ient, see Chapter 10.

3.6. RECURSION 41Example ode:Fa torial.javaFibona i.javaKey TermsHeapAn area of memory from whi h a program dynami ally allo ates memoryfor new obje ts whil the program is running.Dynami (memory) allo ationAllo ation of memory while a program is running to allow the runningprogram to adapt to the a tual needs of a parti ular run.Big OhA notation formally des ribing the set of all fun tions whi h are boundedabove by a nominated fun tion.Dominant OperationAn operation hosen from among those in a sequen e needed by an al-gorithm whi h an be used to represent a riti al step in the algorithm.Binary Sear hA te hnique for sear hing an ordered list in whi h we �rst he k themiddle item and - based on that omparison - "dis ard" half the data.The same pro edure is then applied to the remaining half until a mat his found or there are no more items left.Linked ListData stru ture in whi h items are linked together in a ontinuous list.Dynami data stru turesStru tures whi h grow or shrink as the data they hold hanges. Lists,sta ks and trees are all dynami stru tures.push, popGeneri terms for adding something to, or removing something from asta k ontextThe environment in whi h a fun tion exe utes: in ludes argument values,lo al variables and global variables. All the ontext ex ept the globalvariables is stored in a sta k frame.sta k framesThe data stru ture ontaining all the data (arguments, lo al variables,return address, et ) needed ea h time a pro edure or fun tion is alled.Termination onditionCondition whi h terminates a series of re ursive alls - and prevents theprogram from running out of spa e for sta k frames!

42 CHAPTER 3. DATA STRUCTURES

Chapter 4Trees4.1 Binary TreesThe simplest form of tree is a binary tree. A binary tree onsists of� a node ( alled the root node) and� left and right sub-trees.Both the sub-trees are themselves binary trees. You now have a re ursivelyde�ned data stru ture.Exer ise:It is also possible to de�ne a list re ursively: an you see how?The nodes at the lowest levels of the tree (the ones with no sub-trees) are alledleaves. In an ordered binary tree, the keys of all the nodes in the left sub-treeare less than that of the root, the keys of all the nodes in the right sub-tree aregreater than that of the root, the left and right sub-trees are themselves orderedbinary trees.��

��

��

��

left rightitem

left rightitem left rightitem

left rightitem left rightitem

sub−treeleft

sub−tree

left rightitem

right

root

leaves

nodes

Figure 4.1: A binary tree43

44 CHAPTER 4. TREESData Stru tureThe data stru ture for the tree implementation simply adds left and right point-ers in pla e of the next pointer of the linked list implementation.publi lass BinaryTree ... {private lass Node {Node left, right;Comparable item;// Node onstru torNode( Comparable x ) {item = x;}...}private Node root;// Create an empty treepubli BinaryTree() {root = null;}...}Note that, in order to reate a binary sear h tree, items pla ed in the treemust have some natural ordering. So this implementation requires that theyimplement the Comparable interfa e. The add method in the following odesegment takes an Obje t as its argument: it must ast this obje t to one ofthe Comparable lass before alling the add method of the inner lass, Node.Attempts to add obje ts whi h do not implement Comparable will ause aClassCastEx eption to be thrown at this point.The add method is, naturally, re ursive.

4.1. BINARY TREES 45publi lass BinaryTree ... {private lass Node {Node left, right;Comparable item;// Add to a nodevoid add( Node n ) {if( (this.item). ompareTo( (Comparable)(n.item) ) )< 0 ) {if( n.left == null ) n.left = n;else n.left.add( n );}else {if( n.right == null ) n.right = n;else n.right.add( n );}}...}private Node root;// Add x to the treepubli boolean add( Obje t x ) {Node n = new Node( (Comparable)x );if ( root == null ) root = n;else root.add( n );return true;}...}

Similarly, the ontains method is re ursive:

46 CHAPTER 4. TREESpubli lass BinaryTree ... {private lass Node {Node left, right;Comparable item;...Obje t get( Comparable x ) {Node hild;int omp = (this.item). ompareTo( x );if ( omp == 0 ) return item;else { hild = ( omp < 0 )?left:right;}if ( hild == null ) return null;else return hild.get( x );}}private Node root;// Sear h for an obje tpubli boolean ontains( Obje t x ) {if ( root == null ) return false;else return root.get( x ) != null;}...}4.1.1 AnalysisComplete TreesBefore we look at more general ases, let's make the optimisti assumption thatwe've managed to �ll our tree neatly, i.e. that ea h leaf is the same `distan e'from the root.height

1

2

3A omplete tree This forms a omplete tree, whoseheight is de�ned as the number of linksfrom the root to the deepest leaf.First, we need to work out how many nodes, n, we have in su h a tree of height,h. Now, n = 1 + 2 + 4 + :::+ 2h = 20 + 21 + 22 + :::+ 2hFrom whi h we have, n = 2h+1 � 1and h = blog2 n Examination of the ontains method shows that in the worst ase, h + 1 ordlog2 ne omparisons are needed to �nd an item. This is the same as for binary

4.1. BINARY TREES 47sear h. However, add also requires dlog2 ne omparisons to determine where toadd an item. A tually adding the item takes a onstant number of operations,so we say that a binary tree requires O(logn) operations for both adding and�nding an item - a onsiderable improvement over binary sear h for a dynami stru ture whi h often requires addition of new items. Deletion is also anO(logn)operation.4.1.2 General binary treesHowever, in general addition of items to an ordered tree will not produ e a omplete tree. The worst ase o urs if we add an ordered list of items to atree.Exer ise:What will happen? Think before you turn to the next page!This problem is readily over ome: we use a stru ture known as a heap. How-ever, before looking at heaps, we should formalise our ideas about the omplexityof algorithms by de�ning arefully what O(f(n)) means.Key TermsRoot NodeNode at the "top" of a tree - the one from whi h all operations on thetree ommen e. The root node may not exist (a null tree with no nodesin it) or have 0, 1 or 2 hildren in a binary tree.Leaf NodeNode at the "bottom" of a tree - farthest from the root. Leaf nodes haveno hildren.Complete TreeTree in whi h ea h leaf is at the same distan e from the root. A morepre ise and formal de�nition of a omplete tree is set out later.HeightNumber of nodes whi h must be traversed from the root to rea h a leafof a tree.

48 CHAPTER 4. TREES

Chapter 5ComplexityWe have already used the O() notation to denote the general behaviour of analgorithm as a fun tion of the problem size. We have said that an algorithm isO(logn) if its running time, T (n), to solve a problem of size n is proportionalto logn.5.1 The O() notationFormally, O(g(n)) is the set of fun tions, f , su h that for some > 0,f(n) < g(n)for all positive integers, n > N , i.e. for all suÆ iently large N . Another way ofwriting this is: limn!1f(n)g(n) � Informally, we say the O(g) is the set of all fun tions whi h grow no faster thang. The fun tion g is an upper bound to fun tions in O(g).We are interested in the set of fun tions de�ned by the O() notation be ausewe want to argue about the relative merits of algorithms - independent of theirimplementations. That is, we are not on erned with the language or ma hineused; we want a means of omparing algorithms whi h is relevant to any imple-mentation.We an de�ne two other fun tions: (g) and �(g) .(g) the set of fun tions f(n) for whi h f(n) � g(n) for all positive integers,n > N , and �(g) = (g) \ O(g)We an derive: f 2 �(g)if limn!1f(n)g(n) = 49

50 CHAPTER 5. COMPLEXITYThus, (g) is a lower bound - fun tions in (g) grow faster than g and �(g)are fun tions that grow at the same rate as g. In these last two statements -as in most of the dis ussion on omplexity theory - "within a onstant fa tor"is understood. Di�erent languages, ompilers, ma hines, operating systems,et .will produ e di�erent onstant fa tors: we're on erned with the generalbehaviour of the running time as n in reases to very large values.5.1.1 Properties of the O() notationThe following general properties of O() notation expressions may be derived:1. Constant fa tors may be ignored:For all k > 0, kf is O(f).e.g. an2 and bn2 are both O(n2).2. Higher powers of n grow faster than lower powers:nr is O(ns) if 0 � r � s.3. The growth rate of a sum of terms is the growth rate of its fastest growingterm:If f is O(g), then f + g is O(g).e.g. an3 + bn2 is O(n3).4. The growth rate of a polynomial is given by the growth rate of its leadingterm ( f. (2), (3)):If f is a polynomial of degree d, then f is O(nd).5. If f grows faster than g, whi h grows faster than h, then f grows fasterthan h.6. The produ t of upper bounds of fun tions gives an upper bound for theprodu t of the fun tions:If f is O(g) and h is O(r), then fh is O(gr)e.g. if f is O(n2) and g is O(logn), then fg is O(n2 logn).7. Exponential fun tions grow faster than powers:nk is O(bn), for all b > 1; k � 0,e.g. n4 is O(2n) and n4 is O(en).8. Logarithms grow more slowly than powers:logb n is O(nk) for all b > 1; k > 0e.g. log2 n is O(n0:5).9. All logarithms grow at the same rate:logb n is �(logd n) for all b; d > 1.10. The sum of the �rst n rth powers grows as the (r + 1)th power:Pnk=1 kr is �(nr+1)e.g. Pnk=1 i = (n+1)n2 is �(n2))

5.2. POLYNOMIAL AND INTRACTABLE ALGORITHMS 515.2 Polynomial and Intra table Algorithms5.2.1 Polynomial time omplexityAn algorithm is said to have polynomial time omplexity i� it is O(nd) for someinteger d.5.2.2 Intra table problemsA problem is said to be in-tra table if no algorithm withpolynomial time omplexity isknown for it. Note arefully that we have written is knownhere - this is an important quali� ation! Wewill examine some intra table problems laterin Se tion 11.5.3 Analysing an algorithm5.3.1 Simple Statement Sequen eFirst note that a sequen e of statements whi h is exe uted on e only is O(1).It doesn't matter how many statements are in the sequen e - only that thenumber of statements (or the time that they take to exe ute) is onstant for allproblems.5.3.2 Simple LoopsIf a problem of size n an be solved with a simple loop:for(i=0;i<n;i++) {s;}where s is an O(1) sequen e of statements, then the time omplexity is n�O(1)or O(n).If we have two nested loops:for(j=0;j<n;j++)for(i=0;i<n;i++) {s;}then we have n repetitions of an O(n) sequen e, giving a omplexity of: n�O(n)or O(n2).Where the index 'jumps' by an in reasing amount in ea h iteration, we mighthave a loop like:h = 1;while( h <= n ) {s;h = 2*h;}in whi h h takes values 1, 2, 4, ... until it ex eeds n. This sequen e has 1 +blog2 n values, so the omplexity is O(log2 n).

52 CHAPTER 5. COMPLEXITYIf the inner loop depends on an outer loop index:for(j=0;j<n;j++)for(i=0;i<j;i++) { s; }The inner loop for(i=0; .. is exe uted i times, so the total is:nX1 i = n(n+ 1)2and the omplexity is O(n2). We see that this is the same as the result for twonested loops above, so the variable number of iterations of the inner loop doesnot a�e t the `big pi ture'.However, if the number of iterations of one of the loops de reases by a onstantfa tor with every iteration:h = n;while( h > 0 ) {for(i=0;i<n;i++) { s; }h = h/2;}Then� there are log2 n iterations of the outer loop and� the inner loop is O(n),so the overall omplexity is O(n logn). This is substantially better than theprevious ase in whi h the number of iterations of one of the loops de reased bya onstant for ea h iteration!Key TermsPolynomial time omplexityThe lass of problems whi h are pra ti ally solvable in reasonable time,even for very large problems.Intra table problemsThe lass of problems whi h annot be solved (ex ept for very smallproblem sizes) in reasonable times.

Chapter 6QueuesQueues are dynami olle tions whi h have some on ept of order. This anbe either based on order of entry into the queue - giving us First-In-First-Out(FIFO) or Last-In-First-Out (LIFO) queues. Both of these an be built withlinked lists: the simplest "add-to-head" implementation of a linked list givesLIFO behaviour. A minor modi� ation - adding a tail pointer and adjustingthe addition method implementation - will produ e a FIFO queue.6.1 Performan eA straightforward analysis shows that for both these ases, the time needed toadd or delete an item is onstant and independent of the number of items in thequeue. Thus we lass both addition and deletion as an O(1) operation. For anygiven real ma hine+operating system+language ombination, addition may take 1 se onds and deletion 2 se onds, but we aren't interested in the value of the onstant, it will vary from ma hine to ma hine, language to language, et . Thekey point is that the time is not dependent on n - produ ing O(1) algorithms.On e we have written an O(1) method, there is generally little more that we an do from an algorithmi point of view. O asionally, a better approa hmay produ e a lower onstant time. Often, enhan ing our ompiler, run-timesystem, ma hine, et ., will produ e some signi� ant improvement. HoweverO(1) methods are already very fast, and it's unlikely that e�ort expended inimproving su h a method will produ e mu h real gain!53

54 CHAPTER 6. QUEUES6.1.1 Priority QueuesOften the items added to a queue have a priorityasso iated with them: this priority determines theorder in whi h they exit the queue - highest priorityitems are removed �rst.This situation arises often in pro ess ontrol sys-tems. Imagine the operator's onsole in a large au-tomated fa tory. It re eives many routine messagesfrom all parts of the system: they are assigned a lowpriority be ause they just report the normal fun -tioning of the system - they update various parts ofthe operator's onsole display simply so that thereis some on�rmation that there are no problems. Itwill make little di�eren e if they are delayed or lost.However, o asionally something breaks or fails andalarm messages are sent. These have high prioritybe ause some a tion is required to �x the problem(even if it is mass eva uation be ause nothing anstop the imminent explosion!).Typi ally su h a system will be omposed of manysmall units, one of whi h will be a bu�er for mes-sages re eived by the operator's onsole. The om-muni ations system pla es messages in the bu�er sothat ommuni ations links an be freed for furthermessages while the onsole software is pro essing themessage. The onsole software extra ts messagesfrom the bu�er and updates appropriate parts ofthe display system. Obviously we want to sort mes-sages on their priority so that we an ensure thatthe alarms are pro essed immediately and not de-layed behind a few thousand routine messages whilethe plant is about to explode.As we have seen, we ould use a tree stru ture - whi hgenerally provides O(logn) performan e for both in-sertion and deletion. Unfortunately, if the tree be- omes unbalan ed, performan e will degrade to O(n)in pathologi al ases. This will probably not be a - eptable when dealing with dangerous industrial pro- esses, nu lear rea tors, ight ontrol systems andother life- riti al systems.

AsideThe great majority of omputer systems wouldfall into the broad lassof information systems -whi h simply store andpro ess information forthe bene�t of people whomake de isions based onthat information. Ob-viously, in su h systems,it usually doesn't matterwhether it takes 1 or 100se onds to retrieve a pie eof data - this simply de-termines whether you takeyour o�ee break now orlater. However, as we'llsee, using the best knownalgorithms is usually easyand straight-forward: ifthey're not already odedin libaries, they're in text-books. You don't evenhave to work out how to ode them! In su h ases,it's just your reputationthat's going to su�er ifsomeone (who has stud-ied his or her algorithmstext!) omes along laterand says "Why on earthdid X (you!) use thisO(n2) method - there's awell known O(n) one!"Of ourse, hardware manufa turers are very happy if you use ineÆ ient algo-rithms - it drives the demand for new, faster hardware - and keeps their pro�tshigh! There is a stru ture whi h will provide guaranteed O(logn) performan efor both insertion and deletion: it's alled a heap .6.2 HeapsHeaps are based on the notion of a omplete tree, for whi h we gave an informalde�nition earlier. Formally:

6.2. HEAPS 55A binary tree is ompletely full if it is of height, h, and has 2h+1 � 1 nodes. Abinary tree of height, h, is omplete i�1. it is empty or2. its left subtree is omplete of height h�1 and its right subtree is ompletelyfull of height h� 2 or3. its left subtree is ompletely full of height h � 1 and its right subtree is omplete of height h� 1.A omplete tree is �lled from the left:1. all the leaves are on the same level or two adja ent ones and2. all nodes at the lowest level are as far to the left as possible.6.2.1 Heap PropertiesA binary tree has the heap property i�1. it is empty or2. the key in the root is larger than that in either hild and both subtreeshave the heap property.Extra ting the highest priority itemA heap an be used as a priority queue: the highest priority item is at theroot and is trivially extra ted. But if the root is deleted, we are left with twosub-trees and we must eÆ iently re- reate a single tree with the heap property.The value of the heap stru ture is that we an both extra t the highest priorityitem and insert a new one in O(logn) time.How do we do this?Let's start with this heap.Item T has the highest pri-ority: the heap propertyguarantees that it's at theroot of the tree.Remove the T at the root. ��

��

A

G

E

R

C

P

N

S

A I

O

M

T

To work out how we're go-ing to maintain the heapproperty, use the fa t thata omplete tree is �lledfrom the left. So that theposition whi h must be- ome empty is the one o - upied by the M. Put it inthe va ant root position. ��

��

A

G

E

R

C

P

N

S

A I

O

M

56 CHAPTER 6. QUEUESThis has violated the on-dition that the root mustbe greater than ea h of its hildren. So inter hangethe M with the larger ofits hildren. A

G

E

R

C

P

N

A I

O

M

S

The left subtree has nowlost the heap property. Soagain inter hange the Mwith the larger of its hil-dren.The tree is now a heapagain, so we're �nished. A

G

E C

P

N

A I

O

S

M

R

We need to make at most h inter hanges of a root of a subtree with one of its hildren to fully restore the heap property. Thus deletion from a heap is O(h)or O(logn).6.2.2 Addition to a heapTo add an item to a heap, we fol-low the reverse pro edure. Pla eit in the next leaf position andmove it up. Again, we requireO(h) or O(logn) ex hanges. A

G

E

R

C

P

N

S

A

T

I

O

M X

X

X

XN

P

T

Storage of omplete treesThe properties of a omplete tree lead to a very eÆ ient storage me hanismusing n sequential lo ations in an array.If we number the nodes from 1 at theroot and pla e:� the left hild of node k at position2k� the right hild of node k at posi-tion 2k + 1.Then the '�ll from the left' nature ofthe omplete tree ensures that the heap an be stored in onse utive lo ationsin an array. 8

4

9

5

10

3

7

2

11

1

12

6

13 14

6.2. HEAPS 571 2 3 4 5 6 7 8 9

n=9

k 2k

2k+1

left

rightViewed as an array, we an see that thenth node is always in index position n.

Here is the outline of the implementation of a SimpleColle tion with theproperties of a heap:publi lass Heap implements SimpleColle tion {Comparable t[℄;int size, n;/* ... Constru tor, et ... */private void move_down( int j ) {/* ... see move_down listing for details ... */}/* Extra t the highest priority item from the heap. */publi Comparable extra t() {if ( n <= 0 ) return null;Comparable x = t[0℄; // Highest priority itemt[0℄ = t[n-1℄; // Move 'last' item to repla e itn--; // De rement item ountmove_down( 0 ); // Move 'last' item down to it's orre t pla ereturn x;}}The ode for extra ting the highest priority item from a heap is, naturally,re ursive. On e we've extra ted the root (highest priority) item and swappedthe last item into its pla e, we simply all move down re ursively until we get tothe bottom of the tree.

58 CHAPTER 6. QUEUES/** Colle tion with Priority Queue apabilities **/publi lass Heap implements SimpleColle tion {Comparable t[℄;int size, n;stati int left( int k ) { return 2*k + 1; }/* Move the entry at j down to its orre t pla e - alled after a deletion when the last element is moved (temporarily)to the first position and needs to be moved down to its orre t position*/private void move_down( int j ) {int l , r , large_index;while( (l =left(j)) < n ) {r = l + 1;// Find the larger hildComparable larger_ hild = t[l ℄;large_index = l ;if( r < n ) { // Make sure that there is a right hild!if ( larger_ hild. ompareTo( t[r ℄ ) < 0 ) {larger_ hild = t[r ℄;large_index = r ;}}// larger_ hild has a referen e to the larger hild// large_index is the left or right index, depending on whether// left or right is largerif ( t[j℄. ompareTo( larger_ hild ) < 0 ) { // Swap neededComparable tmp = t[j℄;t[j℄ = t[large_index℄; t[large_index℄ = tmp;// Continue with the bran h ontaining the larger valuej = large_index; ontinue;}else break; // No swaps, finished}}publi Comparable extra t() { // See previous listing for details}}Note the stati fun tion, left(int k), whi h simply en odes the relation be-tween the index of a node and its left hild. The right hild of node k is atleft(k)+1.Inserting into a heap follows a similar strategy, ex ept that we use a move upfun tion to move the newly added item to its orre t pla e. Here is the Heap lass with the add method added:

6.2. HEAPS 59/** Colle tion with Priority Queue apabilities **/publi lass Heap implements SimpleColle tion {Comparable t[℄;int size, n;private void move_up( int j ) {// See next listing for detail}/** Add an item to the heap. Returns true if the addition wassu essful, false if it failed. Failure will o ur if the heapis full. **/publi boolean add( Obje t data ) {if ( n < size ) {// Add new item at the end of the omplete treet[n℄ = (Comparable)data;n++;// Move it up to the orre t pla e (if ne essary)move_up( n-1 );return true;}else {System.out.println("add: Heap full: " + n + " items");}return false;}}

The implementation of the move up fun tion is straightforward: it uses, parent(k),the inverse of the left(k) fun tion to �nd the parent of the urrent node and he ks it:

60 CHAPTER 6. QUEUES/** Colle tion with Priority Queue apabilities **/publi lass Heap implements SimpleColle tion {Comparable t[℄;int size, n;stati int parent( int k ) { return (k-1)/2; }/* Move the entry at j up to its orre t pla e - alled after an insertion to move the last element up to its orre t position*/private void move_up( int j ) {int par;do {par = parent(j);if ( t[j℄. ompareTo( t[par℄ ) > 0 ) {Comparable tmp = t[j℄;t[j℄ = t[par℄; t[par℄ = tmp;j = par;}else break;}while( j > 0 );}publi boolean add( Obje t data ) {if ( n < size ) {// Add new item at the end of the omplete treet[n℄ = (Comparable)data;n++;// Move it up to the orre t pla e (if ne essary)move_up( n-1 );return true;}else {System.out.println("add: Heap full: " + n + " items");}return false;}}Heaps provide us with a method of sorting, known as heapsort . However, wewill examine and analyse some simpler sorting methods �rst.Animation - Heap InsertionIn the animation, note that both the array representation (used in the imple-mentation of the algorithm) and the (logi al) tree representation are shown.This is to demonstrate how the tree is restru tured to make a heap again afterevery insertion or deletion.

6.2. HEAPS 61Key TermsFIFO queueA queue in whi h the �rst item added is always the �rst one out.LIFO queueA queue in whi h the item most re ently added is always the �rst oneout.Priority queueA queue in whi h the items are sorted so that the highest priority itemis always the next one to be extra ted.Life riti al systemsSystems on whi h we depend for safety and whi h may result in death orinjury if they fail: medi al monitoring, industrial plant monitoring and ontrol and air raft ontrol systems are examples of life riti al systems.Real time systemsSystems in whi h time is a onstraint. A system whi h must respondto some event (e.g. the hange in attitude of an air raft aused by someatmospheri event like wind-shear) within a �xed time to maintain sta-bility or ontinue orre t operation (e.g. the air raft systems must makethe ne essary adjustments to the ontrol surfa es before the air raft fallsout of the sky!).Complete TreeA balan ed tree in whi h the distan e from the root to any leaf is eitherh or h� 1.

62 CHAPTER 6. QUEUES

Chapter 7SortingSorting is one of the most important operations performed by omputers. Inthe days of magneti tape storage - before modern dis -based databases - itwas almost ertainly the most ommon operation performed by omputers asmost "database" updating was done by sorting transa tions and merging themwith a master �le. Although modern databases tend to maintain indi es inorder by inserting keys into the orre t pla e (whi h an be done in O(logn)time, see Se tion se :red-bla k), it's still important for presentation of dataextra ted from databases: most people prefer to have the exibility of sortingthe individual re ords into some relevant order (not ne essarily the order of thekeys!) before wading through pages of data. Some important algorithms, e.g.Dijkstra's algorithm (Se tion 11), also ontain sorting steps.7.1 Bubble, Sele tion, Insertion SortsThere are a large number of variations of one basi strategy for sorting. It's thesame strategy that you use for sorting your bridge hand. You pi k up a ard,start at the beginning of your hand and �nd the pla e to insert the new ard,insert it and move all the others up one pla e.We ould make a sortable version of the AColle tion lass, whi h we built inSe tion 3.2, by adding a sort method:63

64 CHAPTER 7. SORTINGpubli lass SortableColle tion implements SimpleColle tion {private Comparable a[℄;private int ount = 0;// Constru tor, other SimpleColle tion methods ..publi void sort( ) {// The first item, a[0℄, is onsidered sorted ...for(int i=1;i< ount;i++) {/* Sele t the item at the beginning of theas yet unsorted se tion */Obje t v = a[i℄;/* Work ba k through the array, finding where vshould go */int j = i;/* If this element is greater than v,move it up one */while ( v. ompareTo( a[j-1℄ ) < 0 ) {a[j℄ = a[j-1℄; j = j-1;if ( j <= 0 ) break;}/* Stopped when a[j-1℄ <= v, put v at position j */a[j℄ = v;}}}Note that, sin e we now need some way of de�ning the order of obje ts, we've re-quired that obje ts in the olle tion be omparable, i.e. implement the Comparableinterfa e, whose only method is ompareTo.Animation - Insertion SortThe animation illustrates an insertion sort: note how the pointer moves theitems already sorted up to make way for the new item.7.1.1 Bubble SortAnother variant of this pro edure, alled bubble sort, is ommonly taught. Thesort method in the previous example ould be repla ed with:

7.2. HEAP SORT 65/* Bubble sort for integers */private stati void SWAP( Obje t a, Obje t b) {Obje t t = a; a=b; b=t;}void sort( ) {int i, j;/* Make n passes through the array */for(int i=0;i<n;i++) {/* From the first element to the endof the unsorted se tion */for(int j=1;j<(n-i);j++) {/* If adja ent items are out of order, swap them */if( a[j-1℄. ompareTo(a[j℄) > 0 ) SWAP(a[j-1℄,a[j℄);}}}7.1.2 AnalysisBoth of these algorithms require n� 1 passes: ea h pass pla es one item in its orre t pla e. (The nth is then in the orre t pla e also.) The ith pass makeseither i or n� i omparisons and moves. So:T (n) = 1 + 2 + : : :+ (n� 1)= n�1Xi=1 i= n2 (n+ 1)or O(n2) - but we already know we an use heaps to get an O(n logn) algorithm.Thus these algorithms are only suitable for small problems where their simple ode makes them faster than the more omplex ode of the O(n logn) algorithm.As a rule of thumb, expe t to �nd an O(n logn) algorithm faster for n > 10 -but the exa t value depends very mu h on individual ma hines! They an beused to squeeze a little bit more performan e out of fast sort algorithms - seese tion 7.3.3, where enhan ed qui k sorting algorithms are dis ussed.7.2 Heap SortWe noted earlier, when dis ussing heaps, that, as well as their use in priorityqueues, they provide a means of sorting:1. onstru t a heap,2. add ea h item to it (maintaining the heap property!),3. when all items have been added, remove them one by one (restoring theheap property as ea h one is removed).

66 CHAPTER 7. SORTINGAddition and deletion are both O(logn) operations. We need to perform nadditions and deletions, leading to an O(n logn) algorithm. We will look atanother eÆ ient sorting algorithm, qui k sort (Se tion 7.3), and then ompareit with heap sort.Animation - Heap SortThe following animation uses a slight modi� ation of the above approa h tosort dire tly using a heap. You will note that it pla es all the items into thearray �rst, then takes items at the bottom of the heap and restores the heapproperty, rather than restoring the heap property as ea h item is entered as thealgorithm above suggests. (This approa h is des ribed more fully in Cormen etal.) Note that the animation shows the data stored in an array (as it is in theimplementation of the algorithm) and also in the tree form - so that the heapstru ture an be learly seen. Both representations are, of ourse, equivalent.7.3 Qui k SortQui k sort is a very eÆ ient sorting algorithm invented by C.A.R. Hoare . Ithas two phases:� the partition phase and� the sort phase.As we will see, most of the work is done in the partition phase - it works outwhere to divide the work. The sort phase simply sorts the two smaller problemsthat are generated in the partition phase.This makes qui k sort a good example of the divide and onquer strategyfor solving problems. (You've already seen an example of this approa h in thebinary sear h pro edure.) In qui k sort, we divide the array of items to besorted into two partitions and then all the qui k sort pro edure re ursively tosort the two partitions, i.e. we divide the problem into two smaller ones and onquer by solving the smaller ones. Thus the onquer part of the qui k sortroutine looks like this:Again, we re ode the sort method of the SortableColle tion lass:publi lass SortableColle tion implements SimpleColle tion {private Comparable a[℄;int ount;private void qui ksort( int low, int high ) {/* Termination ondition! */if ( high > low ) {int pivot = partition( low, high );qui ksort( low, pivot-1 );qui ksort( pivot+1, high );}}

7.3. QUICK SORT 67a

low

< pivot

pivot high

> pivotInitial Step - Partition > pivot

pivot high

a

low

<pivot

pivot

>pivota. Sort left partition in the same wayb. Sort right partition in the same wayFor the strategy to be e�e tive, the partition phase must ensure that all theitems in one part (the lower part) are less than all those in the other (upper)part. To do this, we hoose a pivot element and arrange that all the items inthe lower part are less than the pivot and all those in the upper part greaterthan it. In the most general ase, we don't know anything about the items tobe sorted, so that any hoi e of the pivot element will do - the �rst element isa onvenient one.Animation - Qui k sortAs an illustration of this idea, you an view this animation, whi h shows apartition algorithm in whi h items to be sorted are opied from the originalarray to a new one: items smaller than the pivot are pla ed to the left of thenew array and items greater than the pivot are pla ed on the right. In the�nal step, the pivot is dropped into the remaining slot in the middle. Observethat the animation uses two arrays for the items being sorted: thus it requiresO(n) additional spa e to operate. However, it's possible to partition the arrayin pla e. The next page shows a onventional implementation of the partitionphase whi h swaps elements in the same array and thus avoids using extra spa e.7.3.1 Partition in pla eMost implementations of qui k sort make use of thefa t that you an partition in pla e by keeping twopointers: one moving in from the left and a se ondmoving in from the right. They are moved towardsthe entre until the left pointer �nds an elementgreater than the pivot and the right one �nds anelement less than the pivot. 45 38 12 42 34 862 24 76 48

left rightpivot

These two elements are then swapped. The pointers are then moved inwardagain until they " ross over". The pivot is then swapped into the slot to whi hthe right pointer points and the partition is omplete. Here is an implementationof the internal (private) method, partition:

68 CHAPTER 7. SORTINGprivate void swap( int low, int high ) {Comparable x = a[low℄;a[low℄ = a[high℄;a[high℄ = x;}private int partition( int low, int high ) {int left, right;Comparable pivot_item = a[low℄;pivot = left = low;right = high;while ( left < right ) {/* Move left while item < pivot */while( pivot_item. ompareTo(a[left℄) < 0 ) left++;/* Move right while item > pivot */while( pivot_item. ompareTo( a[right℄ ) > 0 ) right--;if ( left < right ) swap( left, right );}/* right is final position for the pivot */a[low℄ = a[right℄;a[right℄ = pivot_item;return right;}Note that this ode does not he k that left does not ex eed the array bound. Youneed to add this he k, before performing the swaps - both the one in the loop and the�nal one outside the loop. partition ensures that all items less than the pivotpre ede it and returns the position of the pivot. This meets our ondition fordividing the problem: all the items in the lower half are known to be less thanthe pivot and all items in the upper half are known to be greater than it.7.3.2 AnalysisThe partition routine examines every item in the array at most on e, so it is learly O(n). Usually, the partition routine will divide the problem into tworoughly equal sized partitions. We know that we an divide n items in halflog2 n times. This makes qui k sort a O(n logn) algorithm - equivalent to heapsort.Animation - Qui k sortThis animation shows qui k sort operating with an "in pla e" partition algo-rithm. However, we have made an unjusti�ed assumption - see if you an identifyit before you ontinue.7.3.3 Qui k sort - The Fa ts!Qui k sort is generally the best known sorting algorithm, but it has a seriouslimitation, whi h must be understood before using it in ertain appli ations.What happens if we apply the sort method on the previous page to an alreadysorted array? This is ertainly a ase where we'd expe t the performan e to

7.3. QUICK SORT 69be quite good! However, the �rst attempt to partition the problem into twoproblems will return an empty lower partition - the �rst element is the smallest.Thus the �rst partition all simply hops o� one element and alls sort for apartition with n�1 items! This happens a further n�2 times! Ea h partition allstill requires O(n) operations - and we have generated O(n) su h alls. In theworst ase, qui k sort is an O(n2) algorithm! Can we do anything about this?A number of variations to the simple qui k sort will generally produ e betterresults: rather than hoose the �rst item as the pivot, some other strategieswork better.Median-of-3 PivotFor example, the median-of-3 pivot approa h sele ts three andidate pivots anduses the median one. If the three pivots are hosen from the �rst, middle andlast positions, then it is easy to see that for the already sorted array, this willprodu e an optimum result: ea h partition will be exa tly half (one element) ofthe problem and we will need exa tly blogn re ursive alls.Random pivotSome qui k sort's will simply use a randomly hosen pivot. This also works �nefor sorted arrays - on average the pivot will produ e two equal sized partitionsand there will be O(logn) of them. However, whatever strategy we use for hoosing the pivot, it is possible to �nd a pathologi al ase in whi h the problemis not divided equally at any partition stage. Thus qui k sort must always betreated as potentially O(n2)!Why bother with qui k sort then? Heap sort is always O(n logn): whynot just use it? Empiri al studies show that generally qui k sort is onsiderablyfaster than heap sort. The following ounts of ompare and ex hange operationswere made for three di�erent sorting algorithms running on the same data:n Qui k Heap InsertCompare Swap Compare Swap Compare Swap100 712 148 2,842 581 2,595 899200 1,682 328 9,736 1,366 10,307 3,503500 5,102 919 53,113 4,042 62,746 21,083Thus, when an o asional "blowout" to O(n2) is tolerable, we an expe t that,on average, qui k sort will provide onsiderably better performan e - espe iallyif one of the modi�ed pivot hoi e pro edures is used. Most ommer ial appli a-tions would use qui k sort for its better average performan e: they an toleratean o asional long run (whi h just means that a report takes slightly longer toprodu e on full moon days in leap years) in return for shorter runs most of thetime. However, qui k sort should never be used in appli ations whi h require aguarantee of response time, unless it is treated as an O(n2) algorithm in al u-lating the worst- ase response time. If you have to assume O(n2) time, then -if n is small, you're better o� using insertion sort - whi h has simpler ode andtherefore smaller onstant fa tors. And if n is large, you should obviously beusing heap sort, for its guaranteed O(n logn) time. Life- riti al (medi al moni-toring, life support in air raft and spa e raft) and mission- riti al (monitoring

70 CHAPTER 7. SORTINGand ontrol in industrial and resear h plants handling dangerous materials, on-trol for air raft, defen e, et .) software will generally have a response time aspart of the system spe i� ations. In all su h systems, it is not a eptable todesign based on average performan e, you must always allow for the worst ase,and thus treat qui ksort as O(n2).So far, our best sorting algorithm has O(n logn) performan e: an we do anybetter? In general, the answer is no. However, if we know something about theitems to be sorted, then we may be able to do better. But �rst, we should lookat squeezing the last drop of performan e out of qui k sort.A Qui ker Qui k SortTwo things an be done to eke a little more performan e out of your pro essorwhen sorting:1. Qui k sort - in its usual re ursive form - has a reasonably high onstantfa tor relative to a simpler sort su h as insertion sort. Thus, when thepartitions be ome small (n <� 10), a swit h to insertion sort for the smallpartition will usually ause a measurable speed-up. (The point at whi h itbe omes e�e tive to swit h to the insertion sort is extremely sensitive toar hite tural features and needs to be determined for any target pro essor:although a value of � 10 is a reasonable guess!)2. Re ursive algorithms normally su�er from a small (hidden) overhead as-so iated with the fun tion alls and management of sta k frames. This an often be redu ed by re- oding the algorithm in an iterative form. Theiterative version will perform a smaller number of memory allo ations and opy fewer parameters, saving a measurable amount of time at the expenseof simpli ity and elegan e in the ode!.Exer ise:Write the whole algorithm in an iterative form.For a qui k sort that uses the �rst element in ea h partition as a pivot,whi h sorts faster: data whi h is already sorted or data whi h is sorted inreverse order?Key TermsBubble, Insertion, Sele tion SortsSimple sorting algorithms with O(n2) omplexity - suitable for sortingsmall numbers of items only.Divide and Conquer AlgorithmsAlgorithms that solve ( onquer) problems by dividing them into smallersub-problems until the problem is so small that it is trivially solved.in pla eIn pla e sorting algorithms don't require additional temporary spa e tostore elements as they sort; they use the spa e originally o upied bythe elements.

Chapter 8Bin Sort8.1 Bin SortAssume that the keys of the items that we wish to sort lie in a small �xed rangeand that there is only one item with ea h value of the key. Then we an sortwith the following pro edure:� Set up an array of `bins' - one for ea h value of the key - in order,� Examine ea h item and use the value of the key to pla e it in the appro-priate bin,� Copy all the items from the bins ba k into an array.Now our olle tion is sorted and it only took n operations, so this is an O(n)operation. However, note that it will only work under very restri ted onditions.8.1.1 Constraints on bin sortTo understand these restri tions, let's be a little more pre ise about the spe i�- ation of the problem and assume that there are m values of the key. To re overour sorted olle tion, we need to examine ea h bin. This adds a third step tothe algorithm above, Examine ea h bin to see whether there's an item in it.whi h requires m operations. So the algorithm's time be omes:T (n) = 1n+ 2mand it is stri tly O(n +m). Now if m << n, this is learly O(n). However ifm >> n, then it is O(m). For example, if we wish to sort 104 32-bit integers,then m = 232 and we need 232 operations (and a rather large memory!). Forn = 104: n logn � 104 � 13 � 213 � 24 � 217So qui k sort or heap sort would learly be preferred. An implementation of binsort might look like: 71

72 CHAPTER 8. BIN SORTpubli lass SmallIntegerColle tion ... {stati final int EMPTY = -1; // Some onvenient flagprivat int max_value;int a[℄, bins[℄, n;publi SmallIntegerColle tion(int max_size, int max_value ) {a = new int[ max_size ℄;bins = new int[ max_value + 1 ℄;this.max_value = max_value;n = 0;}void sort( ) {int i;/* Pre- ondition: for 0<=i<n : 0 <= a[i℄ < M *//* Mark all the bins empty */for(i=0;i<=max_value;i++) bin[i℄ = EMPTY;for(i=0;i<n;i++)bin[ a[i℄ ℄ = a[i℄;}...}If there are dupli ates, then ea h bin an be repla ed by a linked list. The thirdstep then be omes:� Link all the lists into one list.We an add an item to a linked list in O(1) time. There are n items requiringO(n) time. Linking a list to another list simply involves making the tail of onelist point to the other, so it is O(1). Linking m su h lists obviously takes O(m)time, so the algorithm is still O(n +m). In ontrast to the other sorts, whi hsort in pla e and don't require additional memory, bin sort requires additionalmemory for the bins and is a good example of trading spa e for performan e.Although memory tends to be heap in modern pro essors - so that we wouldnormally use memory rather pro igately to obtain performan e, memory on-sumes power and in some ir umstan es, e.g. omputers in spa e raft, powermight be a higher onstraint than performan e. Having highlighted this on-straint, there is a version of bin sort whi h an sort in pla e:void sort( int a[℄, int n ) {int i;/* Pre- ondition: for 0<=i<n : 0 <= a[i℄ < n */for( int i=0; i<n; i++ )if ( a[i℄ != i ) {int tmp = a[i℄; // swapa[i℄ = a[a[i℄℄;a[a[i℄℄ = tmp;};}}

8.2. RADIX SORTING 73However, this assumes that there are n distin t keys in the range 0::n � 1. Inaddition to this restri tion, the swap operation is relatively expensive, so thatthis version trades spa e for time. The bin sorting strategy may appear ratherlimited, but it an be generalised into a strategy known as Radix sorting.Animation - Bin SortThis animation shows a bin sort operating with ten bins into whi h it dropsnumbered balls. You an hoose a set of randomly ordered data and also datawhi h is already sorted - both in as ending and des ending order. The O(n)time omplexity should be lear from observing the animation: ea h ball is"tou hed" only twi e - on e when it is pla ed in its bin and on e when it isextra ted from the bin again. Sample (Java) sour e ode may be viewed bysele ting Sour e Code from the View menu.8.2 Radix SortingThe bin sorting approa h an be generalised in a te hnique that is known asradix sorting. An example:Assume that we have n integers in the range [0; n2) to be sorted.(For a bin sort, m = n2, and we would have an O(n +m) = O(n2)algorithm.) Sort them in two phases:� Using n bins, pla e ai into bin ai mod n,� Repeat the pro ess using n bins, pla ing ai into bin bai=n ,being areful to append to the end of ea h bin.This results in a sorted list.As an example, onsider the list of integers:36 9 0 25 1 49 64 16 81 4n is 10 and the numbers all lie in (0; 99). After the �rst phase, we will have:Bin 0 1 2 3 4 5 6 7 8 9Content 0 81 - 64 25 36 - - 91 4 16 49Note that in this phase, we pla ed ea h item in a bin indexed by the leastsigni� ant de imal digit. Repeating the pro ess, will produ e:Bin 0 1 2 3 4 5 6 7 8 9Content 0 16 25 36 49 64 81149In this se ond phase, we used the leading de imal digit to allo ate items to bins,being areful to add ea h item to the end of the bin. We an apply this pro essto numbers of any size expressed to any suitable base or radix.

74 CHAPTER 8. BIN SORT8.2.1 Generalised Radix SortingWe an further observe that it's not ne essary to use the same radix in ea hphase.Suppose that the sorting key is a se-quen e of �elds, ea h with boundedranges, e.g. the key is a date with theseattributes: publi lass date {int day;int month;int year;...}If the ranges for day and month are limited in the obvious way and the rangefor year is suitably onstrained, e.g. 1900 < year <= 2000, then we an applythe same pro edure ex ept that we'll employ a di�erent number of bins in ea hphase. In all ases, we'll sort �rst using the least signi� ant "digit" (where"digit" here means a �eld with a limited range), then sort using the next sig-ni� ant "digit", pla ing ea h item after all the items already in the bin, and soon. Assume that the key of the item to be sorted has k �elds, fiji = 0::k � 1,and that ea h fi has si dis rete values, then a generalised radix sort pro edure an be written:radixsort( A, n ) {for(i=0;i<k;i++) {for(j=0;j<si;j++) bin[j℄ = EMPTY; O(si)for(j=0;j<n;j++) {move $A_i$to the end of bin[$A_i$->$f_i$℄} O(n)for(j=0;j<si;j++) on atenate bin[j℄ onto the end of A;}} O(si)Total �ki=1 O(si + n)= O(kn) + �ki=1si= O(n) + �ki=1siNow if, for example, the keys are integers in [0; bk � 1℄, for some onstant k,then the keys an be viewed as k-digit base-b integers. Thus, si = b for all iand the time omplexity be omes O(n+ kb) or O(n). This result depends on kbeing onstant.If k is allowed to in rease with n, then we have a di�erent pi ture. For example,it takes log2 n binary digits to represent an integer < n. If the key length wereallowed to in rease with n, so that k = logn, then we would have:�ki=1O(si + n) = O(n logn+�logni=1 2) = O(n logn)Another way of looking at this is to note that if the range of the key is restri tedto (0; bk � 1), then we will be able to use the radix sort approa h e�e tively if

8.2. RADIX SORTING 75we allow dupli ate keys when n > bk. However, if we need to have unique keys,then k must in rease to at least logb n. Thus, as n in reases, we need to havelogn phases, ea h taking O(n) time, and the radix sort is the same as qui ksort![Sample odeThis sample ode sorts arrays of integers on various radi es: the number of bitsused for ea h radix an be set with the all to SetRadi es. The Bins lass isused in ea h phase to olle t the items as they are sorted. ConsBins is alled toset up a set of bins: ea h bin must be large enough to a ommodate the wholearray, so RadixSort an be very expensive in its memory usage! Sample odeRadixSort.h RadixSort. Bins.h Bins. ℄Animation - Radix SortThis animation demonstrates the sorting of a set of numbers with 4 de imaldigits. Ea h phase sorts on one of the digits: so the sort requires 4 phases with10 bins in ea h phase.Exer ise:How many times is an obje t ompared with another one in� radix sort� qui k sort� bubble sort (worst ase)� bubble sort (best ase)How many times is an obje t 'examined' (i.e. a bin index is al ulated orit is ompared to another obje t) in� radix sort� qui k sort - for an item hosen as the pivot� qui k sort - for an item whi h is never a pivot� merge sortWhi h sorts are a�e ted by the order in whi h the data is supplied to them?(For example, will they be qui ker if the data is supplied in sorted, or almostsorted, order?) If the answer is no, an you make a simple modi� ation tothe basi algorithm to make a di�eren e. (All algorithms an be a�e ted bya pre-pro essor pass to he k whether the data is already in order, so ignorethis modi� ation!)

76 CHAPTER 8. BIN SORT

Chapter 9Sear hing Revisited9.1 Red-bla k TreesA red-bla k tree is a binary sear h tree with one extra attribute for ea h node:the olour, whi h is either red or bla k. We also need to keep tra k of the parentof ea h node, so a red-bla k tree's node stru ture is:publi lass RBTreeimplements SimpleColle tion { lass RBNode {boolean is_red;Comparable item;RBNode left, right, parent;}private RBNode root;...}11

2

1 7

5 8

14

15A basi red-bla k treeFor the purpose of this dis ussion, the sentinel nodes whi h terminate the treeare onsidered to be the leaves and are oloured bla k.9.1.1 De�nition of a red-bla k treeA red-bla k tree is a binary sear h tree whi h has the following red-bla k prop-erties:1. Every node is either red or bla k.2. Every leaf (sentinel) is bla k.3. If a node is red, then both its hildren are bla k.4. Every simple path from a node to a des endant leaf ontains the samenumber of bla k nodes.Property 3 implies that on any path from the root to a leaf, red nodes must notbe adja ent. However, any number of bla k nodes may appear in a sequen e.77

78 CHAPTER 9. SEARCHING REVISITED11

2

1 7

5 8

14

15

Basi red-bla k tree with the sentinel nodesadded. Implementations of the red-bla k treealgorithms will usually in lude the sentinelnodes as a onvenient means of agging thatyou have rea hed a leaf node. They are thebla k nodes of property 2.The number of bla k nodes on any path from, but not in luding, a node x toa leaf is alled the bla k-height of a node, denoted bh(x). We an prove thefollowing lemma:Lemma 9.1.1 A red-bla k tree with n internal nodes has height at most 2 log(n+1).Proof (For a proof, see Cormen, p 264)This demonstrates why the red-bla k tree is a good sear h tree: it an alwaysbe sear hed in O(logn) time. As with heaps, additions and deletions from red-bla k trees destroy the red-bla k property, so we need to restore it. To do thiswe need to look at some operations on red-bla k trees.9.1.2 RotationsA rotation is a lo al operation in a sear h tree that preserves in-order traversalkey ordering. Note that in both trees, an in-order traversal yields:A x B y C

C

C

A B

A

B

x

yleft_rotate

right_rotate

x

y

The rotateLeft operation may be en oded:

9.1. RED-BLACK TREES 79private void rotateLeft( RBNode x ) {RBNode y;y = x.right;/* Turn y's left sub-tree into x's right sub-tree */x.right = y.left;if ( y.left != null )y.left.parent = x;/* y's new parent was x's parent */y.parent = x.parent;/* Set the parent to point to y instead of x *//* First see whether we're at the root */if ( x.parent == null ) root = y;elseif ( x == (x.parent).left )/* x was on the left of its parent */x.parent.left = y;else/* x must have been on the right */x.parent.right = y;/* Finally, put x on y's left */y.left = x;x.parent = y;}

9.1.3 InsertionInsertion is somewhat omplex and involves a number of ases. Note that westart by inserting the new node, x, in the tree just as we would for any otherbinary tree, using the addToTree method. This new node is labelled red, andpossibly destroys the red-bla k property. The main loop moves up the tree,restoring the red-bla k property.

80 CHAPTER 9. SEARCHING REVISITEDpubli lass RBTree implements SimpleColle tion {...private RBNode root;private addRBTree( RBnode x ) {/* Insert in the tree in the usual way */add( root, x );/* Now restore the red-bla k property */x.isRed = true;while ( (x != root) && (x.parent.isRed ) ) {if ( x.parent == x.parent.parent.left ) {/* If x's parent is a left, y is x's right un le */y = x.parent.parent.right;if ( y.isRed ) {/* ase 1 - hange the olours */x.parent.isRed = false;y.isRed = false;x.parent.parent.isRed = true;/* Move x up the tree */x = x.parent.parent;}else {/* y is a bla k node */if ( x == x.parent.right ) {/* and x is to the right *//* ase 2 - move x up and rotate */x = x.parent;rotateLeft( x );}/* ase 3 */x.parent.isRed = false;x.parent.parent.isRed = true;rotateRight( x.parent.parent );}}else {/* repeat the "if" part with right and leftex hanged */}/* Colour the root bla k */root.isRed = false;}publi add( Obje t y ) {RBNode x = new RBNode( y );if ( root == null ) root = x;else addRBTree( x );}...}

9.1. RED-BLACK TREES 819.1.4 Red-Bla k Tree OperationHere's an example of insertion into a red-bla k tree (taken from Cormen, p269).Here's the original tree .. Notethat in the following diagrams,the bla k sentinel nodes havebeen omitted to keep the dia-grams simple.11

2

1 7

5 8

14

15

The tree insert routine has justbeen alled to insert node "4"into the tree. This is no longera red-bla k tree - there are twosu essive red nodes on the path11 ! 2 ! 7 ! 5 ! 4 Mark thenew node, x, and its un le, y. yis red, so we have ase 1 ...11

2

1 7

14

15

5 8

4

y

x

Change the olours of nodes 4, 7and 8. 11

2

1 7

14

15

8

4

5 y

xMove x up to its grandparent, 7.x's parent (2) is still red, so thisisn't a red- bla k tree yet. Markthe un le, y. In this ase, the un- le is bla k, so we have ase 2 ...11

2

1 7

14

15

8

4

5

x

y

Move x up and rotate left. 11

2

1 7

14

15

8

4

5

yx

82 CHAPTER 9. SEARCHING REVISITEDStill not a red-bla k tree .. theun le is bla k, but x's parent isto the left .. 11

14

15

7

82

1

4

5

y

x

Change the olours of 7 and 11and rotate right .. 11

14

15

7

82

1

4

5

y

x

This is now a red-bla k tree, sowe're �nished!O(logn) time! 2

1

7

11

8 14

15

5

4

x

9.1.5 DeletionDeletion follows a similar strategy:1. Delete a node using the same pro edure used for deleting from a binarysear h tree.2. Restore the red-bla k tree properties by working up the tree from thepoint at whi h a node was ex ised to be moved up to repla e the deletedone.The ode is also similar to the addition ode: a binary tree deletion method is alled �rst, followed by exe ution of the ode to restore the red-bla k property.Animation - Red-bla k treeThis animation shows the insertion operation using four history panels - readthem from left to right with the most re ent step on the right. Before runningthe animation, sele t a data set with the sele t menu. On e a basi tree hasbeen built from one of the supplied data sets, you an add or delete individualnodes using the pop-up dialogue boxes from the A tion menu.9.1.6 AnalysisExamination of the ode reveals only one loop. In that loop, the node at theroot of the sub-tree whose red-bla k property we are trying to restore, x, may bemoved up the tree at least one level in ea h iteration of the loop. Sin e the treeoriginally has O(logn) height, there are O(logn) iterations. The addToTree

9.1. RED-BLACK TREES 83routine also has O(logn) omplexity, so overall the addRBTree routine also hasO(logn) omplexity.

84 CHAPTER 9. SEARCHING REVISITED9.2 AVL TreesAn AVL tree is another balan ed binary sear h tree. Named after their inven-tors, Adelson-Velskii and Landis , AVL trees were the �rst dynami ally balan edtrees to be proposed. Like red-bla k trees, they are not perfe tly balan ed, butpairs of sub-trees di�er in height by at most 1, maintaining an O(logn) sear htime. Addition and deletion operations also take O(logn) time.9.2.1 De�nition of an AVL treeAn AVL tree is a binary sear h tree whi h hasthe following properties:1. The sub-trees of every node di�er inheight by at most one.2. Every sub-tree is an AVL tree.Balan e requirement for an AVL tree:the left and right sub-trees di�er by at most 1in height. hh−1 h−2You need to be areful with this de�nition: it permits some apparently unbal-an ed trees! For example, here are some trees:Binary Sear h Tree AVL tree?8

12

5 11

4

18

17

YesExamination shows that ea h left sub- tree has aheight 1 greater than ea h right sub-tree.8

12

5 11

4 7

18

17

2

NoSub-tree with root 8 has height 4 and sub-tree withroot 18 has height 2.9.2.2 InsertionAs with the red-bla k tree, insertion is somewhat omplex and involves a num-ber of ases. Implementations of AVL tree insertion may be found in manytextbooks: they rely on adding an extra attribute, the balan e fa tor to ea hnode. This fa tor indi ates whether the tree is left-heavy (the height of theleft sub-tree is 1 greater than the right sub-tree), balan ed (both sub-trees arethe same height) or right-heavy (the height of the right sub-tree is 1 greater

9.2. AVL TREES 85than the left sub-tree). If the balan e would be destroyed by an insertion, arotation is performed to orre t the balan e.1

2 1

2

A new item has beenadded to the left sub-tree of node 1, ausing itsheight to be ome 2 greaterthan 2's right sub- tree(shown in green). A right-rotation is performed to orre t the imbalan e.

86 CHAPTER 9. SEARCHING REVISITED9.2.3 General n-ary treesIf we relax the restri tion that ea h node an have only one key, we an redu ethe height of the tree.An m-way sear h tree1. is empty or2. onsists of� a root ontaining j (1 � j < m) keys, kj , and� a set of sub-trees, Ti; (i = 0::j), su h that(a) if k is a key in T0, then k � k1(b) if k is a key in Ti(0 < i < j), then ki � k � ki+1( ) if k is a key in Tj , then k > kj and(d) all Ti are nonempty m-way sear h trees or all Ti are emptyOr in plain English ..A node generally has m� 1 keys and m hildren.Ea h node has alternating sub-tree pointers and keys:sub-tree key sub-tree key : : : key sub-tree1. All keys in a sub-tree to the left of a key are smaller than it.2. All keys in the node between two keys are between those two keys.3. All keys in a sub-tree to the right of a key are greater than it.4. This is the "standard" re ursive part of the de�nition.T0 k1 T1 k2 T2 k3 T3

T0 k1 T1 k2 T2 k3 T3

k1 T1 k2 T2 k3 T3T0 k1 T1 k2 T2 k3 T3T0 T0 k1 T1 k2 T2 k3 T3

T0k1T1k2T2k3T3 T0k1T1k2T2k3T3 T0k1T1k2T2k3T3T0k1T1k2T2k3T3 A quaternary (4-way) treeThe height of a omplete m-ary tree with n nodes is dlogm ne.A B-tree of order m is an m-way tree in whi h1. all leaves are on the same level and2. all nodes ex ept for the root and the leaves have at least m=2 hildren andat most m hildren.3. The root has at least 2 hildren and at most m hildren.A variation of the B-tree, known as a B+-tree onsiders all the keys in nodesex ept the leaves as dummies. All keys are dupli ated in the leaves. This hasthe advantage that is all the leaves are linked together sequentially, the entiretree may be s anned without visiting the higher nodes at all.

9.2. AVL TREES 87T0 k1 T1 k2 T2 k3 T3

T0 k1 T1 k2 T2 k3 T3 k1 T1 k2 T2 k3 T3T0

T0 k1 T1 k2 T2 k3 T3

D1 D4 D0 D3 D8 D5D2 D7

T4 k4 T5 k5 T6 k6 T7 T0 k1 T1 k2 T2 k3 T3 A B+-treeKey TermsRed-bla k treesTrees whi h remain balan ed - and thus guarantee O(logn) sear h times- in a dynami environment or, more importantly - sin e any tree anbe re-balan ed - but at onsiderable ost, an be re-balan ed in O(logn)time.AVL treesTrees whi h remain balan ed - and thus guarantee O(logn) sear h times- in a dynami environment. Or more importantly, sin e any tree an bere-balan ed - but at onsiderable ost - an be re-balan ed in O(logn)time.n-ary trees (or n-way trees)Trees in whi h ea h node may have up to n hildren.B-treeBalan ed variant of an n-way tree.B+-treeB-tree in whi h all the leaves are linked to fa ilitate fast in order traver-sal.

88 CHAPTER 9. SEARCHING REVISITED9.3 Hash Tables9.3.1 Dire t Address TablesIf we have a olle tion of n ele-ments whose keys are unique integersin (0;m � 1), where m � n, then we an store the items in a dire t addresstable, T [m℄, where Ti is either emptyor ontains one of the elements of our olle tion. Sear hing a dire t addresstable is learly an O(1) operation: fora key, k, we a ess Tk, if it ontains anelement, return it, if it doesn't then re-turn a null. There are two onstraintshere:� the keys must be unique, and� the range of the key must beseverely bounded.i

i

j

j

k

ki

j

km

collection Tdirect access table

0

0

0

0

kk

k

i

i

0

0

0

0

j

If the keys are not unique, then we an sim-ply onstru t a set of m lists and store theheads of these lists in the dire t address ta-ble. The time to �nd an element mat hing aninput key will still be O(1). However, if ea helement of the olle tion has some other distin-guishing feature (other than its key), and if themaximum number of dupli ates is nmaxdup , thensear hing for a spe i� element is O(nmaxdup ). Ifdupli ates are the ex eption rather than therule, then nmaxdup is mu h smaller than n and adire t address table will provide good perfor-man e. However if nmaxdup approa hes n, thenthe time to �nd a spe i� element is O(n) anda tree stru ture will be more eÆ ient.The range of the key determines the size of the dire t address table and may betoo large to be pra ti al. For instan e it's not likely that you'll be able to usea dire t address table to store elements whi h have arbitrary 32-bit integers astheir keys for a few years yet! Dire t addressing is easily generalised to the asewhere there is a fun tion, h(k) ! (1;m) whi h maps ea h value of the key, k,to the range (1;m). In this ase, we pla e the element in T [h(k)℄ rather thanT [k℄ and we an sear h in O(1) time as before.9.3.2 Mapping fun tionsThe dire t address approa h requires that the fun tion, h(k), is a one-to-onemapping from ea h k to integers in (1;m). Su h a fun tion is known as aperfe t hashing fun tion: it maps ea h key to a distin t integer within somemanageable range and enables us to trivially build an O(1) sear h time table.Unfortunately, �nding a perfe t hashing fun tion is not always possible. Let'ssay that we an �nd a hash fun tion, h(k), whi h maps most of the keys onto

9.3. HASH TABLES 89unique integers, but maps a small number of keys on to the same integer. If thenumber of ollisions ( ases where multiple keys map onto the same integer), issuÆ iently small, then hash tables work quite well and give O(1) sear h times.9.3.3 Handling the ollisionsIn the small number of ases, where multiple keys map to the same integer, thenelements with di�erent keys may be stored in the same "slot" of the hash table.It is lear that when the hash fun tion is used to lo ate a potential mat h, itwill be ne essary to ompare the key of that element with the sear h key. Butthere may be more than one element whi h should be stored in a single slot ofthe table. Various te hniques are used to manage this problem:� haining,� over ow areas,� re-hashing,� using neighbouring slots (linear probing),� quadrati probing,� random probing, ...9.3.4 ChainingOne simple s heme is to hain all ol-lisions in lists atta hed to the appro-priate slot. This allows an unlimitednumber of ollisions to be handled anddoesn't require a priori knowledge ofhow many elements are ontained in the olle tion. The tradeo� is the same aswith linked lists versus array implemen-tations of olle tions: linked list over-head in spa e and, to a lesser extent, intime. kk1

k2

i

i1

0

0

0

0

j

90 CHAPTER 9. SEARCHING REVISITED9.3.5 Re-hashingj

j

h(k)

0

h(i)

i

k

i

kprimary

area

0

0

h’(j)Re-hashing:h(j) = h(k), so the next hash fun tion,h0 is used.Note that the green arrow merely showsa logi al link between keys hashing tothe same slot: no a tual pointer isneeded.

Re-hashing s hemes use a se ond hash-ing operation when there is a ollision.If there is a further ollision, we re-hashuntil an empty slot in the table is found.The re-hashing fun tion an either be anew fun tion or a re-appli ation of theoriginal one. As long as the fun tionsare applied to a key in the same or-der, then a sought key an always belo ated.9.3.6 Linear probingOne of the simplest re-hashing fun -tions is +1 (or �1), i.e. on a ollision,look in the neighbouring slot in the ta-ble. It al ulates the new address ex-tremely qui kly and may be extremelyeÆ ient on a modern RISC pro essordue to eÆ ient a he utilisation ( f. thedis ussion of linked list eÆ ien y). Theanimation gives you a pra ti al demon-stration of the e�e t of linear probing:it also implements a quadrati re-hashfun tion so that you an ompare thedi�eren e.

jj

k

h(k)

h(i)

i

i

k

0

0

+10

Linear probing:h(j) = h(k), so try the next slot or h(j)+1.Note that the green arrow merely showsa logi al link between keys hashing to thesame slot: no a tual pointer is needed.9.3.7 ClusteringLinear probing is subje t to a lustering phenomenon. Re-hashes from onelo ation o upy a blo k of slots in the table whi h "grows" towards slots towhi h other keys hash. This exa erbates the ollision problem and the number

9.3. HASH TABLES 91of re-hashes an be ome large. The animation illustrates this quite well.9.3.8 Quadrati ProbingBetter behaviour is usually obtained with quadrati probing, where the se -ondary hash fun tion depends on the re-hash index: for the ith re-hash:address = h(key) + i2(A more omplex fun tion of i may also be used.) Sin e keys whi h are mappedto the same value by the primary hash fun tion follow the same sequen e ofaddresses, quadrati probing shows se ondary lustering.However, se ondary lustering is not nearly as severe as the lustering shown bylinear probes. Re-hashing s hemes use the originally allo ated table spa e andthus avoid linked list overhead, but require advan e knowledge of the number ofitems to be stored. However, the ollision elements are stored in slots to whi hother key values map dire tly, thus the potential for multiple ollisions in reasesas the table be omes full.9.3.9 Over ow areaAnother s heme will divide the pre-allo ated table into two se tions: theprimary area to whi h keys are mappedand an area for ollisions, normallytermed the over ow area.When a ollision o urs, a slot in theover ow area is used for the new el-ement and a link from the primaryslot established as in a hained system.This is essentially the same as hain-ing, ex ept that the over ow area ispre-allo ated and thus possibly fasterto a ess. As with re- hashing, themaximum number of elements must beknown in advan e, but in this ase, twoparameters must be estimated: the op-timum size of the primary and over owareas. Of ourse, it is possible to designsystems with multiple over ow tables,or with a me hanism for handling over- ow out of the over ow area, whi h pro-vide exibility without losing the ad-vantages of the over ow s heme.

h(k)

0

h(i)

i

j

k

j

i

kprimary

area

0

0

0 areaoverflowUsing an over ow area:h(j) = h(k), put j in the �rst availableslot in the over ow area.In this ase, the green arrow shows thelink that is needed in order to lo ate jagain. E�e tively a linked list is formed inthe over ow area.

92 CHAPTER 9. SEARCHING REVISITED9.3.10 Summary - Hash Table OrganizationOrganization Advantages DisadvantagesChaining Unlimited number of ele-ments Overhead of multiplelinked listsUnlimited number of olli-sionsRe-hashing Fast re-hashing Maximum number of ele-ments must be knownFast a ess through use ofmain table spa e Multiple ollisions maybe ome probableOver ow area Fast a ess Two parameters whi hgovern performan e needto be estimatedCollisions don't use pri-mary table spa eAnimation - hash tableThe animation illustrates forming a hash table from an input text stream: ea hin oming word is passed through a hash fun tion and stored in the table. His-tograms show the hain lengths - the number of omparisons that must be madebefore the orre t word may be found number of ollisions during insertion. Amap of the full table whi h shows the lustering behaviour is also shown in thebottom left orner.9.3.11 Hashing Fun tionsChoosing a good hashing fun tion, h(k), is essential for hash-table based sear h-ing. h should distribute the elements of our olle tion as uniformly as possibleto the "slots" of the hash table. The key riterion is that there should be aminimum number of ollisions. If the probability that a key, k, o urs in our olle tion is P (k), then if there are m slots in our hash table, a uniform hashingfun tion, h(k), would ensure:�kjh(k)=0P (k) = �kjh(k)=1P (k) = : : : = �kjh(k)=m�1P (k) = 1mSometimes, this is easy to ensure. For example, if the keys are randomly dis-tributed in (0; r℄, then, h(k) = bmkr will provide uniform hashing.9.3.12 Mapping keys to natural numbersMost hashing fun tions will �rst map the keys to some set of natural numbers,say (0,r℄. There are many ways to do this, for example if the key is a string ofASCII hara ters,� we an simply add the ASCII representations of the hara ters mod 255to produ e a number in (0; 255) or� we ould xor them, or� we ould add them in pairs mod216 � 1, or ...

9.3. HASH TABLES 93Having mapped the keys to a set of natural numbers, we then have a numberof possibilities.Use a mod fun tion: h(k) = kmodm. When using this method, we usuallyavoid ertain values of m. Powers of 2 are usually avoided, for kmod2b simplysele ts the b low order bits of k. Unless we know that all the 2b possible valuesof the lower order bits are equally likely, this will not be a good hoi e, be ausesome bits of the key are not used in the hash fun tion.Prime numbers whi h are lose to powers of 2 seem to be generally good hoi esfor m. For example, if we have 4000 elements, and we have hosen an over owtable organization, but wish to have the probability of ollisions quite low, thenwe might hoose m = 4093. (4093 is the largest prime less than 4096 = 212.)Use the multipli ation method: Multiply the key by a onstant A; 0 <A < 1, Extra t the fra tional part of the produ t, Multiply this value by m.Thus the hash fun tion is: h(k) = b(m � (kA� bkA )) In this ase, the valueof m is not riti al and we typi ally hoose a power of 2 so that we an get thefollowing eÆ ient pro edure on most digital omputers:1. Choose m = 2p.2. Multiply the w bits of k by bA2w to obtain a 2w bit produ t.3. Extra t the p most signi� ant bits of the lower half of this produ t.It seems that: A = (p5� 1)=2 = 0:6180339887 is a good hoi e1.Use universal hashing: A mali ious adversary an always hose the keysso that they all hash to the same slot, leading to an average O(n) retrievaltime. Universal hashing seeks to avoid this by hoosing the hashing fun tionrandomly from a olle tion of hash fun tions ( f. Cormen et al., p 229- ). Thismakes the probability that the hash fun tion will generate poor behaviour smalland produ es good average performan e.1See Knuth, "Sorting and Sear hing", v. 3 of "The Art of Computer Programming".

94 CHAPTER 9. SEARCHING REVISITEDExer ise:1. I want to onstru t a spell he ker from a list of words. I have ob-tained a set of approximate word frequen ies by analyzing a large orpus of do uments. Can I use the word frequen ies to improve theperforman e of a spell he ker based on a hash table? How?2. A list of words was added to a hash table and the table's iteratorthen used to return ea h word in turn. Then a list of the same wordsin reverse order was added to a new table (with the same stru tureas the �rst one) and an iterator used to return words again. Here isthe sequen e of words returned: Words added Words addedin order in reverse order losely loselygro ers gro ersunmentionable bookiesbookies unmentionableegalitarian egalitarianExplain the di�eren es.The size of the table was in reased and the words added again. Wouldyou expe t " losely" to be returned �rst again?Key Termshash tableTables whi h an be sear hed for an item in O(1) time using a hashfun tion to form an address from the key.hash fun tionFun tion whi h, when applied to the key, produ es a integer whi h anbe used as an address in a hash table. ollisionWhen a hash fun tion maps two di�erent keys to the same table address,a ollision is said to o ur.linear probingA simple re-hashing s heme in whi h the next slot in the table is he kedon a ollision.quadrati probingA re-hashing s heme in whi h a higher (usually 2nd) order fun tion ofthe hash index is used to al ulate the address. lusteringTenden y for lusters of adja ent slots to be �lled when linear probingis used.se ondary lusteringCollision sequen es generated by addresses al ulated with quadrati probing.perfe t hash fun tionFun tion whi h, when applied to all the members of the set of items tobe stored in a hash table, produ es a unique set of integers within somesuitable range.

Chapter 10Dynami AlgorithmsSometimes, the divide and onquer approa h seems appropriate but fails toprodu e an eÆ ient algorithm.10.1 Fibona i NumbersMany programming texts in lude this algorithm for al ulating Fibona i num-bers:int fib( int n ) {if ( n < 2 ) return n;elsereturn fib(n-1) + fib(n-2);}This algorithm is ommonly used as an example of the elegan e of re ursion asa programming te hnique. However, when we examine its time omplexity, we�nd it's far from elegant!10.1.1 AnalysisIf T (n) is the time required to al ulate fn, where fn is the nth Fibona inumber. Then, by examining the fun tion above, it is lear thatT (n) = T (n� 1) + T (n� 2)and T (1) = T (2) = where is a onstant. Therefore T (n) = fnNow it an be shown that:fn = 1p5[(1 +p52 )n � (1�p52 )n℄95

96 CHAPTER 10. DYNAMIC ALGORITHMSfor all n � 0. Noting that the se ond term vanishes for large n, we an derive:limn!1 fn+1fn = p5 + 12thus T (n) = O(fn) = O(1:618n)So this simple fun tion will take exponential time! As we will see in more detaillater, algorithms whi h run in exponential time are to be avoided at all osts!10.1.2 An Iterative SolutionHowever, this simple alternative:int fib( int n ) {int k, f1, f2;if ( n < 2 ) return n;else {f1 = f2 = 1;for(k=2;k<n;k++) {f = f1 + f2;f2 = f1;f1 = f;}return f;}runs in O(n) time. This algorithm solves the problem of al ulating f0 and f1�rst, al ulates f2 from these, then f3 from f2 and f1, and so on. Thus, insteadof dividing the large problem into two (or more) smaller problems and solvingthose problems (as we did in the divide and onquer approa h), we start withthe simplest possible problems. We solve them (usually trivially) and save theseresults. These results are then used to solve slightly larger problems whi h are,in turn, saved and used to solve larger problems again.10.1.3 Free Lun h?As we know, there's never one! Dynami problems obtain their eÆ ien y bysolving and storing the answers to small problems. Thus they usually tradespa e for in reased speed. In the Fibona i ase, the extra spa e is insigni� ant- the two variables f1 and f2, but in some more omplex dynami algorithms,we'll see that the spa e used is signi� ant.10.2 Binomial CoeÆ ientsAs with the Fibona i numbers, the binomial oeÆ ients an be al ulatedre ursively - making use of the relation:nCm =n�1 Cm�1 +n�1 CmA similar analysis to that used for the Fibona i numbers shows that the time omplexity using this approa h is also the binomial oeÆ ient itself. However,

10.2. BINOMIAL COEFFICIENTS 9711 11 2 11 3 3 11 4 6 4 11 5 10 10 5 11 6 15 20 15 6 11 7 21 35 35 21 7 1Figure 10.1: Pas al's trianglewe all know that if we onstru t Pas al's triangle, the nth row gives all thevalues, nCmjm = 0::nEa h entry takes O(1) time to al ulate and there are O(n2) of them. So this al ulation of the oeÆ ients takes O(n2) time. However, it uses O(n2) spa eto store the oeÆ ients.Key TermsDynami AlgorithmA general lass of algorithms whi h solve problems by solving smallerversions of the problem, saving the solutions to the small problems andthen ombining them to solve the larger problem.

98 CHAPTER 10. DYNAMIC ALGORITHMS10.3 Optimal Binary Sear h TreesUp to this point, we have assumed that an optimal sear h tree is one in whi hthe probability of o urren e of all keys is equal1. Thus we on entrated onbalan ing the tree so as to make the ost of �nding any key O(logn). However, onsider a di tionary of words used by a spelling he ker for English languagedo uments. It will be sear hed many more times for 'a', 'the', 'and', et . thanfor the thousands of un ommon words whi h are in the di tionary just in asesomeone happens to use one of them. Su h a di tionary needs to be large: theaverage edu ated person has a vo abulary of 30 000 words, so it needs � 100000words in it to be e�e tive. It is also reasonably easy to produ e a table of thefrequen y of o urren e of words: words are simply ounted in any suitable olle tion of do uments onsidered to be representative of those for whi h thespelling he ker will be used. A balan ed binary tree is likely to end up with aword su h as 'miasma' at its root, guaranteeing that in 99.99+% of sear hes, atleast one omparison is wasted! If key, k, has relative frequen y, rk, then in anoptimal tree, �dkrkwhere dk is the distan e of the key, k, from the root (i.e. the number of om-parisons whi h must be made before k is found), is minimised. We make use ofthe property:Lemma 10.3.1 Sub-trees of optimal trees are themselves optimal trees.Proof If a sub-tree of a sear h tree is not an optimal tree, then a better sear htree will be produ ed if the sub-tree is repla ed by an optimal tree.Thus the problem is to determine whi h key should be pla ed at the root ofthe tree. Then the pro ess an be repeated for the left- and right-sub-trees.However, a divide-and- onquer approa h would hoose ea h key as a andidateroot and repeat the pro ess for ea h sub-tree. Sin e there are n hoi es for theroot and 2O(n) hoi es for roots of the two sub-trees, this leads to an O(nn)algorithm. An eÆ ient algorithm an be generated by the dynami approa h.We al ulate the O(n) best trees onsisting of just two elements (the neighboursin the sorted list of keys).Setting up to reate an optimal binary sear h tree Initially, we havean array of keys and their relative frequen ies.k i

rf i jrf

Items to be placed in tree

j j

k j

− frequencies −

− keys −

k

1Or is unknown, in whi h ase we assume it to be equal.

10.3. OPTIMAL BINARY SEARCH TREES 99We need two n�n arrays: one, , holdsthe osts of optimal subtrees and theother, best, holds the index of the rootof the optimal subtrees. In the ostmatrix, ij holds the ost of the opti-mal subtree ontaining keys i throughj. Initially, we an �ll in the diagonalelements, ii, with the relative frequen- ies, rfi, for ea h key. The diagonal ofthe best matrix, bestii, is �lled in withi.Then, we use the frequen ies to al u-late the optimal sub-trees for every pos-sible pair of adjoining entries in the ar-ray.

BA C D E F G H I J

A

B

C

D

E

F

G

H

I

J

23

8

20

2

10

12

5

30

14

18Cost matrixThere are two possible arrangements for the tree on-taining F and G. The ost for (a) is5� 1 + 14� 2 = 33and for (b) 14� 1 + 5� 2 = 24Thus (b) is the optimum tree and its ost is saved as f;g . We also store G as the root of the best F �Gsub-tree in bestf;g. Similarly, we al ulate the best ost for all n � 1 sub-trees with two elements, g;h, h;i, et . and pla e these in the diagonal below themain diagonal. (b)

14

5

G

F

(a)

14

5

G

F

BA C D E F G H I J

A

B

C

D

E

F

G

H

I

J

23

8

20

2

10

12

5

30

14

18

28

43

26

54

40

24

56

24

46

Cost matrix after �lling in the osts oftwo element trees.The sub-trees ontaining two elementsare then used to al ulate the best ostsfor sub-trees of 3 elements.

100 CHAPTER 10. DYNAMIC ALGORITHMSThis pro ess is ontinued until we have al ulated the ost and the root for theoptimal sear h tree with n elements.There are O(n2) su h sub-tree osts.Ea h one requires n operations to deter-mine, if the ost of the smaller sub-treesis known. Thus the overall algorithm isO(n3).

BA C D E F G H I J

A

B

C

D

E

F

23

8

20

2

10

12

5

30

14

18

28

43

26

54

40

24

56

24

46

67

73

86

60

78

52

64

60G

H

I

J 353

345

284

180

104

230

195

112

122

155

209

270

278

88

97

175

236

244

121

151

212

220 92

125

180

186 107

101Final ost matrixNote that the ith diagonal below the mainone ontains the osts for optimal treeswith i+ 1 elements.The se ond matrix, best, stores the in-dex of the root of ea h optimal subtree.For example, G is the root of the opti-mal F-G subtree, so bestfg ontains 6(= G).

BA C D E F G H I J

A

B

C

D

E

F

G

H

I

J

K

0

2

8

9

3

5

4

6

3

0

1

4

6

8

8

7

1

7

4

Best matrix with roots of optimal treeswith two elements.In the �nal best matrix, the root of theoptimal sear h tree is in the lower left orner: 4 = E.We then look in bestAD for the root ofthe left subtree of E and �nd 1 (= B)and in bestFJ for the root of the rightsubtree of E and �nd 7 (= H). We on-tinue in this way until the omplete op-timal sear h tree has been built: the leftsubtree of B must be A and the rootof the right subtree ( ontaining C-D) isfound in bestCD.BA C D E F G H I J

A

B

C

D

E

F

G

H

I

J

0

2

8

9

3

5

4

6

3

0

1

4

6

8

8

7

1

7

4

2

4

4

6

7

8

4

0

1

2

2

4 4 4 4

4444

4

4 4

4 4

4 4

4

4

4

4

77 7

7 7

6Final Best matrix.

10.3. OPTIMAL BINARY SEARCH TREES 101Animation - Optimal binary sear h treeThe animation shows the two riti al arrays in the algorithm the osts of all thesub-trees and the index of the root of ea h sub-tree. Note how both are builtfrom the diagonal (smallest sub-trees) downwards.10.3.1 Spa e omplexityAs with most dynami algorithms, building an optimal binary sear h tree re-quires additional spa e to store the solutions to sub-problems. There are twon� n auxillary matri es required, so the spa e omplexity is O(n2).10.3.2 Optimal sub-stru tureThis problem also shows a property ommonly found in dynami algorithms:optimal sub-stru ture. In this ase, this is represented by Lemma 10.3.1, inwhi h we observed that the solutions to the sub-problems are, in fa t, part ofthe solution to the whole problem.

102 CHAPTER 10. DYNAMIC ALGORITHMS

Chapter 11The Rest

103

Documents

Con - Unicampmarco/cursos/ea876_08_1/referencias/oo… · Con ten ts Preface i 1 In tro duction 1.1 Go o d Programs. 1 2 Design Strategies 3 2.1 Design of classes. 3 2.2 Programming