796

Aho,Ullman Foundations of Computer Science - C Edition (Freeman)

  • Upload
    makrodr

  • View
    608

  • Download
    53

Embed Size (px)

Citation preview

TABLE OF CONTENTS v33 33 Table of ContentsPreface ixChapter 1. Computer Science: The Mechanization of Abstraction 11.1. What This Book Is About 31.2. What This Chapter Is About 61.3. Data Models 61.4. The C Data Model 131.5. Algorithms and the Design of Programs 201.6. Some C Conventions Used Throughout the Book 221.7. Summary of Chapter 1 231.8. Bibliographic Notes for Chapter 1 24Chapter 2. Iteration, Induction, and Recursion 252.1. What This Chapter Is About 272.2. Iteration 272.3. Inductive Proofs 342.4. Complete Induction 442.5. Proving Properties of Programs 522.6. Recursive Denitions 592.7. Recursive Functions 692.8. Merge Sort: A Recursive Sorting Algorithm 752.9. Proving Properties of Recursive Programs 842.10. Summary of Chapter 2 872.11. Bibliographic Notes for Chapter 2 88Chapter 3. The Running Time of Programs 893.1. What This Chapter Is About 893.2. Choosing an Algorithm 903.3. Measuring Running Time 913.4. Big-Oh and Approximate Running Time 963.5. Simplifying Big-Oh Expressions 1013.6. Analyzing the Running Time of a Program 1093.7. A Recursive Rule for Bounding Running Time 1163.8. Analyzing Programs with Function Calls 1273.9. Analyzing Recursive Functions 1323.10. Analysis of Merge Sort 1363.11. Solving Recurrence Relations 1443.12. Summary of Chapter 3 1543.13. Bibliographic Notes for Chapter 3 155Chapter 4. Combinatorics and Probability 1564.1. What This Chapter Is About 1564.2. Counting Assignments 1574.3. Counting Permutations 1604.4. Ordered Selections 167vi TABLE OF CONTENTS4.5. Unordered Selections 1704.6. Orderings With Identical Items 1784.7. Distribution of Objects to Bins 1814.8. Combining Counting Rules 1844.9. Introduction to Probability Theory 1874.10. Conditional Probability 1934.11. Probabilistic Reasoning 2034.12. Expected Value Calculations 2124.13. Some Programming Applications of Probability 2154.14. Summary of Chapter 4 2204.15. Bibliographic Notes for Chapter 4 221Chapter 5. The Tree Data Model 2235.1. What This Chapter Is About 2235.2. Basic Terminology 2245.3. Data Structures for Trees 2315.4. Recursions on Trees 2395.5. Structural Induction 2485.6. Binary Trees 2535.7. Binary Search Trees 2585.8. Eciency of Binary Search Tree Operations 2685.9. Priority Queues and Partially Ordered Trees 2715.10. Heapsort: Sorting with Balanced POTs 2805.11. Summary of Chapter 5 2845.12. Bibliographic Notes for Chapter 5 285Chapter 6. The List Data Model 2866.1. What This Chapter Is About 2866.2. Basic Terminology 2876.3. Operations on Lists 2916.4. The Linked-List Data Structure 2936.5. Array-Based Implementation of Lists 3016.6. Stacks 3066.7. Implementing Function Calls Using a Stack 3126.8. Queues 3186.9. Longest Common Subsequences 3216.10. Representing Character Strings 3276.11. Summary of Chapter 6 3346.12. Bibliographic Notes for Chapter 6 335Chapter 7. The Set Data Model 3377.1. What This Chapter Is About 3377.2. Basic Denitions 3387.3. Operations on Sets 3427.4. List Implementation of Sets 3517.5. Characteristic-Vector Implementation of Sets 3577.6. Hashing 3607.7. Relations and Functions 3667.8. Implementing Functions as Data 3737.9. Implementing Binary Relations 380TABLE OF CONTENTS vii7.10. Some Special Properties of Binary Relations 3867.11. Innite Sets 3967.12. Summary of Chapter 7 4017.13. Bibliographic Notes for Chapter 7 402Chapter 8. The Relational Data Model 4038.1. What This Chapter Is About 4038.2. Relations 4048.3. Keys 4118.4. Primary Storage Structures for Relations 4148.5. Secondary Index Structures 4198.6. Navigation among Relations 4238.7. An Algebra of Relations 4288.8. Implementing Relational Algebra Operations 4368.9. Algebraic Laws for Relations 4408.10. Summary of Chapter 8 4498.11. Bibliographic Notes for Chapter 8 450Chapter 9. The Graph Data Model 4519.1. What This Chapter Is About 4519.2. Basic Concepts 4529.3. Implementation of Graphs 4599.4. Connected Components of an Undirected Graph 4669.5. Minimal Spanning Trees 4789.6. Depth-First Search 4849.7. Some Uses of Depth-First Search 4959.8. Dijkstras Algorithm for Finding Shortest Paths 5029.9. Floyds Algorithm for Shortest Paths 5139.10. An Introduction to Graph Theory 5219.11. Summary of Chapter 9 5269.12. Bibliographic Notes for Chapter 9 527Chapter 10. Patterns, Automata, and Regular Expressions 52910.1. What This Chapter Is About 53010.2. State Machines and Automata 53010.3. Deterministic and Nondeterministic Automata 53610.4. From Nondeterminism to Determinism 54710.5. Regular Expressions 55610.6. The UNIX Extensions to Regular Expressions 56410.7. Algebraic Laws for Regular Expressions 56810.8. From Regular Expressions to Automata 57110.9. From Automata to Regular Expressions 58210.10. Summary of Chapter 10 58810.11. Bibliographic Notes for Chapter 10 589Chapter 11. Recursive Description of Patterns 59111.1. What This Chapter Is About 59111.2. Context-Free Grammars 59211.3. Languages from Grammars 59911.4. Parse Trees 602viii TABLE OF CONTENTS11.5. Ambiguity and the Design of Grammars 61011.6. Constructing Parse Trees 61711.7. A Table-Driven Parsing Algorithm 62511.8. Grammars Versus Regular Expressions 63411.9. Summary of Chapter 11 64011.10. Bibliographic Notes for Chapter 11 641Chapter 12. Propositional Logic 64212.1. What This Chapter Is About 64212.2. What Is Propositional Logic? 64312.3. Logical Expressions 64512.4. Truth Tables 64912.5. From Boolean Functions to Logical Expressions 65512.6. Designing Logical Expressions by Karnaugh Maps 66012.7. Tautologies 66912.8. Some Algebraic Laws for Logical Expressions 67412.9. Tautologies and Methods of Proof 68212.10. Deduction 68612.11. Proofs by Resolution 69212.12. Summary of Chapter 12 69712.13. Bibliographic Notes for Chapter 12 698Chapter 13. Using Logic to Design Computer Components 69913.1. What This Chapter is About 69913.2. Gates 70013.3. Circuits 70113.4. Logical Expressions and Circuits 70513.5. Some Physical Constraints on Circuits 71113.6. A Divide-and-Conquer Addition Circuit 71613.7. Design of a Multiplexer 72313.8. Memory Elements 73013.9. Summary of Chapter 13 73113.10. Bibliographic Notes for Chapter 13 732Chapter 14. Predicate Logic 73314.1. What This Chapter Is About 73314.2. Predicates 73414.3. Logical Expressions 73614.4. Quantiers 73914.5. Interpretations 74514.6. Tautologies 75114.7. Tautologies Involving Quantiers 75314.8. Proofs in Predicate Logic 75914.9. Proofs from Rules and Facts 76214.10. Truth and Provability 76814.11. Summary of Chapter 14 77414.12. Bibliographic Notes for Chapter 14 775Index 776PrefaceThis book was motivated by the desire we and others have had to further the evolu-tion of the core course in computer science. Many departments across the countryhave revised their curriculum in response to the introductory course in the scienceof computing discussed in the Denning Report, (Denning, P. J., D. E. Comer, D.Gries, M. C. Mulder, A. Tucker, J. Turner, and P. R. Young, Computing as a Dis-cipline, Comm. ACM 32:1, pp. 923, January 1989.). That report draws attentionto three working methodologies or processes theory, abstraction, and design as fundamental to all undergraduate programs in the discipline. More recently,the Computing Curricula 1991 report of the joint ACM/IEEE-CS Curriculum TaskForce echoes the Denning Report in identifying key recurring concepts which arefundamental to computing, especially: conceptual and formal models, eciency,and levels of abstraction. The themes of these two reports summarize what we havetried to oer the student in this book.This book developed from notes for a two-quarter course at Stanford calledCS109: Introduction to Computer Science that serves a number of goals. Therst goal is to give beginning computer science majors a solid foundation for fur-ther study. However, computing is becoming increasingly important in a muchwider range of scientic and engineering disciplines. Therefore, a second goal isto give those students who will not take advanced courses in computer science theconceptual tools that the eld provides. Finally, a more pervasive goal is to exposeall students not only to programming concepts but also to the intellectually richfoundations of the eld.Our rst version of this book was based on programming in Pascal and appearedin 1992. Our choice of Pascal as the language for example programs was motivatedby that languages use in the Computer Science Advanced Placement Test as wellas in a plurality of college introductions to programming. We were pleased to seethat since 1992 there has been a signicant trend toward C as the introductoryprogramming language, and we accordingly developed a new version of the bookusing C for programming examples. Our emphasis on abstraction and encapsulationshould provide a good foundation for subsequent courses covering object-orientedtechnology using C++.At the same time, we decided to make two signicant improvements in thecontent of the book. First, although it is useful to have a grounding in machinearchitecture to motivate how running time is measured, we found that almost allcurricula separate architecture into a separate course, so the chapter on that subjectwas not useful. Second, many introductory courses in the theory of computingemphasize combinatorics and probability, so we decided to increase the coverageand cluster the material into a chapter of its own.Foundations of Computer Science covers subjects that are often found splitbetween a discrete mathematics course and a sophomore-level sequence in computerscience in data structures. It has been our intention to select the mathematicalfoundations with an eye toward what the computer user really needs, rather thanwhat a mathematician might choose. We have tried to integrate eectively themathematical foundations with the computing. We thus hope to provide a betterfeel for the soul of computer science than might be found in a programming course,ixx PREFACEa discrete mathematics course, or a course in a computer science subspecialty. Webelieve that, as time goes on, all scientists and engineers will take a foundationalcourse similar to the one oered at Stanford upon which this book is based. Such acourse in computer science should become as standard as similar courses in calculusand physics.PrerequisitesStudents taking courses based on this book have ranged from rst-year undergrad-uates to graduate students. We assume only that students have had a solid coursein programming. They should be familiar with the programming language ANSIC to use this edition. In particular, we expect students to be comfortable with Cconstructs such as recursive functions, structures, pointers, and operators involvingpointers and structures such as dot, ->, and &.Suggested Outlines for Foundational Courses in CSIn terms of a traditional computer science curriculum, the book combines a rstcourse in data structures that is, a CS2 course with a course in discretemathematics. We believe that the integration of these subjects is extremely desir-able for two reasons:1. It helps motivate the mathematics by relating it more closely to the computing.2. Computing and mathematics can be mutually reinforcing. Some examplesare the way recursive programming and mathematical induction are related inChapter 2 and the way the free/bound variable distinction for logic is relatedto the scope of variables in programming languages in Chapter 14. Suggestionsfor instructive programming assignments are presented throughout the book.There are a number of ways in which this book can be used.A Two-Quarter or Two-Semester CourseThe CS109A-B sequence at Stanford is typical of what might be done in two quar-ters, although these courses are rather intensive, being 4-unit, 10-week courses each.These two courses cover the entire book, the rst seven chapters in CS109A andChapters 8 through 14 in CS109B.A One-Semester CS2 Type CourseIt is possible to use the book for a one-semester course covering a set of topics similarto what would appear in a CS2 course. Naturally, there is too much material inthe book to cover in one semester, and so we recommend the following:1. Recursive algorithms and programs in Sections 2.7 and 2.8.2. Big-oh analysis and running time of programs: all of Chapter 3 except forSection 3.11 on solving recurrence relations.3. Trees in Sections 5.2 through 5.10.PREFACE xi4. Lists: all of Chapter 6. Some may wish to cover lists before trees, whichis a more traditional treatment. We regard trees as the more fundamentalnotion, but there is little harm in switching the order. The only signicantdependency is that Chapter 6 talks about the dictionary abstract data type(set with operations insert, delete, and lookup), which is introduced in Section5.7 as a concept in connection with binary search trees.5. Sets and relations. Data structures for sets and relations are emphasized inSections 7.2 through 7.9 and 8.2 through 8.6.6. Graph algorithms are covered in Sections 9.2 through 9.9.A One-Semester Discrete Mathematics CourseFor a one-semester course emphasizing mathematical foundations, the instructorcould choose to cover:1. Mathematical induction and recursive programs in Chapter 2.2. Big-oh analysis, running time, and recurrence relations in Sections 3.4 through3.11.3. Combinatorics in Sections 4.2 through 4.8.4. Discrete probability in Sections 4.9 through 4.13.5. Mathematical aspects of trees in Sections 5.2 through 5.6.6. Mathematical aspects of sets in Sections 7.2, 7.3, 7.7, 7.10, and 7.11.7. The algebra of relations in Sections 8.2, 8.7, and 8.9.8. Graph algorithms and graph theory in Chapter 9.9. Automata and regular expressions in Chapter 10.10. Context-free grammars in Sections 11.2 through 11.4.11. Propositional and predicate logic in Chapters 12 and 14, respectively.Features of This BookTo help the student assimilate the material, we have added the following study aids:1. Each chapter has an outline section at the beginning and a summary sectionat the end highlighting the main points.2. Marginal notes mark important concepts and denitions. However, items men-tioned in section or subsection headlines are not repeated in the margin.3. Sidebars are separated from the text by double lines. These short notes serveseveral purposes:3 Some are elaborations on the text or make some ne points about programor algorithm design.3 Others are for summary or emphasis of points made in the text nearby.These include outlines of important kinds of proofs, such as the variousforms of proof by induction.3 A few are used to give examples of fallacious arguments, and we hope thatthe separation from the text in this way will eliminate possible miscon-struction of the point.xii PREFACE3 A few give very brief introductions to major topics like undecidability orthe history of computers to which we wish we could devote a full section.4. Most of the sections end with exercises. There are more than 1000 exercisesor parts spread among the sections. Of these roughly 30% are marked with asingle star, which indicates that they require more thought than the unstarredexercises. Approximately another 10% of the exercises are doubly starred, andthese are the most challenging.5. Chapters end with bibliographic notes. We have not attempted to be exhaus-tive, but oer suggestions for more advanced texts on the subject of the chapterand mention the relevant papers with the most historical signicance.About the CoverIt is a tradition for computer science texts to have a cover with a cartoon or drawingsymbolizing the content of the book. Here, we have drawn on the myth of the worldas the back of a turtle, but our world is populated with representatives of some ofthe other, more advanced texts in computer science that this book is intended tosupport. They are:The teddy bear: R. Sethi, Programming Languages: Concepts and Constructs,Addison-Wesley, Reading, Mass., 1989.The baseball player: J. D. Ullman, Principles of Database and Knowledge-BaseSystems, Computer Science Press, New York, 1988.The column: J. L. Hennessy and D. A. Patterson, Computer Architecture: a Quan-titative Approach, Morgan-Kaufmann, San Mateo, Calif., 1990.The dragon: A. V. Aho, R. Sethi, and J. D. Ullman, Compiler Design: Principles,Techniques, and Tools, Addison-Wesley, Reading, Mass., 1986.The triceratops: J. L. Peterson and A. Silberschatz, Operating Systems Concepts,second edition, Addison-Wesley, Reading, Mass., 1985.AcknowledgmentsWe are deeply indebted to a number of colleagues and students who have read thismaterial and given us many valuable suggestions for improving the presentation.We owe a special debt of gratitude to Brian Kernighan, Don Knuth, ApostolosLerios, and Bob Martin who read the original Pascal manuscript in detail andgave us many perceptive comments. We have received, and gratefully acknowledge,reports of course testing of the notes for the Pascal edition of this book by MichaelAnderson, Margaret Johnson, Udi Manber, Joseph Naor, Prabhakar Ragde, RockyRoss, and Shuky Sagiv.There are a number of other people who found errors in earlier editions, boththe original notes and the various printings of the Pascal edition. In this regard,we would like to thank: Susan Aho, Michael Anderson, Aaron Edsinger, LonnieEldridge, Todd Feldman, Steve Friedland, Christopher Fuselier, Mike Genstil, PaulGrubb III, Barry Hayes, John Hwang, Hakan Jakobsson, Arthur Keller, Dean Kelley,James Kuner Jr., Steve Lindell, Richard Long, Mark MacDonald, Simone Mar-tini, Hugh McGuire, Alan Morgan, Monnia Oropeza, Rodrigo Philander, AndrewQuan, Stuart Reges, John Stone, Keith Swanson, Steve Swenson, Sanjai Tiwari,Eric Traut, and Lynzi Ziegenhagen.PREFACE xiiiWe acknowledge helpful advice from Geo Clem, Jon Kettenring, and BrianKernighan during the preparation of the C edition of Foundations of ComputerScience.Peter Ullman produced a number of the gures used in this book. We are grate-ful to Dan Clayton, Anthony Dayao, Mat Howard, and Ron Underwood for helpwith TEX fonts, and to Hester Glynn and Anne Smith for help with the manuscriptpreparation.On-Line Access to Code, Errata, and NotesYou can obtain copies of the major programs in this book by anonymous ftp to hostftp-cs.stanford.edu. Login with user name anonymous and give your name andhost as a password. You may then executecd fcscwhere you will nd programs from this book. We also plan to keep in this directoryinformation about errata and what course notes we can provide.A. V. A.Chatham, NJJ. D. U.Stanford, CAJuly, 1994CHAPTER 133 33Computer Science:The Mechanizationof AbstractionThough it is a new eld, computer science already touches virtually every aspectof human endeavor. Its impact on society is seen in the proliferation of computers,information systems, text editors, spreadsheets, and all of the wonderful applicationprograms that have been developed to make computers easier to use and people moreproductive. An important part of the eld deals with how to make programmingeasier and software more reliable. But fundamentally, computer science is a scienceof abstraction creating the right model for thinking about a problem and devising Abstractionthe appropriate mechanizable techniques to solve it.Every other science deals with the universe as it is. The physicists job, forexample, is to understand how the world works, not to invent a world in whichphysical laws would be simpler or more pleasant to follow. Computer scientists,on the other hand, must create abstractions of real-world problems that can beunderstood by computer users and, at the same time, that can be represented andmanipulated inside a computer.Sometimes the process of abstraction is simple. For example, we can model thebehavior of the electronic circuits used to build computers quite well by an abstrac-tion called propositional logic. The modeling of circuits by logical expressions isnot exact; it simplies, or abstracts away, many details such as the time it takesfor electrons to ow through circuits and gates. Nevertheless, the propositionallogic model is good enough to help us design computer circuits well. We shall havemuch more to say about propositional logic in Chapter 12.As another example, suppose we are faced with the problem of scheduling nal Examscheduling examinations for courses. That is, we must assign course exams to time slots sothat two courses may have their exams scheduled in the same time slot only if thereis no student taking both. At rst, it may not be apparent how we should modelthis problem. One approach is to draw a circle called a node for each course anddraw a line called an edge connecting two nodes if the corresponding courses havea student in common. Figure 1.1 suggests a possible picture for ve courses; thepicture is called a course-conict graph.Given the course-conict graph, we can solve the exam-scheduling problem byrepeatedly nding and removing maximal independent sets from the graph. An12 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTIONCSEng MathEcon PhyFig. 1.1. Course-conict graph for ve courses. An edge between twocourses indicates that at least one student is taking both courses.independent set is a collection of nodes that have no connecting edges within thecollection. An independent set is maximal if no other node from the graph can be Maximalindependentsetadded without including an edge between two nodes of the set. In terms of courses,a maximal independent set is any maximal set of courses with no common students.In Fig. 1.1, {Econ, Eng, Phy} is one maximal independent set. The set of coursescorresponding to the selected maximal independent set is assigned to the rst timeslot.We remove from the graph the nodes in the rst maximal independent set,along with all incident edges, and then nd a maximal independent set among theremaining courses. One choice for the next maximal independent set is the singletonset {CS}. The course in this maximal independent set is assigned to the secondtime slot.We repeat this process of nding and deleting maximal independent sets untilno more nodes remain in the course-conict graph. At this point, all courses willhave been assigned to time slots. In our example, after two iterations, the onlyremaining node in the course-conict graph is Math, and this forms the nal maxi-mal independent set, which is assigned to the third time slot. The resulting examschedule is thusTIME SLOT COURSE EXAMS1 Econ, Eng, Phy2 CS3 MathThis algorithm does not necessarily partition the courses among the smallestpossible number of time slots, but it is simple and does tend to produce a schedulewith close to the smallest number of time slots. It is also one that can be readilyprogrammed using the techniques presented in Chapter 9.Notice that this approach abstracts away some details of the problem that maybe important. For example, it could cause one student to have ve exams in veconsecutive time slots. We could create a model that included limits on how manyexams in a row one student could take, but then both the model and the solutionSEC. 1.1 WHAT THIS BOOK IS ABOUT 3Abstraction: Not to Be FearedThe reader may cringe at the word abstraction, because we all have the intu-ition that abstract things are hard to understand; for example, abstract algebra(the study of groups, rings, and the like) is generally considered harder than thealgebra we learned in high school. However, abstraction in the sense we use it im-plies simplication, the replacement of a complex and detailed real-world situationby an understandable model within which we can solve a problem. That is, weabstract away the details whose eect on the solution to a problem is minimalor nonexistent, thereby creating a model that lets us deal with the essence of theproblem.to the exam-scheduling problem would be more complicated.Often, nding a good abstraction can be quite dicult because we are forcedto confront the fundamental limitations on the tasks computers can perform andthe speed with which computers can perform those tasks. In the early days of com-puter science, some optimists believed that robots would soon have the prodigiouscapability and versatility of the Star Wars robot C3PO. Since then we have learnedthat in order to have intelligent behavior on the part of a computer (or robot),we need to provide that computer with a model of the world that is essentiallyas detailed as that possessed by humans, including not only facts (Sallys phonenumber is 555-1234), but principles and relationships (If you drop something, itusually falls downward).We have made much progress on this problem of knowledge representation. Knowledgerepresentation We have devised abstractions that can be used to help build programs that docertain kinds of reasoning. One example of such an abstraction is the directedgraph, in which nodes represent entities (the species cat or Fluy) and arrows(called arcs) from one node to another represent relationships (Fluy is a cat,cats are animals, Fluy owns Fluys milk saucer); Figure 1.2 suggests such agraph.Another useful abstraction is formal logic, which allows us to manipulate factsby applying rules of inference, such as If X is a cat and Y is the mother of X, thenY is a cat. Nevertheless, progress on modeling, or abstracting, the real world orsignicant pieces thereof remains a fundamental challenge of computer science, onethat is not likely to be solved completely in the near future.33 33 1.1 What This Book Is AboutThis book will introduce the reader, who is assumed to have a working knowledge ofthe programming language ANSI C, to the principal ideas and concerns of computerscience. The book emphasizes three important problem-solving tools:1. Data models, the abstractions used to describe problems. We have already men-tioned two models: logic and graphs. We shall meet many others throughoutthis book.4 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTIONFluyCatAnimalFluysmilksaucerisisownsFig. 1.2. A graph representing knowledge about Fluy.2. Data structures, the programming-language constructs used to represent datamodels. For example, C provides built-in abstractions, such as structures andpointers, that allow us to construct data structures to represent complex ab-stractions such as graphs.3. Algorithms, the techniques used to obtain solutions by manipulating data asrepresented by the abstractions of a data model, by data structures, or by othermeans.Data ModelsWe meet data models in two contexts. Data models such as the graphs discussed inthe introduction to this chapter are abstractions frequently used to help formulatesolutions to problems. We shall learn about several such data models in this book:trees in Chapter 5, lists in Chapter 6, sets in Chapter 7, relations in Chapter 8,graphs in Chapter 9, nite automata in Chapter 10, grammars in Chapter 11, andlogic in Chapters 12 and 14.Data models are also associated with programming languages and computers.For example, C has a data model that includes abstractions such as characters,integers of several sizes, and oating-point numbers. Integers and oating-pointnumbers in C are only approximations of integers and reals in mathematics becauseof the limited precision of arithmetic available in computers. The C data model alsoincludes types such as structures, pointers, and functions, which we shall discuss inmore detail in Section 1.4.Data StructuresWhen the data model of the language in which we are writing a program lacks abuilt-in representation for the data model of the problem at hand, we must representthe needed data model using the abstractions supported by the language. For thispurpose, we study data structures, which are methods for representing in the datamodel of a programming language abstractions that are not an explicit part ofSEC. 1.1 WHAT THIS BOOK IS ABOUT 5that language. Dierent programming languages may have strikingly dierent datamodels. For example, unlike C, the language Lisp supports trees directly, and thelanguage Prolog has logic built into its data model.AlgorithmsAn algorithm is a precise and unambiguous specication of a sequence of steps thatcan be carried out mechanically. The notation in which an algorithm is expressedcan be any commonly understood language, but in computer science algorithms aremost often expressed formally as programs in a programming language, or in aninformal style as a sequence of programming language constructs intermingled withEnglish language statements. Most likely, you have already encountered several im-portant algorithms while studying programming. For example, there are a numberof algorithms for sorting the elements of an array, that is, putting the elements insmallest-rst order. There are clever searching algorithms such as binary search,which quickly nds a given element in a sorted array by repeatedly dividing in halfthe portion of the array in which the element could appear.These, and many other tricks for solving common problems, are among thetools the computer scientist uses when designing programs. We shall study manysuch techniques in this book, including the important methods for sorting andsearching. In addition, we shall learn what makes one algorithm better than another.Frequently, the running time, or time taken by an algorithm measured as a functionof the size of its input, is one important aspect of the quality of the algorithm;we discuss running time in Chapter 3.Other aspects of algorithms are also important, particularly their simplicity.Ideally, an algorithm should be easy to understand and easy to turn into a work-ing program. Also, the resulting program should be understandable by a personreading the code that implements the algorithm. Unfortunately, our desires for afast algorithm and a simple algorithm are often in conict, and we must choose ouralgorithm wisely.Underlying ThreadsAs we progress through this book, we shall encounter a number of important uni-fying principles. We alert the reader to two of these here:1. Design algebras. In certain elds in which the underlying models have becomewell understood, we can develop notations in which design trade-os can beexpressed and evaluated. Through this understanding, we can develop a theoryof design with which well-engineered systems can be constructed. Propositionallogic, with the associated notation called Boolean algebra that we encounter inChapter 12, is a good example of this kind of design algebra. With it, we candesign ecient circuits for subsystems of the kind found in digital computers.Other examples of algebras found in this book are the algebra of sets in Chapter7, the algebra of relations in Chapter 8, and the algebra of regular expressionsin Chapter 10.6 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION2. Recursion is such a useful technique for dening concepts and solving problemsthat it deserves special mention. We discuss recursion in detail in Chapter 2and use it throughout the rest of the book. Whenever we need to dene anobject precisely or whenever we need to solve a problem, we should always ask,What does the recursive solution look like? Frequently that solution has asimplicity and eciency that makes it the method of choice.33 33 1.2 What This Chapter Is AboutThe remainder of this chapter sets the stage for the study of computer science. Theprimary concepts that will be covered are3 Data models (Section 1.3)3 The data model of the programming language C (Section 1.4)3 The principal steps in the software-creation process (Section 1.5)We shall give examples of several dierent ways in which abstractions and mod-els appear in computer systems. In particular, we mention the models found inprogramming languages, in certain kinds of systems programs, such as operatingsystems, and in the circuits from which computers are built. Since software is avital component of todays computer systems, we need to understand the software-creation process, the role played by models and algorithms, and the aspects ofsoftware creation that computer science can address only in limited ways.In Section 1.6 there are some conventional denitions that are used in C pro-grams throughout this book.33 33 1.3 Data ModelsAny mathematical concept can be termed a data model. In computer science, adata model normally has two aspects:1. The values that objects can assume. For example, many data models containobjects that have integer values. This aspect of the data model is static; it tellsus what values objects may take. The static part of a programming languagesdata model is often called the type system. Type system2. The operations on the data. For example, we normally apply operations suchas addition to integers. This aspect of the model is dynamic; it tells us theways in which we can change values and create new values.Programming Language Data ModelsEach programming language has its own data model, and these dier from oneanother, often in quite substantial ways. The basic principle under which mostprogramming languages deal with data is that each program has access to boxes,which we can think of as regions of storage. Each box has a type, such as int orchar. We may store in a box any value of the correct type for that box. We oftenrefer to the values that can be stored in boxes as data objects. Data objectSEC. 1.3 DATA MODELS 7We may also name boxes. In general, a name for a box is any expression thatdenotes that box. Often, we think of the names of boxes as the variables of theprogram, but that is not quite right. For example, if x is a variable local to a Namerecursive function F, then there may be many boxes named x, each associated witha dierent call to F. Then the true name of such a box is a combination of x andthe particular call to F.Most of the data types of C are familiar: integers, oating-point numbers,characters, arrays, structures, and pointers. These are all static notions.The operations permitted on data include the usual arithmetic operations onintegers and oating-point numbers, accessing operations for elements of arrays orstructures, and pointer dereferencing, that is, nding the element pointed to by a Dereferencingpointer. These operations are part of the dynamics of the C data model.In a programming course, we would see important data models that are not partof C, such as lists, trees, and graphs. In mathematical terms, a list is a sequence ofn elements, which we shall write as (a1, a2, . . . , an), where a1 is the rst element, a2The list datamodel the second, and so on. Operations on lists include inserting new elements, deletingelements, and concatenating lists (that is, appending one list to the end of another).3 Example 1.1. In C, a list of integers can be represented by a data structurecalled a linked list in which list elements are stored in cells. Lists and their cells canbe dened by a type declaration such astypedef struct CELL *LIST;struct CELL {int element;struct LIST next;};This declaration denes a self-referential structure CELL with two elds. The rstis element, which holds the value of an element of the list and is of type int.The second eld of each CELL is next, which holds a pointer to a cell. Notethat the type LIST is really a pointer to a CELL. Thus, structures of type CELL canbe linked together by their next elds to form what we usually think of as a linkedlist, as suggested in Fig. 1.3. The next eld can be thought of as either a pointerto the next cell or as representing the entire list that follows the cell in which itappears. Similarly, the entire list can be represented by a pointer, of type LIST, tothe rst cell on the list.a1 a2 an Fig. 1.3. A linked list representing the list (a1, a2, . . . , an).Cells are represented by rectangles, the left part of which is the element, andthe right part of which holds a pointer, shown as an arrow to the next cell pointedto. A dot in the box holding a pointer means that the pointer is NULL.1Lists willbe covered in more detail in Chapter 6. 31NULL is a symbolic constant dened in the standard header le stdio.h to be equal to a valuethat cannot be a pointer to anything. We shall use it to have this meaning throughout thebook.8 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTIONData Models Versus Data StructuresDespite their similar names, a list and a linked list are very dierent concepts. Alist is a mathematical abstraction, or data model. A linked list is a data structure. Inparticular, it is the data structure we normally use in C and many similar languagesto represent abstract lists in programs. There are other languages in which it isnot necessary to use a data structure to represent abstract lists. For example, thelist (a1, a2, . . . , an) could be represented directly in the language Lisp and in thelanguage Prolog similarly, as [a1, a2, . . . , an].Data Models of System SoftwareData models are found not only in programming languages but also in operating Operatingsystems systems and applications programs. You are probably familiar with an operatingsystem such as UNIX or MS-DOS (perhaps with Microsoft Windows).2The func-tion of an operating system is to manage and schedule the resources of a computer.The data model for an operating system like UNIX has concepts such as les, di-rectories, and processes./mnt usr bin ann boba1 a2 a3 b1 b2Fig. 1.4. A typical UNIX directory/le structure.1. The data itself is stored in les, which in the UNIX system are strings ofcharacters.Files2. Files are organized into directories, which are collections of les and/or other Directoriesdirectories. The directories and les form a tree with the les at the leaves.3Figure 1.4 suggests the tree that might represent the directory structure of atypical UNIX operating system. Directories are indicated by circles. The rootdirectory / contains directories called mnt, usr, bin, and so on. The directory/usr contains directories ann and bob; directory ann contains three les: a1,a2, and a3.2If you are unfamiliar with operating systems, you can skip the next paragraphs. However,most readers have probably encountered an operating system, perhaps under another name.For example, the Macintosh system is an operating system, although dierent terminologyis used. For example, a directory becomes a folder in Macintosh-ese.3However, links in directories may make it appear that a le or directory is part of severaldierent directories.SEC. 1.3 DATA MODELS 93. Processes are individual executions of programs. Processes take zero or more Processesstreams as input and produce zero or more streams as output. In the UNIXsystem, processes can be combined by pipes, where the output from one process Pipesis fed as input into the next process. The resulting composition of processescan be viewed as a single process with its own input and output.3 Example 1.2. Consider the UNIX command linebc | word | speakThe symbol | indicates a pipe, an operation that makes the output of the processon the left of this symbol be the input to the process on its right. The programbc is a desk calculator that takes arithmetic expressions, such as 2 + 3, as inputand produces the answer 5 as output. The program word translates numbers intowords; speak translates words into phoneme sequences, which are then uttered overa loudspeaker by a voice synthesizer. Connecting these three programs togetherby pipes turns this UNIX command line into a single process that behaves like atalking desk calculator. It takes as input arithmetic expressions and produces asoutput the spoken answers. This example also suggests that a complex task maybe implemented more easily as the composition of several simpler functions. 3There are many other aspects to an operating system, such as how it managessecurity of data and interaction with the user. However, even these few observationsshould make it apparent that the data model of an operating system is ratherdierent from the data model of a programming language.Another type of data model is found in text editors. Every text editors datamodel incorporates a notion of text strings and editing operations on text. The Text editorsdata model usually includes the notion of lines, which, like most les, are characterstrings. However, unlike les, lines may have associated line numbers. Lines mayalso be organized into larger units such as paragraphs, and operations on lines arenormally applicable anywhere within the line not just at the front, like the mostcommon le operations. The typical editor supports a notion of a current line(where the cursor is) and probably a current position within that line. Operationsperformed by the editor include various modications to lines, such as deletion orinsertion of characters within the line, deletion of lines, and creation of new lines.It is also possible in typical editors to search for features, such as specic characterstrings, among the lines of the le being edited.In fact, if you examine any other familiar piece of software, such as a spread-sheet or a video game, a pattern emerges. Each program that is designed to beused by others has its own data model, within which the user must work. Thedata models we meet are often radically dierent from one another, both in theprimitives they use to represent data and in the operations on that data that areoered to the user. Yet each data model is implemented, via data structures andthe programs that use them, in some programming language.The Data Model of CircuitsWe shall also meet in this book a data model for computer circuits. This model,called propositional logic, is most useful in the design of computers. Computersare composed of elementary components called gates. Each gate has one or more10 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTIONinputs and one output; the value of an input or output can be only 0 or 1. Agate performs a simple function such as AND, where the output is 1 if all theinputs are 1 and the output is 0 if one or more of the inputs are 0. At one levelof abstraction, computer design is the process of deciding how to connect gatesto perform the basic operations of a computer. There are many other levels ofabstraction associated with computer design as well.Figure 1.5 shows the usual symbol for an AND-gate, together with its truth table,which indicates the output value of the gate for each pair of input values.4Wediscuss truth tables in Chapter 12 and gates and their interconnections in Chapter13.xyAND zx0011y0101z0001Fig. 1.5. An AND-gate and its truth table.3 Example 1.3. To execute the C assignment statement a = b+c, a computerperforms the addition with an adder circuit. In the computer, all numbers arerepresented in binary notation using the two digits 0 and 1 (called binary digits, orbits for short). The familiar algorithm for adding decimal numbers, where we add Bitthe digits at the right end, generate a carry to the next place to the left, add thatcarry and the digits at that place, generate a carry to the next place to the left,and so on, works in binary as well.d czx yFig. 1.6. A one-bit adder: dz is the sum x + y + c.Out of a few gates, we can build a one-bit adder circuit, as suggested in Fig. One-bit adder1.6. Two input bits, x and y, and a carry-in bit c, are summed, resulting in a sumbit z and a carry-out bit d. To be precise, d is 1 if two or more of c, x, and y are 1,while z is 1 if an odd number (one or three) of c, x, and y are 1, as suggested by4Note that if we think of 1 as true and 0 as false, then the AND-gate performs the samelogical operation as the && operator of C.SEC. 1.3 DATA MODELS 11x y c d z0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1Fig. 1.7. Truth table for the one-bit adder.The Ripple-Carry Addition AlgorithmWe all have used the ripple-carry algorithm to add numbers in decimal. To add456 + 829, for example, one performs the steps suggested below:14 5 68 2 9504 5 68 2 98 54 5 68 2 91 2 8 5That is, at the rst step, we add the rightmost digits, 6+9 = 15. We write down the5 and carry the 1 to the second column. At the second step, we add the carry-in, 1,and the two digits in the second place from the right, to get 1+5+2 = 8. We writedown the 8, and the carry is 0. In the third step, we add the carry-in, 0, and thedigits in the third place from the right, to get 0 + 4 + 8 = 12. We write down the2, but since we are at the leftmost place, we do not carry the 1, but rather write itdown as the leftmost digit of the answer.Binary ripple-carry addition works the same way. However, at each place, thecarry and the digits being added are all either 0 or 1. The one-bit adder thusdescribes completely the addition table for a single place. That is, if all three bitsare 0, then the sum is 0, and so we write down 0 and carry 0. If one of the three is1, the sum is 1; we write down 1 and carry 0. If two of the three are 1, the sum is2, or 10 in binary; we write down 0 and carry 1. If all three are 1, then the sum is3, or 11 in binary, and so we write down 1 and carry 1. For example, to add 101 to111 using binary ripple-carry addition, the steps are11 0 11 1 1011 0 11 1 10 01 0 11 1 11 1 0 0the table of Fig. 1.7. The carry-out bit followed by the sum bit that is, dz forms a two-bit binary number, which is the total number of x, y, and c that are 1.In this sense, the one-bit adder adds its inputs.Many computers represent integers as 32-bit numbers. An adder circuit can12 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTIONthen be composed of 32 one-bit adders, as suggested in Fig. 1.8. This circuit is oftencalled a ripple-carry adder, because the carry ripples from right to left, one bit at a Ripple-carryadder time. Note that the carry into the rightmost (low-order bit) one-bit adder is always0. The sequence of bits x31x30 x0 represents the bits of the rst number beingadded, and y31y30 y0 is the second addend. The sum is dz31z30 z0; that is, theleading bit is the carry-out of the leftmost one-bit adder, and the following bits ofthe sum are the sum bits of the adders, from the left.0z0x0y0 z30x30y30z31x31y31dFig. 1.8. A ripple-carry adder: dz31z30 z0 = x31x30 x0 + y31y30 y0.The circuit of Fig. 1.8 is really an algorithm in the data model of bits and theprimitive operations of gates. However, it is not a particularly good algorithm. Thereason is that until we compute the carry-out of the rightmost place, we cannotcompute z1 or the carry-out of the second place. Until we compute the carry-out ofthe second place, we cannot compute z2 or the carry-out of the third place, and soon. Thus, the time taken by the circuit is the length of the numbers being added 32 in our case multiplied by the time needed by a one-bit adder.One might suspect that the need to ripple the carry through each of the one-bit adders, in turn, is inherent in the denition of addition. Thus, it may come asa surprise to the reader that computers have a much faster way of adding numbers.We shall cover such an improved algorithm for addition when we discuss the designof circuits in Chapter 13. 3EXERCISES1.3.1: Explain the dierence between the static and dynamic aspects of a datamodel.1.3.2: Describe the data model of your favorite video game. Distinguish betweenstatic and dynamic aspects of the model. Hint : The static parts are not just theparts of the game board that do not move. For example, in Pac Man, the staticpart includes not only the map, but the power pills, monsters, and so on.1.3.3: Describe the data model of your favorite text editor.1.3.4: Describe the data model of a spreadsheet program.SEC. 1.4 THE C DATA MODEL 1333 33 1.4 The C Data ModelIn this section we shall highlight important parts of the data model used by the Cprogramming language. As an example of a C program, consider the program inFig. 1.10 that uses the variable num to count the number of characters in its input.#include main(){int num;num = 0;while (getchar() != EOF)++num; /* add 1 to num */printf("%d\n", num);}Fig. 1.10. C program to count number of input characters.The rst line tells the C preprocessor to include as part of the source thestandard input/output le stdio.h, which contains the denitions of the functionsgetchar and printf, and the symbolic constant EOF, a value that represents theend of a le.A C program itself consists of a sequence of denitions, which can be eitherfunction denitions or data denitions. One must be a denition of a functioncalled main. The rst statement in the function body of the program in Fig. 1.10declares the variable num to be of type int. (All variables in a C program mustbe declared before their use.) The next statement initializes num to zero. Thefollowing while statement reads input characters one at a time using the libraryfunction getchar, incrementing num after each character read, until there are nomore input characters. The end of le is signaled by the special value EOF on the EOFinput. The printf statement prints the value of num as a decimal integer, followedby a newline character.The C Type SystemWe begin with the static part of the C data model, the type system, which describesthe values that data may have. We then discuss the dynamics of the C data model,that is, the operations that may be performed on data.In C, there is an innite set of types, any of which could be the type associatedwith a particular variable. These types, and the rules by which they are constructed,form the type system of C. The type system contains basic types such as integers,and a collection of type-formation rules with which we can construct progressivelymore complex types from types we already know. The basic types of C are1. Characters (char, signed char, unsigned char)2. Integers (int, short int, long int, unsigned)3. Floating-point numbers (float, double, long double)4. Enumerations (enum)14 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTIONIntegers and oating-point numbers are considered to be arithmetic types.The type-formation rules assume that we already have some types, which couldbe basic types or other types that we have already constructed using these rules.Here are some examples of the type formation rules in C:1. Array types. We can form an array whose elements are type T with the decla-rationT A[n]This statement declares an array A of n elements, each of type T. In C, arraysubscripts begin at 0, so the rst element is A[0] and the last element isA[n1]. Arrays can be constructed from characters, arithmetic types, pointers,structures, unions, or other arrays.2. Structure types. In C, a structure is a grouping of variables called members or Members of astructure elds. Within a structure dierent members can have dierent types, but eachmember must have elements of a single type. If T1, T2, . . . , Tn are types andM1, M2, . . . , Mn are member names, then the declarationstruct S {T1 M1;T2 M2; Tn Mn;}denes a structure whose tag (i.e., the name of its type) is S and that hasn members. The ith member has the name Mi and a value of type Ti, fori = 1, 2, . . . , n. Example 1.1 is an illustration of a structure. This structurehas tag CELL and two members. The rst member has name element and hasinteger type. The second has name next and its type is a pointer to a structureof the same type.The structure tag S is optional, but it provides a convenient shorthand forreferring to the type in later declarations. For example, the declarationstruct S myRecord;denes the variable myRecord to be a structure of type S.3. Union types. A union type allows a variable to have dierent types at dierenttimes during the execution of a program. The declarationunion {T1 M1;T2 M2; Tn Mn;} x;denes a variable x that can hold a value of any of the types T1, T2, . . . , Tn.The member names M1, M2, . . . , Mn help indicate which type the value of xshould be regarded as being. That is, x.Mi refers to the value of x treated asa value of type Ti.SEC. 1.4 THE C DATA MODEL 154. Pointer types. C is distinctive for its reliance on pointers. A variable of typepointer contains the address of a region of storage. We can access the value ofanother variable indirectly through a pointer. The declarationT *p;denes the variable p to be a pointer to a variable of type T. Thus p names abox of type pointer to T and the value in box p is a pointer. We often draw thevalue of p as an arrow, rather than as an object of type T itself, as shown inFig. 1.11. What really appears in the box named p is the address, or location,at which an object of type T is stored in the computer.Consider the declarationint x, *p;In C, the unary operator & is used to obtain the address of an object, so thestatementp = &x;assigns the address of x to p; that is, it makes p point to x.The unary operator * applied to p fetches the value of the box pointed to byp, so the statementy = *p;assigns to y the contents of whatever box p points to. If y is a variable of typeint, thenp = &x;y = *p;is equivalent to the assignmenty = x; p Object oftype TFig. 1.11. Variable p is of type pointer to T.3 Example 1.4. C has the typedef construct to create synonyms for type names.The declarationtypedef int Distance;16 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTIONtypedef int type1[10];typedef type1 *type2;typedef struct {int field1;type2 field2;} type3;typedef type3 type4[5];Fig. 1.12. Some C typedef declarations.09 field1field23(a) (b) (c) field1field2field1field23701234(d)Fig. 1.13. Visualization of type declarations in Fig. 1.12.SEC. 1.4 THE C DATA MODEL 17Types, Names, Variables, and IdentiersA number of terms associated with data objects have dierent meanings but areeasy to confuse. First, a type describes a shape for data objects. In C, a newname T may be dened for an existing type using the typedef constructtypedef THere the type descriptor is an expression that tells us the shape of objects of the Type descriptortype T.A typedef declaration for T does not actually create any objects of that type.An object of type T is created by a declaration of the formT x;Here, x is an identier, or variable name. Possibly, x is static (not local to anyfunction), in which case the box for x is created when the program starts. If x isnot static, then it is local to some function F. When F is called, a box whose nameis the x associated with this call to F is created. More precisely, the name of thebox is x, but only uses of the identier x during the execution of this call to F referto this box.As mentioned in the text, there can be many boxes each of whose name involvesthe identier x, since F may be recursive. There may even be other functions thatalso have used identier x to name one of their variables. Moreover, names aremore general than identiers, since there are many kinds of expressions that couldbe used to name boxes. For instance, we mentioned that *p could be the name ofan object pointed to by pointer p, and other names could be complex expressionssuch as (*p).f[2] or p->f[2]. The last two expressions are equivalent and referto the array element number 2 of the eld f of the structure pointed to by pointerp.allows the name Distance to be used in place of the type int.Consider the four typedef declarations in Fig. 1.12. In the conventional viewof data in C, an object of type type1 is an array with 10 slots, each holding aninteger, as suggested in Fig. 1.13(a). Likewise, objects of type type2 are pointers tosuch arrays, as in Fig. 1.13(b). Structures, like those of type3, are visualized as inFig. 1.13(c), with a slot for each eld; note that the name of the eld (e.g., field1)does not actually appear with the value of the eld. Finally, objects of the arraytype type4 would have ve slots, each of which holds an object of type type3, astructure we suggest in Fig. 1.13(d). 3FunctionsFunctions also have associated types, even though we do not associate boxes orvalues with functions, as we do with program variables. For any list of typesT1, T2, . . . , Tn, we can dene a function with n parameters consisting of those types,in order. This list of types followed by the type of the value returned by the function(the return-value) is the type of the function. If the function has no return value, Return-valueits type is void.In general, we can build types by applying the type-construction rules arbi-trarily, but there are a number of constraints. For example, we cannot construct an18 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTIONarray of functions, although we can construct an array of pointers to functions.The complete set of rules for constructing types in C can be found in the ANSIstandard.Operations in the C Data ModelThe operations on data in the C data model can be divided into three categories:1. Operations that create or destroy a data object.2. Operations that access and modify parts of a data object.3. Operations that combine the values of data objects to form a new value for adata object.Data Object Creation and DisposalFor data creation, C provides several rudimentary mechanisms. When a function iscalled, boxes for each of its local arguments (parameters) are created; these serveto hold the values of the arguments.Another mechanism for data creation is through the use of the library routinemalloc(n), which returns a pointer to n consecutive character positions of unusedstorage that can be used to store data by the caller of malloc. Data objects canthen be created in this storage region.C has the analogous methods for destroying data objects. Local parameters ofa function call cease to exist when the function returns. The routine free releasesthe storage created by malloc. In particular, the eect of calling free(p) is torelease the storage area pointed to by p. It is disastrous to use free to get rid ofan object that was not created by calling malloc.Data Access and ModicationC has mechanisms for accessing the components of objects. It uses a[i] to accessthe ith element of array a, x.m to access member m of a structure named x, and *pto access the object pointed to by pointer p.Modifying, or writing, values in C is done principally by the assignment oper-ators, which allow us to change the value of an object.3 Example 1.5. If a is a variable of type type4 dened in Example 1.4, thena[0].(*field2[3]) = 99;assigns the value 99 to the fourth element of the array pointed to by field2 in thestructure that is the rst element of the array a. 3Data CombinationC has a rich set of operators for manipulating and combining values. The principaloperators are1. Arithmetic operators. C provides:SEC. 1.4 THE C DATA MODEL 19a) The customary binary arithmetic operators +, , , / on integers andoating-point numbers. Integer division truncates (4/3 yields 1).b) There are the unary + and operators.c) The modulus operator i % j produces the remainder when i is dividedby j.d) The increment and decrement operators, ++ and --, applied to a singleinteger variable add or subtract 1 from that variable, respectively. Theseoperators can appear before or after their operand, depending on whetherwe wish the value of the expression to be computed before or after thechange in the variables value.2. Logical operators. C does not have a Boolean type; it uses zero to representthe logical value false, and nonzero to represent true.5C uses:a) && to represent AND. For example, the expression x && y returns 1 if bothoperands are nonzero, 0 otherwise. However, y is not evaluated if x hasthe value 0.b) || represents OR. The expression x || y returns 1 if either x or y isnonzero, and returns 0 otherwise. However, y is not evaluated if x isnonzero.c) The unary negation operator !x returns 0 if x is nonzero and returns 1 ifx = 0.d) The conditional operator is a ternary (3-argument) operator representedby a question mark and a colon. The expression x?y:z returns the valueof y if x is true (i.e., it is nonzero) and returns the value of z if x is false(i.e., 0).3. Comparison operators. The result of applying one of the six relational compar-ison operators (==, !=, , =) to integers or oating point numbersis 0 if the relation is false and 1 otherwise.4. Bitwise manipulation operators. C provides several useful bitwise logical op-erators, which treat integers as if they were bits strings equal to their binaryrepresentations. These include & for bitwise AND, | for bitwise inclusive-or, ^for bitwise exclusive-or, > for right shift, and a tilde for leftshift.5. Assignment operators. C uses = as the assignment operator. In addition, Callows expressions such asx = x + y;to be written in a shortened formx += y;Similar forms apply to the other binary arithmetic operators.5We shall use TRUE and FALSE as dened constants 1 and 0, respectively, to represent Booleanvalues; see Section 1.6.20 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION6. Coercion operators. Coercion is the process of converting a value of one typeinto an equivalent value of another type. For example, if x is a oating-pointnumber and i is an integer, then x = i causes the integer value of i to beconverted to a oating-point number with the same value. Here, the coercionoperator is not shown explicitly, but the C compiler can deduce that conversionfrom integer to oat is necessary and inserts the required step.EXERCISES1.4.1: Explain the dierence between an identier of a C program and a name (fora box or data object).1.4.2: Give an example of a C data object that has more than one name.1.4.3: If you are familiar with another programming language besides C, describeits type system and operations.33 33 1.5 Algorithms and the Design of ProgramsThe study of data models, their properties, and their appropriate use is one pillar ofcomputer science. A second, equally important pillar is the study of algorithms andtheir associated data structures. We need to know the best ways to perform commontasks, and we need to learn the principal techniques for designing good algorithms.Further, we need to understand how the use of data structures and algorithms tsinto the process of creating useful programs. The themes of data models, algorithms,data structures, and their implementation in programs are interdependent, and eachappears many times throughout the book. In this section, we shall mention somegeneralities regarding the design and implementation of programs.The Creation of SoftwareIn a programming class, when you were given a programming problem, you probablydesigned an algorithm to solve the problem, implemented the algorithm in somelanguage, compiled and ran the program on some sample data, and then submittedthe program to be graded.In a commercial setting, programs are written under rather dierent circum-stances. Algorithms, at least those simple enough and common enough to havenames, are usually small parts of a complete program. Programs, in turn, are usu-ally components of a larger system, involving hardware as well as software. Boththe programs and the complete systems in which they are embedded are developedby teams of programmers and engineers; there could be hundreds of people on sucha team.The development of a software system typically spans several phases. Although Softwaredevelopmentprocessthese phases may supercially bear some resemblance to the steps involved in solvingthe classroom programming assignment, most of the eort in building a softwaresystem to solve a given problem is not concerned with programming. Here is anidealized scenario.SEC. 1.5 ALGORITHMS AND THE DESIGN OF PROGRAMS 21Problem denition and specication. The hardest, but most important, part ofthe task of creating a software system is dening what the problem really is andthen specifying what is needed to solve it. Usually, problem denition begins byanalyzing the users requirements, but these requirements are often imprecise andhard to write down. The system architect may have to consult with the future usersof the system and iterate the specication, until both the specier and the usersare satised that the specication denes and solves the problem at hand. In thespecication stage, it may be helpful to build a simple prototype or model of the Prototypingnal system, to gain insight into its behavior and intended use. Data modeling isalso an important tool in the problem-denition phase.Design. Once the specication is complete, a high-level design of the system iscreated, with the major components identied. A document outlining the high-leveldesign is prepared, and performance requirements for the system may be included.More detailed specications of some of the major components may also be includedduring this phase. A cost-eective design often calls for the reuse or modication of Software reusepreviously constructed components. Various software methodologies such as object-oriented technology facilitate the reuse of components.Implementation. Once the design is xed, implementation of the components canproceed. Many of the algorithms discussed in this book are useful in the implemen-tation of new components. Once a component has been implemented, it is subjectto a series of tests to make sure that it behaves as specied.Integration and system testing. When the components have been implemented andindividually tested, the entire system is assembled and tested.Installation and eld testing. Once the developer is satised that the systemwill work to the customers satisfaction, the system is installed on the customerspremises and the nal eld testing takes place.Maintenance. At this point, we might think that the bulk of the work has beendone. Maintenance remains, however, and in many situations maintenance canaccount for more than half the cost of system development. Maintenance mayinvolve modifying components to eliminate unforeseen side-eects, to correct orimprove system performance, or to add features. Because maintenance is such animportant part of software systems design, it is important to write programs thatare correct, rugged, ecient, modiable, and whenever possible portable fromone computer to another.It is very important to catch errors as early as possible, preferably during theproblem-denition phase. At each successive phase, the cost of xing a design erroror programming bug rises greatly. Independent reviews of requirements and designsare benecial in reducing downstream errors.Programming StyleAn individual programmer can ease the maintenance burden greatly by writingprograms that others can read and modify readily. Good programming style comesonly with practice, and we recommend that you begin at once to try writing pro-grams that are easy for others to understand. There is no magic formula that willguarantee readable programs, but there are several useful rules of thumb:22 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION1. Modularize a program into coherent pieces.2. Lay out a program so that its structure is clear.3. Write intelligent comments to explain a program. Describe, clearly and pre-cisely, the underlying data models, the data structures selected to representthem, and the operation performed by each procedure. When describing a pro-cedure, state the assumptions made about its inputs, and tell how the outputrelates to the input.4. Use meaningful names for procedures and variables.5. Avoid explicit constants whenever possible. For example, do not use 7 for the Denedconstants number of dwarfs. Rather, use a dened constant such as NumberOfDwarfs, sothat you can easily change all uses of this constant to 8, if you decide to addanother dwarf.6. Avoid the use of global variables that is, variables dened for the program Global variablesas a whole unless the data represented by that variable really is used by mostof the procedures of the program.Another good programming practice is to maintain a test suite of inputs that Test suitewill try to exercise every line of code while you are developing a program. Whenevernew features are added to the program, the test suite can be run to make sure thatthe new program has the same behavior as the old on previously working inputs.33 33 1.6 Some C Conventions Used Throughout the BookThere are several denitions and conventions that we shall nd useful as we illustrateconcepts with C programs. Some of these are common conventions found in thestandard header le stdio.h, while others are dened specially for the purposes ofthis book and must be included with any C program that uses them.1. The identier NULL is a value that may appear anywhere a pointer can appear, NULLbut it is not a value that can ever point to anything. Thus, NULL in a eld suchas next in the cells of Example 1.1 can be used to indicate the end of a list.We shall see that NULL has a number of similar uses in other data structures.NULL is properly dened in stdio.h.2. The identiers TRUE and FALSE are dened by#define TRUE 1#define FALSE 0Thus, TRUE can be used anywhere a condition with logical value true is wanted, TRUE and FALSEand FALSE can be used for a condition whose value is false.3. The type BOOLEAN is dened as BOOLEANtypedef int BOOLEAN;We use BOOLEAN whenever we want to stress the fact that we are interested inthe logical rather than the numeric value of an expression.SEC. 1.7 SUMMARY OF CHAPTER 1 234. The identier EOF is a value that is returned by le-reading functions such as EOFgetchar() when there are no more bytes left to be read from the le. Anappropriate value for EOF is provided in stdio.h.5. We shall dene a macro that generates declarations of cells of the kind used inExample 1.1. An appropriate denition appears in Fig. 1.14. It declares cells Cell denitionwith two elds: an element eld whose type is given by the parameter Typeand a next eld to point to a cell with this structure. The macro providestwo external denitions: CellName is the name of structures of this type, andListName is a name for the type of pointers to these cells.#define DefCell(EltType, CellType, ListType) \typedef struct CellType *ListType; \struct CellType { \EltType element; \ListType next; \}Fig. 1.14. A macro for dening list cells.3 Example 1.6. We can dene cells of the type used in Example 1.1 by the macrouseDefCell(int, CELL, LIST);The macro then expands intotypedef struct CELL, LIST;struct CELL {int element;LIST next;}As a consequence, we can use CELL as the type of integer cells, and we can use LISTas the type of pointers to these cells. For example,CELL c;LIST L;denes c to be a cell and L to be a pointer to a cell. Note that the representationof a list of cells is normally a pointer to the rst cell on the list, or NULL if the listis empty. 333 33 1.7 Summary of Chapter 1At this point you should be aware of the following concepts:3 How data models, data structures, and algorithms are used to solve problems24 COMPUTER SCIENCE: THE MECHANIZATION OF ABSTRACTION3 The distinction between a list as a data model and a linked list as a datastructure3 The presence of some kind of data model in every software system, be it aprogramming language, an operating system, or an application program3 The key elements of the data model supported by the programming languageC3 The major steps in the development of a large software system33 33 1.8 Bibliographic Notes for Chapter 1Kernighan and Ritchie [1988] is the classic reference for the C programming lan-guage. Roberts [1994] is a good introduction to programming using C.Stroustrup [1991] has created an object-oriented extension of C called C++that is now widely used for implementing systems. Sethi [1989] provides an intro-duction to the data models of several major programming languages.Brooks [1974] eloquently describes the technical and managerial diculties indeveloping large software systems. Kernighan and Plauger [1978] provide soundadvice for improving your programming style.American National Standards Institute (ANSI) [1990]. Programming Language C,American National Standards Institute, New York.Brooks, F. P. [1974]. The Mythical Man Month, Addison-Wesley, Reading, Mass.Kernighan, B. W., and P. J. Plauger [1978]. The Elements of Programming Style,second edition, McGraw-Hill, New York.Kernighan, B. W., and D. M. Ritchie [1988]. The C Programming Language, secondedition, Prentice-Hall, Englewood Clis, New Jersey.Roberts, E. S. [1994]. A C-Based Introduction to Computer Science, Addison-Wesley, Reading, Mass.Sethi, R. [1989]. Programming Languages: Concepts and Constructs, Addison-Wesley, Reading, Mass.Stroustrup, B. [1991]. The C++ Programming Language, second edition, Addison-Wesley, Reading, Mass.CHAPTER 233 33Iteration,Induction,andRecursionThe power of computers comes from their ability to execute the same task, ordierent versions of the same task, repeatedly. In computing, the theme of iterationis met in a number of guises. Many concepts in data models, such as lists, are formsof repetition, as A list either is empty or is one element followed by another, thenanother, and so on. Programs and algorithms use iteration to perform repetitivejobs without requiring a large number of similar steps to be specied individually,as Do the next step 1000 times. Programming languages use looping constructs,like the while- and for-statements of C, to implement iterative algorithms.Closely related to repetition is recursion, a technique in which a concept isdened, directly or indirectly, in terms of itself. For example, we could have deneda list by saying A list either is empty or is an element followed by a list. Recursionis supported by many programming languages. In C, a function F can call itself,either directly from within the body of F itself, or indirectly by calling some otherfunction, which calls another, and another, and so on, until nally some functionin the sequence calls F. Another important idea, induction, is closely related torecursion and is used in many mathematical proofs.Iteration, induction, and recursion are fundamental concepts that appear inmany forms in data models, data structures, and algorithms. The following listgives some examples of uses of these concepts; each will be covered in some detailin this book.1. Iterative techniques. The simplest way to perform a sequence of operationsrepeatedly is to use an iterative construct such as the for-statement of C.2. Recursive programming. C and many other languages permit recursive func-tions, which call themselves either directly or indirectly. Often, beginning pro-grammers are more secure writing iterative programs than recursive ones, butan important goal of this book is to accustom the reader to thinking and pro-gramming recursively, when appropriate. Recursive programs can be simplerto write, analyze, and understand.2526 ITERATION, INDUCTION, AND RECURSIONNotation: The Summation and Product SymbolsAn oversized Greek capital letter sigma is often used to denote a summation, asin

ni=1 i. This particular expression represents the sum of the integers from 1 ton; that is, it stands for the sum 1 + 2 + 3 + + n. More generally, we can sumany function f(i) of the summation index i. (Of course, the index could be somesymbol other than i.) The expression

bi=a f(i) stands forf(a) + f(a + 1) + f(a + 2) + + f(b)For example,

mj=2 j2stands for the sum 4 +9 +16 + +m2. Here, the functionf is squaring, and we used index j instead of i.As a special case, if b < a, then there are no terms in the sum

bi=a f(i), andthe value of the expression, by convention, is taken to be 0. If b = a, then there isexactly one term, that for i = a. Thus, the value of the sum

ai=a f(i) is just f(a).The analogous notation for products uses an oversized capital pi. The expres-sion

bi=a f(i) stands for the product f(a) f(a + 1) f(a + 2) f(b); ifb < a, the product is taken to be 1.3. Proofs by induction. An important technique for showing that a statementis true is proof by induction. We shall cover inductive proofs extensively,starting in Section 2.3. The following is the simplest form of an inductiveproof. We begin with a statement S(n) involving a variable n; we wish toprove that S(n) is true. We prove S(n) by rst proving a basis, that is, the Basisstatement S(n) for a particular value of n. For example, we could let n = 0 andprove the statement S(0). Second, we must prove an inductive step, in which Inductive stepwe prove that the statement S, for one value of its argument, follows from thesame statement S for the previous values of its argument; that is, S(n) impliesS(n + 1) for all n 0. For example, S(n) might be the familiar summationformulan

i=1i = n(n + 1)/2 (2.1)which says that the sum of the integers from 1 to n equals n(n+1)/2. The basiscould be S(1) that is, Equation (2.1) with 1 in place of n which is just theequality 1 = 1 2/2. The inductive step is to show that

ni=1 i = n(n + 1)/2implies that

n+1i=1 i = (n +1)(n + 2)/2; the former is S(n), which is Equation(2.1) itself, while the latter is S(n + 1), which is Equation (2.1) with n + 1replacing n everywhere n appears. Section 2.3 will show us how to constructproofs such as this.4. Proofs of program correctness. In computer science, we often wish to prove,formally or informally, that a statement S(n) about a program is true. Thestatement S(n) might, for example, describe what is true on the nth iterationof some loop or what is true for the nth recursive call to some function. Proofsof this sort are generally inductive proofs.5. Inductive denitions. Many important concepts of computer science, especiallythose involving data models, are best dened by an induction in which we giveSEC. 2.2 ITERATION 27a basis rule dening the simplest example or examples of the concept, and aninductive rule or rules, where we build larger instances of the concept fromsmaller ones. For instance, we noted that a list can be dened by a basis rule(an empty list is a list) together with an inductive rule (an element followedby a list is also a list).6. Analysis of running time. An important criterion for the goodness of analgorithm is how long it takes to run on inputs of various sizes (its runningtime). When the algorithm involves recursion, we use a formula called arecurrence equation, which is an inductive denition that predicts how long thealgorithm takes to run on inputs of dierent sizes.Each of these subjects, except the last, is introduced in this chapter; the runningtime of a program is the topic of Chapter 3.33 33