Applied Mathematics: Body & Soul Vol I–III

Applied Mathematics: Body & Soul

Vol I–III

Kenneth Eriksson, Don Estep and Claes Johnson

July 29, 2003

To the students of Chemical Engineering at Chalmers during1998–2002, who enthusiastically participated in thedevelopment of the reform project behind this book.

Preface

I admit that each and every thing remains in its state until there is

reason for change. (Leibniz)

The Need of Reform of Mathematics Education

Mathematics education needs to be reformed as we now pass into the newmillennium. We share this conviction with a rapidly increasing number ofresearchers and teachers of both mathematics and topics of science andengineering based on mathematical modeling. The reason is of course thecomputer revolution, which has fundamentally changed the possibilities ofusing mathematical and computational techniques for modeling, simula-tion and control of real phenomena. New products and systems may bedeveloped and tested through computer simulation on time scales and atcosts which are orders of magnitude smaller than those using traditionaltechniques based on extensive laboratory testing, hand calculations andtrial and error.

At the heart of the new simulation techniques lie the new fields ofComputational Mathematical Modeling (CMM), including ComputationalMechanics, Physics, Fluid Dynamics, Electromagnetics and Chemistry, allbased on solving systems of differential equations using computers, com-bined with geometric modeling/Computer Aided Design (CAD). Compu-tational modeling is also finding revolutionary new applications in biology,medicine, environmental sciences, economy and financial markets.

VIII Preface

Education in mathematics forms the basis of science and engineeringeducation from undergraduate to graduate level, because engineering andscience are largely based on mathematical modeling. The level and thequality of mathematics education sets the level of the education as a whole.The new technology of CMM/CAD crosses borders between traditionalengineering disciplines and schools, and drives strong forces to modernizeengineering education in both content and form from basic to graduatelevel.

Our Reform Program

Our own reform work started some 20 years ago in courses in CMM atadvanced undergraduate level, and has through the years successively pen-etrated through the system to the basic education in calculus and linearalgebra. Our aim has become to develop a complete program for mathe-matics education in science and engineering from basic undergraduate tograduate education. As of now our program contains the series of books:

1. Computational Differential Equations, (CDE)

2. Applied Mathematics: Body & Soul I–III, (AM I–III)

3. Applied Mathematics: Body & Soul VI–, (AM IV–).

AM I–III is the present book in three volumes I–III covering the basicsof calculus and linear algebra. AM IV– offers a continuation with a seriesof volumes dedicated to specific areas of applications such as DynamicalSystems (IV), Fluid Mechanics (V), Solid Mechanics (VI) and Electromag-netics (VII), which will start appearing in 2003. CDE published in 1996may be be viewed as a first version of the whole Applied Mathematics: Body& Soul project.

Our program also contains a variety of software (collected in the Math-ematics Laboratory), and complementary material with step-by step in-structions for self-study, problems with solutions, and projects, all freelyavailable on-line from the web site of the book. Our ambition is to offera “box” containing a set of books, software and additional instructional ma-terial, which can serve as a basis for a full applied mathematics programin science and engineering from basic to graduate level. Of course, we hopethis to be an on-going project with new material being added gradually.

We have been running an applied mathematics program based on AMI–III from first year for the students of chemical engineering at Chalmerssince the Fall 99, and we have used parts of the material from AM IV– inadvanced undergraduate/beginning graduate courses.

Preface IX

Main Features of the Program:

The program is based on a synthesis of mathematics, computationand application.

The program is based on new literature, giving a new unified presen-tation from the start based on constructive mathematical methodsincluding a computational methodology for differential equations.

The program contains, as an integrated part, software at differentlevels of complexity.

The student acquires solid skills of implementing computational meth-ods and developing applications and software using Matlab.

The synthesis of mathematics and computation opens mathematicseducation to applications, and gives a basis for the effective use ofmodern mathematical methods in mechanics, physics, chemistry andapplied subjects.

The synthesis building on constructive mathematics gives a synergeticeffect allowing the study of complex systems already in the basic ed-ucation, including the basic models of mechanical systems, heat con-duction, wave propagation, elasticity, fluid flow, electro-magnetism,reaction-diffusion, molecular dynamics, as well as corresponding multi-physics problems.

The program increases the motivation of the student by applyingmathematical methods to interesting and important concrete prob-lems already from the start.

Emphasis may be put on problem solving, project work and presen-tation.

The program gives theoretical and computational tools and buildsconfidence.

The program contains most of the traditional material from basiccourses in analysis and linear algebra

The program includes much material often left out in traditional pro-grams such as constructive proofs of all the basic theorems in analysisand linear algebra and advanced topics such as nonlinear systems ofalgebraic/differential equations.

Emphasis is put on giving the student a solid understanding of basicmathematical concepts such as real numbers, Cauchy sequences, Lips-chitz continuity, and constructive tools for solving algebraic/differen-tial equations, together with an ability to utilize these tools in ad-vanced applications such as molecular dynamics.

X Preface

The program may be run at different levels of ambition concerningboth mathematical analysis and computation, while keeping a com-mon basic core.

AM I–III in Brief

Roughly speaking, AM I–III contains a synthesis of calculus and linearalgebra including computational methods and a variety of applications.Emphasis is put on constructive/computational methods with the doubleaim of making the mathematics both understandable and useful. Our am-bition is to introduce the student early (from the perspective of traditionaleducation) to both advanced mathematical concepts (such as Lipschitzcontinuity, Cauchy sequence, contraction mapping, initial-value problemfor systems of differential equations) and advanced applications such asLagrangian mechanics, n-body systems, population models, elasticity andelectrical circuits, with an approach based on constructive/computationalmethods.

Thus the idea is that making the student comfortable with both ad-vanced mathematical concepts and modern computational techniques, willopen a wealth of possibilities of applying mathematics to problems of realinterest. This is in contrast to traditional education where the emphasis isusually put on a set of analytical techniques within a conceptual frameworkof more limited scope. For example: we already lead the student in the sec-ond quarter to write (in Matlab) his/her own solver for general systems ofordinary differential equations based on mathematically sound principles(high conceptual and computational level), while traditional education atthe same time often focuses on training the student to master a bag oftricks for symbolic integration. We also teach the student some tricks tothat purpose, but our overall goal is different.

Constructive Mathematics: Body & Soul

In our work we have been led to the conviction that the constructive as-pects of calculus and linear algebra need to be strengthened. Of course,constructive and computational mathematics are closely related and thedevelopment of the computer has boosted computational mathematics inrecent years. Mathematical modeling has two basic dual aspects: one sym-bolic and the other constructive-numerical, which reflect the duality be-tween the infinite and the finite, or the continuous and the discrete. Thetwo aspects have been closely intertwined throughout the development ofmodern science from the development of calculus in the work of Euler, La-grange, Laplace and Gauss into the work of von Neumann in our time. For

Preface XI

example, Laplace’s monumental Mecanique Celeste in five volumes presentsa symbolic calculus for a mathematical model of gravitation taking the formof Laplace’s equation, together with massive numerical computations giv-ing concrete information concerning the motion of the planets in our solarsystem.

However, beginning with the search for rigor in the foundations of cal-culus in the 19th century, a split between the symbolic and construc-tive aspects gradually developed. The split accelerated with the inven-tion of the electronic computer in the 1940s, after which the construc-tive aspects were pursued in the new fields of numerical analysis andcomputing sciences, primarily developed outside departments of mathe-matics. The unfortunate result today is that symbolic mathematics andconstructive-numerical mathematics by and large are separate disciplinesand are rarely taught together. Typically, a student first meets calcu-lus restricted to its symbolic form and then much later, in a differentcontext, is confronted with the computational side. This state of affairslacks a sound scientific motivation and causes severe difficulties in coursesin physics, mechanics and applied sciences which build on mathematicalmodeling.

New possibilies are opened by creating from the start a synthesis ofconstructive and symbolic mathematics representing a synthesis of Body& Soul: with computational techniques available the students may becomefamiliar with nonlinear systems of differential equations already in earlycalculus, with a wealth of applications. Another consequence is that thebasics of calculus, including concepts like real number, Cauchy sequence,convergence, fixed point iteration, contraction mapping, is lifted out ofthe wardrobe of mathematical obscurities into the real world with directpractical importance. In one shot one can make mathematics educationboth deeper and broader and lift it to a higher level. This idea underlies thepresent book, which thus in the setting of a standard engineering program,contains all the basic theorems of calculus including the proofs normallytaught only in special honors courses, together with advanced applicationssuch as systems of nonlinear differential equations. We have found that thisseemingly impossible program indeed works surprisingly well. Admittedly,this is hard to believe without making real life experiments. We hope thereader will feel encouraged to do so.

Lipschitz Continuity and Cauchy Sequences

The usual definition of the basic concepts of continuity and derivative,which is presented in most Calculus text books today, build on the conceptof limit: a real valued function f(x) of a real variable x is said to be con-tinuous at x if limx→x f(x) = f(x), and f(x) is said to be differentiable at

XII Preface

x with derivative f ′(x) if

limx→x

f(x) − f(x)x− x

exists and equals f ′(x). We use different definitions, where the concept oflimit does not intervene: we say that a real-valued function f(x) is Lipschitzcontinuous with Lipschitz constant Lf on an interval [a, b] if for all x, x ∈[a, b], we have

|f(x) − f(x)| ≤ Lf |x− x|.

Further, we say that f(x) is differentiable at x with derivative f ′(x) if thereis a constant Kf (x) such that for all x close to x

|f(x) − f(x) − f ′(x)(x− x)| ≤ Kf(x)|x− x|2.

This means that we put somewhat more stringent requirements on theconcepts of continuity and differentiability than is done in the usual def-initions; more precisely, we impose quantitative measures in the form ofthe constants Lf and Kf (x), whereas the usual definitions using limits arepurely qualitative.

Using these more stringent definitions we avoid pathological situations,which can only be confusing to the student (in particular in the beginning)and, as indicated, we avoid using the (difficult) concept of limit in a settingwhere in fact no limit processes are really taking place. Thus, we do not leadthe student to definitions of continuity and differentiability suggesting thatall the time the variable x is tending to some value x, that is, all the timesome kind of (strange?) limit process is taking place. In fact, continuityexpresses that the difference f(x) − f(x) is small if x − x is small, anddifferentiability expresses that f(x) locally is close to a linear function, andto express these facts we do not have to invoke any limit processes.

These are examples of our leading philosophy of giving Calculus a quan-titative form, instead of the usual purely qualitative form, which we believehelps both understanding and precision. We believe the price to pay forthese advantages is usually well worth paying, and the loss in generalityare only some pathological cases of little interest. We can in a natural wayrelax our definitions, for example to Holder continuity, while still keepingthe quantitative aspect, and thereby increase the pathology of the excep-tional cases.

The usual definitions of continuity and differentiability strive for maximalgenerality, typically considered to be a virtue by a pure mathematician,which however has pathological side effects. With a constructive point ofview the interesting world is the constructible world and maximality is notan important issue in itself.

Of course, we do not stay away from limit processes, but we concen-trate on issues where the concept of limit really is central, most notably in

Preface XIII

defining the concept of a real number as the limit of a Cauchy sequence ofrational numbers, and a solution of an algebraic or differential equation asthe limit of a Cauchy sequence of approximate solutions. Thus, we give theconcept of Cauchy sequence a central role, while maintaining a constructiveapproach seeking constructive processes for generating Cauchy sequences.

In standard Calculus texts, the concepts of Cauchy sequence and Lip-schitz continuity are not used, believing them to be too difficult to bepresented to freshmen, while the concept of real number is left undefined(seemingly believing that a freshman is so familiar with this concept fromearly life that no further discussion is needed). In contrast, in our construc-tive approach these concepts play a central role already from start, and inparticular we give a good deal of attention to the fundamental aspect of theconstructibility of real numbers (viewed as possibly never-ending decimalexpansions).

We emphasize that taking a constructive approach does not make math-ematical life more difficult in any important way, as is often claimed by theruling mathematical school of formalists/logicists: All theorems of interestin Calculus and Linear Algebra survive, with possibly some small unessen-tial modifications to keep the quantitative aspect and make the proofs moreprecise. As a result we are able to present basic theorems such as Con-traction Mapping Principle, Implicit Function theorem, Inverse Functiontheorem, Convergence of Newton’s Method, in a setting of several variableswith complete proofs as a part of our basic Calculus, while these resultsin the standard curriculum are considered to be much too difficult for thislevel.

Proofs and Theorems

Most mathematics books including Calculus texts follow a theorem-proofstyle, where first a theorem is presented and then a corresponding proofis given. This is seldom appreciated very much by the students, who oftenhave difficulties with the role and nature of the proof concept.

We usually turn this around and first present a line of thought leading tosome result, and then we state a corresponding theorem as a summary ofthe hypothesis and the main result obtained. We thus rather use a proof-theorem format. We believe this is in fact often more natural than thetheorem-proof style, since by first presenting the line of thought the differ-ent ingredients, like hypotheses, may be introduced in a logical order. Theproof will then be just like any other line of thought, where one successivelyderives consequences from some starting point using different hypothesisas one goes along. We hope this will help to eliminate the often perceivedmystery of proofs, simply because the student will not be aware of the factthat a proof is being presented; it will just be a logical line of thought, like

XIV Preface

any logical line of thought in everyday life. Only when the line of thoughtis finished, one may go back and call it a proof, and in a theorem collectthe main result arrived at, including the required hypotheses. As a conse-quence, in the Latex version of the book we do use a theorem-environment,but not any proof-environment; the proof is just a logical line of thoughtpreceding a theorem collecting the hypothesis and the main result.

The Mathematics Laboratory

We have developed various pieces of software to support our program intowhat we refer to as the Mathematics Laboratory. Some of the softwareserves the purpose of illustrating mathematical concepts such as roots ofequations, Lipschitz continuity, fixed point iteration, differentiability, thedefinition of the integral and basic calculus for functions of several vari-ables; other pieces are supposed to be used as models for the students owncomputer realizations; finally some pieces are aimed at applications such assolvers for differential equations. New pieces are being added continuously.Our ambition is to also add different multi-media realizations of variousparts of the material.

In our program the students get a training from start in using Matlabas a tool for computation. The development of the constructive mathe-matical aspects of the basic topics of real numbers, functions, equations,derivatives and integrals, goes hand in hand with experience of solvingequations with fixed point iteration or Newton’s method, quadrature, andnumerical methods or differential equations. The students see from theirown experience that abstract symbolic concepts have roots deep down intoconstructive computation, which also gives a direct coupling to applicationsand physical reality.

Go to http://www.phi.chalmers.se/bodysoul/

The Applied Mathematics: Body & Soul project has a web site contain-ing additional instructional material and the Mathematics Laboratory. Wehope that the web site for the student will be a good friend helping to(independently) digest and progress through the material, and that for theteacher it may offer inspiration. We also hope the web site may serve asa forum for exchange of ideas and experience related the project, and wetherefore invite both students and teachers to submit material.

Preface XV

Acknowledgment

The authors of this book want to thank sincerely the following colleaguesand graduate students for contributing valuable material, corrections andsuggestions for improvement: Rickard Bergstrom, Niklas Eriksson, JohanHoffman, Mats Larson, Stig Larsson, Marten Levenstam, Anders Logg,Klas Samuelsson and Nils Svanstedt, all actively participating in the devel-opment of our reform project. And again, sincere thanks to all the studentsof chemical engineering at Chalmers who carried the burden of being ex-posed to new material often in incomplete form, and who have given muchenthusiastic criticism and feed-back.

The source of mathematicians pictures is the MacTutor History of Math-ematics archive, and some images are copied from old volumes of Deadalus,the yearly report from The Swedish Museum of Technology.

My heart is sad and lonelyfor you I sigh, dear, onlyWhy haven’t you seen it

I’m all for you body and soul(Green, Body and Soul)

Contents Volume 1

Derivatives and Geometry in R3 1

1 What is Mathematics? 31.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 31.2 The Modern World . . . . . . . . . . . . . . . . . . . . 31.3 The Role of Mathematics . . . . . . . . . . . . . . . . 61.4 Design and Production of Cars . . . . . . . . . . . . . 111.5 Navigation: From Stars to GPS . . . . . . . . . . . . . 111.6 Medical Tomography . . . . . . . . . . . . . . . . . . . 111.7 Molecular Dynamics and Medical Drug Design . . . . 121.8 Weather Prediction and Global Warming . . . . . . . . 131.9 Economy: Stocks and Options . . . . . . . . . . . . . . 131.10 Languages . . . . . . . . . . . . . . . . . . . . . . . . . 141.11 Mathematics as the Language of Science . . . . . . . . 151.12 The Basic Areas of Mathematics . . . . . . . . . . . . 161.13 What Is Science? . . . . . . . . . . . . . . . . . . . . . 171.14 What Is Conscience? . . . . . . . . . . . . . . . . . . . 171.15 How to View this Book as a Friend . . . . . . . . . . . 18

2 The Mathematics Laboratory 212.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Math Experience . . . . . . . . . . . . . . . . . . . . . 22

XVIII Contents Volume 1

3 Introduction to Modeling 253.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 253.2 The Dinner Soup Model . . . . . . . . . . . . . . . . . 253.3 The Muddy Yard Model . . . . . . . . . . . . . . . . . 283.4 A System of Equations . . . . . . . . . . . . . . . . . . 293.5 Formulating and Solving Equations . . . . . . . . . . . 30

4 A Very Short Calculus Course 334.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 334.2 Algebraic Equations . . . . . . . . . . . . . . . . . . . 344.3 Differential Equations . . . . . . . . . . . . . . . . . . 344.4 Generalization . . . . . . . . . . . . . . . . . . . . . . . 394.5 Leibniz’ Teen-Age Dream . . . . . . . . . . . . . . . . 414.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 434.7 Leibniz . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 Natural Numbers and Integers 475.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 475.2 The Natural Numbers . . . . . . . . . . . . . . . . . . 485.3 Is There a Largest Natural Number? . . . . . . . . . . 515.4 The Set N of All Natural Numbers . . . . . . . . . . . 525.5 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . 535.6 Absolute Value and the Distance

Between Numbers . . . . . . . . . . . . . . . . . . . . . 565.7 Division with Remainder . . . . . . . . . . . . . . . . . 575.8 Factorization into Prime Factors . . . . . . . . . . . . 585.9 Computer Representation of Integers . . . . . . . . . . 59

6 Mathematical Induction 636.1 Induction . . . . . . . . . . . . . . . . . . . . . . . . . 636.2 Changes in a Population of Insects . . . . . . . . . . . 68

7 Rational Numbers 717.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 717.2 How to Construct the Rational Numbers . . . . . . . . 727.3 On the Need for Rational Numbers . . . . . . . . . . . 757.4 Decimal Expansions of Rational Numbers . . . . . . . 757.5 Periodic Decimal Expansions of Rational Numbers . . 767.6 Set Notation . . . . . . . . . . . . . . . . . . . . . . . . 807.7 The Set Q of All Rational Numbers . . . . . . . . . . . 817.8 The Rational Number Line and Intervals . . . . . . . . 827.9 Growth of Bacteria . . . . . . . . . . . . . . . . . . . . 837.10 Chemical Equilibrium . . . . . . . . . . . . . . . . . . 85

Contents Volume 1 XIX

8 Pythagoras and Euclid 878.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 878.2 Pythagoras Theorem . . . . . . . . . . . . . . . . . . . 878.3 The Sum of the Angles of a Triangle is 180 . . . . . . 898.4 Similar Triangles . . . . . . . . . . . . . . . . . . . . . 918.5 When Are Two Straight Lines Orthogonal? . . . . . . 918.6 The GPS Navigator . . . . . . . . . . . . . . . . . . . . 948.7 Geometric Definition of sin(v) and cos(v) . . . . . . . 968.8 Geometric Proof of Addition Formulas for cos(v) . . . 978.9 Remembering Some Area Formulas . . . . . . . . . . . 988.10 Greek Mathematics . . . . . . . . . . . . . . . . . . . . 988.11 The Euclidean Plane Q

2 . . . . . . . . . . . . . . . . . 998.12 From Pythagoras to Euclid to Descartes . . . . . . . . 1008.13 Non-Euclidean Geometry . . . . . . . . . . . . . . . . 101

9 What is a Function? 1039.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1039.2 Functions in Daily Life . . . . . . . . . . . . . . . . . . 1069.3 Graphing Functions of Integers . . . . . . . . . . . . . 1099.4 Graphing Functions of Rational Numbers . . . . . . . 1129.5 A Function of Two Variables . . . . . . . . . . . . . . 1149.6 Functions of Several Variables . . . . . . . . . . . . . . 116

10 Polynomial functions 11910.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 11910.2 Linear Polynomials . . . . . . . . . . . . . . . . . . . . 12010.3 Parallel Lines . . . . . . . . . . . . . . . . . . . . . . . 12410.4 Orthogonal Lines . . . . . . . . . . . . . . . . . . . . . 12410.5 Quadratic Polynomials . . . . . . . . . . . . . . . . . . 12510.6 Arithmetic with Polynomials . . . . . . . . . . . . . . 12910.7 Graphs of General Polynomials . . . . . . . . . . . . . 13510.8 Piecewise Polynomial Functions . . . . . . . . . . . . . 137

11 Combinations of functions 14111.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 14111.2 Sum of Two Functions and Product

of a Function with a Number . . . . . . . . . . . . . . 14211.3 Linear Combinations of Functions . . . . . . . . . . . . 14211.4 Multiplication and Division of Functions . . . . . . . . 14311.5 Rational Functions . . . . . . . . . . . . . . . . . . . . 14311.6 The Composition of Functions . . . . . . . . . . . . . . 145

12 Lipschitz Continuity 14912.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 14912.2 The Lipschitz Continuity of a Linear Function . . . . . 150

XX Contents Volume 1

12.3 The Definition of Lipschitz Continuity . . . . . . . . . 15112.4 Monomials . . . . . . . . . . . . . . . . . . . . . . . . . 15412.5 Linear Combinations of Functions . . . . . . . . . . . . 15712.6 Bounded Functions . . . . . . . . . . . . . . . . . . . . 15812.7 The Product of Functions . . . . . . . . . . . . . . . . 15912.8 The Quotient of Functions . . . . . . . . . . . . . . . . 16012.9 The Composition of Functions . . . . . . . . . . . . . . 16112.10 Functions of Two Rational Variables . . . . . . . . . . 16212.11 Functions of Several Rational Variables . . . . . . . . . 163

13 Sequences and limits 16513.1 A First Encounter with Sequences and Limits . . . . . 16513.2 Socket Wrench Sets . . . . . . . . . . . . . . . . . . . . 16713.3 J.P. Johansson’s Adjustable Wrenches . . . . . . . . . 16913.4 The Power of Language:

From Infinitely Many to One . . . . . . . . . . . . . . 16913.5 The ε−N Definition of a Limit . . . . . . . . . . . . . 17013.6 A Converging Sequence Has a Unique Limit . . . . . . 17413.7 Lipschitz Continuous Functions and Sequences . . . . 17513.8 Generalization to Functions of Two Variables . . . . . 17613.9 Computing Limits . . . . . . . . . . . . . . . . . . . . 17713.10 Computer Representation of Rational Numbers . . . . 18013.11 Sonya Kovalevskaya . . . . . . . . . . . . . . . . . . . 181

14 The Square Root of Two 18514.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 18514.2

√2 Is Not a Rational Number! . . . . . . . . . . . . . 187

14.3 Computing√

2 by the Bisection Algorithm . . . . . . . 18814.4 The Bisection Algorithm Converges! . . . . . . . . . . 18914.5 First Encounters with Cauchy Sequences . . . . . . . . 19214.6 Computing

√2 by the Deca-section Algorithm . . . . . 192

15 Real numbers 19515.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 19515.2 Adding and Subtracting Real Numbers . . . . . . . . . 19715.3 Generalization to f(x, x) with f Lipschitz . . . . . . . 19915.4 Multiplying and Dividing Real Numbers . . . . . . . . 20015.5 The Absolute Value . . . . . . . . . . . . . . . . . . . . 20015.6 Comparing Two Real Numbers . . . . . . . . . . . . . 20015.7 Summary of Arithmetic with Real Numbers . . . . . . 20115.8 Why

√2√

2 Equals 2 . . . . . . . . . . . . . . . . . . . 20115.9 A Reflection on the Nature of

√2 . . . . . . . . . . . . 202

15.10 Cauchy Sequences of Real Numbers . . . . . . . . . . . 20315.11 Extension from f : Q → Q to f : R → R . . . . . . . . 20415.12 Lipschitz Continuity of Extended Functions . . . . . . 205

Contents Volume 1 XXI

15.13 Graphing Functions f : R → R . . . . . . . . . . . . . 20615.14 Extending a Lipschitz Continuous Function . . . . . . 20615.15 Intervals of Real Numbers . . . . . . . . . . . . . . . . 20715.16 What Is f(x) if x Is Irrational? . . . . . . . . . . . . . 20815.17 Continuity Versus Lipschitz Continuity . . . . . . . . . 211

16 The Bisection Algorithm for f(x) = 0 21516.1 Bisection . . . . . . . . . . . . . . . . . . . . . . . . . . 21516.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . 21716.3 Computational Cost . . . . . . . . . . . . . . . . . . . 219

17 Do Mathematicians Quarrel?* 22117.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 22117.2 The Formalists . . . . . . . . . . . . . . . . . . . . . . 22417.3 The Logicists and Set Theory . . . . . . . . . . . . . . 22417.4 The Constructivists . . . . . . . . . . . . . . . . . . . . 22717.5 The Peano Axiom System for Natural Numbers . . . . 22917.6 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . 22917.7 Cantor Versus Kronecker . . . . . . . . . . . . . . . . . 23017.8 Deciding Whether a Number is Rational or Irrational . 23217.9 The Set of All Possible Books . . . . . . . . . . . . . . 23317.10 Recipes and Good Food . . . . . . . . . . . . . . . . . 23417.11 The “New Math” in Elementary Education . . . . . . 23417.12 The Search for Rigor in Mathematics . . . . . . . . . . 23517.13 A Non-Constructive Proof . . . . . . . . . . . . . . . . 23617.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 237

18 The Function y = xr 24118.1 The Function

√x . . . . . . . . . . . . . . . . . . . . . 241

18.2 Computing with the Function√x . . . . . . . . . . . . 242

18.3 Is√x Lipschitz Continuous on R

+? . . . . . . . . . . . 24218.4 The Function xr for Rational r = p

q . . . . . . . . . . . 24318.5 Computing with the Function xr . . . . . . . . . . . . 24318.6 Generalizing the Concept of Lipschitz Continuity . . . 24318.7 Turbulent Flow is Holder (Lipschitz) Continuous

with Exponent 13 . . . . . . . . . . . . . . . . . . . . . 244

19 Fixed Points and Contraction Mappings 24519.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 24519.2 Contraction Mappings . . . . . . . . . . . . . . . . . . 24619.3 Rewriting f(x) = 0 as x = g(x) . . . . . . . . . . . . . 24719.4 Card Sales Model . . . . . . . . . . . . . . . . . . . . . 24819.5 Private Economy Model . . . . . . . . . . . . . . . . . 24919.6 Fixed Point Iteration in the Card Sales Model . . . . . 25019.7 A Contraction Mapping Has a Unique Fixed Point . . 254

XXII Contents Volume 1

19.8 Generalization to g : [a, b] → [a, b] . . . . . . . . . . . . 25619.9 Linear Convergence in Fixed Point Iteration . . . . . . 25719.10 Quicker Convergence . . . . . . . . . . . . . . . . . . . 25819.11 Quadratic Convergence . . . . . . . . . . . . . . . . . . 259

20 Analytic Geometry in R2 265

20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 26520.2 Descartes, Inventor of Analytic Geometry . . . . . . . 26620.3 Descartes: Dualism of Body and Soul . . . . . . . . . . 26620.4 The Euclidean Plane R

2 . . . . . . . . . . . . . . . . . 26720.5 Surveyors and Navigators . . . . . . . . . . . . . . . . 26920.6 A First Glimpse of Vectors . . . . . . . . . . . . . . . . 27020.7 Ordered Pairs as Points or Vectors/Arrows . . . . . . . 27120.8 Vector Addition . . . . . . . . . . . . . . . . . . . . . . 27220.9 Vector Addition and the Parallelogram Law . . . . . . 27320.10 Multiplication of a Vector by a Real Number . . . . . 27420.11 The Norm of a Vector . . . . . . . . . . . . . . . . . . 27520.12 Polar Representation of a Vector . . . . . . . . . . . . 27520.13 Standard Basis Vectors . . . . . . . . . . . . . . . . . . 27720.14 Scalar Product . . . . . . . . . . . . . . . . . . . . . . 27820.15 Properties of the Scalar Product . . . . . . . . . . . . 27820.16 Geometric Interpretation of the Scalar Product . . . . 27920.17 Orthogonality and Scalar Product . . . . . . . . . . . . 28020.18 Projection of a Vector onto a Vector . . . . . . . . . . 28120.19 Rotation by 90 . . . . . . . . . . . . . . . . . . . . . . 28320.20 Rotation by an Arbitrary Angle θ . . . . . . . . . . . . 28520.21 Rotation by θ Again! . . . . . . . . . . . . . . . . . . . 28620.22 Rotating a Coordinate System . . . . . . . . . . . . . . 28620.23 Vector Product . . . . . . . . . . . . . . . . . . . . . . 28720.24 The Area of a Triangle with a Corner at the Origin . . 29020.25 The Area of a General Triangle . . . . . . . . . . . . . 29020.26 The Area of a Parallelogram Spanned

by Two Vectors . . . . . . . . . . . . . . . . . . . . . . 29120.27 Straight Lines . . . . . . . . . . . . . . . . . . . . . . . 29220.28 Projection of a Point onto a Line . . . . . . . . . . . . 29420.29 When Are Two Lines Parallel? . . . . . . . . . . . . . 29420.30 A System of Two Linear Equations

in Two Unknowns . . . . . . . . . . . . . . . . . . . . 29520.31 Linear Independence and Basis . . . . . . . . . . . . . 29720.32 The Connection to Calculus in One Variable . . . . . . 29820.33 Linear Mappings f : R

2 → R . . . . . . . . . . . . . . . 29920.34 Linear Mappings f : R

2 → R2 . . . . . . . . . . . . . . 299

20.35 Linear Mappings and Linear Systems of Equations . . 30020.36 A First Encounter with Matrices . . . . . . . . . . . . 30020.37 First Applications of Matrix Notation . . . . . . . . . 302

Contents Volume 1 XXIII

20.38 Addition of Matrices . . . . . . . . . . . . . . . . . . . 30320.39 Multiplication of a Matrix by a Real Number . . . . . 30320.40 Multiplication of Two Matrices . . . . . . . . . . . . . 30320.41 The Transpose of a Matrix . . . . . . . . . . . . . . . . 30520.42 The Transpose of a 2-Column Vector . . . . . . . . . . 30520.43 The Identity Matrix . . . . . . . . . . . . . . . . . . . 30520.44 The Inverse of a Matrix . . . . . . . . . . . . . . . . . 30620.45 Rotation in Matrix Form Again! . . . . . . . . . . . . 30620.46 A Mirror in Matrix Form . . . . . . . . . . . . . . . . 30720.47 Change of Basis Again! . . . . . . . . . . . . . . . . . . 30820.48 Queen Christina . . . . . . . . . . . . . . . . . . . . . 309

21 Analytic Geometry in R3 313

21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 31321.2 Vector Addition and Multiplication by a Scalar . . . . 31521.3 Scalar Product and Norm . . . . . . . . . . . . . . . . 31521.4 Projection of a Vector onto a Vector . . . . . . . . . . 31621.5 The Angle Between Two Vectors . . . . . . . . . . . . 31621.6 Vector Product . . . . . . . . . . . . . . . . . . . . . . 31721.7 Geometric Interpretation of the Vector Product . . . . 31921.8 Connection Between Vector Products in R

2 and R3 . . 320

21.9 Volume of a Parallelepiped Spannedby Three Vectors . . . . . . . . . . . . . . . . . . . . . 320

21.10 The Triple Product a · b× c . . . . . . . . . . . . . . . 32121.11 A Formula for the Volume Spanned

by Three Vectors . . . . . . . . . . . . . . . . . . . . . 32221.12 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32321.13 Projection of a Point onto a Line . . . . . . . . . . . . 32421.14 Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . 32421.15 The Intersection of a Line and a Plane . . . . . . . . . 32621.16 Two Intersecting Planes Determine a Line . . . . . . . 32721.17 Projection of a Point onto a Plane . . . . . . . . . . . 32821.18 Distance from a Point to a Plane . . . . . . . . . . . . 32821.19 Rotation Around a Given Vector . . . . . . . . . . . . 32921.20 Lines and Planes Through the Origin Are Subspaces . 33021.21 Systems of 3 Linear Equations in 3 Unknowns . . . . . 33021.22 Solving a 3 × 3-System by Gaussian Elimination . . . 33221.23 3 × 3 Matrices: Sum, Product and Transpose . . . . . 33321.24 Ways of Viewing a System of Linear Equations . . . . 33521.25 Non-Singular Matrices . . . . . . . . . . . . . . . . . . 33621.26 The Inverse of a Matrix . . . . . . . . . . . . . . . . . 33621.27 Different Bases . . . . . . . . . . . . . . . . . . . . . . 33721.28 Linearly Independent Set of Vectors . . . . . . . . . . 33721.29 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . 33821.30 Linear Transformations Versus Matrices . . . . . . . . 338

XXIV Contents Volume 1

21.31 The Scalar Product Is InvariantUnder Orthogonal Transformations . . . . . . . . . . . 339

21.32 Looking Ahead to Functions f : R3 → R

3 . . . . . . . 34021.34 Gosta Mittag-Leffler . . . . . . . . . . . . . . . . . . . 343

22 Complex Numbers 34522.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 34522.2 Addition and Multiplication . . . . . . . . . . . . . . . 34622.3 The Triangle Inequality . . . . . . . . . . . . . . . . . 34722.4 Open Domains . . . . . . . . . . . . . . . . . . . . . . 34822.5 Polar Representation of Complex Numbers . . . . . . . 34822.6 Geometrical Interpretation of Multiplication . . . . . . 34822.7 Complex Conjugation . . . . . . . . . . . . . . . . . . 34922.8 Division . . . . . . . . . . . . . . . . . . . . . . . . . . 35022.9 The Fundamental Theorem of Algebra . . . . . . . . . 35022.10 Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . 35122.11 Solving a Quadratic Equation w2 + 2bw + c = 0 . . . . 351

23 The Derivative 35323.1 Rates of Change . . . . . . . . . . . . . . . . . . . . . 35323.2 Paying Taxes . . . . . . . . . . . . . . . . . . . . . . . 35423.3 Hiking . . . . . . . . . . . . . . . . . . . . . . . . . . . 35723.4 Definition of the Derivative . . . . . . . . . . . . . . . 35723.5 The Derivative of a Linear Function Is Constant . . . 36023.6 The Derivative of x2 Is 2x . . . . . . . . . . . . . . . . 36023.7 The Derivative of xn Is nxn−1 . . . . . . . . . . . . . . 36223.8 The Derivative of 1

x Is − 1x2 for x = 0 . . . . . . . . . . 363

23.9 The Derivative as a Function . . . . . . . . . . . . . . 36323.10 Denoting the Derivative of f(x) by Df(x) . . . . . . . 36323.11 Denoting the Derivative of f(x) by df

dx . . . . . . . . . 36523.12 The Derivative as a Limit of Difference Quotients . . . 36523.13 How to Compute a Derivative? . . . . . . . . . . . . . 36723.14 Uniform Differentiability on an Interval . . . . . . . . 36923.15 A Bounded Derivative Implies Lipschitz Continuity . . 37023.16 A Slightly Different Viewpoint . . . . . . . . . . . . . . 37223.17 Swedenborg . . . . . . . . . . . . . . . . . . . . . . . . 372

24 Differentiation Rules 37524.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 37524.2 The Linear Combination Rule . . . . . . . . . . . . . . 37624.3 The Product Rule . . . . . . . . . . . . . . . . . . . . 37724.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . 37824.5 The Quotient Rule . . . . . . . . . . . . . . . . . . . . 37924.6 Derivatives of Derivatives: f (n) = Dnf = dnf

dxn . . . . . 38024.7 One-Sided Derivatives . . . . . . . . . . . . . . . . . . 381

Contents Volume 1 XXV

24.8 Quadratic Approximation . . . . . . . . . . . . . . . . 38224.9 The Derivative of an Inverse Function . . . . . . . . . 38524.10 Implicit Differentiation . . . . . . . . . . . . . . . . . . 38624.11 Partial Derivatives . . . . . . . . . . . . . . . . . . . . 38724.12 A Sum Up So Far . . . . . . . . . . . . . . . . . . . . . 388

25 Newton’s Method 39125.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 39125.2 Convergence of Fixed Point Iteration . . . . . . . . . . 39125.3 Newton’s Method . . . . . . . . . . . . . . . . . . . . . 39225.4 Newton’s Method Converges Quadratically . . . . . . . 39325.5 A Geometric Interpretation of Newton’s Method . . . 39425.6 What Is the Error of an Approximate Root? . . . . . . 39525.7 Stopping Criterion . . . . . . . . . . . . . . . . . . . . 39825.8 Globally Convergent Newton Methods . . . . . . . . . 398

26 Galileo, Newton, Hooke, Malthus and Fourier 40126.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 40126.2 Newton’s Law of Motion . . . . . . . . . . . . . . . . . 40226.3 Galileo’s Law of Motion . . . . . . . . . . . . . . . . . 40226.4 Hooke’s Law . . . . . . . . . . . . . . . . . . . . . . . . 40526.5 Newton’s Law plus Hooke’s Law . . . . . . . . . . . . 40626.6 Fourier’s Law for Heat Flow . . . . . . . . . . . . . . . 40726.7 Newton and Rocket Propulsion . . . . . . . . . . . . . 40826.8 Malthus and Population Growth . . . . . . . . . . . . 41026.9 Einstein’s Law of Motion . . . . . . . . . . . . . . . . . 41126.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 412

References 415

Index 417

Contents Volume 2

Integrals and Geometry in Rn 425

27 The Integral 42727.1 Primitive Functions and Integrals . . . . . . . . . . . . 42727.2 Primitive Function of f(x) = xm for m = 0, 1, 2, . . . . . 43127.3 Primitive Function of f(x) = xm for m = −2,−3, . . . . 43227.4 Primitive Function of f(x) = xr for r = −1 . . . . . . 43227.5 A Quick Overview of the Progress So Far . . . . . . . 43327.6 A “Very Quick Proof” of the Fundamental Theorem . 43327.7 A “Quick Proof” of the Fundamental Theorem . . . . 43527.8 A Proof of the Fundamental Theorem of Calculus . . . 43627.9 Comments on the Notation . . . . . . . . . . . . . . . 44227.10 Alternative Computational Methods . . . . . . . . . . 44327.11 The Cyclist’s Speedometer . . . . . . . . . . . . . . . . 44327.12 Geometrical Interpretation of the Integral . . . . . . . 44427.13 The Integral as a Limit of Riemann Sums . . . . . . . 44627.14 An Analog Integrator . . . . . . . . . . . . . . . . . . . 447

28 Properties of the Integral 45128.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 45128.2 Reversing the Order of Upper and Lower Limits . . . . 45228.3 The Whole Is Equal to the Sum of the Parts . . . . . . 452

XXVIII Contents Volume 2

28.4 Integrating Piecewise LipschitzContinuous Functions . . . . . . . . . . . . . . . . . . 453

28.5 Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . 45428.6 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . 45528.7 The Triangle Inequality for Integrals . . . . . . . . . . 45528.8 Differentiation and Integration

are Inverse Operations . . . . . . . . . . . . . . . . . . 45628.9 Change of Variables or Substitution . . . . . . . . . . . 45728.10 Integration by Parts . . . . . . . . . . . . . . . . . . . 45928.11 The Mean Value Theorem . . . . . . . . . . . . . . . . 46028.12 Monotone Functions and the Sign of the Derivative . . 46228.13 A Function with Zero Derivative is Constant . . . . . . 46228.14 A Bounded Derivative Implies Lipschitz Continuity . . 46328.15 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . 46328.16 October 29, 1675 . . . . . . . . . . . . . . . . . . . . . 46628.17 The Hodometer . . . . . . . . . . . . . . . . . . . . . . 467

29 The Logarithm log(x) 47129.1 The Definition of log(x) . . . . . . . . . . . . . . . . . 47129.2 The Importance of the Logarithm . . . . . . . . . . . . 47229.3 Important Properties of log(x) . . . . . . . . . . . . . 473

30 Numerical Quadrature 47730.1 Computing Integrals . . . . . . . . . . . . . . . . . . . 47730.2 The Integral as a Limit of Riemann Sums . . . . . . . 48130.3 The Midpoint Rule . . . . . . . . . . . . . . . . . . . . 48230.4 Adaptive Quadrature . . . . . . . . . . . . . . . . . . . 483

31 The Exponential Function exp(x) = ex 48931.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 48931.2 Construction of the Exponential exp(x) for x ≥ 0 . . . 49131.3 Extension of the Exponential exp(x) to x < 0 . . . . . 49631.4 The Exponential Function exp(x) for x ∈ R . . . . . . 49631.5 An Important Property of exp(x) . . . . . . . . . . . . 49731.6 The Inverse of the Exponential is the Logarithm . . . 49831.7 The Function ax with a > 0 and x ∈ R . . . . . . . . . 499

32 Trigonometric Functions 50332.1 The Defining Differential Equation . . . . . . . . . . . 50332.2 Trigonometric Identities . . . . . . . . . . . . . . . . . 50732.3 The Functions tan(x) and cot(x) and Their Derivatives 50832.4 Inverses of Trigonometric Functions . . . . . . . . . . . 50932.5 The Functions sinh(x) and cosh(x) . . . . . . . . . . . 51132.6 The Hanging Chain . . . . . . . . . . . . . . . . . . . . 51232.7 Comparing u′′ + k2u(x) = 0 and u′′ − k2u(x) = 0 . . . 513

Contents Volume 2 XXIX

33 The Functions exp(z), log(z), sin(z) and cos(z) for z ∈ C 51533.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 51533.2 Definition of exp(z) . . . . . . . . . . . . . . . . . . . . 51533.3 Definition of sin(z) and cos(z) . . . . . . . . . . . . . . 51633.4 de Moivres Formula . . . . . . . . . . . . . . . . . . . . 51633.5 Definition of log(z) . . . . . . . . . . . . . . . . . . . . 517

34 Techniques of Integration 51934.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 51934.2 Rational Functions: The Simple Cases . . . . . . . . . 52034.3 Rational Functions: Partial Fractions . . . . . . . . . . 52134.4 Products of Polynomial and Trigonometric

or Exponential Functions . . . . . . . . . . . . . . . . 52634.5 Combinations of Trigonometric and Root Functions . . 52634.6 Products of Exponential and Trigonometric Functions 52734.7 Products of Polynomials and Logarithm Functions . . 527

35 Solving Differential Equations Using the Exponential 52935.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 52935.2 Generalization to u′(x) = λ(x)u(x) + f(x) . . . . . . . 53035.3 The Differential Equation u′′(x) − u(x) = 0 . . . . . . 53435.4 The Differential Equation

∑nk=0 akD

ku(x) = 0 . . . . 53535.5 The Differential Equation

∑nk=0 akD

ku(x) = f(x) . . . 53635.6 Euler’s Differential Equation . . . . . . . . . . . . . . . 537

36 Improper Integrals 53936.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 53936.2 Integrals Over Unbounded Intervals . . . . . . . . . . . 53936.3 Integrals of Unbounded Functions . . . . . . . . . . . . 541

37 Series 54537.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 54537.2 Definition of Convergent Infinite Series . . . . . . . . . 54637.3 Positive Series . . . . . . . . . . . . . . . . . . . . . . . 54737.4 Absolutely Convergent Series . . . . . . . . . . . . . . 55037.5 Alternating Series . . . . . . . . . . . . . . . . . . . . . 55037.6 The Series

∑∞i=1

1i Theoretically Diverges! . . . . . . . 551

37.7 Abel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55337.8 Galois . . . . . . . . . . . . . . . . . . . . . . . . . . . 554

38 Scalar Autonomous Initial Value Problems 55738.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 55738.2 An Analytical Solution Formula . . . . . . . . . . . . . 55838.3 Construction of the Solution . . . . . . . . . . . . . . . 561

XXX Contents Volume 2

39 Separable Scalar Initial Value Problems 56539.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 56539.2 An Analytical Solution Formula . . . . . . . . . . . . . 56639.3 Volterra-Lotka’s Predator-Prey Model . . . . . . . . . 56839.4 A Generalization . . . . . . . . . . . . . . . . . . . . . 569

40 The General Initial Value Problem 57340.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 57340.2 Determinism and Materialism . . . . . . . . . . . . . . 57540.3 Predictability and Computability . . . . . . . . . . . . 57540.4 Construction of the Solution . . . . . . . . . . . . . . . 57740.5 Computational Work . . . . . . . . . . . . . . . . . . . 57840.6 Extension to Second Order Initial Value Problems . . 57940.7 Numerical Methods . . . . . . . . . . . . . . . . . . . . 580

41 Calculus Tool Bag I 58341.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 58341.2 Rational Numbers . . . . . . . . . . . . . . . . . . . . 58341.3 Real Numbers. Sequences and Limits . . . . . . . . . . 58441.4 Polynomials and Rational Functions . . . . . . . . . . 58441.5 Lipschitz Continuity . . . . . . . . . . . . . . . . . . . 58541.6 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 58541.7 Differentiation Rules . . . . . . . . . . . . . . . . . . . 58541.8 Solving f(x) = 0 with f : R → R . . . . . . . . . . . . 58641.9 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 58741.10 The Logarithm . . . . . . . . . . . . . . . . . . . . . . 58841.11 The Exponential . . . . . . . . . . . . . . . . . . . . . 58941.12 The Trigonometric Functions . . . . . . . . . . . . . . 58941.13 List of Primitive Functions . . . . . . . . . . . . . . . . 59241.14 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 59241.15 The Differential Equation u+ λ(x)u(x) = f(x) . . . . 59341.16 Separable Scalar Initial Value Problems . . . . . . . . 593

42 Analytic Geometry in Rn 595

42.1 Introduction and Survey of Basic Objectives . . . . . . 59542.2 Body/Soul and Artificial Intelligence . . . . . . . . . . 59842.3 The Vector Space Structure of R

n . . . . . . . . . . . . 59842.4 The Scalar Product and Orthogonality . . . . . . . . . 59942.5 Cauchy’s Inequality . . . . . . . . . . . . . . . . . . . . 60042.6 The Linear Combinations of a Set of Vectors . . . . . 60142.7 The Standard Basis . . . . . . . . . . . . . . . . . . . . 60242.8 Linear Independence . . . . . . . . . . . . . . . . . . . 60342.9 Reducing a Set of Vectors to Get a Basis . . . . . . . . 60442.10 Using Column Echelon Form to Obtain a Basis . . . . 60542.11 Using Column Echelon Form to Obtain R(A) . . . . . 606

Contents Volume 2 XXXI

42.12 Using Row Echelon Form to Obtain N(A) . . . . . . . 60842.13 Gaussian Elimination . . . . . . . . . . . . . . . . . . . 61042.14 A Basis for R

n Contains n Vectors . . . . . . . . . . . 61042.15 Coordinates in Different Bases . . . . . . . . . . . . . . 61242.16 Linear Functions f : R

n → R . . . . . . . . . . . . . . 61342.17 Linear Transformations f : R

n → Rm . . . . . . . . . . 613

42.18 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 61442.19 Matrix Calculus . . . . . . . . . . . . . . . . . . . . . . 61542.20 The Transpose of a Linear Transformation . . . . . . . 61742.21 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . 61842.22 The Lipschitz Constant of a Linear Transformation . . 61942.23 Volume in R

n: Determinants and Permutations . . . . 61942.24 Definition of the Volume V (a1, . . . , an) . . . . . . . . . 62142.25 The Volume V (a1, a2) in R

2 . . . . . . . . . . . . . . . 62242.26 The Volume V (a1, a2, a3) in R

3 . . . . . . . . . . . . . 62242.27 The Volume V (a1, a2, a3, a4) in R

4 . . . . . . . . . . . 62342.28 The Volume V (a1, . . . , an) in R

n . . . . . . . . . . . . 62342.29 The Determinant of a Triangular Matrix . . . . . . . . 62342.30 Using the Column Echelon Form to Compute detA . . 62342.31 The Magic Formula detAB = detAdetB . . . . . . . 62442.32 Test of Linear Independence . . . . . . . . . . . . . . . 62442.33 Cramer’s Solution for Non-Singular Systems . . . . . . 62642.34 The Inverse Matrix . . . . . . . . . . . . . . . . . . . . 62742.35 Projection onto a Subspace . . . . . . . . . . . . . . . 62842.36 An Equivalent Characterization of the Projection . . . 62942.37 Orthogonal Decomposition: Pythagoras Theorem . . . 63042.38 Properties of Projections . . . . . . . . . . . . . . . . . 63142.39 Orthogonalization: The Gram-Schmidt Procedure . . . 63142.40 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . 63242.41 Invariance of the Scalar Product

Under Orthogonal Transformations . . . . . . . . . . . 63242.42 The QR-Decomposition . . . . . . . . . . . . . . . . . 63342.43 The Fundamental Theorem of Linear Algebra . . . . . 63342.44 Change of Basis: Coordinates and Matrices . . . . . . 63542.45 Least Squares Methods . . . . . . . . . . . . . . . . . . 636

43 The Spectral Theorem 63943.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . 63943.2 Basis of Eigenvectors . . . . . . . . . . . . . . . . . . . 64143.3 An Easy Spectral Theorem for Symmetric Matrices . . 64243.4 Applying the Spectral Theorem to an IVP . . . . . . . 64343.5 The General Spectral Theorem

for Symmetric Matrices . . . . . . . . . . . . . . . . . 64443.6 The Norm of a Symmetric Matrix . . . . . . . . . . . . 64643.7 Extension to Non-Symmetric Real Matrices . . . . . . 647

XXXII Contents Volume 2

44 Solving Linear Algebraic Systems 64944.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 64944.2 Direct Methods . . . . . . . . . . . . . . . . . . . . . . 64944.3 Direct Methods for Special Systems . . . . . . . . . . . 65644.4 Iterative Methods . . . . . . . . . . . . . . . . . . . . . 65944.5 Estimating the Error of the Solution . . . . . . . . . . 66944.6 The Conjugate Gradient Method . . . . . . . . . . . . 67244.7 GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . 674

45 Linear Algebra Tool Bag 68345.1 Linear Algebra in R

2 . . . . . . . . . . . . . . . . . . . 68345.2 Linear Algebra in R

3 . . . . . . . . . . . . . . . . . . . 68445.3 Linear Algebra in R

n . . . . . . . . . . . . . . . . . . . 68445.4 Linear Transformations and Matrices . . . . . . . . . . 68545.5 The Determinant and Volume . . . . . . . . . . . . . . 68645.6 Cramer’s Formula . . . . . . . . . . . . . . . . . . . . . 68645.7 Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 68745.8 Projections . . . . . . . . . . . . . . . . . . . . . . . . 68745.9 The Fundamental Theorem of Linear Algebra . . . . . 68745.10 The QR-Decomposition . . . . . . . . . . . . . . . . . 68745.11 Change of Basis . . . . . . . . . . . . . . . . . . . . . . 68845.12 The Least Squares Method . . . . . . . . . . . . . . . 68845.13 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . 68845.14 The Spectral Theorem . . . . . . . . . . . . . . . . . . 68845.15 The Conjugate Gradient Method for Ax = b . . . . . . 688

46 The Matrix Exponential exp(xA) 68946.1 Computation of exp(xA) when A Is Diagonalizable . . 69046.2 Properties of exp(Ax) . . . . . . . . . . . . . . . . . . 69246.3 Duhamel’s Formula . . . . . . . . . . . . . . . . . . . . 692

47 Lagrange and the Principle of Least Action* 69547.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 69547.2 A Mass-Spring System . . . . . . . . . . . . . . . . . . 69747.3 A Pendulum with Fixed Support . . . . . . . . . . . . 69847.4 A Pendulum with Moving Support . . . . . . . . . . . 69947.5 The Principle of Least Action . . . . . . . . . . . . . . 69947.6 Conservation of the Total Energy . . . . . . . . . . . . 70147.7 The Double Pendulum . . . . . . . . . . . . . . . . . . 70147.8 The Two-Body Problem . . . . . . . . . . . . . . . . . 70247.9 Stability of the Motion of a Pendulum . . . . . . . . . 703

Contents Volume 2 XXXIII

48 N-Body Systems* 70748.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 70748.2 Masses and Springs . . . . . . . . . . . . . . . . . . . . 70848.3 The N -Body Problem . . . . . . . . . . . . . . . . . . 71048.4 Masses, Springs and Dashpots:

Small Displacements . . . . . . . . . . . . . . . . . . . 71148.5 Adding Dashpots . . . . . . . . . . . . . . . . . . . . . 71248.6 A Cow Falling Down Stairs . . . . . . . . . . . . . . . 71348.7 The Linear Oscillator . . . . . . . . . . . . . . . . . . . 71448.8 The Damped Linear Oscillator . . . . . . . . . . . . . 71548.9 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . 717

49 The Crash Model* 71949.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 71949.2 The Simplified Growth Model . . . . . . . . . . . . . . 72049.3 The Simplified Decay Model . . . . . . . . . . . . . . . 72249.4 The Full Model . . . . . . . . . . . . . . . . . . . . . . 723

50 Electrical Circuits* 72750.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 72750.2 Inductors, Resistors and Capacitors . . . . . . . . . . . 72850.3 Building Circuits: Kirchhoff’s Laws . . . . . . . . . . . 72950.4 Mutual Induction . . . . . . . . . . . . . . . . . . . . . 730

51 String Theory* 73351.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 73351.2 A Linear System . . . . . . . . . . . . . . . . . . . . . 73451.3 A Soft System . . . . . . . . . . . . . . . . . . . . . . . 73551.4 A Stiff System . . . . . . . . . . . . . . . . . . . . . . . 73551.5 Phase Plane Analysis . . . . . . . . . . . . . . . . . . . 736

52 Piecewise Linear Approximation 73952.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 73952.2 Linear Interpolation on [0, 1] . . . . . . . . . . . . . . . 74052.3 The Space of Piecewise Linear Continuous Functions . 74552.4 The L2 Projection into Vh . . . . . . . . . . . . . . . . 747

53 FEM for Two-Point Boundary Value Problems 75353.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 75353.2 Initial Boundary-Value Problems . . . . . . . . . . . . 75653.3 Stationary Boundary Value Problems . . . . . . . . . . 75753.4 The Finite Element Method . . . . . . . . . . . . . . . 757

XXXIV Contents Volume 2

53.5 The Discrete System of Equations . . . . . . . . . . . . 76053.6 Handling Different Boundary Conditions . . . . . . . . 76353.7 Error Estimates and Adaptive Error Control . . . . . . 76653.8 Discretization of Time-Dependent

Reaction-Diffusion-Convection Problems . . . . . . . . 77153.9 Non-Linear Reaction-Diffusion-Convection Problems . 771

References 775

Index 777

Contents Volume 3

Calculus in Several Dimensions 785

54 Vector-Valued Functions of Several Real Variables 78754.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 78754.2 Curves in R

n . . . . . . . . . . . . . . . . . . . . . . . 78854.3 Different Parameterizations of a Curve . . . . . . . . . 78954.4 Surfaces in R

n, n ≥ 3 . . . . . . . . . . . . . . . . . . . 79054.5 Lipschitz Continuity . . . . . . . . . . . . . . . . . . . 79054.6 Differentiability: Jacobian, Gradient and Tangent . . . 79254.7 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . 79654.8 The Mean Value Theorem . . . . . . . . . . . . . . . . 79754.9 Direction of Steepest Descent and the Gradient . . . . 79854.10 A Minimum Point Is a Stationary Point . . . . . . . . 80054.11 The Method of Steepest Descent . . . . . . . . . . . . 80054.12 Directional Derivatives . . . . . . . . . . . . . . . . . . 80154.13 Higher Order Partial Derivatives . . . . . . . . . . . . 80254.14 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . 80354.15 The Contraction Mapping Theorem . . . . . . . . . . . 80454.16 Solving f(x) = 0 with f : R

n → Rn . . . . . . . . . . . 806

54.17 The Inverse Function Theorem . . . . . . . . . . . . . 80754.18 The Implicit Function Theorem . . . . . . . . . . . . . 80854.19 Newton’s Method . . . . . . . . . . . . . . . . . . . . . 80954.20 Differentiation Under the Integral Sign . . . . . . . . . 810

XXXVI Contents Volume 3

55 Level Curves/Surfaces and the Gradient 81355.1 Level Curves . . . . . . . . . . . . . . . . . . . . . . . 81355.2 Local Existence of Level Curves . . . . . . . . . . . . . 81555.3 Level Curves and the Gradient . . . . . . . . . . . . . 81555.4 Level Surfaces . . . . . . . . . . . . . . . . . . . . . . . 81655.5 Local Existence of Level Surfaces . . . . . . . . . . . . 81755.6 Level Surfaces and the Gradient . . . . . . . . . . . . . 817

56 Linearization and Stability of Initial Value Problems 82156.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 82156.2 Stationary Solutions . . . . . . . . . . . . . . . . . . . 82256.3 Linearization at a Stationary Solution . . . . . . . . . 82256.4 Stability Analysis when f ′(u) Is Symmetric . . . . . . 82356.5 Stability Factors . . . . . . . . . . . . . . . . . . . . . 82456.6 Stability of Time-Dependent Solutions . . . . . . . . . 82756.7 Sum Up . . . . . . . . . . . . . . . . . . . . . . . . . . 827

57 Adaptive Solvers for IVPs 82957.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 82957.2 The cG(1) Method . . . . . . . . . . . . . . . . . . . . 83057.3 Adaptive Time Step Control for cG(1) . . . . . . . . . 83257.4 Analysis of cG(1) for a Linear Scalar IVP . . . . . . . 83257.5 Analysis of cG(1) for a General IVP . . . . . . . . . . 83557.6 Analysis of Backward Euler for a General IVP . . . . . 83657.7 Stiff Initial Value Problems . . . . . . . . . . . . . . . 83857.8 On Explicit Time-Stepping for Stiff Problems . . . . . 840

58 Lorenz and the Essence of Chaos* 84758.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 84758.2 The Lorenz System . . . . . . . . . . . . . . . . . . . . 84858.3 The Accuracy of the Computations . . . . . . . . . . . 85058.4 Computability of the Lorenz System . . . . . . . . . . 85258.5 The Lorenz Challenge . . . . . . . . . . . . . . . . . . 854

59 The Solar System* 85759.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 85759.2 Newton’s Equation . . . . . . . . . . . . . . . . . . . . 86059.3 Einstein’s Equation . . . . . . . . . . . . . . . . . . . . 86159.4 The Solar System as a System of ODEs . . . . . . . . 86259.5 Predictability and Computability . . . . . . . . . . . . 86559.6 Adaptive Time-Stepping . . . . . . . . . . . . . . . . . 86659.7 Limits of Computability and Predictability . . . . . . 867

Contents Volume 3 XXXVII

60 Optimization 86960.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 86960.2 Sorting if Ω Is Finite . . . . . . . . . . . . . . . . . . . 87060.3 What if Ω Is Not Finite? . . . . . . . . . . . . . . . . . 87160.4 Existence of a Minimum Point . . . . . . . . . . . . . . 87260.5 The Derivative Is Zero at an Interior Minimum Point . 87260.6 The Role of the Hessian . . . . . . . . . . . . . . . . . 87660.7 Minimization Algorithms: Steepest Descent . . . . . . 87660.8 Existence of a Minimum Value and Point . . . . . . . 87760.9 Existence of Greatest Lower Bound . . . . . . . . . . . 87960.10 Constructibility of a Minimum Value and Point . . . . 88060.11 A Decreasing Bounded Sequence Converges! . . . . . . 880

61 The Divergence, Rotation and Laplacian 88361.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 88361.2 The Case of R

2 . . . . . . . . . . . . . . . . . . . . . . 88461.3 The Laplacian in Polar Coordinates . . . . . . . . . . . 88561.4 Some Basic Examples . . . . . . . . . . . . . . . . . . 88661.5 The Laplacian Under Rigid Coordinate Transformations 88661.6 The Case of R

3 . . . . . . . . . . . . . . . . . . . . . . 88761.7 Basic Examples, Again . . . . . . . . . . . . . . . . . . 88861.8 The Laplacian in Spherical Coordinates . . . . . . . . 889

62 Meteorology and Coriolis Forces* 89162.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 89162.2 A Basic Meteorological Model . . . . . . . . . . . . . . 89262.3 Rotating Coordinate Systems

and Coriolis Acceleration . . . . . . . . . . . . . . . . . 893

63 Curve Integrals 89763.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 89763.2 The Length of a Curve in R

2 . . . . . . . . . . . . . . 89763.3 Curve Integral . . . . . . . . . . . . . . . . . . . . . . . 89963.4 Reparameterization . . . . . . . . . . . . . . . . . . . . 90063.5 Work and Line Integrals . . . . . . . . . . . . . . . . . 90163.6 Work and Gradient Fields . . . . . . . . . . . . . . . . 90263.7 Using the Arclength as a Parameter . . . . . . . . . . 90363.8 The Curvature of a Plane Curve . . . . . . . . . . . . 90463.9 Extension to Curves in R

n . . . . . . . . . . . . . . . . 905

64 Double Integrals 90964.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 90964.2 Double Integrals over the Unit Square . . . . . . . . . 91064.3 Double Integrals via One-Dimensional Integration . . . 91364.4 Generalization to an Arbitrary Rectangle . . . . . . . 916

XXXVIII Contents Volume 3

64.5 Interpreting the Double Integral as a Volume . . . . . 91664.6 Extension to General Domains . . . . . . . . . . . . . 91764.7 Iterated Integrals over General Domains . . . . . . . . 91964.8 The Area of a Two-Dimensional Domain . . . . . . . . 92064.9 The Integral as the Limit of a General Riemann Sum . 92064.10 Change of Variables in a Double Integral . . . . . . . . 921

65 Surface Integrals 92765.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 92765.2 Surface Area . . . . . . . . . . . . . . . . . . . . . . . 92765.3 The Surface Area of a the Graph of a Function

of Two Variables . . . . . . . . . . . . . . . . . . . . . 93065.4 Surfaces of Revolution . . . . . . . . . . . . . . . . . . 93065.5 Independence of Parameterization . . . . . . . . . . . . 93165.6 Surface Integrals . . . . . . . . . . . . . . . . . . . . . 93265.7 Moment of Inertia of a Thin Spherical Shell . . . . . . 933

66 Multiple Integrals 93766.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 93766.2 Triple Integrals over the Unit Cube . . . . . . . . . . . 93766.3 Triple Integrals over General Domains in R

3 . . . . . . 93866.4 The Volume of a Three-Dimensional Domain . . . . . 93966.5 Triple Integrals as Limits of Riemann Sums . . . . . . 94066.6 Change of Variables in a Triple Integral . . . . . . . . 94166.7 Solids of Revolution . . . . . . . . . . . . . . . . . . . 94366.8 Moment of Inertia of a Ball . . . . . . . . . . . . . . . 944

67 Gauss’ Theorem and Green’s Formula in R2 947

67.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 94767.2 The Special Case of a Square . . . . . . . . . . . . . . 94867.3 The General Case . . . . . . . . . . . . . . . . . . . . . 948

68 Gauss’ Theorem and Green’s Formula in R3 957

68.1 George Green (1793–1841) . . . . . . . . . . . . . . . . 960

69 Stokes’ Theorem 96369.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 96369.2 The Special Case of a Surface in a Plane . . . . . . . . 96569.3 Generalization to an Arbitrary Plane Surface . . . . . 96669.4 Generalization to a Surface Bounded by a Plane Curve 967

70 Potential Fields 97170.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 97170.2 An Irrotational Field Is a Potential Field . . . . . . . . 97270.3 A Counter-Example for a Non-Convex Ω . . . . . . . . 974

Contents Volume 3 XXXIX

71 Center of Mass and Archimedes’ Principle* 97571.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 97571.2 Center of Mass . . . . . . . . . . . . . . . . . . . . . . 97671.3 Archimedes’ Principle . . . . . . . . . . . . . . . . . . 97971.4 Stability of Floating Bodies . . . . . . . . . . . . . . . 981

72 Newton’s Nightmare* 985

73 Laplacian Models 99173.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 99173.2 Heat Conduction . . . . . . . . . . . . . . . . . . . . . 99173.3 The Heat Equation . . . . . . . . . . . . . . . . . . . . 99473.4 Stationary Heat Conduction: Poisson’s Equation . . . 99573.5 Convection-Diffusion-Reaction . . . . . . . . . . . . . . 99773.6 Elastic Membrane . . . . . . . . . . . . . . . . . . . . . 99773.7 Solving the Poisson Equation . . . . . . . . . . . . . . 99973.8 The Wave Equation: Vibrating Elastic Membrane . . . 100173.9 Fluid Mechanics . . . . . . . . . . . . . . . . . . . . . . 100173.10 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . 100773.11 Gravitation . . . . . . . . . . . . . . . . . . . . . . . . 101173.12 The Eigenvalue Problem for the Laplacian . . . . . . . 101573.13 Quantum Mechanics . . . . . . . . . . . . . . . . . . . 1017

74 Chemical Reactions* 102374.1 Constant Temperature . . . . . . . . . . . . . . . . . . 102374.2 Variable Temperature . . . . . . . . . . . . . . . . . . 102674.3 Space Dependence . . . . . . . . . . . . . . . . . . . . 1026

75 Calculus Tool Bag II 102975.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 102975.2 Lipschitz Continuity . . . . . . . . . . . . . . . . . . . 102975.3 Differentiability . . . . . . . . . . . . . . . . . . . . . . 102975.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . 103075.5 Mean Value Theorem for f : R

n → R . . . . . . . . . . 103075.6 A Minimum Point Is a Stationary Point . . . . . . . . 103075.7 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . 103075.8 Contraction Mapping Theorem . . . . . . . . . . . . . 103175.9 Inverse Function Theorem . . . . . . . . . . . . . . . . 103175.10 Implicit Function Theorem . . . . . . . . . . . . . . . . 103175.11 Newton’s Method . . . . . . . . . . . . . . . . . . . . . 103175.12 Differential Operators . . . . . . . . . . . . . . . . . . 103175.13 Curve Integrals . . . . . . . . . . . . . . . . . . . . . . 103275.14 Multiple Integrals . . . . . . . . . . . . . . . . . . . . . 103375.15 Surface Integrals . . . . . . . . . . . . . . . . . . . . . 103375.16 Green’s and Gauss’ Formulas . . . . . . . . . . . . . . 103475.17 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . 1034

XL Contents Volume 3

76 Piecewise Linear Polynomials in R2 and R

3 103576.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 103576.2 Triangulation of a Domain in R

2 . . . . . . . . . . . . 103676.3 Mesh Generation in R

3 . . . . . . . . . . . . . . . . . . 103976.4 Piecewise Linear Functions . . . . . . . . . . . . . . . 104076.5 Max-Norm Error Estimates . . . . . . . . . . . . . . . 104276.6 Sobolev and his Spaces . . . . . . . . . . . . . . . . . . 104576.7 Quadrature in R

2 . . . . . . . . . . . . . . . . . . . . . 1046

77 FEM for Boundary Value Problems in R2 and R

3 104977.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 104977.2 Richard Courant: Inventor of FEM . . . . . . . . . . . 105077.3 Variational Formulation . . . . . . . . . . . . . . . . . 105177.4 The cG(1) FEM . . . . . . . . . . . . . . . . . . . . . . 105177.5 Basic Data Structures . . . . . . . . . . . . . . . . . . 105777.6 Solving the Discrete System . . . . . . . . . . . . . . . 105877.7 An Equivalent Minimization Problem . . . . . . . . . . 105977.8 An Energy Norm a Priori Error Estimate . . . . . . . 106077.9 An Energy Norm a Posteriori Error Estimate . . . . . 106177.10 Adaptive Error Control . . . . . . . . . . . . . . . . . 106377.11 An Example . . . . . . . . . . . . . . . . . . . . . . . . 106577.12 Non-Homogeneous Dirichlet Boundary Conditions . . . 106677.13 An L-shaped Membrane . . . . . . . . . . . . . . . . . 106677.14 Robin and Neumann Boundary Conditions . . . . . . . 106877.15 Stationary Convection-Diffusion-Reaction . . . . . . . 107077.16 Time-Dependent Convection-Diffusion-Reaction . . . . 107177.17 The Wave Equation . . . . . . . . . . . . . . . . . . . 107277.18 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 1072

78 Inverse Problems 107778.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 107778.2 An Inverse Problem for One-Dimensional Convection . 107978.3 An Inverse Problem for One-Dimensional Diffusion . . 108178.4 An Inverse Problem for Poisson’s Equation . . . . . . 108378.5 An Inverse Problem for Laplace’s Equation . . . . . . 108678.6 The Backward Heat Equation . . . . . . . . . . . . . . 1087

79 Optimal Control 109179.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 109179.2 The Connection Between dJ

dp and ∂L∂p . . . . . . . . . . 1093

80 Differential Equations Tool Bag 109580.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 109580.2 The Equation u′(x) = λ(x)u(x) . . . . . . . . . . . . . 109680.3 The Equation u′(x) = λ(x)u(x) + f(x) . . . . . . . . . 1096

Contents Volume 3 XLI

80.4 The Differential Equation∑n

k=0 akDku(x) = 0 . . . . 1096

80.5 The Damped Linear Oscillator . . . . . . . . . . . . . 109780.6 The Matrix Exponential . . . . . . . . . . . . . . . . . 109780.7 Fundamental Solutions of the Laplacian . . . . . . . . 109880.8 The Wave Equation in 1d . . . . . . . . . . . . . . . . 109880.9 Numerical Methods for IVPs . . . . . . . . . . . . . . 109880.10 cg(1) for Convection-Diffusion-Reaction . . . . . . . . 109980.11 Svensson’s Formula for Laplace’s Equation . . . . . . . 109980.12 Optimal Control . . . . . . . . . . . . . . . . . . . . . 1099

81 Applications Tool Bag 110181.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 110181.2 Malthus’ Population Model . . . . . . . . . . . . . . . 110181.3 The Logistics Equation . . . . . . . . . . . . . . . . . . 110181.4 Mass-Spring-Dashpot System . . . . . . . . . . . . . . 110181.5 LCR-Circuit . . . . . . . . . . . . . . . . . . . . . . . . 110281.6 Laplace’s Equation for Gravitation . . . . . . . . . . . 110281.7 The Heat Equation . . . . . . . . . . . . . . . . . . . . 110281.8 The Wave Equation . . . . . . . . . . . . . . . . . . . 110281.9 Convection-Diffusion-Reaction . . . . . . . . . . . . . . 110281.10 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . 110381.11 The Incompressible Navier-Stokes Equations . . . . . . 110381.12 Schrodinger’s Equation . . . . . . . . . . . . . . . . . . 1103

82 Analytic Functions 110582.1 The Definition of an Analytic Function . . . . . . . . . 110582.2 The Derivative as a Limit of Difference Quotients . . . 110782.3 Linear Functions Are Analytic . . . . . . . . . . . . . . 110782.4 The Function f(z) = z2 Is Analytic . . . . . . . . . . . 110782.5 The Function f(z) = zn Is Analytic for n = 1, 2, . . . . . 110882.6 Rules of Differentiation . . . . . . . . . . . . . . . . . . 110882.7 The Function f(z) = z−n . . . . . . . . . . . . . . . . 110882.8 The Cauchy-Riemann Equations . . . . . . . . . . . . 110882.9 The Cauchy-Riemann Equations and the Derivative . . 111082.10 The Cauchy-Riemann Equations in Polar Coordinates 111182.11 The Real and Imaginary Parts of an Analytic Function 111182.12 Conjugate Harmonic Functions . . . . . . . . . . . . . 111182.13 The Derivative of an Analytic Function Is Analytic . . 111282.14 Curves in the Complex Plane . . . . . . . . . . . . . . 111282.15 Conformal Mappings . . . . . . . . . . . . . . . . . . . 111482.16 Translation-rotation-expansion/contraction . . . . . . 111582.17 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . 111582.18 Mobius Transformations . . . . . . . . . . . . . . . . . 111682.19 w = z1/2, w = ez, w = log(z) and w = sin(z) . . . . . . 111782.20 Complex Integrals: First Shot . . . . . . . . . . . . . . 1119

XLII Contents Volume 3

82.21 Complex Integrals: General Case . . . . . . . . . . . . 112082.22 Basic Properties of the Complex Integral . . . . . . . . 112182.23 Taylor’s Formula: First Shot . . . . . . . . . . . . . . . 112182.24 Cauchy’s Theorem . . . . . . . . . . . . . . . . . . . . 112282.25 Cauchy’s Representation Formula . . . . . . . . . . . . 112382.26 Taylor’s Formula: Second Shot . . . . . . . . . . . . . 112582.27 Power Series Representation of Analytic Functions . . 112682.28 Laurent Series . . . . . . . . . . . . . . . . . . . . . . . 112882.29 Residue Calculus: Simple Poles . . . . . . . . . . . . . 112982.30 Residue Calculus: Poles of Any Order . . . . . . . . . 113182.31 The Residue Theorem . . . . . . . . . . . . . . . . . . 113182.32 Computation of

∫ 2π

0 R(sin(t), cos(t)) dt . . . . . . . . . 113282.33 Computation of

∫∞−∞

p(x)q(x) dx . . . . . . . . . . . . . . . 1133

82.34 Applications to Potential Theory in R2 . . . . . . . . . 1134

83 Fourier Series 114183.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 114183.2 Warm Up I: Orthonormal Basis in C

n . . . . . . . . . 114483.3 Warm Up II: Series . . . . . . . . . . . . . . . . . . . . 114483.4 Complex Fourier Series . . . . . . . . . . . . . . . . . . 114583.5 Fourier Series as an Orthonormal Basis Expansion . . 114683.6 Truncated Fourier Series and Best L2-Approximation . 114783.7 Real Fourier Series . . . . . . . . . . . . . . . . . . . . 114783.8 Basic Properties of Fourier Coefficients . . . . . . . . . 115083.9 The Inversion Formula . . . . . . . . . . . . . . . . . . 115583.10 Parseval’s and Plancherel’s Formulas . . . . . . . . . . 115783.11 Space Versus Frequency Analysis . . . . . . . . . . . . 115883.12 Different Periods . . . . . . . . . . . . . . . . . . . . . 115983.13 Weierstrass Functions . . . . . . . . . . . . . . . . . . 115983.14 Solving the Heat Equation Using Fourier Series . . . . 116083.15 Computing Fourier Coefficients with Quadrature . . . 116283.16 The Discrete Fourier Transform . . . . . . . . . . . . . 1162

84 Fourier Transforms 116584.1 Basic Properties of the Fourier Transform . . . . . . . 116784.2 The Fourier Transform f(ξ) Tends to 0 as |ξ| → ∞ . . 116984.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . 116984.4 The Inversion Formula . . . . . . . . . . . . . . . . . . 116984.5 Parseval’s Formula . . . . . . . . . . . . . . . . . . . . 117184.6 Solving the Heat Equation Using the Fourier Transform 117184.7 Fourier Series and Fourier Transforms . . . . . . . . . 117284.8 The Sampling Theorem . . . . . . . . . . . . . . . . . 117384.9 The Laplace Transform . . . . . . . . . . . . . . . . . . 117484.10 Wavelets and the Haar Basis . . . . . . . . . . . . . . 1175

Contents Volume 3 XLIII

85 Analytic Functions Tool Bag 117985.1 Differentiability and analyticity . . . . . . . . . . . . . 117985.2 The Cauchy-Riemann Equations . . . . . . . . . . . . 117985.3 The Real and Imaginary Parts of an Analytic Function 118085.4 Conjugate Harmonic Functions . . . . . . . . . . . . . 118085.5 Curves in the Complex Plane . . . . . . . . . . . . . . 118085.6 An Analytic Function Defines a Conformal Mapping . 118185.7 Complex Integrals . . . . . . . . . . . . . . . . . . . . 118185.8 Cauchy’s Theorem . . . . . . . . . . . . . . . . . . . . 118185.9 Cauchy’s Representation Formula . . . . . . . . . . . . 118185.10 Taylor’s Formula . . . . . . . . . . . . . . . . . . . . . 118285.11 The Residue Theorem . . . . . . . . . . . . . . . . . . 1182

86 Fourier Analysis Tool Bag 118386.1 Properties of Fourier Coefficients . . . . . . . . . . . . 118386.2 Convolution . . . . . . . . . . . . . . . . . . . . . . . . 118386.3 Fourier Series Representation . . . . . . . . . . . . . . 118486.4 Parseval’s Formula . . . . . . . . . . . . . . . . . . . . 118486.5 Discrete Fourier Transforms . . . . . . . . . . . . . . . 118486.6 Fourier Transforms . . . . . . . . . . . . . . . . . . . . 118486.7 Properties of Fourier Transforms . . . . . . . . . . . . 118586.8 The Sampling Theorem . . . . . . . . . . . . . . . . . 1185

87 Incompressible Navier-Stokes: Quick and Easy 118787.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 118787.2 The Incompressible Navier-Stokes Equations . . . . . . 118887.3 The Basic Energy Estimate for Navier-Stokes . . . . . 118987.4 Lions and his School . . . . . . . . . . . . . . . . . . . 119087.5 Turbulence: Lipschitz with Exponent 1/3? . . . . . . . 119187.6 Existence and Uniqueness of Solutions . . . . . . . . . 119287.7 Numerical Methods . . . . . . . . . . . . . . . . . . . . 119287.8 The Stabilized cG(1)dG(0) Method . . . . . . . . . . . 119387.9 The cG(1)cG(1) Method . . . . . . . . . . . . . . . . . 119487.10 The cG(1)dG(1) Method . . . . . . . . . . . . . . . . . 119587.11 Neumann Boundary Conditions . . . . . . . . . . . . . 119587.12 Computational Examples . . . . . . . . . . . . . . . . 1197

References 1203

Index 1205

Volume 1

Derivativesand

Geometry in R3

|u(xj) − u(xj−1)| ≤ Lu|xj − xj−1|

u(xj) − u(xj−1) ≈ u′(xj−1)(xj − xj−1)

a · b = a1b1 + a2b2 + a3b3

1What is Mathematics?

The question of the ultimate foundations and the ultimate meaningof mathematics remains open; we do not know in what direction itwill find its final solution or whether a final objective answer may beexpected at all. “Mathematizing” may well be a creative activity ofman, like language or music, of primary originality, whose historicaldecisions defy complete objective rationalization. (Weyl)

1.1 Introduction

We start out by giving a very brief idea of the nature of mathematics andthe role of mathematics in our society.

1.2 The Modern World: Automatized Productionand Computation

The mass consumption of the industrial society is made possible by the au-tomatized mass production of material goods such as food, clothes, housing,TV-sets, CD-players and cars. If these items had to be produced by hand,they would be the privileges of only a select few.

Analogously, the emerging information society is based on mass con-sumption of automatized computation by computers that is creating a new“virtual reality” and is revolutionizing technology, communication, admin-

4 1. What is Mathematics?

Fig. 1.1. First picture of book printing technique (from Danse Macabre, Lyon1499)

istration, economy, medicine, and the entertainment industry. The infor-mation society offers immaterial goods in the form of knowledge, infor-mation, fiction, movies, music, games and means of communication. Themodern PC or lap-top is a powerful computing device for mass produc-tion/consumption of information e.g. in the form of words, images, moviesand music.

Key steps in the automatization or mechanization of production were:Gutenbergs’s book printing technique (Germany, 1450), Christoffer Pol-hem’s automatic machine for clock gears (Sweden, 1700), The SpinnningJenny (England, 1764), Jacquard’s punched card controlled weaving loom(France, 1801), Ford’s production line (USA, 1913), see Fig. 1.1, Fig. 1.2,and Fig. 1.3.

Key steps in the automatization of computation were: Abacus (AncientGreece, Roman Empire), Slide Rule (England, 1620), Pascals MechanicalCalculator (France, 1650), Babbage’s Difference Machine (England, 1830),Scheutz’ Difference Machine (Sweden, 1850), ENIAC Electronic Numer-ical Integrator and Computer (USA, 1945), and the Personal ComputerPC (USA, 1980), see Fig. 1.5, Fig. 1.6, Fig. 1.7 and Fig. 1.8. The Dif-ference Machines could solve simple differential equations and were usedto compute tables of elementary functions such as the logarithm. ENIACwas one of the first modern computers (electronic and programmable),consisted of 18.000 vacuum tubes filling a room of 50 × 100 square feetwith a weight of 30 tons and energy consuming of 200 kilowatts, andwas used to solve the differential equations of ballistic firing tables asan important part of the Allied World War II effort. A modern laptopat a cost of $2000 with a processor speed of 2 GHz and internal mem-

1.2 The Modern World 5

Fig. 1.2. Christoffer Polhem’s machine for clock gears (1700), Spinning Jenny(1764) and Jaquard’s programmable loom (1801)


Fig. 1.3. Ford assembly line (1913)

ory of 512 Mb has the computational power of hundreds of thousands ofENIACs.

Automatization (or automation) is based on frequent repetition of a cer-tain algorithm or scheme with new data at each repetition. The algorithmmay consist of a sequence of relatively simple steps together creating a morecomplicated process. In automatized manufacturing, as in the productionline of a car factory, physical material is modified following a strict repeti-tive scheme, and in automatized computation, the 1s and 0s of the micro-processor are modified billions of times each second following the computerprogram. Similarly, a genetic code of an organism may be seen as an al-gorithm that generates a living organism when realized in interplay withthe environment. Realizing a genetic code many times (with small varia-tions) generates populations of organisms. Mass-production is the key toincreased complexity following the patterns of nature: elementary particle→ atom → molecule and molecule → cell → organism → population, orthe patterns of our society: individual → group → society or computer →computer network → global net.

1.3 The Role of Mathematics

Mathematics may be viewed as the language of computation and thus liesat the heart of the modern information society. Mathematics is also the lan-guage of science and thus lies at the heart of the industrial society that grewout of the scientific revolution in the 17th century that began when Leibnizand Newton created Calculus. Using Calculus, basic laws of mechanics andphysics, such as Newton’s law, could be formulated as mathematical mod-

1.3 The Role of Mathematics 7

Fig. 1.4. Computing device of the Inca Culture

els in the form of differential equations. Using the models, real phenomenacould be simulated and controlled (more or less) and industrial processescould be created.

The mass consumption of both material and immaterial goods, consid-ered to be a corner-stone of our modern democratic society, is made possiblethrough automatization of production and computation. Therefore, math-ematics forms a fundamental part of the technical basis of the modernsociety revolving around automatized production of material goods andautomatized computation of information.

The vision of virtual reality based on automatized computation was for-mulated by Leibniz already in the 17th century and was developed furtherby Babbage with his Analytical Engine in the 1830s. This vision is finallybeing realized in the modern computer age in a synthesis of Body & Soulof Mathematics.

We now give some examples of the use of mathematics today that areconnected to different forms of automatized computation.


Fig. 1.5. Classical computational tools: Abacus (300 B.C.-), Galileo’s Compass(1597) and Slide Rule (1620-)

1.3 The Role of Mathematics 9

Fig. 1.6. Napier’s Bones (1617), Pascals Calculator (1630), Babbage’s DifferenceMachine (1830) and Scheutz’ Swedish Difference Machine (1850)


Fig. 1.7. Odhner’s mechanical calculator made in Goteborg, Sweden, 1919–1950

Fig. 1.8. ENIAC Electronic Numerical Integrator and Calculator (1945)

1.4 Design and Production of Cars 11

1.4 Design and Production of Cars

In the car industry, a model of a component or complete car can be madeusing Computer Aided Design CAD. The CAD-model describes the ge-ometry of the car through mathematical expressions and the model canbe displayed on the computer screen. The performance of the componentcan then be tested in computer simulations, where differential equationsare solved through massive computation, and the CAD-model is used asinput of geometrical data. Further, the CAD data can be used in automa-tized production. The new technique is revolutionizing the whole industrialprocess from design to production.

1.5 Navigation: From Stars to GPS

A primary force behind the development of geometry and mathematicssince the Babylonians has been the need to navigate using information fromthe positions of the planets, stars, the Moon and the Sun. With a clock anda sextant and mathematical tables, the sea-farer of the 18th century coulddetermine his position more or less accurately. But the results dependedstrongly on the precision of clocks and observations and it was easy forlarge errors to creep in. Historically, navigation has not been an easy job.

During the last decade, the classical methods of navigation have beenreplaced by GPS, the Global Positioning System. With a GPS navigatorin hand, which we can buy for a couple of hundred dollars, we get ourcoordinates (latitude and longitude) with a precision of 50 meters at thepress of a button. GPS is based on a simple mathematical principle knownalready to the Greeks: if we know our distance to three point is space withknown coordinates then we can compute our position. The GPS uses thisprinciple by measuring its distance to three satellites with known positions,and then computes its own coordinates. To use this technique, we need todeploy satellites, keep track of them in space and time, and measure rele-vant distances, which became possible only in the last decades. Of course,computers are used to keep track of the satellites, and the microprocessorof a hand-held GPS measures distances and computes the current coordi-nates.

The GPS has opened the door to mass consumption in navigation, whichwas before the privilege of only a few.

1.6 Medical Tomography

The computer tomograph creates a pictures of the inside of a human bodyby solving a certain integral equation by massive computation, with data


Fig. 1.9. GPS-system with 4 satellites

coming from measuring the attenuation of very weak X-rays sent throughthe body from different directions. This technique offers mass consump-tion of medical imaging, which is radically changing medical research andpractice.

1.7 Molecular Dynamics and Medical Drug Design

The classic way in which new drugs are discovered is an expensive and time-consuming process. First, a physical search is conducted for new organicchemical compounds, for example among the rain forests in South America.Once a new organic molecule is discovered, drug and chemical companieslicense the molecule for use in a broad laboratory investigation to see if thecompound is useful. This search is conducted by expert organic chemistswho build up a vast experience with how compounds can interact and whichkind of interactions are likely to prove useful for the purpose of controllinga disease or fixing a physical condition. Such experience is needed to reducethe number of laboratory trials that are conducted, otherwise the vast rangeof possibilities is overwhelming.

The use of computers in the search for new drugs is rapidly increasing.One use is to makeup new compounds so as to reduce the need to makeexpensive searches in exotic locations like southern rain forests. As part ofthis search, the computer can also help classify possible configurations of

1.8 Weather Prediction and Global Warming 13

Fig. 1.10. Medical tomograph

molecules and provide likely ranges of interactions, thus greatly reducingthe amount of laboratory testing time that is needed.

1.8 Weather Prediction and Global Warming

Weather predictions are based on solving differential equations that de-scribe the evolution of the atmosphere using a super computer. Reasonablyreliable predictions of daily weather are routinely done for periods of a fewdays. For longer periods. the reliability of the simulation decreases rapidly,and with present day computers daily weather predictions for a period oftwo weeks are impossible.

However, forecasts over months of averages of temperature and rainfallare possible with present day computer power and are routinely performed.

Long-time simulations over periods of 20–50 years of yearly temperature-averages are done today to predict a possible global warming due to the useof fossil energy. The reliability of these simulations are debated.

1.9 Economy: Stocks and Options

The Black-Scholes model for pricing options has created a new market ofso called derivative trading as a complement to the stock market. To cor-rectly price options is a mathematically complicated and computationallyintensive task, and a stock broker with first class software for this purpose(which responds in a few seconds), has a clear trading advantage.


Fig. 1.11. The Valium molecule

1.10 Languages

Mathematics is a language. There are many different languages. Our mothertongue, whatever it happens to be, English, Swedish, Greek, et cetera, isour most important language, which a child masters quite well at the ageof three. To learn to write in our native language takes longer time andmore effort and occupies a large part of the early school years. To learnto speak and write a foreign language is an important part of secondaryeducation.

Language is used for communication with other people for purposes ofcooperation, exchange of ideas or control. Communication is becoming in-creasingly important in our society as the modern means of communicationdevelop.

Using a language we may create models of phenomena of interest, and byusing models, phenomena may be studied for purposes of understanding orprediction. Models may be used for analysis focussed on a close examinationof individual parts of the model and for synthesis aimed at understandingthe interplay of the parts that is understanding the model as a whole.A novel is like a model of the real world expressed in a written languagelike English. In a novel the characters of people in the novel may be analyzedand the interaction between people may be displayed and studied.

1.11 Mathematics as the Language of Science 15

The ants in a group of ants or bees in a bees hive also have a languagefor communication. In fact in modern biology, the interaction between cellsor proteins in a cell is often described in terms of entities ”talking to eachother”.

It appears that we as human beings use our language when we think. Wethen seem to use the language as a model in our head, where we try variouspossibilities in simulations of the real world: “If that happens, then I’ll dothis, and if instead that happens, then I will do so and so. . .”. Planning ourday and setting up our calender is also some type of modeling or simulationof events to come. Simulations by using our language thus seems to go onin our heads all the time.

There are also other languages like the language of musical notationwith its notes, bars, scores, et cetera. A musical score is like a model ofthe real music. For a trained composer, the model of the written scorecan be very close to the real music. For amateurs, the musical score maysay very little, because the score is like a foreign language which is notunderstood.

1.11 Mathematics as the Language of Science

Mathematics has been described as the language of science and technologyincluding mechanics, astronomy, physics, chemistry, and topics like fluidand solid mechanics, electromagnetics et cetera. The language of mathe-matics is used to deal with geometrical concepts like position and form andmechanical concepts like velocity, force and field. More generally, mathe-matics serves as a language in any area that includes quantitative aspectsdescribed in terms of numbers, such as economy, accounting, statistics etcetera. Mathematics serves as the basis for the modern means of electroniccommunication where information is coded as sequences of 0’s and 1’s andis transferred, manipulated or stored.

The words of the language of mathematics often are taken from our usuallanguage, like points, lines, circles, velocity, functions, relations, transfor-mations, sequences, equality, inequality et cetera.

A mathematical word, term or concept is supposed to have a specificmeaning defined using other words and concepts that are already defined.This is the same principle as is used in a Thesaurus, where relatively compli-cated words are described in terms of simpler words. To start the definitionprocess, certain fundamental concepts or words are used, which cannot bedefined in terms of already defined concepts. Basic relations between thefundamental concepts may be described in certain axioms. Fundamentalconcepts of Euclidean geometry are point and line, and a basic Euclideanaxiom states that through each pair of distinct points there is a uniqueline passing. A theorem is a statement derived from the axioms or other


theorems by using logical reasoning following certain rules of logic. Thederivation is called a proof of the theorem.

1.12 The Basic Areas of Mathematics

The basic areas of mathematics are

Geometry

Algebra

Analysis.

Geometry concerns objects like lines, triangles, circles. Algebra and Anal-ysis is based on numbers and functions. The basic areas of mathematicseducation in engineering or science education are

Calculus

Linear Algebra.

Calculus is a branch of analysis and concerns properties of functions such ascontinuity, and operations on functions such as differentiation and integra-tion. Calculus connects to Linear Algebra in the study of linear functionsor linear transformations and to analytical geometry, which describes ge-ometry in terms of numbers. The basic concepts of Calculus are

function

derivative

integral.

Linear Algebra combines Geometry and Algebra and connects to AnalyticalGeometry. The basic concepts of Linear Algebra are

vector

vector space

projection, orthogonality

linear transformation.

This book teaches the basics of Calculus and Linear Algebra, which arethe areas of mathematics underlying most applications.

1.13 What Is Science? 17

1.13 What Is Science?

The theoretical kernel of natural science may be viewed as having twocomponents

formulating equations (modeling),

solving equations (computation).

Together, these form the essence of mathematical modeling and computa-tional mathematical modeling. The first really great triumph of science andmathematical modeling is Newton’s model of our planetary system as a setof differential equations expressing Newton’s law connecting force, throughthe inverse square law, and acceleration. An algorithm may be seen asa strategy or constructive method to solve a given equation via computa-tion. By applying the algorithm and computing, it is possible to simulatereal phenomena and make predictions.

Traditional techniques of computing were based on symbolic or numer-ical computation with pen and paper, tables, slide ruler and mechanicalcalculator. Automatized computation with computers is now opening newpossibilities of simulation of real phenomena according to Natures ownprinciple of massive repetition of simple operations, and the areas of appli-cations are quickly growing in science, technology, medicine and economics.

Mathematics is basic for both steps (i) formulating and (ii) solving equa-tion. Mathematics is used as a language to formulate equations and as a setof tools to solve equations.

Fame in science can be reached by formulating or solving equations. Thesuccess is usually manifested by connecting the name of the inventor to theequation or solution method. Examples are legio: Newton’s method, Euler’sequations, Lagrange’s equations, Poisson’s equation, Laplace’s equation,Navier’s equation, Navier-Stokes’ equations, Boussinesq’s equation, Ein-stein’s equation, Schrodinger’s equation, Black-Scholes formula . . . , mostof which we will meet below.

1.14 What Is Conscience?

The activity of the brain is believed to consist of electrical/chemical sig-nals/waves connecting billions of synapses in some kind of large scale com-putation. The question of the nature of the conscience of human beings hasplayed a central role in the development of human culture since the earlyGreek civilization, and today computer scientists seek to capture its evasivenature in various forms of Artificial Intelligence AI. The idea of a divisionof the activity of the brain into a (small) conscious “rational” part anda (large) unconscious “irrational” part, is widely accepted since the days ofFreud. The rational part has the role of “analysis” and “control” towards


some “purpose” and thus has features of Soul, while the bulk of the “com-putation” is Body in the sense that it is “just” electrical/chemical waves.We meet the same aspects in numerical optimization, with the optimizationalgorithm itself playing the role of Soul directing the computational efforttowards the goal, and the underlying computation is Body.

We have been brought up with the idea that the conscious is in controlof the mental “computation”, but we know that this is often not the case.In fact, we seem to have developed strong skills in various kinds of after-rationalization: whatever happens, unless it is an “accident” or something“unexpected”, we see it as resulting from a rational plan of ours made upin advance, thus turning a posteriori observations into a priori predictions.

1.15 How to Come to Grips with the Difficultiesof Understanding the Material of this Bookand Eventually Viewing it as a Good Friend

We conclude this introductory chapter with some suggestions intended tohelp the reader through the most demanding first reading of the book andreach a state of mind viewing the book as a good helpful friend, rather thanthe opposite. From our experience of teaching the material of this book,we know that it may evoke quite a bit of frustration and negative feelings,which is not very productive.

Mathematics Is Difficult: Choose Your Own Level of Ambition

First, we have to admit that mathematics is a difficult subject, and we seeno way around this fact.Secondly, one should realize that it is perfectlypossible to live a happy life with a career in both academics and industrywith only elementary knowledge of mathematics. There are many examplesincluding Nobel Prize Winners. This means that it is advisable to set a levelof ambition in mathematics studies which is realistic and fits the interestprofile of the individual student. Many students of engineering have otherprime interests than mathematics, but there are also students who reallylike mathematics and theoretical engineering subjects using mathematics.The span of mathematical interest thus may be expected to be quite widein a group of students following a course based on this book, and it seemsreasonable that this would be reflected in the choice of level of ambition.

Advanced Material: Keep an Open Mind and Be Confident

The book contains quite a bit of material which is “advanced” and notusually met in undergraduate mathematics, and which one may bypass andstill be completely happy. It is probably better to be really familiar with

1.15 How to View this Book as a Friend 19

and understand a smaller set of mathematical tools and have the ability tomeet new challenges with some self-confidence, than repeatedly failing todigest too large portions. Mathematics is so rich, that even a life of fully-time study can only cover a very small part. The most important abilitymust be to meet new material with an open mind and some confidence!

Some Parts of Mathematics Are Easy

On the other hand, there are many aspects of mathematics which are not sodifficult, or even “simple”, once they have been properly understood. Thus,the book contains both difficult and simple material, and the first impres-sion from the student may give overwhelming weight to the former. Tohelp out we have collected the most essential nontrivial facts in short sum-maries in the form of Calculus Tool Bag I and II, Linear Algebra Tool Bag,Differential Equations Tool Bag, Applications Tool Bag, Fourier AnalysisTool Bag and Analytic Functions Tool Bag. The reader will find the toolbags surprisingly short: just a couple pages, altogether say 15–20 pages. Ifproperly understood, this material carries a long way and is “all” one needsto remember from the math studies for further studies and professional ac-tivities in other areas. Since the book contains about 1200 pages it means50–100 pages of book text for each one page of summary. This means thatthe book gives more than the absolute minimum of information and hasthe ambition to give the mathematical concepts a perspective concerningboth history and applicability today. So we hope the student does not getturned off by the quite a massive number of words, by remembering thatafter all 15–20 pages captures the essential facts. During a period of studyof say one year and a half of math studies, this effectively means about onethird of a page each week!

Increased/Decreased Importance of Mathematics

The book reflects both the increased importance of mathematics in theinformation society of today, and the decreased importance of much of theanalytical mathematics filling the traditional curriculum. The student thusshould be happy to know that many of the traditional formulas are nolonger such a must, and that a proper understanding of relatively few basicmathematical facts can help a lot in coping with modern life and science.

Which Chapters Can I Skip in a First Reading?

We indicate by * certain chapters directed to applications, which one mayby-pass in a first reading without loosing the main thread of the presenta-tion, and return to at a later stage if desired.


Chapter 1 Problems

1.1. Find out which Nobel Prize Winners got the prize for formulating or solvingequations.

1.2. Reflect about the nature of “thinking” and “computing”.

1.3. Find out more about the topics mentioned in the text.

1.4. (a) Do you like mathematics or hate mathematics, or something in between?Explain your standpoint. (b) Specify what you would like to get out of yourstudies of mathematics.

1.5. Present some basic aspects of science.

Fig. 1.12. Left person: “Isn’t it remarkable that one can compute the distanceto stars like Cassiopeja, Aldebaran and Sirius?”. Right person: “I find it evenmore remarkable that one may know their names!” (Assar by Ulf Lundquist)

2The Mathematics Laboratory

It is nothing short of a miracle that modern methods of instructionhave not yet entirely strangled the holy curiosity of inquiry.(Einstein)

2.1 Introduction

This book is complemented by various pieces of software collected into theMathematics Laboratory, freely available on the book web site. The Mathe-matics Laboratory contains different types of software organized under thefollowing headings: Math Experience, Tools, Applications and Students Lab.

Math Experience serves the purpose of illustrating mathematical con-cepts from analysis and linear algebra such as roots of equations, Lipschitzcontinuity, fixed point iteration, differentiability, the definition of the inte-gral, basis calculus for functions of several variables.

Tools contains (i) ready-mades such as different solvers for differentialequations and (ii) shells aimed at helping the student to make his owntools such as solvers of systems of equations and differential equations.

Applications contains more developed software like DOLFIN DynamicObject Oriented Library for FInite Elements, and Tanganyika for multi-adaptive solution of systems of ordinary differential equations.

Students Lab contains constributions from students project work and willhopefully serve as a source of inspiration transferring know how from oldto new students.

22 2. The Mathematics Laboratory

2.2 Math Experience

Math Experience is a collection of Matlab GUI software designed to of-fer a deeper understanding of important mathematical concepts and ideassuch as, for example, convergence, continuity, linearization, differentiation,Taylor polynomials, integration, etc. The idea is to provide on-screen com-puter “labs” in which the student, by himself guided by a number of welldesigned questions, can seek to fully understand (a) the concepts and ideasas such and (b) the mathematical formulas and equations describing theconcepts, by interacting with the lab environment in different ways. Forexample, in the Taylor lab (see Fig. 2.1) it is possible to give a function, orpick one from a gallery, and study its Taylor polynomial approximation ofdifferent degrees, how it depends on the point of focus by mouse-draggingthe point, how it depends on the distance to the point by zooming in andout etc. There is also a movie where the terms in the Taylor polynomial areadded one at a time. In the MultiD Calculus lab (see Fig. 2.2) it is possibleto define a function u(x1, x2) and compute its integral over a given curve ora given domain, to view its gradient field, contour plots, tangent planes etc.One may also study vector fields (u, v), view their divergence and rotation,compute the integrals of these quantities to verify the fundamental theo-rems of vector calculus, view the (u, v) mapped domain and the Jacobianof the map etc, etc.

The following labs are available from the book web page:

Func lab – about relations and functions, inverse function etc.

Graph Gallery – elementary functions and their parameter depen-dence.

Cauchy lab – about sequences & convergence

Lipschitz lab – the concept of continuity

Root lab – about bisection and fixed point iteration

Linearization and the derivative

Newtons lab – illustrating Newton’s method

Taylor lab – polynomials

Opti lab – elementary optimization

Piecewise polynomial lab – about piecewise polynomial approxima-tion

Integration lab – Euler and Riemann summation, adaptive integra-tion

2.2 Math Experience 23

Fig. 2.1. The Taylor lab

Dynamical system lab

Pendulum lab – the effect of linearization and approximation

Vector algebra – a graphical vector calculator

Analytic geometry – coupling geometry and linear algebra

MultiD Calculus – integration and vector Calculus

FE-lab – illustrating the finite element method

Adaptive FEM – illustrating adaptive finite element techniques

Poisson lab – fundamental solutions etc

Fourier lab

Wavelet lab

Optimal control lab – control problems related to differential equations

Archimedes lab – for experiments related to Archimedes principle

24 2. The Mathematics Laboratory

Fig. 2.2. The MultiD Calculus lab

3Introduction to Modeling

The best material model of a cat is another, or preferably the same,cat. (Rosenblueth/Wiener in Philosophy of Science 1945)

3.1 Introduction

We start by giving two basic examples of the use of mathematics for de-scribing practical situations. The first example is a problem in householdeconomy and the second is a problem in surveying, both of which havebeen important fields of application for mathematics since the time of theBabylonians. The models are very simple but illustrate fundamental ideas.

3.2 The Dinner Soup Model

You want to make a soup for dinner together with your roommate, andfollowing a recipe you ask your roommate to go to the grocery store and buy10 dollars worth of potatoes, carrots, and beef according to the proportions3:2:1 by weight. In other words, your roommate has 10 dollars to spend onthe ingredients, which should be bought in the amounts so that by weightthere are three times as much potatoes as beef and two times as muchcarrots as beef. At the grocery store, your roommate finds that potatoesare 1 dollar per pound, carrots are 2 dollars per pound, and beef is 8 dollarsper pound. Your roommate thus faces the problem of figuring out how muchof each ingredient to buy to use up the 10 dollars.

26 3. Introduction to Modeling

One way to solve the problem is by trial and error as follows: Yourroommate could take quantities of the ingredients to the cash register inthe proportions of 3:2:1 and let the clerk check the price, repeating untila total of 10 dollars is reached. Of course, both your roommate and theclerk could probably think of better ways to spend the afternoon. Anotherpossibility would be to make a mathematical model of the situation andthen seek to find the correct amounts to buy by doing some computations.The basic idea would be to use brains and pen and paper or a calculator,instead of labor intensive brute physical work.

The mathematical model may be set up as follows: Recalling that wewant to determine the amounts of ingredients to buy, we notice that it isenough to determine the amount of beef, since we’ll buy twice as muchcarrots as beef and three times as much potatoes as beef. Let’s give a nameto the quantity to determine. Let x denote the amount of meat in poundsto buy. The symbol x here represents an unknown quantity, or unknown,that we are seeking to determine by using available information.

If the amount of meat is x pounds, then the price of the meat to buy is8x dollars by the simple computation

cost of meat in dollars = xpounds × 8dollarspound

= 8xdollars.

Since there should be three times as much potatoes as meat by weight,the amount of potatoes in pounds is 3x and the cost of the potatoes is3x dollars since the price of potatoes is one dollar per pound. Finally, theamount of carrots to buy is 2x and the cost is 2 times 2x = 4x dollars,since the price is 2 dollars per pound. The total cost of meat, potatoes andcarrots is found by summing up the cost of each

8x+ 3x+ 4x = 15x.

Since we assume that we have 10 dollars to spend, we get the relation

15x = 10, (3.1)

which expresses the equality of total cost and available money. This is anequation involving the unknown x and data determined by the physicalsituation. From this equation, your roommate can figure out how muchbeef to buy. This is done by dividing both sides of (3.1) by 15, which givesx = 10/15 = 2/3 ≈ 0.67 pounds of meat. The amount of carrots shouldthen be 2×2/3 = 4/3 ≈ 1.33, and finally the amount of potatoes 3×2/3 = 2pounds.

The mathematical model for this situation is 15x = 10, where x is theamount of meat, 15x is the total cost and 10 is the available money. Themodeling consists in expressing the total cost of the ingredients 15x in termsof the amount of beef x. Note that in this model, we only take into account

3.2 The Dinner Soup Model 27

what is essential for the current purpose of buying potatoes, carrots andmeat for the Dinner Soup, and we did not bother to write down the pricesof other items, like ice cream or beer. Determining the useful information isan important, and sometimes difficult, part of the mathematical modeling.

A nice feature of mathematical models is that they can be reused tosimulate different situations. For example, if you have 15 dollars to spend,then the model 15x = 15 arises with solution x = 1. If you have 25 dollarsto spend, then the model is 15x = 25 with solution x = 25/15 = 5/3. Ingeneral, if the amount of money y is given, then the model is 15x = y. Inthis model we use the two symbols x and y, and assume that the amountof money y is given and the amount of beef x is an unknown quantity to bedetermined from the equation (15x = y) of the model. The roles could shiftaround: you may think of the amount of beef x as being given and the totalcost or expenditure y to be determined (according to the formula y = 15x).In the first case, we would think of the amount of beef x as a function ofthe expenditure y and in the second the expenditure y as a function of x.

Assigning symbols to relevant quantities, known or unknown, is an im-portant step in setting up a mathematical model of something. The idea ofassigning symbols for unknown quantities was used already by the Baby-lonians (who had frequent use of models like the Dinner Soup model inorganizing the feeding of the many people working on their irrigation sys-tems).

Suppose that we could not solve the equation 15x = 10, because of a lackof skill in solving equations (we may have forgotten the trick of dividing by15 that we learned in school). We could then try to get a solution by somekind of trial and error strategy as follows. First we assume that x = 1. Wethen find that the total cost is 15 dollars, which is too much. We then trywith a smaller quantity of meat, say x = 0.6, and compute the total costto 9 dollars, which is too little. We then try with something between 0.6and 1, say x = 0.7 and find that the cost would be 10.5 dollars, which isa little too much. We conclude that the right amount must be somewherebetween 0.6 and 0.7, probably closer 0.7. We can continue in the same wayto find as many decimals of x as we like. For instance we check next inthe same way that x must be some where between 0.66 and 0.67. In thiscase we know the exact answer x = 2

3 = 0.66666 . . . . The trial and errorstrategy just described is a model of the process of bringing food to thecounter and letting the cashier compute the total prize. In the model wecompute the prize ourselves without having to physically collect the itemsand bring them to the counter, which simplifies the trial and error process.


3.3 The Muddy Yard Model

One of the authors owns a house with a 100m×100m backyard that has theunfortunate tendency to form a muddy lake every time it rains. We showa perspective of the field on the left in Fig. 3.1. Because of the grading in

Fig. 3.1. Perspective of a field with poor drainage and a model describing thedimensions

the yard, the owner has had the idea for some time to fix this by digginga shallow ditch down the diagonal of the yard, laying some perforatedplastic drain pipe, then covering the pipe back up. He is then faced withthe problem of determining the amount of pipe that he needs to purchase.Since a survey of the property only provides the outside dimensions andthe locations of corners and physically measuring the diagonal through themud is not easy to do, he has decided to try to compute the distance usingmathematics. Can mathematics help him in this endeavor?

Inspection of the property and a map indicates that the yard can bemodeled as a horizontal square (the grading seems small), and we thusseek to compute the length of the diagonal of the square. We display themodel on the right in Fig. 3.1, where we change to units of 100m, so thefield is 1 × 1, and denote the length of the diagonal by x. We now recallPythagoras’ theorem, which states that x2 = 12 + 12 = 2. To find thelength x of the drain pipe, we are thus led to solve the equation

x2 = 2. (3.2)

Solving the equation x2 = 2 may seem to be deceptively simple at first;the positive solution is just x =

√2 after all. But walking into a store and

asking for√

2 units of pipe may not get a positive response. Precut pipesdo not come in lengths calibrated by

√2, neither do measure sticks indicate√

2, and a clerk is thus going to need some concrete information about thevalue of

√2 to be able to measure out a proper piece of pipe.

We can try to pin down the value of√

2 by using a trial and error strategy.We can check easily that 12 = 1 < 2 while 22 = 4 > 2. So we know that

3.4 A System of Equations 29

√2, whatever it is, is between 1 and 2. Next we can check 1.12 = 1.21,

1.22 = 1.44, 1.32 = 1.69, 1.42 = 1.96, 1.52 = 2.25, 1.62 = 2.56, 1.72 = 2.89,1.82 = 3.24, 1.92 = 3.61. Apparently

√2 is between 1.4 and 1.5. Next we

can try to fix the third decimal. Now we find that 1.412 = 1.9881 while1.422 = 2.0164. So apparently

√2 is between 1.41 and 1.42 and likely

closer to 1.41. It appears that proceeding in this way, we can determine asmany decimal places of

√2 as we like, and we may consider the problem of

computing how much drain pipe to buy to be solved!Below we will meet many equations that have to be solved by using

some variation of a trial and error strategy. In fact, most mathematicalequations cannot be solved exactly by some algebraic manipulations, as wecould do (if we were sufficiently clever) in the case of the Dinner Soup model(3.1). Consequently, the trial and error approach to solving mathematicalequations is fundamentally important in mathematics. We shall also seethat trying to solve equations such as x2 = 2 carries us directly into the veryheart of mathematics, from Pythagoras and Euclid through the quarrels onthe foundations of mathematics that peaked in the 1930s and on into thepresent day of the modern computer.

3.4 A System of Equations: The Dinner Soup/IceCream Model

Suppose you would like to finish off the Dinner Soup with some ice creamdessert at the cost of 3 dollars a pound, still at the total expense of 10dollars. How much of each item should now be bought?

Well, if the amount ice cream is y pounds, the total cost will be 15x+3yand thus we have the equation 15x+3y = 10 expressing that the total costis equal to the available money. We now have two unknowns x and y, andwe need one more equation. So far, we would be able to set x = 0 and solvefor y = 10

3 spending all the money on ice cream. This would go againstsome principle we learned as small kids. The second equation needed couldcome from some idea of balancing the amount of ice cream (junk food) tothe amount of carrots (healthy food), for example according to the formula2x = y + 1, or 2x − y = 1. Altogether, we would thus get the followingsystem of two equations in the two unknowns x and y:

15x+ 3y = 10,2x− y = 1.

Solving for y in the second equation, we get y = 2x−1, which inserted intothe first equation gives

15x+ 6x− 3 = 10, that is 21x = 13, that is, x =1321.


Finally, inserting the value of x = 1321 ≈ 0.60 into the equation 2x− y = 1,

we get y = 521 ≈ 0.24, and we have found the solution of the system of

equations modeling the present situation.

3.5 Formulating and Solving Equations

Let us put the Dinner Soup and Muddy Yard models into the perspectiveof formulating and solving equations presented above. Formulating equa-tions corresponds to collecting information in systematic form, and solvingequations corresponds to drawing conclusions from the collected informa-tion.

We began by describing the physical situations in the Dinner Soup andMuddy Yard models in terms of mathematical equations. This aspect isnot just mathematical but involves also whatever knowledge from physics,economy, history, psychology, etcetera that may be relevant to describe thesituation to be modeled. The equations we obtained in the Dinner Soupand the Muddy Yard models, namely 15x = 10 and x2 = 2, are examplesof algebraic equations in which the data and the unknown x are both num-bers. As we consider more complicated situations, we will often encountermodels in which the data and the unknown quantities are functions. Suchmodels typically contain derivatives and integrals and are then referred toas differential equations or integral equations.

The second aspect is solving the equations of the model to determinethe unknown and gain new information about the situation at hand. In thecase of the Dinner Soup model, we can solve the model equation 15x = 10exactly and express the solution x = 2/3 as a rational number. In theMuddy Yard model we resort to an iterative “trial and error” strategy tocompute as many digits of the decimal expansion of the solution x =

√2

as we may need. So it goes in general: once in a while, we can write downa solution of a model equation explicitly, but most often we have to becontent with an approximate solution, the digits of which we can determinethrough some iterative computational process.

Note that there is no reason to be disappointed over the fact that equa-tions representing mathematical models cannot be solved exactly, since themathematical model is an approximation anyway, in general. It is betterto have a complicated but accurate mathematical model equation, that ad-mits only approximate solutions, than to have a trivial inaccurate modelequation that can be solved exactly! The wonderful thing with computersis that they may compute accurate solutions also to complicated accuratemodel equations, which makes it possible to simulate real phenomena.

Chapter 3 Problems 31

Chapter 3 Problems

3.1. Suppose that the grocery store sells potatoes for 40 cents per pound, carrotsfor 80 cents per pound, and beef for 40 cents per ounce. Determine the modelrelation for the total price.

3.2. Suppose that you change the soup recipe to have equal amounts of carrotsand potatoes while the weight of these combined should be six times the weightof beef. Determine the model relation for the total price.

3.3. Suppose you go all out and add onions to the soup recipe in the proportion of2 : 1 to the amount of beef, while keeping the proportions of the other ingredientsthe same. The price of onions in the store is $1 per pound. Determine the modelrelation for the total price.

3.4. While flying directly over the airport in a holding pattern at an altitude of1 mile, you see your high rise condominium from the window. Knowing that theairport is 4 miles from your condominium and pretending that the condominiumhas height 0, how far are you from home and a cold beer?

3.5. Devise a model of the draining of a yard that has three sides of approximatelythe same length 2 assuming that we drain the yard by laying a pipe from onecorner to the midpoint of the opposite side. What quantity of pipe do we need?

3.6. A father and his child are playing with a teeter-totter which has a seatboard12 feet long. If the father weighs 170 pounds and the child weighs 45 pounds,construct a model for the location of the pivot point on the board in order forthe teeter-totter to be in perfect balance? Hint: recall the principle of the leverwhich says that the products of the distances from the fulcrum to the masses oneach end of a lever must be equal for the lever to be in equilibrium.

4A Very Short Calculus Course

Mathematics has the completely false reputation of yielding infallibleconclusions. Its infallibility is nothing but identity. Two times two isnot four, but it is just two times two, and that is what we call fourfor short. But four is nothing new at all. And thus it goes on in itsconclusions, except that in the height the identity fades out of sight.(Goethe)

4.1 Introduction

Following up on the general idea of science as a combination of formulatingand solving equations, we describe the bare elements of this picture froma mathematical point of view. We want to give a brief glimpse of the mainthemes of Calculus that will be discovered as we work through the volumesof this book. In particular, we will encounter the magical words of function,derivative, and integral. If you have some idea of these concepts already,you will understand some of the outline. If you have no prior acquaintancewith these concepts, you can use this section to just get a first taste ofwhat Calculus is all about without expecting to understand the details atthis point. Keep in mind that this is just a glimpse of the actors behindthe curtain before the play begins!

We hope the reader can use this chapter to get a grip on the essence ofCalculus by reading just a couple of pages. But this is really impossible insome sense because calculus contains so many formulas and details that itis easy to get overwhelmed and discouraged. Thus, we urge the reader to

34 4. A Very Short Calculus Course

browse through the following couple of pages to get a quick idea and thenreturn later and confirm with an “of course”.

On the other hand, the reader may be surprised that something that isseemingly explained so easily in a couple of pages, actually takes severalhundred pages to unwind in this book (and other books). We don’t seemto be able give a good explanation of this “contradiction” indicating that“what looks difficult may be easy” and vice versa. We also present shortsummaries of Calculus in Chapter Calculus Tool Bag I and Calculus ToolBag II, which support the idea that a distilled essence of Calculus indeedcan be given in a couple of pages.

4.2 Algebraic Equations

We will consider algebraic equations of the form: find x such that

f(x) = 0, (4.1)

where f(x) is a function of x. Recall that f(x) is said to be a function ofx if for each number x there is a number y = f(x) assigned. Often, f(x)is given by some algebraic formula: for example f(x) = 15x− 10 as in theDinner Soup model, or f(x) = x2 − 2 as in the Muddy Yard model.

We call x a root of the equation f(x) = 0 if f(x) = 0. The root ofthe equation 15x − 10 = 0 is x = 2

3 . The positive root x of the equationx2 − 2 = 0 is equal to

√2 ≈ 1.41. We will consider different methods to

compute a root x satisfying f(x) = 0, including the trial and error methodbriefly presented above in the context of the Muddy Yard Model.

We will also meet systems of algebraic equations, where we seek to de-termine several unknowns satisfying several equations, as for the DinnerSoup/Ice cream model above.

4.3 Differential Equations

We will also consider the following differential equation: find a function x(t)such that for all t

x′(t) = f(t), (4.2)

where f(t) is a given function, and x′(t) is the derivative of the function x(t).This equation has several new ingredients. First, we seek here a functionx(t) with a set of different values x(t) for different values of the variablet, and not just one single value of x like the root the algebraic equationx2 = 2 considered above. In fact, we met this already in the Dinner Soupproblem in case of a variable amount of money y to spend, leading to theequation 15x = y with solution x = y

15 depending on the variable y, that is,

4.3 Differential Equations 35

x = x(y) = y15 . Secondly, the equation x′(t) = f(t) involves the derivative

x′(t) of x(t), so we have to investigate derivatives.A basic part of Calculus is to (i) explain what a derivative is, and (ii)

solve the differential equation x′(t) = f(t), where f(t) is a given function.The solution x(t) of the differential equation x′(t) = f(t), is referred to asan integral of f(t), or alternatively as a primitive function of f(t). Thus,a basic problem of Calculus is to find a primitive function x(t) of a givenfunction f(t) corresponding to solving the differential equation x′(t) = f(t).

We now attempt to explain (i) the meaning of (4.2) including the meaningof the derivative x′(t) of the function x(t), and (ii) give a hint at how tofind the solution x(t) of the differential equation x′(t) = f(t) in terms ofthe given function f(t).

As a concrete illustration, let us imagine a car moving on a highway. Lett represent time, let x(t) be the distance traveled by the car at time t, andlet f(t) be the momentary velocity of the car at time t, see Fig. 4.1.

x(t)

f(t)

Fig. 4.1. Highway with car (Volvo?) with velocity f(t) and travelled distancex(t)

We choose a starting time, say t = 0 and a final time, say t = 1, and wewatch the car as it passes from its initial position with x(0) = 0 at timet = 0 through a sequence of increasing intermediate times t1, t2, . . ., withcorresponding distances x(t1), x(t2), . . ., to the final time t = 1 with totaldistance x(1). We thus assume that 0 = t0 < t1 < · · · < tn−1 < tn · · · <tN = 1 is a sequence of intermediate times with corresponding distancesx(tn) and velocities f(tn), see Fig. 4.2.

x(tn−1)

x(tn)

f(tn−1) f(tn)

Fig. 4.2. Distance and velocity at times tn−1 and tn

For two consecutive times tn−1 and tn, we expect to have

x(tn) ≈ x(tn−1) + f(tn−1)(tn − tn−1), (4.3)


which says that the distance x(tn) at time tn is obtained by adding to thedistance x(tn−1) at time tn−1 the quantity f(tn−1)(tn − tn−1), which isthe product of the velocity f(tn−1) at time tn−1 and the time incrementtn − tn−1. This is because

change in distance = average velocity × change in time,

or traveled distance between time tn−1 and tn equals the (average) velocitymultiplied by the time change tn − tn−1. Note that we may equally wellconnect x(tn) to x(tn−1) by the formula

x(tn) ≈ x(tn−1) + f(tn)(tn − tn−1), (4.4)

corresponding to replacing tn−1 by tn in the f(t)-term. We use the approx-imate equality ≈ because we use the velocity f(tn−1) or f(tn), which is notexactly the same as the average velocity over the time interval from tn−1

to tn, but should be close to the average if the time interval is short (andthe veolicity does not change very quickly).

Example 4.1. If x(t) = t2, then x(tn) − x(tn−1) = t2n − t2n−1 = (tn +tn−1)(tn − tn−1), and (4.3) and (4.4) correspond to approximating the av-erage velocity (tn + tn−1) with 2tn−1 or 2tn, respectively.

The formula (4.3) is at the heart of Calculus! It contains both the deriva-tive of x(t) and the integral of f(t). First, shifting x(tn−1) to the left andthen dividing by the time increment tn − tn−1, we get

x(tn) − x(tn−1)tn − tn−1

≈ f(tn−1). (4.5)

This is a counterpart to (4.2), which indicates how to define the derivativex′(tn−1) in order to have the equation x′(tn−1)) = f(tn−1) fulfilled:

x′(tn−1) ≈x(tn) − x(tn−1)

tn − tn−1. (4.6)

This formula says that the derivative x′(tn−1) is approximately equal tothe average velocity

x(tn) − x(tn−1)tn − tn−1

.

over the time interval between tn−1 and tn. Thus, we may expect thatthe equation x′(t) = f(t) just says that the derivative x′(t) of the traveleddistance x(t) with respect to time t, is equal to the momentary velocity f(t).The formula (4.6) then says that the velocity x′(tn−1) at time tn−1, that isthe momentary velocity at time tn−1, is approximately equal to the averagevelocity over the time interval (tn−1, tn). We have now uncovered some ofthe mystery of the derivative hidden in (4.3).

4.3 Differential Equations 37

Next, considering the formula corresponding to to (4.3) for the timeinstances tn−2 and tn−1, obtained by simply replacing n by n−1 everywherein (4.3), we have

x(tn−1) ≈ x(tn−2) + f(tn−2)(tn−1 − tn−2), (4.7)

and thus together with (4.3),

x(tn) ≈≈x(tn−1)

︷︸︸︷x(tn−2) + f(tn−2)(tn−1 − tn−2)+f(tn−1)(tn − tn−1). (4.8)

Repeating this process, and using that x(t0) = x(0) = 0, we get the formula

x(tn) ≈ Xn = f(t0)(t1 − t0) + f(t1)(t2 − t1) + · · ·+ f(tn−2)(tn−1 − tn−2) + f(tn−1)(tn − tn−1).

(4.9)

Example 4.2. Consider a velocity f(t) = t1+t increasing with time t from

zero for t = 0 towards one for large t. What is the travelled distance x(tn)at time tn in this case? To get an (approximate) answer we compute theapproximation Xn according to (4.9):

x(tn) ≈ Xn =t1

1 + t1(t2 − t1) +

t21 + t2

(t3 − t2) + · · ·

+tn−2

1 + tn−2(tn−1 − tn−2) +

tn−1

1 + tn−1(tn − tn−1).

With a “uniform” time step k = tj − tj−1 for all j, this reduces to

x(tn) ≈ Xn =k

1 + kk +

2k1 + 2k

k + · · ·

+(n− 2)k

1 + (n− 2)kk +

(n− 1)k1 + (n− 1)k

k.

We compute the sum for n = 1, 2, . . . , N choosing k = 0.05, and plot theresulting values of Xn approximating x(tn) in Fig. 4.3.

We now return to (4.9), and setting n = N we have in particular

x(1) = x(tN ) ≈ f(t0)(t1 − t0) + f(t1)(t2 − t1) + · · ·+f(tN−2)(tN−1 − tN−1) + f(tN−1)(tN − tN−1),

that is, x(1) is (approximately) the sum of the terms f(tn−1)(tn − tn−1)with n ranging from n = 1 up to n = N . We may write this in morecondensed form using the summation sign Σ as

x(1) ≈N∑

n=1

f(tn−1)(tn − tn−1), (4.10)


Fig. 4.3. Travelled distance Xn approximating x(t) for f(t) = t1+t

with timesteps k = 0.05

which expresses the total distance x(1) as the sum of all the increments ofdistance f(tn−1)(tn − tn−1) for n = 1, . . . , N . We can view this formula asa variant of the “telescoping” formula

x(1) = x(tN )

=0︷︸︸︷−x(tN−1) + x(tN−1)

= 0︷︸︸︷−x(tN−2) + x(tN−2) · · ·

+

=0︷︸︸︷−x(t1) + x(t1)−x(t0)

= x(tN ) − x(tN−1)︸︷︷︸≈f(tN−1)(tN−tN−1)

+ x(tN−1) − x(tN−2)︸︷︷︸≈f(tN−2)(tN−1−tN−2)

+ x(tN−2) · · · − x(t1)

+ x(t1) − x(t0)︸︷︷︸≈f(t0)(t1−t0)

expressing the total distance x(1) as a sum of all the increments x(tn) −x(tn−1) of distance (assuming x(0) = 0), and recalling that

x(tn) − x(tn−1) ≈ f(tn−1)(tn − tn−1).

In the telescoping formula, each value x(tn), except x(tN ) = x(1) andx(t0) = 0, occurs twice with different signs.

In the language of Calculus, the formula (4.10) will be written as

x(1) =∫ 1

0

f(t) dt, (4.11)

where ≈ has been replaced by =, the sum∑

has been replaced by theintegral

∫, the increments tn − tn−1 by dt, and the sequence of “discrete”

time instances tn, running (or rather “jumping” in small steps) from time 0

4.4 Generalization 39

to time 1 corresponds to the integration variable t running (“continuously”)from 0 to 1. We call the right hand side of (4.11) the integral of f(t) from 0to 1. The value x(1) of the function x(t) for t = 1, is the integral of f(t) from0 to 1. We have now uncovered some of the mystery of the integral hiddenin the formula (4.10) resulting from summing the basic formula (4.3).

The difficulties with Calculus, in short, are related to the fact that in(4.5) we divide with a small number, namely the time increment tn −tn−1, which is a tricky operation, and in (4.10) we sum a large numberof approximations, and the question is then if the approximate sum, thatis, the sum of approximations f(tn−1)(tn − tn−1) of x(tn) − x(tn−1), isa reasonable approximation of the “real” sum x(1). Note that a sum ofmany small errors very well could result in an accumulated large error.

We have now gotten a first glimpse of Calculus. We repeat: the heart isthe formula

x(tn) ≈ x(tn−1) + f(tn−1)(tn − tn−1)

or setting f(t) = x′(t),

x(tn) ≈ x(tn−1) + x′(tn−1)(tn − tn−1),

connecting increment in distance to velocity multiplied with increment intime. This formula contains both the definition of the integral reflecting(4.10) obtained after summation, and the definition of the derivative x′(t)according to (4.6) obtained by dividing by tn− tn−1. Below we will uncoverthe surprising strength of these seemingly simple relations.

4.4 Generalization

We shall also meet the following generalization of (4.2)

x′(t) = f(x(t), t) (4.12)

in which the function f on the right hand side depends not only on t butalso on the unknown solution x(t). The analog of formula (4.3) now maytake the form

x (tn) ≈ x (tn−1) + f (x(tn−1), tn−1) (tn − tn−1) , (4.13)

or changing from tn−1 to tn in the f -term and recalling (4.4),

x (tn) ≈ x (tn−1) + f (x(tn), tn) (tn − tn−1) , (4.14)

where as above, 0 = t0 < t1 < · · · < tn−1 < tn · · · < tN = 1 is a sequenceof time instances.

Using (4.13), we may successively determine approximations of x(tn)for n = 1, 2, . . . , N , assuming that x(t0) is a given initial value. If we use


instead (4.14), we obtain in each step an algebraic equation to determinex(tn) since the right hand side depends on x(tn).

In this way, solving the differential equation (4.12) approximately for0 < t < 1 is reduced to computing x(tn) for n = 1, . . . , N , using theexplicit formula (4.13) or solving the algebraic equation Xn = X(tn−1) +f(Xn, tn)(tn − tn−1) in the unknown Xn.

As a basic example, we will study the differential equation

x′(t) = x(t) for t > 0, (4.15)

corresponding to choosing f(x(t), t) = x(t). In this case (4.13) takes theform

x(tn) ≈ x(tn−1) + x(tn−1)(tn − tn−1) = (1 + (tn − tn−1))x(tn−1).

With (tn − tn−1) = 1N constant for n = 1, . . . , N , we get the formula

x(tn) ≈(

1 +1N

)

x(tn−1) for n = 1, . . . , N.

Repeating this formula, we get x(tn) ≈ (1 + 1N )(1 + 1

N )x(tn−2), and so on,which gives

x(1) ≈(

1 +1N

)N

x(0). (4.16)

Later, we will see that there is indeed an exact solution of the equationx′(t) = x(t) for t > 0 satisfying x(0) = 1, and we shall denote it x(t) =exp(t), and name it the exponential function. The formula (4.16) gives thefollowing approximate formula for exp(1), where exp(1) = e is commonlyreferred to as the base of the natural logarithm:

e ≈(

1 +1N

)N

. (4.17)

We give below values of (1 + 1N )N for different N :

N (1 + 1N )N

1 22 2.253 2.374 2.44145 2.48836 2.52167 2.546510 2.593720 2.6533100 2.70481000 2.716910000 2.7181

4.5 Leibniz’ Teen-Age Dream 41

The differential equation x′(t) = x(t) for t > 0, models the evolution offor example a population of bacteria which grows at a rate x′(t) equal tothe given amount of bacteria x(t) at each time instant t. After each onetime unit such a population has multiplied with the factor e ≈ 2.72.

Fig. 4.4. Graph of exp(t): Exponential growth

4.5 Leibniz’ Teen-Age Dream

A form of Calculus was envisioned by Leibniz already as a teen-ager. YoungLeibniz used to amuse himself with tables of the following form

n 1 2 3 4 5 6 7

n2 1 4 9 16 25 36 491 3 5 7 9 11 131 2 2 2 2 2 2

or

n 1 2 3 4 5

n3 1 8 27 64 1251 7 19 37 611 6 12 18 241 5 6 6 6


The pattern is that below each number, one puts the difference of thatnumber and the number to its left. From this construction, it follows thatany number in the table is equal to the sum of all the numbers in the nextrow below and to the left of the given number. For example for the squaresn2 in the first table we obtain the formula

n2 = (2n− 1) + (2(n− 1) − 1) + · · · + (2 · 2 − 1) + (2 · 1 − 1), (4.18)

which can also be written as

n2 + n = 2(n+ (n− 1) + · · · + 2 + 1) = 2n∑

k=1

k. (4.19)

This corresponds to the area of the “triangular” domain in Fig. 4.5, whereeach term in the sum (the factor 2 included) corresponds to the area of oneof the colons of squares.

Fig. 4.5.

The formula (4.19) is an analog of the formula

x2 = 2∫ x

0

y dy

with x corresponding to n, y to k, dy to 1, and∑n

k=1 to∫ n

0 . Note that forn large the n-term in (4.19) is vanishing in comparison with n2 in the sumn2 + n.

By dividing by n2, we can also write (4.18) as

1 = 2n∑

k=1

k

n

1n− 1n, (4.20)

which is an analog of

1 = 2∫ 1

0

y dy

4.6 Summary 43

with dy corresponding to 1n , y to k

n and∑n

k=0 to∫ 1

0 . Note that the term− 1

n in (4.20) acts as a small error term that gets smaller with increas-ing n.

From the second table with n3 we may similarly see that

n3 =n∑

k=1

(3k2 − 3k + 1), (4.21)

which is an analog of the formula

x3 =∫ x

0

3y2 dy

with x corresponding to n, y to k and dy = 1.By dividing by n3, we can also write (4.21) as

1 =n∑

k=0

3(k

n

)2 1n− 1n

n∑

k=0

3k

n

1n

+1n2,

which is an analog of

1 =∫ 1

0

3y2 dy

with dy corresponding to 1n , y to k

n , and∑n

k=0 to∫ 1

0. Again, the error

terms that appear get smaller with increasing n.Notice that repeated use of summation allows e.g. n3 to be computed

starting with the constant differences 6 and building the table frombelow.

4.6 Summary

We may think of Calculus as the science of solving differential equations.With a similar sweeping statement, we may view Linear Algebra as thescience of solving systems of algebraic equations. We may thus present thebasic subjects of our study of Linear Algebra and Calculus in the form ofthe following two problems:

Find x such that f(x) = 0 (algebraic equation) (4.22)

where f(x) is a given function of x, and

Find x(t) such that x′(t) = f(x(t), t)

for t ∈ (0, 1], x(0) = 0, (differential equation) (4.23)


where f(x, t) is a given function of x and t. Keeping this crude descrip-tion in mind when following this book may help to organize the jungleof mathematical notation and techniques inherent to Linear Algebra andCalculus.

We shall largely take a constructive approach to the problem of solvingequations, where we seek algorithms through which solutions may be de-termined or computed with more or less work. Algorithms are like recipesfor finding solutions in a step by step manner. In the process of construc-tively solving equations we will need numbers of different kinds, such asnatural numbers, integers, rational numbers. We will also need the conceptof real numbers, real variable, real-valued function, sequence of numbers,convergence, Cauchy sequence and Lipschitz continuous function.

These concepts are supposed to be our humble servants and not terror-izing masters, as is often the case in mathematics education. To reach thisposition we will seek to demystify the concepts by using the constructiveapproach as much as possible. We will thus seek to look behind the curtainon the theater scene of mathematics, where often very impressive lookingphenomena and tricks are presented by math teachers, and we will see thatas students we can very well make these standard tricks ourselves, and infact come up with some new tricks of our own which may even be betterthan the old ones.

4.7 Leibniz: Inventor of Calculusand Universal Genius

Gottfried Wilhelm von Leibniz (1646–1716) is maybe the most versatilescientist, mathematician and philosopher all times. Newton and Leibnizindependently developed different formulations of Calculus; Leibniz nota-tion and formalism quickly became popular and is the one used still todayand which we will meet below. Leibniz boldly tackled the basic problemin Physics/Philosophy/Psychology of Body and Soul in his treatise A NewSystem of Nature and the Communication of Substances as well as theUnion Existing between the Soul and the Body from 1695. In this workLeibniz presented his theory of Pre-established Harmony of Soul and Body;In the related Monadology he describes the World as consisting of somekind of elementary particles in the form of monads, each of which witha blurred incomplete perception of the rest of the World and thus in pos-session of some kind of primitive soul. The modern variant of Monadologyis QuantuumMechanics, one of the most spectacular scientific achievementsof the 20th century.

Here is a description of Leibniz from Encyclopedia Britannica: “Leibnizwas a man of medium height with a stoop, broad-shouldered but bandy-legged, as capable of thinking for several days sitting in the same chair as of


travelling the roads of Europe summer and winter. He was an indefatigableworker, a universal letter writer (he had more than 600 correspondents),a patriot and cosmopolitan, a great scientist, and one of the most powerfulspirits of Western civilization”.

Fig. 4.6. Leibniz, Inventor of Calculus: “Theoria cum praxis”. “When I set myselfto reflect on the Union of Soul with the Body, I seemed to be cast back again intothe open sea. For I could find no way of explaining how the Body causes somethingto happen in the Soul, or vice versa. . . Thus there remains only my hypothesis,that is to say the way of the pre-established harmony–pre-established, that is bya Divine anticipatory artifice, which is so formed each of theses substances fromthe beginning, that in merely following its own laws, which it received with itsbeing, it is yet in accord with the other, just as if they mutually influenced oneanother, or as if, over and above his general concourse, God were for ever puttingin his hands to set them right”

Chapter 4 Problems

4.1. Derive mathematical models of the form y = f(x) connecting the displace-ment x with the force y = f(x) for the following mechanical systems consistingof an elastic string in the first two cases and an elastic string coupled to an elas-tic spring in the third case. Find approximate models of the form y = xr withr = 1, 3, 1

2.


x

x

x

f(x)f(x)f(x)

Fig. 4.7.

4.2. Like Galileo solve the following differential equations: (a) x′(t) = v, (b)x′(t) = at, where v and a are constants. Interprets the results in a Pisa setting.

4.3. Consider the table

0 0 2 4 8 16 32 64

0 2 4 8 16 32 64 128

2 4 8 16 32 64 128 256

Do you see any connection to the exponential function?

4.4. The area of a disc of radius one is equal to π. Compute lower and upperbounds for π by comparing the area of the disc to the areas of inscribed andcircumscribed polygons, see Fig. 4.8.

Fig. 4.8.

4.5. Solve the Dinner Soup/Ice cream problem with the equation 2x = y + 1replaced by x = 2y + 1.

5Natural Numbers and Integers

“But”, you might say, “none of this shakes my belief that 2 and 2are 4”. You are right, except in marginal case. . . and it is only inmarginal cases that you are doubtful whether a certain animal isa dog or a certain length is less than a meter. Two must be two ofsomething, and the proposition “2 and 2 are 4” is useless unless itcan be applied. Two dogs and two dogs are certainly four dogs, butcases arrive in which you are doubtful whether two of them are dogs.“Well, at any rate there are four animals” you may say. But thereare microorganisms concerning which it is doubtful whether theyare animals or plants. “Well, then living organisms,” you may say.But there are things of which it is doubtful whether they are livingorganisms or not. You will be driven into saying: “Two entities andtwo entities are four entities”. When you have told me what youmean by “entity” I will resume the argument. (Russell)

5.1 Introduction

In this chapter, we recall how natural numbers and integers may be con-structively defined, and how to prove the basic rules of computation welearn in school. The purpose is to give a quick example of developinga mathematical theory from a set of very basic facts. The idea is to givethe reader the capability of explaining to her/his grandmother why, for ex-ample, 2 times 3 is equal to 3 times 2. Answering questions of this natureleads to a deeper understanding of the nature of integers and the rules forcomputing with integers, which goes beyond just accepting facts you learn

48 5. Natural Numbers and Integers

in school as something given once and for all. An important aspect of thisprocess is the very questioning of established facts that follows from posingthe why, which may lead to new insight and new truths replacing the oldones.

5.2 The Natural Numbers

The natural numbers such as 1, 2, 3, 4, . . ., are familiar from our experiencewith counting where we repeatedly add 1 starting with 1. So 2 = 1 + 1,3 = 2+1 = 1+1+1, 4 = 3+1 = 1+1+1+1, 5 = 4+1 = 1+1+1+1+1, andso on. Counting is a pervasive activity in human society: we count minuteswaiting for the bus to come and the years of our life; the clerk counts changein the store, the teacher counts exam points, Robinson Crusoe counted thedays by making cuts on a log. In each of these cases, the unit 1 representssomething different; minutes and years, cents, exam points, days; but theprocess of counting is the same for all the cases. Children learn to count atan early age and may count to 10 by the age of say 3. Clever chimpanzeesmay also be taught to count to 10. The ability to count to 100 may beachieved by children of the age of 5.

The sum n+m obtained by adding two natural numbers n and m, is thenatural number resulting from adding 1 first n times and then m times. Werefer to n andm as the terms of the sum n+m. The equality 2+3 = 5 = 3+2reflects that

(1 + 1) + (1 + 1 + 1) = 1 + 1 + 1 + 1 + 1 = (1 + 1 + 1) + (1 + 1),

which can be explained in words as observing that if we have 5 donutsin a box, then we can consume them by first eating 2 donuts and then 3donuts or equally well by first eating 3 donuts and then 2 donuts. By thesame argument we can prove the commutative rule for addition

m+ n = n+m,

and the associative rule for addition

m+ (n+ p) = (m+ n) + p,

where m, n, and p are natural numbers.The product m × n = mn obtained by multiplying two natural numbers

m and n, is the natural number resulting by adding n to itself m times.The numbers m and n of a product m×n are called factors of the product.The commutative rule for multiplication

m× n = n×m (5.1)

5.2 The Natural Numbers 49

expresses the fact that adding n to itself m times is equal to adding m toitself n times. This fact can be established by making a square array ofdots with m rows and n columns and counting the total number of dotsm× n in two ways: first by summing the m dots in each column and thensumming over the n columns and second by summing the n dots in eachrow and then summing over the m rows, see Fig. 5.1.

m

n

Fig. 5.1. Illustration of the commutative rule for multiplication m×n = n×m.We get the same sum if first add up the dots by counting across the rows or downthe columns

In a similar way we can prove the associative rule for multiplication

m× (n× p) = (m× n) × p (5.2)

and the distributive rule combining addition and multiplication,

m× (n+ p) = m× n+m× p, (5.3)

for natural numbers m, n, and p. Note that here we use the convention thatmultiplications are carried out first, then summations, unless otherwise isindicated. For example, 2+3×4 means 2+(3×4) = 24, not (2+3)×4 = 20.To overrule this convention we may use parentheses, as in (2+3)×4 = 5×4.From (5.3) (and (5.1)) we obtain the useful formula

(m+ n)(p+ q) = (m+ n)p+ (m+ n)q = mp+ np+mq + nq. (5.4)

We define n2 = n× n, n3 = n× n× n, and more generally

np = n× n× · · · × n( p factors)


for natural numbers n and p, and refer to np as n to the power p, or the“p-th power of n”. The basic properties

(np

)q = npq

np × nq = np+q

np ×mp = (nm)p,

follow directly from the definition, and from the associative and distributivelaws of multiplication.

We also have a clear idea of ranking natural numbers according to size.We consider m to be larger than n, written as m > n, if we can obtain mby adding 1 repeatedly to n. The inequality relation satisfies its own set ofrules including

m < n and n < p implies m < p

m < n implies m+ p < n+ p

m < n implies p×m < p× n

m < n and p < q implies m+ p < n+ q,

which hold for natural numbers n, m, p, and q. Of course, n > m is thesame as m < n, and writing m ≤ n means that m < n or m = n.

A way of representing the natural numbers is to use a horizontal lineextending to the right with the marks 1, 2, 3, spaced at a unit distanceconsecutively, see Fig. 5.2. This is called the natural number line. The lineserves like a ruler to keep the points lined up in ascending order to theright.

1 2 3 4

Fig. 5.2. The natural number line

We can interpret all of the arithmetic operations using the number line.For example, adding 1 to a natural number n means shifting one unit tothe right from the position of n to that of n + 1, and likewise adding pmeans shifting p units to the right.

We can also extend the natural number line one unit to the left and markthat point by 0, which we refer to as zero. We can use 0 as a starting pointfrom which we get to the point marked 1 by moving one unit to the right,We can interpret this operation as 0 + 1 = 1, and generally we have

0 + n = n+ 0 = n (5.5)

for n a natural number. We further define n× 0 = 0 × n = 0 and n0 = 1.

5.3 Is There a Largest Natural Number? 51

0 1 2 3 4

Fig. 5.3. The extended natural number line, including 0

Representing natural numbers as sums of ones like 1 + 1 + 1 + 1 + 1 or1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1, that is, as cuts on a log or as beads ona thread, quickly becomes impractical as the size of the number increases.To be able to express natural numbers of any size, it is convenient to usea positional system. In a positional system with base 10 we use the digits 0,1, 2, 3, 4, 5, 6, 7, 8, 9, and express each natural number uniquely as a sumof terms of the form

d× 10p (5.6)

where d is one of the digits 0, 1, 2, . . . , 9, and p is a natural number or 0.For example

4711 = 4 × 103 + 7 × 102 + 1 × 101 + 1 × 100.

We normally use the positional system with base 10, where the choice ofbase is of course connected to counting using our fingers.

One can use any natural number as the base in a positional system. Thecomputer normally uses the binary system with base 2, where a naturalnumber is expressed as a string of 0s and 1s. For example

1001 = 1 × 23 + 0 × 22 + 0 × 21 + 1 × 20, (5.7)

which equals the usual number 9. We will return to this topic below.

5.3 Is There a Largest Natural Number?

The insight that counting always can be continued by adding 1 yet anothertime, that is the insight that if n is a natural number, then n+1 is a naturalnumber, is an important step in the development of a child usually taken inearly school years. Whatever natural number I would assign as the largestnatural number, you could argue that the next natural number obtained byadding 1 is bigger, and I would probably have to admit that there cannotbe a largest natural number. The line of natural numbers extends for everto the right.

Of course, this is related to some kind of unlimited thought experiment. Inreality, time or space could set limits. Eventually, Robinson’s log would befilled with cuts, and a natural number with say 1050 digits would seem im-possible to store in a computer since the number of atoms in the Universe isestimated to be of this order. The number of stars in the Universe is proba-bly finite although we tend to think of this number as being without bound.


We may thus say that in principle there is no largest natural number,while in practice we will most likely never deal with natural numbers biggerthan 10100. Mathematicians are interested in principles and thus would liketo first get across what is true in principle, and then at a later stage whatmay be true in practice. Other people may prefer to go to realities directly.Of course, principles may be very important and useful, but one should notforget that there is a difference between what is true in principle and whatis really true.

The idea that, in principle, there cannot be a largest natural number,is intimately connected to the concept of infinity. We may say that thereare infinitely many natural numbers, or that the set of natural numbers isinfinite, in the sense that we can keep on counting or making cuts withoutever stopping; there is always possible to make another cut and add 1another time. With this view, the concept of infinity is not so difficult tograsp; it just means that we never come to an end. Infinitely many stepsmeans a potential to take yet another step independent of the number ofsteps we have taken. There is no limit or bound. To have infinitely manydonuts means that we can always take yet another donut whenever we wantindependent of how many we have already eaten. This potential seems morerealistic (and pleasant) than actually eating infinitely many donuts.

5.4 The Set N of All Natural Numbers

We may easily grasp the set 1, 2, 3, 4, 5 of the first 5 natural numbers1, 2, 3, 4, 5. This may be done by writing down the numbers 1, 2, 3, 4 and 5on a piece of paper and viewing the numbers as constituting one entity, likea telephone number. We may even grasp the set 1, 2, . . . , 100 of the first100 natural numbers 1, 2, 3, . . . , 99, 100 in the same way. We may also graspindividual very large numbers; for instance we might grasp the number1 000 000 000 by imagining what we could buy for 1 000 000 000 dollars. Wealso feel quite comfortable with the principle of being able to add 1 to anygiven natural number. We could even agree to denote by N all the naturalnumbers that we potentially could reach by repeatedly adding 1.

We can think of N as the set of possible natural numbers and it is clearthat this set is always under construction and can never actually be com-pleted. It is like a high rise, where continuously new stores can be addedon top without any limit set by city regulations or construction technique.We understand that N embodies a potential rather than an existing reality,as we discussed above.

The definition of N as the set of possible natural numbers is a bit vaguebecause the term “possible” is a bit vague. We are used to the fact thatwhat is possible for you may be impossible for me and vice versa. Whose“possible” should we use? With this perspective we leave the door a bit

5.5 Integers 53

open to everyone to have his own idea of N depending on the meaning of“possible natural number” for each individual.

If we are not happy with this idea of N as “the set of all possible naturalnumbers”, with its admitted vagueness, we may instead seek a definition of“the set of all natural numbers” which would be more universal. Of courseany attempt to display this set by writing down all natural numbers ona piece of paper, would be rudely interrupted by reality. Deprived of thispossibility, even in principle, it appears that we must seek guidelines fromsome Big Brother concerning the meaning of N as “the set of all naturalnumbers”.

The idea of a universal Big Brother definition of difficult mathematicalconcepts connected to infinity one way or the other, like N, grew strong dur-ing the late 19th century. The leader of this school was Cantor, who createda whole new theory dealing with infinite sets and infinite numbers. Cantorbelieved he could grasp the set of natural numbers as one completed entityand use this as a stepping stone to construct sets of even higher degreesof infinity. Cantors work had profound influence on the view of infinity inmathematics, but his theories about infinite sets were understood by fewand used by even fewer. What remains of Cantors work today is a firmbelief by a majority of mathematicians that the set of all natural num-bers may be viewed as a uniquely defined completed entity which may bedenoted by N. A minority of mathematicians, the so-called constructivistsled by Kronecker, have opposed Cantors ideas and would rather think of N

somewhat more vaguely defined as the set of possible natural numbers, aswe proposed above.

The net result appears to be that there is no consensus on the definitionof N. Whatever interpretation of N you prefer, and this is now open toyour individual choice just as religion is, there will always remain someambiguity to this notion. Of course, this reflects that we can give names tothings that we cannot fully grasp, like the world, soul, love, jazz music, ego,happiness et cetera. We all have individual ideas of what these words mean.

Personally, we tend to favor the idea of using N to denote the “set ofpossible natural numbers”. Admittedly this is a bit vague (but honest),and the vagueness does not appear to create any problems in our work.

5.5 Integers

If we associate addition by the natural number p as moving p units to theright on the natural number line, we can introduce the operation of sub-traction by p as moving p units to the left. In the setting of donuts in a box,we can think of addition as putting donuts into the box and subtractionas taking them out. For example, if we have 12 donuts in the box and eat7 of them, we know there will be 5 left. We originally got the 12 donuts


by adding individual donuts into a box, and we may take away donuts, orsubtract them, by taking them back out of the box. Mathematically, wewrite this as 12 − 7 = 5 which is just another way of saying 5 + 7 = 12.

We immediately run into a complication with subtraction that we didnot meet with addition. While the sum n + m of two natural numbers isalways a natural number, the difference n−m is a natural number only ifm < n. Moving m units to the left from n will take us outside the naturalnumber line if m > n. For example, the difference 12− 15 would arise if wewanted to take 15 donuts out of a box with 12 donuts. Similar situationsarise frequently. If we want to buy a titanium bike frame for $2500, whilewe only have $1500 in the bank, we know we have to borrow $1000. This$1000 is a debt and does not represent a positive amount in our savingsaccount, and thus does not correspond to a natural number.

To handle such situations, we extend the natural numbers 1, 2, 3, · · · by adjoining the negative numbers −1, −2, −3, · · · together with 0. Theresult is the set of integers

Z = · · · ,−3,−2,−1, 0, 1, 2, 3, · · · = 0,±1,±2,±3, · · ·.

We say that 1, 2, 3, · · · , are the positive integers while −1, −2, −3, · · · ,are the negative integers. Graphically, we think of extending the naturalnumber line to the left and then marking the point that is one unit distanceto the left of 0 as −1, and so on, to get the integer number line, see Fig. 5.4.

− −− 0 11 22 33

Fig. 5.4. The integer number line

We may define the sum n +m of two integers n and m as the result ofadding m to n as follows. If n and m are both natural numbers, or positiveintegers, then n+m is obtained the usual way by starting at 0, moving nunits to the right followed by m more units to the right. If n is positiveand m is negative, then n+m is obtained starting at 0, moving n units tothe right, and then m units back to the left. Likewise if n is negative andm is positive, then we obtain n+m by starting at 0, moving n units to theleft and then m units to the right. Finally, if both n and m are negative,then we obtain n+m by starting at 0, moving n units to the left and thenm more units to the left. Adding 0, we move neither right nor left, andthus n + 0 = n for all integers n. We have now extended the operation ofaddition from the natural numbers to the integers.

Next, to define the operation of subtraction, we first agree to denote by−n the integer with the opposite sign to the integer n. We then have forany integer n that −(−n) = n, reflecting that taking the opposite signtwice gives the original sign, and n+ (−n) = (−n) + n = 0, reflecting that

5.5 Integers 55

moving n units back and forth starting at 0 will end up at 0. We now definen−m = −m+ n = n+ (−m), which we refer to as subtracting m from n.We see that subtracting m from n is the same as adding −m to n.

Finally, we need to extend multiplication to integers. To see how to dothis, we seek guidance by formally multiplying the equality n+ (−n) = 0,where n a natural number, by the natural number m. We then obtainm × n + m × (−n) = 0, which suggest that m × (−n) = −(m × n), sincem× n+ (−(m× n)) = 0. We are thus led to define m× (−n) = −(m× n)for positive integers m and n, and likewise (−n) ×m = −(n × m). Notethat by this definition, −n×m may be interpreted both as (−n) ×m andas −(n ×m). In particular we have that (−1) × n = −n for n a positiveinteger. Finally, to see how to define (−n) × (−m) for n and m positiveintegers, we multiply the equalities n+(−n) = 0 and m+(−m) = 0 to getformally n×m+n× (−m)+(−n)×m+(−n)× (−m) = 0, which indicatesthat −n×m+ (−n)× (−m) = 0, that is (−n)× (−m) = n×m, which wenow take as a definition of the product of two negative numbers (−n) and(−m). In particular we have (−1) × (−1) = 1. We have now defined theproduct of two arbitrary integers (of course we set n × 0 = 0 × n = 0 forany integer n).

To sum up, we have defined the operations of addition and multiplica-tion of integers and we can now verify all the familiar rules for computingwith integers including the commutative, associative and distributive rulesstated above for natural numbers.

Note that we may say that we have constructed the negative integers−1,−2, . . . from the given natural numbers 1, 2, . . . through a processof reflection around 0, where each natural number n gets its mirror image−n. We thus may say that we construct the integer line from the naturalnumber line through a process of reflection around 0. Kronecker said thatthe natural numbers were given by God and that all other numbers, likethe negative integers, are invented or constructed by man.

Another way to define or construct −n for a natural number n is to thinkof −n as the solution x = −n of the equation n+x = 0 since n+(−n) = 0,or equally well as the solution of x+n = 0 since (−n)+n = 0. This idea iseasily extended from n to −n, i.e. to the negative integers, by considering−(−n) to be the solution of x+(−n) = 0. Since n+(−n) = 0, we concludethe familiar formula −(−n) = n. To sum up, we may view −n to be thesolution of the equation x+ n = 0 for any integer n.

We further extend the ordering of the natural numbers to all of Z bydefining m < n if m is to the left of n on the integer line, that is, if m isnegative and n positive, or zero, or if also n is negative but −m > −n. Thisordering is a little bit confusing, because we like to think of for example−1000 as a lot bigger number than −10. Yet we write −1000 < −10 sayingthat −1000 is smaller than −10. What we need is a measure of the size ofa number, disregarding its sign. This will be the topic next.


5.6 Absolute Value and the Distance BetweenNumbers

As just indicated, it is convenient to be able to discuss the size of numbersindependent of the sign of the number. For this purpose we define theabsolute value |p| of the number p by

|p| =

p, p ≥ 0−p, p < 0.

For example, |3| = 3 and | − 3| = 3. Thus |p| measures the size of thenumber p, disregarding its sign, as desired. For example |− 1000| > |− 10|.

Often we are interested in the difference between two numbers p and q,but are concerned primarily with the size of the difference and care lessabout its sign, that is we are interested in |p − q| corresponding to thedistance between the two numbers on the number line.

For example suppose we have to buy a piece of molding for a doorwayand when using a tape measure we position one side of the doorframe at2 inches and the opposite side at 32 inches. We would not go to the storeand ask the person for a piece of molding that begins at 2 inches and endsat 32 inches. Instead, we would only tell the clerk that we need 32−2 = 30inches. In this case, 30 is the distance between 32 and 2. We define thedistance between two integers p and q as |p− q|.

By using the absolute value, we insure that the distance between p and qis the same as the distance between q and p. For example, |5− 2| = |2− 5|.

In this book, we will be dealing with inequalities combined with theabsolute value frequently. We give an example close to every student’s heart.

Example 5.1. Suppose the scores on an exam that are within 5 of 79 outof 100 get a grade of B and we want to write down the list of scores thatget a B. This includes all scores x that are a distance of at most 5 from79, which can be written

|x− 79| ≤ 5. (5.8)

There are two possible cases: x < 79 and x ≥ 79. If x ≥ 79 then |x −79| = x − 79 and (5.8) becomes x − 79 ≤ 5 or x ≤ 84. If x < 79 then|x− 79| = −(x− 79) and (5.8) means that −(x− 79) ≤ 5 or (x− 79) ≥ −5or x ≥ 74. Combining these results we have 79 ≤ x ≤ 84 as one possibilityor 74 ≤ x < 79 as another possibility, or in other words, 74 ≤ x ≤ 84.

In general if |x| < b, then we have the two possibilities −b < x < 0 or0 ≤ x < b which means that −b < x < b. We can actually solve both casesat one time.

5.7 Division with Remainder 57

Example 5.2. |x− 79| ≤ 5 means that

−5 ≤ x− 79 ≤ 574 ≤ x ≤ 84

To solve |4 − x| ≤ 18, we write

−18 ≤ 4 − x ≤ 1818 ≥ x− 4 ≥ −18 (Note the changes!)

22 ≥ x ≥ −14

Example 5.3. To solve the following inequality in x:

|x− 79| ≥ 5. (5.9)

we first assume that x ≥ 79, in which case (5.9) becomes x − 79 ≥ 5 orx ≥ 84. Next, if x ≤ 79 then (5.9) becomes −(x− 79) ≥ 5 or (x− 79) ≤ −5or x ≤ −74. The answer is thus all x with x ≥ 84 or x ≤ −74.

Finally we recall that multiplying an inequality by a negative numberlike (−1) reverses the inequality:

m < n implies −m > −n.

5.7 Division with Remainder

We define division with remainder of a natural number n by another naturalnumber m, as the process of computing nonnegative integers p and r < msuch that n = pm+r. The existence of unique p and r follows by consideringthe sequence of natural numbersm, 2m, 3m, . . ., and noting that there mustbe a unique p such that pm ≤ n < (p+ 1)m, see Fig. 5.5.

(p− 1)m pm (p+ 1)mn

m = 5 and n = pm+ r with r = 2 < m

Fig. 5.5. Illustration of pm ≤ n < (p+ 1)m

Setting r = n − pm, we obtain the desired representation n = pm + rwith 0 ≤ r < m. We call r the remainder in division of n by m. Whenthe remainder r is zero, then we obtain a factorization n = pm of n asa product of the factors p and m.


We can find the proper p in division with remainder of n bym by repeatedsubtraction of m. For example, if n = 63 and m = 15, then we may write

63 = 15 + 4863 = 15 + 15 + 34 = 2 × 15 + 33

63 = 3 × 15 + 1863 = 4 × 15 + 3,

and thus find that in this case p = 4 and r = 3.A more systematic procedure for division with remainder is the long

division algorithm, which is taught in school. We give two examples (63 =4 × 15 + 3 again, and 2418610 = 19044× 127 + 22) in Fig. 5.5.TS

a

15 63

4

603

127 2418610127

11481143

561508

530508

22

19044

1×127

9×127

4×127

4×127

4×15

Fig. 5.6. Two examples of long division

5.8 Factorization into Prime Factors

A factor of a natural number n is a natural number m that divides into nwithout leaving a remainder, that is, n = pm for some natural number p.For example, 2 and 3 are both factors of 6. A natural number n always hasfactors 1 and n since 1×n = n. A natural number n is called a prime numberif the only factors of n are 1 and n. The first few prime numbers (excluding1 since such factors are not of much interest) are 2, 3, 5, 7, 11, · · ·. Theonly even prime number is 2. Suppose that we take the natural number nand try to find two factors n = pq. Now there are two possibilities: eitherthe only two factors are 1 and n, i.e. n is prime, or we find two factors pand q, neither of which are 1 or n. By the way, it is easy to write a programto search for all the factors of a given natural number n by systematicallydividing by all the natural numbers up to n. Now in the second case, bothp and q must be less than n. In fact p ≤ n/2 and q ≤ n/2 since thesmallest possible factor not equal to 1 is 2. Now we repeat by factoringp and q separately. In each case, we either find the number is prime or

TSa Please confirm this reference to Fig. 5.5.

Editor’s or typesetter’s annotations (will be removed before the final TEX run)

5.9 Computer Representation of Integers 59

we factor it into a product of smaller natural numbers. Then we continuewith the smaller factors. Eventually this process must stop since n is finiteand the factors at any stage are no larger than half the size of the factorsof the previous stage. When the process has stopped, we have factoredn into a product of prime numbers. This factorization is unique exceptfor order. One consequence of the factorization into prime numbers is thefollowing fact. Suppose that we know that 2 is a factor of n. If n = pq isany factorization of n, it follows that at least one of the factors p and qmust have a factor of 2. The same is true for prime number factors 3, 5, 7etc., that is for any prime number factor.

5.9 Computer Representation of Integers

Since we will be using the computer throughout this course, we have topoint out some properties of computer arithmetic. We are distinguishingarithmetic carried out on a computer from the “theoretical” arithmetic welearn about in school.

The fundamental issue that arises when using a computer stems from thephysical limitation on memory. A computer must store numbers on a physi-cal device which cannot be “infinite”. Hence, a computer can only representa finite number of numbers. Every computer language has a finite limit onthe numbers it can represent. It is quite common for a computer language tohave INTEGER and LONG INTEGER types of variables, where an INTE-GER variable is an integer in the range of −32768,−32767, . . . , 32767,which are the numbers that take two bytes of storage, and a long inte-ger variable is an integer in the range −2147483648, −2147483647, . . . ,2147483647, which are the integers requiring four bytes of storage (wherea “byte” of memory consists of 8 “bit-cells”, each capable of storing eithera zero or a one). This can have some serious consequences, as anyone whoprograms a loop using an integer index that goes above the appropriatelimit finds out. In particular, we cannot check whether some fact is truefor all integers using a computer to test each case.

Chapter 5 Problems

5.1. Identify five ways in your life in which you count and the unit “1” for eachcase.

5.2. Use the natural number line representation to interpret and verify theequalities: (a) x+ y = y + x and (b) x+ (y + z) = (x+ y) + z: that hold for anynatural numbers x, y, and z:TS

b

TSb Is there a figure missing here, or should the colon be changed to dot?



5.3. Use (two and three dimensional) arrays of dots to interpret and verify a)the distributive rule for multiplication m× (n+ p) = m× n+m× p and b) theassociative rule (m× n) × p = m× (n× p).

5.4. Use the definition of np for natural numbers n and p to verify that (a)(np

)q= npq and (b) np × nq = np+q for natural numbers n, p, q.

5.5. Prove that m × n = 0 if and only if m = 0 or n = 0, for integers m and n.What does or mean here? Prove that for p = 0, p × m = p × n if and only ifm = n. What can be said if p = 0?

5.6. Verify using (5.4) that for integers n and m,

(n+m)2 = n2 + 2nm+m2

(n+m)3 = n3 + 3n2m+ 3nm2 +m3

(n+m)(n−m) = n2 −m2.

(5.10)

5.7. Use the integer number line to illustrate the four possible cases in thedefinition of n+m for integers n and m.

5.8. Divide (a) 102 by 18, (b) −4301 by 63, and (c) 650912 by 309 using longdivision.

5.9. (a) Find all the natural numbers that divide into 40 with zero remainder.(b) Do the same for 80.

5.10. (Abstract) Use long division to show that

a3 + 3a2b+ 3ab2 + b3

a+ b= a2 + 2ab+ b2.

5.11. (a) Write a MATLAB routine that tests a given natural number n tosee if it is prime. Hint: systematically divide n by the smaller natural numbersfrom 2 to n/2 to check whether there are factors. Explain why it suffices to checkup to n/2. (b) Use this routine to write a MATLAB routine that finds all theprime numbers less than a given number n. (c) List all the prime numbers lessthan 1000.

5.12. Factor the following integers into a product of prime numbers; (a) 60, (b)96, (c) 112, (d) 129.

5.13. Find two natural numbers p and q such that pq contains a factor of 4 butneither p nor q contains a factor of 4. This means that the fact that some naturalnumber m is factor of a product n = pq does not imply that m must be a factorof either p or q. Why doesn’t this contradict the fact that if pq contains a factorof 2 then at least one of p or q contains a factor of 2?


5.14. Pick out the invalid rules from the following list

a < b implies a− c < b− c

(a+ b)2 = a2 + b2

(c(a+ b)

)2= c2(a+ b)2

ac < bc implies a < b

a− b < c implies a < c+ b

a+ bc = (a+ b)c

In each case, find numbers that show the rule is invalid.

5.15. Solve the following inequalities:

(a) |2x− 18| ≤ 22 (b) |14 − x| < 6

(c) |x− 6| > 19 (d) |2 − x| ≥ 1

5.16. Verify that the following is true for arbitrary integers a, b and c: (a) |a2| =a2 (b) |a|2 = a2 (c) |ab| = |a| |b| (d) |a+ b| ≤ |a| + |b| (e) |a− b| ≤ |a| + |b|(f) |a+ b− c| ≤ |a| + |b| + |c| (g) |a| ≤ |a− b| + |b| (h) ||a| − |b|| ≤ |a− b|

5.17. Show that the inequalities (e)-(h) of Problem 5.17 follow once you have (d)and the fact that |a| = | − a| for any integer a.

5.18. Write a little program in the computer language of your choice that findsthe largest integer that the language can represent. Hint: usually one of two thingshappen if you try to set an integer variable to a value that is too large: eitheryou get an error message or the computer gives the variable a negative value.

6Mathematical Induction

There is a tradition of opposition between adherents of inductionand deduction. In my view it would be just as sensible for the twoends of a worm to quarrel. (Whitehead)

6.1 Induction

Carl Friedrich Gauss (1777–1885), sometimes called the Prince of Mathe-matics, is one of the greatest mathematicians all times. In addition to anincredible ability to compute (especially important in the 1800s) and anunsurpassed talent for mathematical proof, Gauss had an inventive imagi-nation and a restless interest in nature and he made important discoveriesin a staggering range of pure and applied mathematics. He was also a pio-neer in the constructionist sense, digging deeply into many of the acceptedmathematical truths of his time in order to really understand what ev-eryone “knew” had to be true. Perhaps the only really unfortunate sideto Gauss is that he wrote about his work only very sparingly and manymathematicians that followed him were doomed to reinvent things that healready knew.

There is a story about Gauss in school at the age of ten which goes asfollows. His old-fashioned arithmetic teacher would like to show off to hisstudents by asking them to add a large number of sequential numbers byhand, something the teacher knew (from a book) could be done quickly

64 6. Mathematical Induction

and accurately by using the following neat formula:

1 + 2 + 3 + · · · + (n− 1) + n =n(n+ 1)

2. (6.1)

Note that the “· · · ” indicate that we add all the natural numbers between 1and n. Using the formula makes it possible to replace the n−1 additions onthe left by a multiplication and a division, which is a considerable reductionin work, especially if you are using a piece of chalk and a slate to do thesums.

The teacher posed the problem of computing the sum 1 + 2 + · · ·+ 99 tothe class, and almost immediately Gauss came up and laid his slate downon the desk with the correct answer (4950), while the rest of the class stillwere in the beginning of a long struggle. How did young Gauss manageto compute the sum so quickly? Did he already know the nice formula(6.1)? Of course not, but he immediately derived it using the followingclever argument: To sum 1 + 2 + · · · + 99, group the numbers two by twoas follows:

1 + · · · + 99= (1 + 99) + (2 + 98) + (3 + 97) + · · · (49 + 51) + 50= 49 × 100 + 50 = 49 × 2 × 50 + 50 = 99 × 50

which agrees with the formula (6.1) with n = 99. One can use this type ofargument to prove the validity of (6.1) for any natural number n.

A modern teacher without a need to show off, could state the formula(6.1) and ask the students to use it with n = 99, for example, and then gohome and play the trick with their parents.

Let’s now look at the problem of verifying that the formula (6.1) is truefor any natural number n, once this formula is given to us. Suppose, weare not as clever as Gauss and don’t find the nice way of grouping thenumbers two by two indicated above. We are thus asked just to check ifa given formula is correct, and not to first find the formula itself and thenprove its validity. This is like in a multiple choice test, where we may beasked if King Gustav Adolf II of Sweden died 1632 or 1932 or not at all, asopposed to a direct question what year this king died. Everyone knows thata multiple choice questions may be easier to answer than a direct question.

We are thus asked to prove that the formula (6.1) holds for any naturalnumber n. It is easy enough to verify that it is true for n = 1: 1 = 1× 2/2:for n = 2: 1 + 2 = 3 = 2 × 3/2: and for n = 3: 1 + 2 + 3 = 6 = 3 × 4/2.Checking the validity in this way for any natural number, one at a time, upto say n = 1000, would be very tiring. Of course we could try to get helpfrom a computer, but the computer would also get stuck if n is very large.We also know in the back of our minds that no matter how many naturalnumbers n we check, there will always be natural numbers left which wehave not yet checked.

6.1 Induction 65

Is there some other way of checking the validity of (6.1) for any natu-ral number n? Yes, we could try the principle of mathematical induction,based on the idea of showing that if the formula is valid for n, then it isautomatically valid also for n+ 1.

1 2 3 n n+1

Fig. 6.1. The principle of induction: If n falls then n + 1 will fall, so if 1 fallsthey all fall!

The first step is then to check that the formula is valid for n = 1. We al-ready took this easy step. The remaining step, which is called the inductivestep, is to show that if the formula holds for a certain natural number, thenit also holds for the next natural number. The principle of mathematicalinduction now states that the formula must be true for any natural num-ber n. You are probably ready to accept this principle on intuitive grounds:we know that the formula holds for n = 1. By the inductive step, it thenholds for the next number, that is for n = 2, and thus for n = 3 again bythe induction step, and then for n = 4, and so on. Since we will eventuallyreach any natural number this way, we may be pretty sure that the formulaholds for any natural number. Of course the principle of mathematical in-duction is based on the conviction that we will eventually reach any naturalnumber if we start with 1 and then add 1 sufficiently many times.

Let us now take the induction step in the attempted proof of (6.1). Wethus assume that the formula (6.1) is valid for n = m− 1, where m ≥ 2 isa natural number. In other words, we assume that

1 + 2 + 3 + · · · +m− 1 =(m− 1)m

2. (6.2)

We now want to prove that the formula holds for the next natural numbern = m. To do this for (6.1), we add m to both sides of (6.2) to get

1 + 2 + 3 + · · · +m− 1 +m =(m− 1)m

2+m

=m2 −m

2+

2m2

=m2 −m+ 2m

2

=m(m+ 1)

2

which shows the validity of the formula for n = m. We have thus verifiedthat if (6.1) is true for a certain natural number n = m− 1 then it is truefor the next natural number n = m, that is we have verified the inductive


step. Repeating the inductive step, we see that (6.1) holds for any naturalnumber n.

We may phrase the proof of the inductive step alternatively as follows,where we don’t bother to introduce the natural number m. In the refor-mulation we assume that (6.1) holds with n replaced by n− 1, that is, weassume that

1 + 2 + 3 + · · · + n− 1 =(n− 1)n

2.

Adding n to both sides, we get

1 + · · · + n− 1 + n =(n− 1)n

2+ n =

n(n+ 1)2

which is (6.1) as desired. We may thus take the inductive step by proving(6.1) assuming the validity of (6.1) with n replaced by n− 1.

Formulas like (6.1) have a close connection to the basic integration for-mulas of Calculus we meet later and they also occur, for example, whencomputing compound interest on a savings account or adding up popu-lations of animals. The formulas are thus useful not only for impressingpeople.

Many students may find that verifying a property like (6.1) for specificn like 1 or 2 or 100 is not so difficult to do. But the general inductive step:assuming the formula is true for some given natural number and showingthat it is then true for the next natural number, causes some uneasiness be-cause the given number is not specified concretely. Don’t let this feeling ofstrangeness stop you from trying out the problems: like much of mathemat-ics, actually doing the problems is not as bad as the anticipation of havingto do the problems (like going to the dentist). Working out some inductionproblems is good practice for getting used to some of the arguments thatwe encounter later.

The method of induction may be useful for showing the validity of a givenformula, but first you have to find the formula in some way, which mayrequire some good intuition, trial and error, or some other insight (like theclever idea of Gauss). Induction may thus help you because it gives yousome kind of method of approach (assume the validity for some naturalnumber and then prove validity for the next), but you must have a goodguess or conjecture to start from.

You could try to come up with the formula (6.1) by some trial and error,once you have the courage to start looking for a formula. You could argueroughly that the average size of the numbers 1 to n is n/2, and since thereare n numbers to add, their sum should be something like nn

2 , which ispretty close to the correct (n + 1)n

2 . You don’t need to be Gauss to seethis.

We now give three additional examples illustrating the use of mathemat-ical induction.

6.1 Induction 67

Example 6.1. First we show the following formula for the sum of a finitegeometric series:

1 + p+ p2 + p3 + · · · + pn =1 − pn+1

1 − p(6.3)

where p is the quotient, which we here assume to be a given natural number,and n is a natural number. Note that we think of p as being fixed and theinduction is on the number n with n+ 1 being the number of terms in theseries. The formula (6.3) holds for n = 1, since

1 + p =(1 − p)(1 + p)

1 − p=

1 − p2

1 − p,

where we use the formula a2 − b2 = (a− b)(a+ b). Assuming it is true withn replaced by n− 1, we have

1 + p+ p2 + p3 + · · · + pn−1 =1 − pn

1 − p.

We add pn to both sides to get

1 + p+ p2 + p3 + · · · + pn−1 + pn =1 − pn

1 − p+ pn

=1 − pn

1 − p+pn(1 − p)

1 − p

=1 − pn+1

1 − p

which shows the inductive step.

Of course, a much simpler way to verify (6.3) is to just note that (1−p)(1+p+p2+p3+· · ·+pn) = 1+p+p2+p3+· · ·+pn−p−p2−p3−· · ·−pn−pn+1 =1 − pn+1, and then simply divide by 1 − p.

Example 6.2. Induction can also be used to show properties that do notinvolve sums. For example, we show an inequality that is useful. For anyfixed natural number p,

(1 + p)n ≥ 1 + np (6.4)

for any natural number n. The inequality (6.4) is certainly valid for n = 1,since (1 + p)1 = 1 + 1 × p. Now assume it holds for n− 1,

(1 + p)n−1 ≥ 1 + (n− 1)p.

We multiply both sides by the positive number 1 + p,

(1 + p)n = (1 + p)n−1(1 + p) ≥ (1 + (n− 1)p)(1 + p)

≥ 1 + (n− 1)p+ p+ (n− 1)p2 ≥ 1 + np+ (n− 1)p2.

Since (n − 1)p2 is nonnegative, we can take it away from the right-handside and then obtain (6.4).


6.2 Changes in a Population of Insects

Induction is also often used to derive models. We now present an exampleinvolving the growth in a population of insects. We consider a simplifiedsituation of a population of an insect in which the adults breed duringthe first summer they are alive then die before the following summer. Wewant to figure out how the population of these insects changes year byyear. This is an important issue for instance if the insects carry diseaseor consume farm crops. In general, there are many factors that affect therate of reproduction; the food supply, the weather, pesticides, and even thepopulation itself. But in first phase of the modelling, we simplify all of thisby assuming that the number of offspring produced each breeding seasonis simply proportional to the number of insects alive during that season.Experimentally this is often a valid assumption if the population is not toolarge.

Because we are describing populations of the insects during differentyears, we need to introduce a notation that makes it easy to associate vari-able names with different years. We use the index notation to do this. Welet P0 denote the current or initial population and P1, P2, · · · , Pn, · · ·denote the populations during subsequent years number 1, 2, · · · , n, · · ·respectively. The index or subscript on Pn is a convenient way to denote theyear. Following our assumption, we know that Pn is always proportional toPn−1. Our modelling assumption is that the population Pn, after any num-ber of years n, is proportional to, that is a fixed multiple of, the populationPn−1 the year before. Using R to denote the constant of proportionality,we thus have

Pn = RPn−1. (6.5)

Assuming the initial population P0 is known, the problem is to figure outwhen the population reaches a specific level M . In other words, find thefirst n such that Pn ≥M .

In order to do this, we want to find a formula expressing the dependenceof Pn on n. We can do this by using induction on (6.5). Since (6.5) alsoholds for n− 1, i.e. Pn−1 = RPn−2. Substituting, we find

Pn = RPn−1 = R(RPn−2) = R2Pn−2.

Now we substitute for Pn−2 = RPn−3, Pn−3 = RPn−4, and so on. Aftern− 2 more substitutions, we find

Pn = RnP0. (6.6)

Since R and P0 are known, this gives an explicit formula for Pn in terms ofn. Note that the way we use induction in this example might seem differentthan the previous examples. But the difference is only superficial. To makethe induction argument look the same as for the previous examples, we


can assume that (6.6) holds for n − 1 and then use (6.5) to show that ittherefore holds for n.

Returning to the question of finding n such that Pn ≥ M , the modelproblem is to find n such that

Rn ≥M/P0. (6.7)

As long asR > 1,Rn eventually grows large enough to do this. For example,if R = 2, then Pn grows quickly with n. If P0 = 1 000, then P1 = 2 000,P5 = 32 000, and P10 = 1 024 000.

Chapter 6 Problems

6.1. Prove the following formulas:

(a) 12 + 22 + 32 + · · · + n2 =n(n+ 1)(2n+ 1)

6(6.8)

and

(b) 13 + 23 + 33 + · · · + n3 =

(n(n+ 1)

2

)2

: (6.9)

hold for all natural numbers n by using induction.

6.2. Using induction, show the following formula holds for all natural numbersn,

1

1 × 2+

1

2 × 3+

1

3 × 4+ · · · + 1

n(n+ 1)=

1

n+ 1.

6.3. Using induction, show the following inequalities hold for all natural numbersn:

(a) 3n2 ≥ 2n+ 1 (b) 4n ≥ n2

6.4. The problem is to model the population of a species of insects that hasa single breeding season during the summer. The adults breed during the firstsummer they are alive then die before the following summer. Assuming that thenumber of offspring born each breeding season is proportional to the square ofthe number of adults, express the population of the insects as a function of theyear.

6.5. The problem is to model the population of a species of insects that hasa single breeding season during the summer. The adults breed during the firstsummer they are alive then die before the following summer and moreover theadults kill and eat some of their offspring. Assuming that the number of offspringborn each breeding season is proportional to the number of adults and thata number of offspring are killed by the adults is proportional to the square of thenumber of adults, derive an equation relating the population of the insects in oneyear to the population in the previous year.


6.6. (Harder) The problem is to model the population of a species of insectsthat has a single breeding season during the summer. The adults breed duringthe first and second summers they are alive then die before their third summer.Assuming that the number of offspring born each breeding season is proportionalto the number of adults that are alive, derive an equation relating the populationof the insects in any year past the first to the population in the previous twoyears.

6.7. Derive the formula (6.1) by verifying the sum

1 + 2 + · · · + n− 1 + n+ n + n− 1 + · · · + 2 + 1

n+ 1 + n+ 1 + · · · + n+ 1 + n+ 1

and showing that this means that 2(1 + 2 + · · · + n) = n× (n+ 1).

6.8. (Harder) Prove the formula for the geometric sum (6.3) holds by usinginduction on long division on the expression

pn+1 − 1

p− 1.

Fig. 6.2. Gauss 1831: “I protest against the use of infinite magnitude as some-thing completed, which in mathematics is never permissible. Infinity is merelya facon parlerTS

a , the real meaning being a limit which certain ratios approachindefinitely near, while others are permitted to increase without restriction”

TSa Do you mean “facon de parler”?


7Rational Numbers

The chief aim of all investigations of the external world should beto discover the rational order and harmony which has been imposedon it by God and which He revealed to us in the language of math-ematics. (Kepler)

7.1 Introduction

We learn in school that a rational number r is a number of the form r =pq = p/q, where p and q are integers with q = 0. Such numbers are alsorefereed to as fractions or ratios or quotients. We call p the numerator and qthe denominator of the fraction or ratio. We know that p

1 = p, and thus therational numbers include the integers. A basic motivation for the inventionof rational numbers is that with them we can solve equations of the form

qx = p

with p and q = 0 integers. The solution is x = pq . In the Dinner Soup model

we met the equation 15x = 10 of this form with solution x = 1015 = 2

3 .Clearly, we could not solve the equation 15x = 10 if x was restricted tobe a natural number, so you and your roommate should be happy to haveaccess to the rational numbers.

If the natural number m is a factor of the natural number n so thatn = pm with p a natural number, then p = n

m , in which case thus nm is

a natural number. If division of n by m leaves a non-zero remainder r, so

72 7. Rational Numbers

that n = pm+ r with 0 < r < m, then nm = p+ r

m , which is not a naturalnumber.

7.2 How to Construct the Rational Numbers

Suppose now that your roommate has an unusual background and has neverheard about rational numbers, but fortunately is very familiar with integersand is more than willing to learn new things. How could you quickly explainto her/him what rational numbers are and how to compute with them? Inother words, how could you convey how to construct rational numbers fromintegers, and how to add, subtract, multiply and divide rational numbers?One possibility would be to simply say that x = p

q is “that thing” whichsolves the equation qx = p, with p and q = 0 integers. For example, a quickway to convey the meaning of 1

2 would be to say that it is the solutionof the equation 2x = 1, that is 1

2 is the quantity which when multipliedby 2 gives 1. We would then use the notation x = p

q to indicate that thenumerator p is the right hand side and the denominator q is the factor onthe left hand side in the equation qx = p. We could equally well think ofx = p

q as a pair, or more precisely as an ordered pair x = (p, q) with a firstcomponent p and a second component q representing the right hand sideand the left hand side factor of the equation qx = p respectively. Note thatthe notation p

q is nothing but an alternative way of ordering the pair ofintegers p and q with an “upper” p and a “lower” q; the horizontal bar inpq separating p and q is just a counterpart of the comma separating p andq in (p, q).

We could now directly identify some of these pairs (p, q) or “new things”with already known objects. Namely, a pair (p, q) with q = 1 would beidentified with the integer p since in this case the equation is 1x = p withsolution x = p. We could thus write (p, 1) = p corresponding to writingp1 = p, as we are used to do.

Suppose now you would like to teach your roommate how to operate withrational numbers using the rules that are familiar to us who know aboutrational numbers, once you have conveyed the idea that a rational numberis an ordered pair (p, q) with p and q = 0 integers. We could seek inspirationfrom the construction of the rational number (p, q) = p

q as that thing whichsolves the equation qx = p with p and q = 0 integers. For example, supposewe want to figure out how to multiply the rational number x = (p, q) = p

q

with the rational number y = (r, s) = rs . We then start from the defining

equations qx = p and sy = r. Multiplying both sides, using the fact thatxs = sx so that qxsy = qsxy = qs(xy), we find that

qs(xy) = pr,

7.2 How to Construct the Rational Numbers 73

from which we conclude that

xy = (pr, qs) =pr

qs,

since z = xy visibly solves the equation qsz = pr. We thus conclude thefamiliar rule

xy =p

q× r

s=pr

qsor (p, q) × (r, s) = (pr, qs), (7.1)

which says that numerators and denominators are multiplied separately.Similarly to get a clue how to add two rational numbers x = (p, q) = p

q

and y = (r, s) = rs , we again start from the defining equations qx = p and

sy = r. Multiplying both sides of qx = p by s, and both sides of sy = r byq, we find qsx = ps and qsy = qr. From these equations and the fact thatfor integers qs(x+ y) = qsx+ qsy, we find that

qs(x+ y) = ps+ qr,

which suggests that

x+ y =p

q+r

s=ps+ qr

qsor (p, q) + (r, s) = (ps+ qr, qs). (7.2)

This gives the familiar way of adding rational numbers by using a commondenominator.

We further note that for s = 0, qx = p if and only if sqx = sp, (seeProblem 5.5). Since the two equations qx = p and sqx = sp have the samesolution x,

p

q= x =

sp

sqor (p, q) = (sp, sq). (7.3)

This says that a common nonzero factor s in the numerator and the de-nominator may be cancelled out or, vice versa introduced.

With inspiration from the above calculations, we may now define the ra-tional numbers to be the ordered pairs (p, q) with p and q = 0 integers, andwe decide to write (p, q) = p

q . Inspired by (7.3), we define (p, q) = (sp, sq)for s = 0, thus considering (p, q) and (sp, sq) to be (two representatives of)one and the same rational number. For example, 6

4 = 32 .

We next define the operations of multiplication × and addition + ofrational numbers by (7.1) and (7.2). We may further identify the rationalnumber (p, 1) with the integer p, since p solves the equation 1x = p. We canthus view the rational numbers as an extension of the integers, in the sameway that the integers are an extension of the natural numbers. We note thatp+r = (p, 1)+(r, 1) = (p+r, 1) = p+r and pr = (p, 1)×(r, 1) = (pr, 1) = pr,and thus addition and multiplication of the rational numbers that can beidentified with integers is performed as before.


We can also define division (p, q)/(r, s) of the rational number (p, q) bythe rational number (r, s) with r = 0, as the solution x of the equation(r, s)x = (p, q). Since (r, s)(ps, qr) = (rps, sqr) = (p, q),

x = (p, q)/(r, s) =(p, q)(r, s)

= (ps, qr),

which we can also write aspqrs

=ps

qr.

Finally, we may order the rational numbers as follows. We define the ratio-nal number (p, q) (with q = 0) to be positive, writing (p, q) > 0 wheneverp and q have the same sign, and for two rational numbers (p, q) and (r, s)we write (p, q) < (r, s) whenever (r, s)− (p, q) > 0. Note the difference canbe computed as (r, s) − (p, q) = (qr − sp, sq) because −(p, q) is just a con-venient notation for (−p, q). Note also that −(p, q) = (−p, q) = (p,−q),which we recognize as

−pq

=−pq

=p

−q .

The absolute value |r| of a rational number r = (p, q) = pq is defined as

for natural numbers by

|r| =r if r ≥ 0,−r if r < 0. (7.4)

where as above −r = −(p, q) = − pq = −p

q = p−q .

We can now verify all the familiar rules for computing with rationalnumbers by using the rules for integers already established.

Of course we use xn with x rational and n a natural number to denotethe product of n factors x. We also write

x−n =1xn

for natural numbers n and x = 0. Defining x0 = 1 for x rational, we havedefined xn for x rational n integer, with x = 0 if n < 0.

We finally check that we can indeed solve equations of the form qx = p,or (q, 1)x = (p, 1), with q = 0 and p integers. The solution is x = (p, q)since (q, 1)(p, q) = (qp, q) = (p, 1).

So we have constructed the rational numbers from the integers in thesense that we view each rational number p

q as an ordered pair (p, q) ofintegers p and q = 0 and we have specified how to compute with rationalnumbers using the rules for computing with integers.

We note that any quantity computed using addition, subtraction, mul-tiplication, and division of rational numbers (avoiding division by zero)

7.3 On the Need for Rational Numbers 75

always produces another rational number. In the language of mathemati-cians, the set of rational numbers is “closed” under arithmetic operations,since these operations do not lead out of the set. Hopefully, your (receptive)roommate will now be satisfied.

7.3 On the Need for Rational Numbers

The need of using rational numbers is made clear in early school years. Theintegers alone are too crude an instrument and we need fractions to reacha satisfactory precision. One motivation comes from our daily experiencewith measuring quantities of various sorts. When creating a set of standardsfor measuring quantities, such as the English foot-pound system or themetric system, we choose some arbitrary quantities to mark as the unitmeasurement. For example, the meter or the yard for distance, the poundor the kilogram for weight, the minute or second for time. We measureeverything in reference to these units. But rarely does a quantity measureout to be an even number of units and so we are forced to deal with fractionsof the units. The only possible way to avoid this would be to pick extremelysmall units (like the Italian lire), but this is impractical. We even givenames to some particular units of fractions; centimeters are 1/100 of meters,millimeters are 1/1000 of a meter, inches are 1/12 of foot, ounces are 1/16of a pound, and so on.

Consider the problem of adding 76 cm to 5 m. We do this by changingthe meters into centimeters, 5 m = 500 cm, then adding to get 576 cm.But this is the same thing as finding a common denominator for the twodistances in terms of a centimeter, i.e. 1/100 of a meter, and adding theresult.

7.4 Decimal Expansions of Rational Numbers

The most useful way to represent a rational number is in the form of a dec-imal expansion, such as 1/2 = 0.5, 5/2 = 2.5, and 5/4 = 1.25. In general,a finite decimal expansion is a number of the form

±pmpm−1 · · · p2p1p0.q1q2 · · · qn, (7.5)

where the digits pm, pm−1, . . . , p0, q0, . . . , qn are each equal to one of thenatural numbers 0, 1, · · · , 9 while m and n are natural numbers. Thedecimal expansion (7.5) is a shorthand notation for the number

± pm10m + pm−110m−1 + · · · + p1101 + p0100

+ q110−1 + · · · + qn−110−(n−1) + qn10−n.


For example

432.576 = 4 × 102 + 3 × 101 + 2 × 100 + 5 × 10−1 + 7 × 10−2 + 6 × 10−3.

The integer part of the decimal number (7.5) is pmpm−1 · · · p1p0, while thedecimal or fractional part is 0.q1q2 · · · qn. For example, 432.576 = 432 +0.576.

The decimal expansion is computed by continuing the long division algo-rithm “past” the decimal point rather than stopping when the remainderis found. We illustrate in Fig. 7.1.

Fig. 7.1. Using long division to obtain a decimal expansion

A finite decimal expansion is necessarily a rational number because itis a sum of rational numbers. This can also be understood by writingpmpm−1 · · · p1p0.q1q2 · · · qn as the quotient of the integers:

pmpm−1 · · · p1p0.q1q2 · · · qn =pmpm−1 · · · p1p0.q1q2 · · · qn

10n,

like 432.576 = 432576/103.

7.5 Periodic Decimal Expansionsof Rational Numbers

Computing decimal expansions of rational numbers using long divisionleads immediately to an interesting observation: some decimal expansionsdo not “stop”. In other words, some decimal expansions are never-ending,that is contain an infinite number of nonzero decimal digits. For exam-ple, the solution to the equation 15x = 10 in the Dinner Soup model isx = 2/3 = .666 · · · . Further, 10/9 = 1.11111 · · · , as displayed in Fig. 7.2.The word “infinite” is here to indicate that the decimal expansion contin-ues without ever stopping. We can find many examples of infinite decimal

7.5 Periodic Decimal Expansions of Rational Numbers 77

Fig. 7.2. The decimal expansion of 10/9 never stops

expansions:

13

= .3333333333 · · ·

211

= .18181818181818 · · ·

47

= .571428571428571428571428 · · ·

We conclude that the system of rational numbers pq with p and q = 0

integers, and the decimal system, don’t fit completely. To express certainrational numbers decimally is impossible with only a finite number of dec-imals, unless we are prepared to accept some imprecision.

We note that in all the above examples of infinite decimal expansions,the digits in the decimal expansion begin to repeat after some point. Thedigits in 10/9 and 1/3 repeat in each entry, the digits in 2/11 repeat afterevery two entries, and the digits in 4/7 repeat after every six entries. Wesay that these decimal expansions are periodic.

In fact, if we consider the process of long division in computing thedecimal expansion of p/q, then we realize that the decimal expansion of anyrational number must either be finite (if the remainder eventually becomeszero), or periodic (if the remainder is never zero). To see that these are theonly alternatives, we assume that the expansion is not finite. At every stagein the division process the remainder will then be nonzero, and disregardingthe decimal point, the remainder will correspond to a natural number rsatisfying 0 < r < q. In other words, remainders can take at most q − 1different forms. Continuing long division at most q steps must thus leavea remainder, whose digits have come up at least once before. But after thatfirst repetition of remainder, the subsequent remainders will repeat in thesame way and thus the decimal expansion will eventually be periodic.


The periodic pattern of a rational number may take a long time to beginrepeating. We give an example:

1043439

= 2.37585421412300683371298405466970387243735763097

9498861047835990888382687927107061503416856492027334851936218678815489749430523917995444191343963553530751708428246013667425968109339407744874715261958997722095671981776765 375854214123006833712984054669703872437357630979498861047835990888382687927107061503416856492027334851936218678815489749430523917995444191343963553530751708428246013667425968109339407744874715261958997722095671981776765 · · ·

Once a periodic pattern of the decimal expansion of a rational numberhas developed, then we may consider the complete decimal expansion tobe known in the sense that we can give the value of any decimal of theexpansion without having to continue the long division algorithm to thatdecimal. For example, we are sure that the 231th digit of 10/9 = 1.111 · · ·is 1, and the 103th digit of .56565656 · · · is 5.

A rational number with an infinite decimal expansion cannot be exactlyrepresented using a finite decimal expansion. We now seek to consider theerror committed by truncating an infinite periodic expansion to a finiteone. Of course, the error must be equal to the number corresponding tothe decimals left out by truncating to a finite expansion. For example,truncating after 3 decimals, we would have

109

= 1.111 + 0.0001111 · · · ,

with the error equal to 0.0001111 · · · , which certainly must be less than10−3. Similarly, truncating after n decimals, the error would be less than10−n.

However, since this discussion directly involves the infinite decimal ex-pansion left out by truncation, and since we have so far not specified howto operate with infinite decimal expansions, let us approach the problemfrom a somewhat different angle. Denoting the decimal expansion of 10/9truncated after n decimals by 1.1 · · · 1n (that is with n decimals equal to 1after the point), we have

1.11 · · · 11n = 1 + 10−1 + 10−2 + · · · + 10−n+1 + 10−n.

7.5 Periodic Decimal Expansions of Rational Numbers 79

Computing the sum on the right hand side using the formula (6.3) fora geometric sum, we have

1.11 · · ·11n =1 − 10−n−1

1 − 0.1=

109

(1 − 10−n−1), (7.6)

and thus109

= 1.11 · · ·11n +10−n

9. (7.7)

The error committed by truncation is thus 10−n/9, which we can bound by10−n to simplify. The error 10−n/9 will get as small as we please by takingn large enough, and thus we can make 1.11 · · ·11n as close as we like to10/9 by taking n large enough. This leads us to interpreting

109

= 1.11111111 · · ·

as meaning that we can make the numbers 1.111 · · ·1n as close as we liketo 10/9 by taking n large. In particular, we would have

|109

− 1.11 · · ·11n| ≤ 10−n.

Taking sufficiently many decimals in the never ending decimal expansionof 10

9 makes the error smaller than any given positive number.We give another example before considering the general case. Computing

we find that 2/11 = .1818181818 · · · . Taking the first m pairs of the digits18, we get

.1818 · · ·18m =18100

+18

10000+

181000000

+ · · · + 18102m

=18100

(

1 +1

100+

11002

+ · · · + 1100m−1

)

=18100

1 − (100−1)m

1 − 100−1=

18100

10099

(1 − 100−m)

=211

(1 − 100−m).

that is211

= 0.1818 · · ·18m +211

100−m,

so that| 211

− 0.1818 · · ·18m| ≤ 100−m.

We thus interpret 2/11 = .1818181818 · · · as meaning that we can make thenumbers .1818 · · ·18m as close as we like to 2/11 by taking m sufficientlylarge.


We now consider the general case of an infinite periodic decimal expan-sion of the form

p = .q1q2 · · · qnq1q2 · · · qnq1q2 · · · qn · · · ,

where each period consists of the n digits q1 · · · qn. Truncating the decimalexpansion after m periods, we get using (6.3), as

pm =q1q2 · · · qn

10n+q1q2 · · · qn

10n2+ · · · + q1q2 · · · qn

10nm

=q1q2 · · · qn

10n

(

1 +1

10n+

1(10n)2

+ · · · + 1(10n)m−1

)

=q1q2 · · · qn

10n

1 − (10−n)m

1 − 10−n=q1q2 · · · qn10n − 1

(1 − (10−n)m

),

that isq1q2 · · · qn10n − 1

= pm +q1q2 · · · qn10n − 1

10−nm,

so that ∣∣∣∣q1q2 · · · qn10n − 1

− pm

∣∣∣∣ ≤ 10−nm.

We conclude that we may interpret

p =q1q2 · · · qn10n − 1

to mean that the difference between the truncated decimal expansion pm ofp and q1q2 · · · qn/(10n − 1) can be made smaller than any positive numberby taking the number of periods m large enough, that is by taking moredigits of p into account. Thus, we may view p to be equal to a rationalnumber, namely p = q1q2 · · · qn/(10n − 1).

Example 7.1. 0.123123123 · · · is the same as the rational number 12399 , and

4.121212 · · · is the same as 4 + 129 = 4×9+12

9 = 489 .

We conclude that each infinite periodic decimal expansion may be con-sidered to be equal to a rational number, and vice versa. We may thus sum-marize the discussion in this section as the following fundamental theorem.

Theorem 7.1 The decimal expansion of a rational number is periodic.A periodic decimal expansion is equal to a rational number.

7.6 Set Notation

We have already encountered several examples of sets, for example theset 1, 2, 3, 4, 5 of the first 5 natural numbers, and the (infinite) set N =

7.7 The Set Q of All Rational Numbers 81

1, 2, 3, 4, · · · of all (possible) natural numbers. A set is defined by itselements. For example, the set A = 1, 2, 3, 4, 5 consists of the elements1, 2, 3, 4 and 5. To denote that an object is an element of a set we use thesymbol ∈, for example 4 ∈ A. We further have that 7 ∈ N but 7 /∈ A.To define a set we have to somehow specify its elements. In the two givenexamples we could accomplish this by simply listing its elements within theembracing set indicators and . As we encounter more complicated sets wehave to somewhat develop our notation. One convenient way is to specifythe elements of a set through some relevant property. For example A =n ∈ N : n ≤ 5, to be interpreted as “the set of natural numbers n suchthat n ≤ 5”. For another example, the set of odd natural numbers could bespecified as n ∈ N : n odd or n ∈ N : n = 2j − 1 for some j ∈ N. Thecolon : is here interpreted as “such as”.

Given sets A and B, we may construct several new sets. In particular,we denote by A∪B the union of A and B consisting of all elements whichbelong to at least one of the sets A and B, and by A∩B the intersection ofA and B consisting of all elements which belong to both A and B. FurtherA\B denotes the set of elements in A which do not belong to B, which maybe interpreted as “subtracting” B (or rather B ∪A) from A, see Fig. 7.3.

Fig. 7.3. The sets A ∪B, A ∩ B and A\B

We further denote by A×B the product set of A and B which is the setof all possible ordered pairs (a, b) where a ∈ A and b ∈ B.

Example 7.2. If A = 1, 2, 3 and B = 3, 4, then A ∪ B = 1, 2, 3, 4,A ∩ B = 3, A\B = 1, 2 and A × B = (1, 3), (1, 4), (2, 3), (2, 4), (3, 3),(3, 4).

7.7 The Set Q of All Rational Numbers

It is common to use Q to denote the set of all possible rational numbers,that is, the set of numbers x of the form x = p/q = (p, q), where p and q = 0


are integers. We often omit the “possible” and just say that Q denotes theset of rational numbers, which we can write as

Q =

x =p

q: p, q ∈ Z, q = 0

.

We can also describe Q as the set of finite or periodic decimal expansions.

7.8 The Rational Number Line and Intervals

Recall that we represent the integers using the integer number line, whichconsists of a line on which we mark regularly spaced points. We can also usea line to represent the rational numbers. We begin with the integer numberline and then add the rational numbers that have one decimal place:

− · · · ,−1,−.9,−.8, · · · ,−.1, 0, .1, .2, · · · , .9, 1, · · · .

Then we add the rational numbers that have two decimal places:

− · · · ,−.99,−.98, · · · ,−.01, 0, .01, .02, · · · , 98, .99, 1, · · · .

Then onto the rational numbers with 3, 4, · · · decimal places. We illustratein Fig. 7.4.

Fig. 7.4. Filling in the rational number line between −4 and 4 starting withintegers, rationals with one digit, and rationals with two digits, and so on

We see that there are quickly so many points to plot that the numberline looks completely solid. A solid line would mean that every number isrational, something we discuss later. But in any case, a drawing of a numberline appears solid. We call this the rational number line.

For given rational numbers a and b with a ≤ b we say that the rationalnumbers x such that a ≤ x ≤ b is a closed interval and we denote theinterval by [a, b]. We also write

[a, b] = x ∈ Q : a ≤ x ≤ b

7.9 Growth of Bacteria 83

The points a and b are called the endpoints of the interval. Similarly wedefine open (a, b) and half-open intervals [a, b) and (a, b] by

(a, b) = x ∈ Q : a < x < b,

[a, b) = x ∈ Q : a ≤ x < b, and (a, b] = x ∈ Q : a < x ≤ b.In an analogous way, we write all the rational numbers larger than a num-ber a as

(a,∞) = x ∈ Q : a < x and [a,∞) = x ∈ Q : a ≤ x.

We write the set of numbers less than a in a similar way. We also representintervals graphically by marking the points on the rational line segment,as we show in Fig. 7.5. Note how we use an open circle or a closed circle tomark the endpoints of open and closed intervals.

.5 2

.5 < x < 2

.3 .4

.3 x .4

-8 4

-8 < x 4

.45 3

.45 x 3

Fig. 7.5. Various rational line intervals

7.9 Growth of Bacteria

We now present a model from biology related to population dynamics re-quiring the use of rational numbers.

Certain bacteria cannot produce some of the amino acids they need forthe production of protein and cell reproduction. When such bacteria arecultured in growth media containing sufficient amino acids, then the popu-lation doubles in size at a regular time interval, say on the order of an hour.If P0 is the initial population at the current time and Pn is the populationafter n hours, then we have

Pn = 2Pn−1 (7.8)

for n ≥ 1. This model is similar to the model (6.5) we used to describethe insect population in Model 6.2. If the bacteria can keep growing in this


way, then we know from that model that Pn = 2nP0. However if there isa limited amount of amino acid, then the bacteria begin to compete forthe resource. As a result, the population will no longer double every hour.The question is what happens to the bacteria population as time increases?Does it keep increasing, does it decrease to zero (die out), or does it tendto some constant value for example?

To model this, we allow the proportionality factor 2 in (7.8) to vary withthe population in such a way that it decreases as the population increases.For example, we assume there is a constant K > 0 such that the populationat hour n satisfies

Pn =2

1 + Pn−1/KPn−1. (7.9)

With this choice, the proportionality factor 2/(1 + Pn−1/K) is always lessthan 2 and clearly decreases as Pn−1 increases. We emphasize that thereare many other functions that have this behavior. The right choice is theone that gives results that match experimental data from the laboratory. Itturns out that the choice we have made does fit experimental data well and(7.9) has been used as a model not only for bacteria but also for certainhuman populations as well as for fisheries.

We now seek a formula expressing how Pn depends on n. We defineQn = 1/Pn, then (7.9) implies (check this!) that

Qn =Qn−1

2+

12K

.

Now we use induction as we did for the insect model:

Qn =12Qn−1 +

12K

=122Qn−2 +

12K

+1

4K

=123Qn−3 +

12K

+1

4K1

8K...

=12nQ0 +

12K

(

1 +12

+ · · · + 12n−1

)

With each hour that passes, we add another term onto the sum giving Qn

while we want to figure out what happens to Qn as n increases. Using theformula for the sum of the geometric series (6.3), which turns out to holdfor the sum of rational numbers as well as for integers, we find

Pn =1Qn

=1

12nQ0 + 1

K

(1 − 1

2n

) . (7.10)

7.10 Chemical Equilibrium 85

7.10 Chemical Equilibrium

The solubility of ionic precipitates is an important issue in analytical chem-istry. For the equilibrium

A x B y xA y+ + yB x− (7.11)

for a saturated solution of slightly soluble strong electrolytes, the solubilityproduct constant is given by

Ksp = [ A y+]x[ B x−]y. (7.12)

The solubility product constant is useful for predicting whether or nota precipitate can form in a given set of conditions and the solubility of anelectrolyte for example.

We will use it to determine the solubility of Ba(IO 3 ) 2 in a .020 mole/litersolution of KIO 3:

Ba(IO 3 ) 2 Ba 2+ + 2 IO−3

given that the Ksp for Ba(IO 3 ) 2 is 1.57 × 10−9. We let S denote thesolubility of Ba(IO 3 ) 2. By a mass law, we know that S = [ Ba 2+] whileiodate ions come from both the KIO 3 and the Ba(IO 3 ) 2. The total iodateconcentration is the sum of these contributions,

[ IO−3 ] = (.02 + 2S).

Substituting these into (7.12), we get the equation

S (.02 + 2S)2 = 1.57 × 10−9. (7.13)

Chapter 7 Problems

7.1. Explain to your roommate what rational numbers are and how to manipu-late them. Change roles in this game.

7.2. Prove the commutative, associative and distributive law for rational num-bers.

7.3. Verify the commutative and distributive rules for addition and multiplica-tion of rational numbers from the given definitions of addition and multiplication.

7.4. Using the usual definitions for multiplication and additions of rationalnumbers show that if r, s and t are rational numbers, then r(s+ t) = rs+ rt.


7.5. Determine the set of x satisfying the following inequalities:

(a) |3x − 4| ≤ 1 (b) |2 − 5x| < 6

(c) |14x − 6| > 7 (d) |2 − 8x| ≥ 3

7.6. Verify that for rational numbers r, s, and t

|s − t| ≤ |s| + |t|, (7.14)

|s − t| ≤ |s− u| + |t− u|, (7.15)

and|st| = |s| |t|. (7.16)

7.7. A person running on a large ship runs 8.8 feet/second while heading towardthe bow while the ship is moving at 16 miles/hour. What is the speed of therunner relative to a stationary observer? Interpret the computation giving thesolution as finding a common denominator.

7.8. Compute decimal expansions for (a) 3/7, (b) 2/13, and (c) 5/17.

7.9. Compute decimal expansions for (a) 432/125 and (b) 47.8/80.

7.10. Find rational numbers corresponding to the decimal expansions

(a) 42424242 · · · , (b) .881188118811 · · · , and (c) .4290542905 · · · .

7.11. Represent the following sets as parts of the rational number line:

(a) x ∈ Q : −3 < x(b) x ∈ Q : −1 < x ≤ 2 and 0 < x < 4(c) x ∈ Q : −1 ≤ x ≤ 3 or − 2 < x < 2(d) x ∈ Q : x ≤ 1 or x > 2.

7.12. Find an equation for the number of milligrams of Ba(IO 3)2 that can bedissolved in 150 ml of water at 25 C with Ksp = 1.57× 10−9 moles2/liter3. Thereaction is

Ba(IO 3 ) 2 Ba 2+ + 2 IO −3

7.13. You invest some money in a bond that yields 9% interest each year.Assuming that you invest any money you make from interest in more bonds foran initial investment of $C0, write down a model giving the amount of moneyyou have after n years. View the growth of your capital with n using MATLAB

for example.

8Pythagoras and Euclid

(1) At its deepest level, reality is mathematical in nature.(2) Philosophy can be used for spiritual purification.(3) The soul can rise to union with the divine.(4) Certain symbols have a mystical significance.(5) All brothers of the order should observe strict loyalty and secrecy.(Beliefs of Pythagoras)

8.1 Introduction

In this chapter, we discuss a couple of basic useful facts from geometry.In doing so, we make connections to the origins of mathematics in ancientGreece 2500 years ago and in particular to two heros; Pythagoras andEuclid. In their work, we find roots of most of the topics we will meetbelow.

8.2 Pythagoras Theorem

You have probably heard about Pythagoras theorem for a triangle witha right angle since it is a very fundamental and important result of ge-ometry. Recall, we used Pythagoras theorem in the Muddy Yard model forexample. Pythagoras theorem states that if the sides next to the right anglehave lengths a and b and the side opposite the right angle (the hypotenuse)has length c, then (see Fig. 8.1).

88 8. Pythagoras and Euclid

a

bc

c2 = a2 + b2

Fig. 8.1. Pythagoras’ theorem

Suppose that our roommate has never heard about this result, as unlikelyas it may be. How can we convince her/him that Pythagoras theoremis true? That is, how can we prove it? Well, we could argue as follows:Construct a rectangle surrounding the triangle by drawing a line parallelto the hypotenuse through the right angle corner and lines at right angles tothe hypotenuse through the other two corners, see Fig. 8.2. We have nowthree triangles, two new adjoined to the original one. All these triangleshave the same form, that is they have the same angles: one right angle of90, one angle of size α and one of size β, see Fig. 8.2. This is becauseα + β = 90, since the sum of the angles of a triangle is always 180,and a right angle is 90. Since the three triangles have the same angles,they have the same form, or in other words, the triangles are similar. Thismeans that the ratio of corresponding sides is equal for all three triangles.Using this fact twice, we see that x/a = a/c and y/b = b/c. We concludethat c = x + y = a2/c + b2/c or c2 = a2 + b2, which is the statement ofPythagoras theorem.

a

bc

x

y

α

α

α

β

β

β

Fig. 8.2. Proof of Pythagoras theorem

8.3 The Sum of the Angles of a Triangle is 180 89

That should convince our roommate granted that she/he is reasonablyfamiliar with (i) the fact that the sum of the angles of a triangle is 180,and (ii) the concept of similar triangles. If not, we may have to help ourroommate through the following two sections.

8.3 The Sum of the Angles of a Triangle is 180

Consider a triangle with corners A, B and C and angles α, β and γ ac-cording to Fig. 8.3. We recall that the angle between two lines (or linesegments) meeting at a point, measures how much one of the lines is ro-tated with respect to the other line. The most natural unit for this is “turn”or “revolution”. The arm of a clock makes a full turn from 12 to 12 anda half turn from 12 to 6. When we turn the first page of our newspaperover and all the way around, we have rotated it one full turn. If we haveplenty of space, like when sitting at the breakfast table alone, we just turnthe page over which makes an angle of half a turn. We commonly use a sys-tem of degrees to measure angles, where one full turn corresponds to 360degrees. Of course half a turn is the same as 180 degrees and quarter ofa turn is 90 degrees, which is a right angle.

A B

C

α

α

β

β

γ

Fig. 8.3. The angles of a triangle sum up to 180

Returning to the triangle in Fig. 8.3, we would like to understand whyα + β + γ = 180. To do this, draw a straight line parallel to the baseAB through the corner C opposite to the base. The angles formed at Care α, γ and β, and their sum thus is 180, which we wanted to show.We here use the fact that a line crossing two parallel lines crosses the twoparallel lines at equal angles, see Fig. 8.4. This statement is Euclid’s famousfifth axiom for geometry, which is called the parallel axiom. An axiom issomething we take for granted without asking for any motivation. We mayaccept the validity of an axiom on intuitive grounds, or just accept it asa rule of a game. Euclidean geometry is based on five axioms, the first ofwhich states that through two different points there is a unique straightline, see Fig. 8.5. In the axioms, undefined concepts like point and straightline appear. When we think of these concepts, we use our intuition from


α

α

Fig. 8.4. Two parallel lines intersected by a third one

A

B

Fig. 8.5. Euclids first axiom: through two different points there is a uniquestraight line

our experience of the real world. The second axiom states that a piece ofa straight line can be extended to a straight line. The third axiom statesthat one can construct a circle with a given center and given radius andthe fourth axiom states that all right angles are equal.

Euclidean geometry concerns plane geometry, which may be thought ofas the geometry on a flat surface. We may think of a soccer field as beingflat, but we know that very large pieces of surface of the Earth cannotbe considered to be fully flat. Geometry on curved surfaces is called non-Euclidean geometry. Non-Euclidean geometry has some surprising featuresand in particular there is no parallel axiom. Consequently, the sum of theangles of a triangle on the curved surface of the Earth may be different from180. See Fig. 8.6, where the sum of the angles of the indicated trianglewith three right angles (can you see it?) is 270, and not at all 180.

Fig. 8.6. A “triangle” on the Earth with one corner at the North Pole and twocorners at the Equator with longitude difference of 90

8.4 Similar Triangles 91

8.4 Similar Triangles

Two triangles with the same angles are said to be similar. Euclid says thatthe ratio of corresponding sides of two similar triangles are the same. Ifa triangle has sides of lengths a, b and c, then a similar triangle will haveside lengths a = ka, b = kb and c = kc, where k > 0 is a common scalefactor, in other words the ratio of the corresponding side lengths is thesame: ka

kb = ab , ka

kc = ac and kb

kc = bc . Another way of viewing this fact is

to say that the angles of a triangle do not change if we change the sizeof the triangle by changing the side lengths with a common factor. Thisis not true in non-Euclidean geometry. If we increase the size of a (large)triangle on the surface of the Earth, then the angles will increase. We poseas a challenge to the reader the problem of proving from Euclid’s axiomsthat similar triangles indeed have proportional sides, see Problem 8.4.

a

bc

a

bc

a/b = a/b

Fig. 8.7. The sides of similar triangles are proportional

8.5 When Are Two Straight Lines Orthogonal?

We continue with an application of Pythagoras theorem that is of fun-damental importance in both calculus and linear algebra, and which hasserved as the basic theoretical tool of a carpenter through the centuries.We ask the question: how can we determine if two intersecting straightlines of the Euclidean plane intersect under a right angle, that is if the twostraight lines are orthogonal or not, see Fig. 8.8. This question typicallycomes up when constructing the foundation of a rectangular building. As-sume that the two lines intersect at the point O and assume one of the linespasses through a point A and the other through a point B in the plane,see Fig. 8.8. We now consider the triangle AOB and ask if the angle AOBis equal to 90, that is if the side OA is orthogonal to the side OB, seeFig. 8.9 We could try to check this directly with a portable right angle,but that would always leave some room for incorrect decision if the dimen-


a

b

A

B

O

90?

Fig. 8.8. Are these two (pieces of) lines orthogonal?

aa

b b cc

A

B

O

90?Q D

Ec2 = a2 + b2

Fig. 8.9. Test of right-angledness

sion of the portable right angle is much smaller than the triangle AOBitself. This would normally be the case when seeking to determine whereto put the corners of a foundation of a building. Another possibility is touse Pythagoras theorem. We would expect that if

a2 + b2 = c2 (8.1)

where a is the length of the side OA, b is the length of side OB and cthat of the side AB, see Fig. 8.9, then the angle BOA would be 90. Weshall prove that this is so shortly, but let’s first see how a carpenter woulduse this result in practice. He would then cut three pieces of a string, oflength 3, 4 and 5 units. The beauty of these numbers is that they fit intothe equation (8.1):

32 + 42 = 52 (8.2)

If our conjecture is correct, then a triangle with sides 3, 4 and 5 will havea right angle. Assuming now that a = 3, b = 4 we could check if thetriangle AOB has a right angle by putting the string of length 5 along ABand check if c = 5. If so, then the angle AOB would be 90. Using threepieces of strings of length 3, 4 and 5 units, the carpenter can thus constructa right angle. The choice of units is important in practical applications ofthis idea. Very short strings would be cheap but the precision would suffer,and very long strings would be impractical to handle.

Let’s now go back to the conjecture that if a2 + b2 = c2 then the triangleAOB would have a right angle. To convince ourselves we could followingEuclid argue as follows: Construct a right-angled triangle DQE with the

8.5 When Are Two Straight Lines Orthogonal? 93

right angle at Q and with the length ofDQ equal to a and the length of EQequal to b. Then by Pythagoras theorem the length of DE squared wouldbe equal to a2 + b2. But we assume that c2 = a2 + b2 and thus the lengthof DE is c. This means that the triangles DQE and AOB would have thesame side lengths and thus would be similar. Hence, the angle AOB wouldbe equal to the angle DQE, which is equal to 90, and we have proved thatAOB is right angled.

We can use the equality a2 + b2 = c2 as a test of orthogonality of thesides OA and OB using the trick of the carpenter, but this could still leavesome uncertainty, for instance if we had to use strings of dimension verymuch smaller than the foundation.

Suppose now that we know the coordinates (a1, a2) of the corner A andthe coordinates (b1, b2) of corner B. This could be the case if the triangleis in fact defined by specifying these coordinates, which could happen if weuse a map to identify the triangle. Then we would have

a2 = a21 + a2

2, b2 = b21 + b22, c2 = (b1 − a1)2 + (b2 − a2)2

where the last equality follows from Fig. 8.10. We conclude that

c2 = b21 + a21 − 2b1a1 + b22 + a2

2 − 2b2a2 = a2 + b2 − 2(a1b1 + a2b2)

We see that a2 + b2 = c2 if and only if

a1b1 + a2b2 = 0. (8.3)

Thus, our test of orthogonality is reduced to checking the algebraic relationa1b1 + a2b2 = 0. If we know the coordinates (a1, a2) and (b1, b2), this con-dition can be checked by multiplying numbers and is therefore not subjectto taste or individual decision (up to round-off)

The magic formula (8.3) for checking orthogonality will play an impor-tant role below. It translates a geometric fact (orthogonality) into an arith-metic equality.

O a1

a2

b1

b2

c

A

B

Fig. 8.10.


8.6 The GPS Navigator

GPS (Global Positioning System) is a wonderful new invention. It was de-veloped by the U.S. Military in the 1980s, but now has important civiluse. You can buy at GPS receiver for less than 200 dollars and carry itin your pocket on your hiking tour in the mountains, in your sailing boatout on the ocean, or in your car driving through the desert or the cityof Los Angeles. At a press of a button on the receiver, it gives you yourpresent coordinates: latitude and longitude as an ordered pair of num-bers (57.25, 12.60), where the first number (57.25) is the latitude, andthe second number is the longitude (12.60). The precision may be about10 meters. More advanced use of GPS may give you a precision as good as1 millimeter.

Knowing your coordinates, you can identify your present position ona map and decide in what direction to proceed to come to your goal.GPS thus solves the main problem of navigation: to figure out your co-ordinates on a map. Before the GPS this could be very difficult and theresults could be very unreliable. Determining the latitude was relativelyeasy if you could see the sun and measure its height over the horizonat noon using a sextant. Determining the longitude was much more dif-ficult and would require a good clock. The main motivation for all thework that went into developing accurate clocks or chronometers in the18th century, was to determine your longitude at sea. The results be-fore the chronometer could be grossly inaccurate: you could believe, likeColumbus, that you had come to India, while in fact you were close toAmerica!

GPS is based on a simple (and clever) mathematical principle. We presentthe basic idea supposing that our world is a Euclidean plane equipped witha coordinate system. Our problem is to find our coordinates. Suppose wecan measure our distance to two points A and B with known positions,which we denote by dA and dB. With the available information, we can saythat we must be located at one of the points of intersection of the circlewith center A and radius dA and the circle with center B and radius dB ,according to Fig. 8.11. Note that Euclid’s third axiom tells us that the twocircles do exist.

To find the coordinates (x1, x2) of the intersection points, we have tosolve the following system of two equations in the two unknowns x1 andx2:

(x1 − a1)2 + (x2 − a2)2 = d2A (x1 − b1)2 + (x2 − b2)2 = d2

B .

If we can solve this system of equations, we can determine where we are,granted we have some extra information which tells us which of the twopossible solutions applies. In three dimensions, we would have to determineour distance to three points in space with known locations and solve a cor-

8.6 The GPS Navigator 95

AB

?

dAdB

Fig. 8.11. Positioning using GPS

responding system of three equations with three unknowns, correspondingto the intersection of three spheres.

GPS uses a system of 24 satellites deployed in groups of 4 in each of 6orbital planes. Each satellite has a circular orbit of radius about 26.000km and the orbital period is 12 hours. Each satellite has an accurateclock and there is a surveying system that keeps track of the positionsof the satellites at each time instant. A GPS receiver receives a signalfrom each visible satellite which encodes the position and clock time ofthe satellite. The receiver measures the time delay, by comparing the re-ceived time with its own time, and then computes the distance to thesatellite knowing the speed of light. Knowing the distance to three satel-lites and their positions, the GPS then computes its position as one ofthe two intersection points of three spheres. One of these points will beway out in space and can be eliminated if we are sure that we are onEarth. The result is our location in space specified with latitude, longi-tude and height above the sea. In practice, a 4th satellite is needed tocalibrate the clocks of the satellites and the receiver, which is necessaryto compute the distances correctly. If more than 4 satellites are visible,the GPS computes a least squares approximate solution to the resultingover-determined system of equations, and the precision increases with thenumber of satellites.

GPS gives a coupling of geometry to arithmetics or algebra. Each physicalpoint is space is connected to a triple of numbers representing latitude,longitude and height.


8.7 Geometric Definition of sin(v) and cos(v)

Consider a right-angled triangle with an angle v, and sides of length a, band c, as in Fig. 8.12. We recall the definitions

cos(v) =a

csin(v) =

b

c.

Note that the values of cos(v) and sin(v) do not depend of the particulartriangle we have used, only on v. This is because another right-angled tri-angle with the same angle v and sides a, b and c as to the right in Fig. 8.12would be similar to the one considered first, and ratios of correspondingpairs of sides in similar triangles are the same, as we have concluded be-fore. In other words, a/c = a/c and b/c = b/c. For example, we may usea triangle with c = 1 in which case cos(v) = a and sin(v) = b. If we imbedsuch a triangle in the unit circle x2

1 +x22 = 1 as illustrated in Fig. 8.13, then

a = x1 and b = x2. That is, cos(v) = x1 and sin(v) = x2. This imbeddingalso makes it possible to extend the definition of cos(v) and sin(v), and inparticular, to angles v with 90 ≤ v < 180.

8.8 Geometric Proof of Addition Formulas

a

bc

a

bc

vv

Fig. 8.12. cos(v) = a/c and sin(v) = b/c

vx1

x1

x2

x2

(x1, x2)

x21 + x2

2 = 1

1

Fig. 8.13. cos(v) = x1 and sin(v) = x2

8.8 Geometric Proof of Addition Formulas for cos(v) 97

for cos(v)

As an application of the definition of sin(v) and cos(v), we give a geometricproof of the following infamous formula,

cos(β − α) = cos(β) cos(α) + sin(β) sin(α) (8.4)

The proof is given in the following figure and is based on using the defini-tions of cos(v) and sin(v) with v = α, v = β and v = β − α, respectively.Consider the two right-angled triangles to the left in Fig. 8.14 with a com-mon side of length 1, and with the lengths of the other sides expressed interms of sines and cosines of the present angles. To the right in Fig. 8.14,we have formed two other right-angled triangles in the same figure, bothwith an angle β. From the definition of cos(β) and sin(β), respectively, wefind that the base in the larger left triangle is cos(β) cos(α) and the basein the smaller upper right triangle is sin(β) sin(α). We thus conclude thatcos(β − α) = cos(β) cos(α) + sin(β) sin(α).

1

α

β

ββ − α

cos(α)cos(α)

sin(α)sin(α)

cos(β−α) cos(β) cos(α)

sin(β) sin(α)

Fig. 8.14. Why cos(β − α) = cos(β) cos(α) + sin(β) sin(α)

From Fig. 8.14, we can similarly conclude that

sin(β − α) = sin(β) cos(α) − cos(β) sin(α). (8.5)

Below,, we shall redefine cos(v) and sin(v) as the solutions of certain funda-mental differential equations. In particular, this will define cos(v) and sin(v)for arbitrary “angles” v, including vs greater than 180 and less than 0, thelatter corresponding to angles obtained by moving the point (x1, x2) clock-wise from (1, 0) on the unit circle in Fig. 8.13, with sin(v) = − sin(−v)and cos(v) = cos(−v) in accordance with the formulas cos(v) = x1 andsin(v) = x2 above.


8.9 Remembering Some Area Formulas

We remind the reader about the following well-known formulas for the areasof rectangles, triangles, and circles, shown in Fig. 8.15. Below we will comeback to these basic formulas and motivate them more closely.

Fig. 8.15. Some common formulas for areas A

8.10 Greek Mathematics

Greek mathematics is dominated by the schools of Pythagoras and Euclid.Pythagoras’ school is based on numbers, that is arithmetic, and Euclid’son geometry. 2000 years later in the 17th century, Descartes fused the twoapproaches together by creating analytic geometry.

Mathematics was traditionally used in Babylonia and Greece for practicalcalculations in astronomy and navigation et cetera, while the Pythagoreans“transformed mathematics into an education for aristocrats (free men) frombeing a skill of slaves”. As a result experimental science and mechanicsbecame poorly developed in the classical Greek period. Instead the principleof logical deduction was dominating.

Pythagoras was born 585 B.C. on the island Samos off the coast of AsiaMinor and founded his own school on Croton, a Greek settlement in south-ern Italy. The Pythagorean school was a brotherhood that kept its deep-est knowledge secret. The Pythagoreans associated with aristocrats andPythagoras himself was murdered for political reasons about 497 B.C., af-ter which his followers spread over Greece.

The Platonic school, called the Academy, was founded in Athens 387B.C. by Plato (427–347 B.C), a great idealistic philosopher. Plato wasa follower of the Pythagoreans and said that “arithmetic has a very greatand elevating effect, compelling the soul to reason about abstract number,and rebelling against the introduction of visible or tangible objects into theargument. . .” Plato liked mathematics because it gave evidence of existence

8.11 The Euclidean Plane Q2 99

of that wonderful world of perfect objects like numbers, triangles, points, etcetera, with which Plato was so enthralled. Plate considered the real worldto be imperfect and corrupt.

The discovery (we’ll come back to this below) that the length of thediagonal of a square of side one, represented by the number

√2, could not

be expressed as the quotient of two natural numbers, that is that√

2 isnot a rational number, was a strong blow to Pythagorean’s belief that theuniverse could be understood through relations between natural numbers.Ultimately, this led to the development of the Euclidean school based ongeometry instead of arithmetic, where the irrational nature of

√2 could be

handled, or more precisely avoided. For Euclid, the diagonal of a square wasjust a geometric entity, which simply had a certain length. One did not haveto express it using numbers. The length was what it was, namely the lengthof the diagonal of a square of side one. Similarly, Euclid “geometrized”arithmetic. For example, the product ab of two numbers a and b can beviewed as the area of the rectangle with sides a and b.

The prime example of a system based on logical deduction is the Ele-ments, which is the monumental treatise on geometry in thirteen booksby Euclid. The development proceeds from axioms and definitions throughtheorems derived by using the rules of logic.

A particular language has developed for expressing logical dependencies.For example, A ⇒ B means “if A then B”, that is, B follows from A, sothat if A is valid then B is also valid, also expressed as “A implies B”.Similarly A⇐ B means that A follows from B. If A implies B and also Bimplies A, we say that A and B are equivalent, expressed as A⇔ B.

Example 8.1. x > 0 ⇒ x+ 1 > 0.

Book I begins with definitions and axioms, and continues with theoremson congruence, parallel lines, and proves the Pythagorean theorem. BookI-IV all treat rectilinear figures made up by straight line segments. Book Vconcerns proportions (!) and is considered as maybe the greatest achieve-ments of Euclidean geometry. Book VI treats similar figures (!!). BooksVII-IX concern natural numbers, and Book X goes into irrational numberslike

√2. Books XI, XII and XIII concern geometry in three dimensions and

the method of exhaustion connected to computing the area of curvilinearfigures or volumes.

8.11 The Euclidean Plane Q2

If we accept the parallel axiom, which in particular leads to the conclusionthat the sum of the angles of a triangle is 180 as we saw above, this meansthat effectively we are assuming that we are working on a flat surface ora Euclidean plane.


We thus assume the reader has an intuitive idea of a Euclidean plane,to be thought of as a very large flat surface extending in all directionswithout any borders. For instance, like an endless flat parking place withouta single car. In this Euclidean plane, which we imagine consists of points,we imagine a coordinate system consisting of two straight lines in the planemeeting at a right angle at a point which we call the origin. We imagineeach coordinate axis to be a copy of the line of rational numbers Q, seeFig. 8.16.

x1

x2

O

(x1, x2)

Fig. 8.16. Coordinate system of the Euclidean plane

We identify one coordinate axis as axis 1 and the other as coordinateaxis 2. We can then associate to each point in the plane two coordinates x1

and x2, where x1 represents the intersection with axis 1 of the line throughthe point parallel to axis 2 and vice versa. We write the coordinates as(x1, x2), which we think of as an ordered pair of rational numbers witha first component x1 ∈ Q and a second component x2 ∈ Q. We thus seekto represent the Euclidean plane as the set Q

2 = (x1, x2) : x1, x2 ∈ Q,that is, the set of ordered pairs (x1, x2) with x1 and x2 rational numbers.To each point in the Euclidean plane corresponds its coordinates, which isan ordered pair of rational numbers, and to each ordered pair of rationalnumbers corresponds the point in the plane with those coordinates.

8.12 From Pythagoras to Euclid to Descartes

The idea is thus to associate to each geometrical point of the Euclideanplane, its coordinates as an ordered pair of rational numbers. This cou-ples geometrical points to arithmetic, or algebra based on numbers, and is

8.13 Non-Euclidean Geometry 101

a most basic and fruitful connection to make. This was the idea of Descarteswho followed the idea of Pythagoras to base geometry on numbers. Euclidturned this around and based numbers on geometry.

We will come back to the coupling of geometry and algebra many timesbelow. It is probably the most basic idea of the whole of mathematics.The power comes from associating numbers to points in space, and thenworking with numbers instead of points. This is the “arithmetrization” ofgeometry of Descartes in the 17th century, which preceded Calculus andchanged the history of mankind.

However, following the idea of Pythagoras and Descartes to base geom-etry on numbers, we will run into a quite serious difficulty which will forceus to extend the rational numbers to real numbers including also so calledirrational numbers. This exciting story will be unraveled below.

8.13 Non-Euclidean Geometry

But, not all geometry that is useful is based on Euclid’s axioms. In this cen-tury, for example, physicists have used non-Euclidean geometry to explainhow the universe behaves. One of the first people to consider non-Euclideangeometry was the great mathematician Gauss.

Gauss’ interest in non-Euclidean geometry gives a good picture of howhis mind worked. By the time Gauss was sixteen, he had begun to se-riously question Euclidean geometry. At the time that Gauss lived, Eu-clidean geometry had obtained an almost holy status and was held bymany mathematicians and philosophers to be one of the higher truths thatcould never be questioned. Yet, Gauss was bothered by the fact that Eu-clidean geometry rested on postulates that apparently could not be proved,such as two parallel lines cannot meet. He went on to develop a theory ofnon-Euclidean geometry in which parallel lines can meet and this theoryseemed to be as good as Euclidean geometry for describing the world.Gauss did not publish his theory, fearing too much controversy, but hedecided that it should be tested. In Euclidean geometry, the sum of theangles in a triangle add up to 180 while in the non-Euclidean geome-try this is not true. So centuries before the age of modern physics, Gaussconducted an experiment to see if the universe is “curved” by measuringthe angles in the triangle made up by three mountain peaks. Unfortu-nately, the accuracy of his instruments was not good enough to settle thequestion.


Chapter 8 Problems

8.1. Which point has the coordinates (latitude-longitude) (57.25, 12.60)? Deter-mine your own coordinates.

8.2. Derive (8.5) from Fig. 8.14.

8.3. Give another proof of Pythagoras’ Theorem.

8.4. (a) Prove from Euclid’s axioms that the sides of two triangles with thesame angles are proportional (Hint: use the 5th axiom a lot). (b) Prove that thethree lines bisecting each angle of a triangle, intersect at a common point insidethe triangle. (Hint: Decompose the given triangle into three triangles joiningthe corners with the point intersection of two of the bisectors). (c) Prove thatthe three lines joining each corner of a given triangle with the mid-point of theopposite side, intersect at a common point inside the triangle.

Fig. 8.17. Two classical tools

9What is a Function?

He who loves practice without theory is like the sailor who boardsship without rudder and compass and never knows where he maycast. (Leonardo da Vinci)

All Bibles or sacred codes have been the causes of the followingErrors:1. That Man has two real existing principles, Viz: a Body & a Soul.2. That Energy, call’d Evil, is alone from the Body; & that Reason,call’d Good, is alone from the Soul.3. That God will torment Man in Eternity for following his Energies.But the following Contraries to these are True:1. Man has no Body distinct from his Soul; for that call’d Body isa portion of Soul discern’d by the five Senses, the chief inlets of Soulin this age.2. Energy is the only life and is from the Body: and Reason is thebound or outward circumference of Energy.3. Energy is Eternal Delight. (William Blake 1757–1827)

9.1 Introduction

The concept of a function is fundamental in mathematics. We already metthis concept in the context of the Dinner Soup model, where the total costwas 15x (dollars) if the amount of beef was x (pounds). For every amountof beef x, there is a corresponding total cost 15x. We say that the totalcost 15x is a function of, or depends on, the amount of beef x.

104 9. What is a Function?

The term function and the mathematical notation we use today wasintroduced by Leibniz (1646–1716), who said that f(x), which reads “f ofx”, is a function of x if for each value of x in some prescribed set of valuesover which x can vary, there is assigned a unique value f(x). In the DinnerSoup model f(x) = 15x. It is helpful to think of x as the input, while f(x)is the corresponding output, so that as the value of x varies, the value off(x) varies according to the assignment. Correspondingly, we often writex → f(x) to signify that x is mapped onto f(x). We also think of thefunction f as a “machine” that transforms x into f(x):

xf→ f(x),

see also Fig. 9.1.

f

xf(x)

Df Rf

Fig. 9.1. Illustration of f : Df → Rf

We refer to x as a variable since x can take different values, and x isalso called the argument of the function. The prescribed set of values overwhich x can vary is called the domain of the function f and is denoted byD(f). The set of values f(x) corresponding to the values of x in the domainD(f), is called the range R(f) of f(x). As x varies over the domain D(f),the corresponding function value f(x) varies over R(f). We often write thissymbolically as f : D(f) → R(f) indicating that for each x ∈ D(f) thereis a value f(x) ∈ R(f) assigned.

In the context of the Dinner Soup model with f(x) = 15x, we may chooseD(f) = [0, 1], if we decide that the amount of beef x can vary in the interval[0, 1], in which case R(f) = [0, 15]. For each amount x of beef in the interval[0, 1], there is a corresponding total cost f(x) = 15x in the interval [0, 15].Again: the total cost 15x is a function of the amount of beef x. We mayalso choose the domain D(f) to be some other set of possible values of theamount of beef x such as D(f) = [a, b], where a and b are positive rationalnumbers, with the corresponding range R(f) = [15a, 15b], or D(f) = Q

+

with the corresponding range R(f) = Q+, where Q

+ is the set of positiverational numbers. We may even consider the function x → f(x) = 15xwith D(f) = Q and the corresponding range R(f) = Q, which would lead

9.1 Introduction 105

outside the Dinner Soup model since there x is non-negative. For a givenassignment x→ f(x), that is, a given function f(x), we may thus associatedifferent domains D(f) and corresponding ranges R(f) depending on thesetting.

It is common to assign a variable name to the output of a function,for example we may write y = f(x). Thus, the value of the variable yis given by the value f(x) assigned to the variable x. We therefore callx the independent variable and y the dependent variable. The independentvariable x takes on values in the domainD(f), while the dependent variabley takes on values in the range R(f).

Note that the names we use for the independent variable and the depen-dent variable for a given function f can be changed. The names x and yare common, but there is nothing special about these letters. For example,z = f(u) denotes the same function if we do not change f , i.e. the functiony = 15x can just as well be written z = 15u. In both cases, to a givennumber x or u the function f assigns that number multiplied by 15, thatis 15x or 15u. Thus we refer to “the function f(x)” while in fact it wouldbe more correct to just say “the function f”, because f is the “name” ofthe function, while f(x) is more like a description or definition of the func-tion. Nevertheless we will often use the somewhat sloppy language “thefunction f(x)” because it identifies both the name of the function and itsdefinition/description.

Example 9.1. The function x → f(x) = x2, or in short the functionf(x) = x2, may be considered with domain D(f) = Q

+ and rangeR(f) = Q

+, but also with domain D(f) = Q and again R(f) = Q+, or

with D(f) = Z, and R(f) = 0,±1,±2,±4, . . .. We illustrate in Fig. 9.2.

−1 0 1 2 3 4 5 −1 0 1 2 3 4 5

Df Rf

f(x) = x2

Fig. 9.2. Illustration of f : Q → Q+ with f(x) = x2

Example 9.2. For the function f(z) = z + 3 we may choose, for example,D(f) = N and R(f) = 4, 5, 6, . . ., or D(f) = Z and R(f) = Z.

Example 9.3. We may consider the function f(n) = 2−n with D(f) = N

and R(f) = 12 ,

14 ,

18 , . . ..

Example 9.4. For the function x→ f(x) = 1/x we may choose D(f) = Q+

and R(f) = Q+. For any given x in Q

+, the value f(x) = 1/x is in Q+,

and thus R(f) is a subset of Q+. Correspondingly, for any given y in Q

+


there is an x in Q+ with y = 1/x, and thus R(f) = Q

+, that is R(f) fillsup the whole of Q

+.

While the domain D(f) of a function f(x) often is given by the contextor the nature of f(x), it is often difficult to exactly determine the corre-sponding range R(f). We therefore often interpret f : D(f) → B to meanthat for each x in D(f) there is an assigned value f(x) that belongs tothe set B. The range R(f) is thus included in B, but the set B may bebigger than R(f). This relieves us from figuring out exactly what set R(f)is, which would be required to give f : D(f) → R(f) substance. We saythat f maps D(f) onto R(f) since every element of the set R(f) is of theform f(x) for some x ∈ D(f), and writing f : D(f) → B we say that fmaps D(f) into the set B.

The notation f : D(f) → B then rather serves the purpose of describ-ing the nature or type of the function values f(x), than more preciselywhat function values are assumed as x varies over D(f) For example, writ-ing f : D(f) → N indicates that the function values f(x) are naturalnumbers. Below we will meet functions x → f(x), where the variable xdoes not represent just a single number, but something more general likea pair of numbers, and likewise f(x) may be a pair of numbers. Writingf : D(f) → B with the proper sets D(f) and B, may contain the informa-tion that x is a number and f(x) is pair of numbers. We will meet manyconcrete examples below.

Example 9.5. The function f(x) = x2 satisfies f : Q → [0,∞) withD(f) = Q and R(f) = [0,∞), but we can also write f : Q → Q, indi-cating that x2 is a rational number if x is, see Fig. 9.2.

Example 9.6. The function

f(x) =x3 − 4x2 + 1

(x − 4)(x− 2)(x+ 3)

is defined for all rational numbers x = 4, 2,−3, so it is natural to defineD(f) = x ∈ Q, x = 4, x = 2, x = −3. It is often the case that we takethe domain to be the largest set of numbers for which a function is defined.The range is hard to compute, but certainly we have f : D(f) → Q.

9.2 Functions in Daily Life

In daily life, we stumble over functions right and left. A car dealer assignsa price f(x), which is a number, to each car x in his lot. Here D(f) maybe a set of numbers if each car is identified by a number, or D(f) may besome other listing of the cars such as Chevy85blue, Olds93pink,. . . , andthe range R(f) is the set of all different prices of the cars in D(f). When

9.2 Functions in Daily Life 107

the government makes out our tax bill, it is assigning one number f(x),representing the amount we owe, to another number x, representing oursalary. Both the domain D(f) and the range R(f) in this example changea lot depending on the political winds.

Any quantity which varies over time may be viewed as a function of time.The daily maximum temperature in degrees Celsius in Stockholm during1999 is a certain function f(x) of the day x of the year, with D(f) =1, 2, . . . , 365 and R(f) normally a subset of [−30, 30]. The price f(x)of a stock during one day of trade at the Stockholm Stock Exchange isa function of the time x of the day, with D(f) = [10.00, 17.00] and R(f)the range of variation of the stock price during the day. The length ofwomen’s skirts varies over the years around the level of the knee, and issupposed to be a good indicator of the variation of the economical climate.The length of a human being varies over the life time, and the thickness ofthe ozone layer over years.

We may also simultaneously consider several quantities depending ontime, like for example the temperature t(x) in degrees Celsius and windvelocity w(x) in meter per second in Chicago as functions of time x, wherex ranges over the month of January, and we may combine the two valuest(x) and w(x) into a pair of numbers “t(x) and w(x)”, which we may writeas f(x) = “t(x) and w(x)” or in short-hand f(x) = (t(x), w(x)) with theparenthesis enclosing the pair. For example writing, f(10) = (−30, 20),would give the information that the 10th of January was a tough day withtemperature −30C and wind 20 meter per second. From this informationwe could compute the adjusted temperature -50C that day taking thewind factor into account.

Likewise the input variable x could represent a pair of numbers, likea temperature and a wind speed and the output could be the adjustedtemperature with the wind factor taken into account (Find the formula!).

We conclude that the input x of a function f(x) may be of many differenttypes, single numbers, pairs of numbers, triples of numbers, et cetera, aswell as the output f(x).

Example 9.7. A book may consist of a set of pages numbered from 1 toN . We may introduce the function f(n) defined on D(f) = 1, 2, · · · , N,with f(n) representing the physical page with number n. In this case therange R(f) is the collection of pages of the book.

Example 9.8. A movie consists of a sequence of pictures that are displayedat the rate of 16 pictures per second. We usually watch a movie from thefirst to the last picture. Afterwards we might talk about different scenes inthe movie, which corresponds to subsets of the totality of pictures. A veryfew people, like the film editor and director, might consider the movie onthe level of the individual elements in the domain, that is the pictures on thefilm. When editing the movie, they number the picture frames 1, 2, 3, · · · , N


where 1, 2, . . . 16 are the numbers of the pictures displayed sequentially dur-ing the first second, and N is the number of the last picture. We may thenconsider the movie as a function f(n) with D(f) = 1, 2, · · · , N, which toeach number n in D(f) associates the picture frame with number n.

Example 9.9. A telephone directory of the people living in a city likeGoteborg is simply a printed version of the function f(x) that to eachperson x in Goteborg with a listed number, assigns a telephone number.For example, if x = Anders Andersson then f(x) = 4631123456 which isthe telephone number of Anders Andersson. If we have to find a telephonelisting, our thought is first to get the telephone book, that is the printedrepresentation of the entire domain and range of the function f , and then todetermine the image, i.e. telephone number, of an individual in the domain.In this example, we arrange the domain of individual names of peopleliving in Goteborg in such a way that it is easy to search for a particularinput. That is we list the individuals alphabetically. We could use anotherarrangement, say by listing individuals in order of their social securitynumbers.

Example 9.10. The 1890 census (population count) in the US was per-formed using Herman Hollerith’s (1860–1829) punched card system, wherethe data for each person (sex, age, address, et cetera) was entered in theform of holes in certain positions on a dollar bill size card, which couldthen be read automatically by a machine using a system of pins connectingelectrical circuits through the holes, see Fig. 9.3. The total population wasfound to be 62.622.250 after a processing time of three months with theHollerith system instead of the projected 2 years. Evidently, we may viewthe Hollerith system as a function from the set of all 1890 US citizens to thedeck of punched cards. To further exploit his system Hollerith founded theTabulating Machine Company, which was renamed International BusinessMachines Corporation IBM in 1924.

There is one important aspect of all the three above examples, book,movie and directory, not captured viewing these objects as certain func-tions x → f(x) with a certain domain D(f) and range R(f), namely, theordering of D(f). The pages of a book, and pictures of a film are numberedconsecutively, and the domain of a directory is also ordered alphabetically.In the case of a book or film the ordering helps to make sense out of thematerial, and a dictionary without any order is almost useless. Of course,swapping through films has become a part of the life-style of to-day, butthe risk of a loss of understanding is obvious. To be able to catch the mainidea or plot of a book or film as a whole it is necessary to read the pages orview the pictures in order. The ordering helps us to get an overall meaning.

Similarly, it is useful to be able to catch the main properties of a func-tion f(x), and this can sometimes be done by graphing or visualizing thefunction using some suitable ordering of D(f). We now go into the topic of

9.3 Graphing Functions of Integers 109

Fig. 9.3. Hermann Hollerith, inventor of the punched card machine: “My friendDr. Billings one night at the pub suggested to me that there ought to be somemechanical way of doing the census, something on the principle of the Jacquardloom, whereby holes in a card regulate the pattern to be woven”

graphing functions f(x) with D(f) and R(f) subsets of Q, of course withthe usual ordering of Q, and with the purpose of trying to grasp the natureof a given function “as a whole”.

9.3 Graphing Functions of Integers

So far we have described a function both by listing all its values in a tablelike the phone book and by giving a formula like f(n) = n2 and indicatingthe domain. It is also useful to have a picture of the behavior of a function,or in other words, to represent a function geometrically. Graphing functionsis a way of visualizing a function so that we can grasp the nature of thefunction “in one shot” or as one object. For example, we can describe the


function as increasing in this region and decreasing in this other region,giving an idea of how it behaves without being specific.

We begin by describing the graphing of functions f : Z → Z. Recall thatintegers are represented geometrically using the integer line. To describe theinput and output to a function f : Z → Z, we therefore need two numberlines so that we can mark the points in D(f) on one and the points in R(f)on the other. A convenient way to arrange these two number lines is to placethem orthogonal to each other as in Fig. 9.4. If we mark the points obtainedby intersecting vertical lines through integer points on the horizontal axiswith the horizontal lines through integer points on the vertical axis, weget a grid of points like that shown in Fig. 9.4. This is called the integercoordinate plane. Each number line is called an axis of the coordinate planewhile the intersection point of the two number lines is called the origin andis denoted by 0.

Fig. 9.4. The integer coordinate plane

As we saw, a function f : Z → Z can be represented by making a list withthe inputs placed side-by-side with the corresponding outputs. We showsuch a table for f(n) = n2 in Fig. 9.5. We can represent such a table also inthe integer coordinate plane by marking only those points correspondingto an entry in the table, i.e. marking each intersection point of the linerising vertically from the input and the line extending horizontally fromthe corresponding output. We draw the plot corresponding to f(n) = n2

in Fig. 9.5.

Example 9.11. In Fig. 9.6, we plot n, n2, and 2n along the vertical axis withn = 1, 2, 3, . . . , 6 along the horizontal axis. The plot suggests 2n grows morequickly than both n and n2 as n increases. In Fig. 9.7, we plot n−1, n−2,and 2−n with n = 1, 2, .., 6, and we see that 2−n decreases most rapidlyand n−1 least rapidly. Compare these results to Fig. 9.6.

Instead of using a table to list the points for a function, we can representa point on the integer plane mathematically by means of an ordered pair of

9.3 Graphing Functions of Integers 111

n f(n)

01

-12-2

3-3

01

144

99

4-4

5-5

1616

2525

6-6

3636 2 4-2-4

5

10

15

Fig. 9.5. A tabular listing of f(n) = n2 and a graph of the points associatedwith the function f(n) = n2 with domain equal to the integers

2 4 6

20

40

60

20

40

60

0

10

30

50

Fig. 9.6. Plots of the functions f(n) = n, f(n) = n2, and • f(n) = 2n

with D(f) = N

numbers. To the point in the plane located at the intersection of the verticalline passing through n on the horizontal axis and the horizontal line passingthrough m on the vertical axis, we associate the pair of numbers (n,m).These are the coordinates of the point. Using this notation, we can describethe function f(n) = n2 as the set of ordered pairs

(0, 0), (1, 1), (−1, 1), (2, 4), (−2, 4), (3, 9), (−3, 9), · · · .


2 4 6

.5

1

0

.25

.75

1 3 5

Fig. 9.7. Plots of the functions f(n) = n−1, f(n) = n−2, and • f(n) = 2−n

with D(f) = N

Note that we always associate the first number in the ordered pair with thehorizontal location of the point and the second number with the verticallocation. This is an arbitrary choice.

We can illustrate the idea of a function giving a transformation of itsdomain into its range nicely using its graph. Consider Fig. 9.5. We startat a point in the domain on the horizontal axis and follow a line straightup to the point on the graph of the function. From this point, we followa line horizontally to the vertical axis. In other words, we can find theoutput associated to a given input by tracing first a vertical line and thena horizontal line.

Note also that for functions with D(f) = N or D(f) = Q it is onlypossible to graph part of the function, simply because we cannot in practiceextend the natural or integer number line all the way to “infinity”. Ofcourse, a table representation of such a function must also be limited toa finite range of argument values. Only a defining formula of the functionvalues, like f(n) = n2 (together with a specification of D(f)), can give thefull picture in this case.

9.4 Graphing Functions of Rational Numbers

Now we consider plotting a function f : Q → Q. Following the lead offunctions of integers, we plot functions of rational numbers on the rationalcoordinate plane which we construct by placing two rational number linescalled the axes at right angles and meeting at the origins and then markingevery point that has rational number coordinates. Of course consideringFig. 7.4, such a plane will appear to be solid even if it is not solid. Weavoid plotting an example!

If we begin the plot of a function of rational numbers as above by writingdown a list of values, we realize immediately that graphing a function of

9.4 Graphing Functions of Rational Numbers 113

rational numbers is more complicated than graphing a function of integers.When we compute values of a function of integers, we cannot compute allthe values because there are infinitely many integers. Instead we choosea smallest and largest integer and compute the values of the functionsfor those integers in between. For the same reason, we can not computeall the values of a function defined on the rational numbers. But now wehave to cut off the list also in another way: we have to choose a smallestand largest number for making the list as before, but we also have to de-cide how many points to use in between the low and high values. In otherwords, we cannot compute the values of the function at all the rationalnumbers in between two rational numbers. This means that a list of val-ues of a function of rational numbers always has “gaps” in between thepoints where we evaluate the function. We give an example to make thisclear.

Example 9.12. We list some values of the function f(x) = 12x+ 1

2 definedon the rational numbers:

x 12x+ 1

2

−5 −2−2.8 −.9−2 −.5−1.2 −.1−1 0

x 12x+ 1

2

−.6 .2.2 .61 13 25 3

and then plot the function values in Fig. 9.8.

Fig. 9.8. A plot of the function values of f(x) = 12x+ 1

2, and several functions

taking on the same values at the sample points

The values we list for this example suggest strongly that we should drawa straight line through the indicated points in order to plot the function.However, we cannot be sure that this is the correct graph because thereare many functions that agree with 1

2x+ 12 at the points we computed, for


example we show two of them in Fig. 9.8. Therefore, to graph a functionaccurately, we would need to evaluate it at many more points than we haveused in Fig. 9.8 in general. On the other hand we cannot possibly computethe values f(x) for all possible rational numbers x, so that in the end we stillhave to guess the values of the function in between the points we compute,assuming that the function does not do anything strange there. Matlabfor example fills the gaps between the computed points with straight linesegments when plotting.

Deciding whether or not we have evaluated a function defined on therational numbers enough times to be able to guess its behavior is an in-teresting and important problem. This is not just a theoretical problemby the way: if we have to measure some quantities during an experimentthat should theoretically lie on a line, we are very likely to get a plot ofa function that is close to a line, but that has little wiggles because ofexperimental error.

In fact we are able to use Calculus to help with this decision. For now, wewill assume that the functions we plot vary smoothly between the samplepoints, which is largely true for the functions we consider in this book.

We finish this chapter by giving another example of a plot. In the nextchapter, we spend a lot more time on plotting.

Example 9.13. We list some values of the function f(x) = x2 defined onthe rational numbers:

x x2

−4 16−3.5 12.25−3.1 9.618−2 4

−1.8 3.24−1.4 1.96−1 1

x x2

−.8 .64−.4 .16

0 0.2 .04

1.2 1.441.5 2.25

2.21 4.8841

x x2

2.3 5.292.4 5.76

3 93.1 9.613.6 12.963.7 13.69

4 16

and then plot the function values in Fig. 9.9.

9.5 A Function of Two Variables

We give an example of a function of two variables. The total cost in theDinner Soup/Ice Cream model was

15x+ 3y,

where x was the amount of beef and y that of ice cream. We may view thetotal cost 15x + 3y as a function f(x, y) = 15x + 3y of the two variablesx and y. For each value of x and y there is a corresponding function value

9.5 A Function of Two Variables 115

Fig. 9.9. A plot of some of the points given by f(x) = x2 and a smooth curvethat passes through the points

f(x, y) = 15x+3y representing the total cost. We think here of both x andy as independent variables which may vary freely, corresponding to anycombination of beef and ice cream, and the function value z = f(x, y) asa dependent variable. For each pair of values of x and y there is assigneda value of z = f(x, y) = 15x+3y. We may write (x, y) → f(x, y) = 15x+3y,denoting the pair of x and y by (x, y).

This represents a very natural and very important extension of the con-cept of a function considered so far: a function may depend on two indepen-dent variables. Assuming that for the function f(x, y) = 15x+ 3y we allowboth x and y to vary over [0,∞), we will write f : [0,∞)× [0,∞) → [0,∞)to denote that for each x ∈ [0,∞) and y ∈ [0,∞), that is for each pair(x, y) ∈ [0,∞)× [0,∞), there is a unique value f(x, y) = 15x+ 3y ∈ [0,∞)assigned.

Example 9.14. The prize your roommate has to pay for the x pounds ofbeef and y pounds of ice cream is p = 15x+ 3y, that is p = f(x, y) wheref(x, y) = 15x+ 3y.

Example 9.15. The time t required for a certain bike trip depends on the dis-tance s of the trip, and on the (mean) speed v as s

v , that is t = f(s, v) = sv .

Example 9.16. The pressure p in an ideal (thin) gas mixture depends on thetemperature T and volume V occupied of the gas as p = f(T, V ) = nRT

V ,where n is the number of moles of gas molecules and R is the universal gasconstant.


9.6 Functions of Several Variables

Of course we may go further and consider functions depending on severalindependent variables.

Example 9.17. Letting your roommate decide the amount of beef x, carrotsy and potatoes z in the Dinner Soup, the cost k of the soup will be k =8x+ 2y + z depending on the three variables x, y and z. The cost is thusgiven by k = f(x, y, z) where f(x, y, z) = 8x+ 2y + z.

Example 9.18. The temperature u at a certain position depends on thethree space coordinates x, y and z, as well as on time t, that is u =u(x, y, z, t).

As we come to consider situations with more than just a few indepen-dent variables, it soon becomes necessary to change notation and use somekind of indexation of the variables like for example denoting the spacialcoordinates x, y and z instead by x1, x2 and x3. For example, we may thenwrite the function u in the last example as u(x, t) where x = (x1, x2, x3)contains the three space coordinates.

Chapter 9 Problems

9.1. Identify four functions you encounter in your daily life and determine thedomain and range for each.

9.2. For the function f(x) = 4x − 2, determine the range corresponding to: (a)D(f) = (−2, 4], (b) D(f) = (3,∞), (c) D(f) = −3, 2, 6, 8.

9.3. Given that f(x) = 2 − 13x, find the domain D(f) corresponding to therange R(f) = [−1, 1] ∪ (2,∞).

9.4. Determine the domain and range of f(x) = x3/100 + 75 where f(x) isa function giving the temperature inside an elevator holding x people and witha maximum capacity of 9 people.

9.5. Determine the domain and range of H(t) = 50− t2 where H(t) is a functiongiving the height in meters of a ball dropped at time t = 0.

9.6. Find the range of the function f(n) = 1/n2 defined on D(f) = n ∈ N :n ≥ 1.

9.7. Find the domain and a set B containing the range of the function f(x) =1/(1 + x2).


9.8. Find the domain of the functions

(a)2 − x

(x+ 2)x(x− 4)(x− 5)(b)

x

4 − x2(c)

1

2x+ 1+

x2

x− 8

9.9. (Harder) Consider the function f(n) defined on the natural numbers wheref(n) is the remainder obtained by dividing n by 5 using long division. So forexample, f(1) = 1, f(6) = 1, f(12) = 2, etc. Determine R(f).

9.10. Illustrate the map f : N → Q using two intervals where f(n) = 2−n.

9.11. Plot the following functions f : Z → Z after making a list of at least 5values: (a) f(n) = 4 − n, (b) f(n) = 2n− n2, (c) f(n) = (n+ 1)3.

9.12. Draw three different curves that pass through the points

(−2,−1), (−1,−.5), (0, .25), (1, 1.5), (3, 4).

9.13. Plot the functions; (a) 2−n, (b) 5−n, and (c) 10−n; defined on the naturalnumbers n. Compare the plots.

9.14. Plot the function f(n) = 109

(1− 10−n−1) defined on the natural numbers.

9.15. Plot the function f : Q → Q with f(x) = x3 after making a table of values.

9.16. Write a MATLAB function that takes two rational arguments x and yand returns their sum x+ y.

9.17. Write a MATLAB function that takes two arguments x and y repre-senting two velocities, and returns the time gained per kilometer by raising thevelocity from x to y.

10Polynomial functions

Sometimes he thought to himself, “Why?” and sometimes he thought,“Wherefore?”, and sometimes he thought, “Inasmuch as which?”.(Winnie-the Pooh)

He was one of the most original and independent of men and neverdid anything or expressed himself like anybody else. The result wasthat it was very difficult to take notes at his lectures so that wehad to trust mainly to Rankine’s text books. Occasionally in thehigher classes he would forget all about having to lecture and, afterwaiting for ten minutes or so, we sent the janitor to tell him thatthe class was waiting. He would come rushing into the door, takinga volume of Rankine from the table, open it apparently at random,see some formula or other and say it was wrong. He then went up tothe blackboard to prove this. He wrote on the board with his backto us, talking to himself, and every now and then rubbed it all outand said it was wrong. He would then start afresh on a new line, andso on. Generally, towards the end of the lecture he would finish onewhich he did not rub out and say that this proved Rankine was rightafter all. (Rayleigh about Reynolds)

10.1 Introduction

We now proceed to study polynomial functions, which are fundamentalin Calculus and Linear Algebra. A polynomial function, or polynomial,

120 10. Polynomial functions

f(x) has the form

f(x) = a0 + a1x+ a2x2 + a3x

3 + · · · + anxn, (10.1)

where a0, a1, · · · , an, are given rational numbers called the coefficientsand the variable x varies over some set of rational numbers. The valueof a polynomial function f(x) can be directly computed by adding andmultiplying rational numbers. The Dinner Soup function f(x) = 15x is anexample of a linear polynomial with n = 1, a0 = 0 and a1 = 15, and theMuddy Yard function f(x) = x2 is an example of a quadratic function withn = 2, a0 = a1 = 0, and a2 = 1.

If all the coefficients ai are zero, then f(x) = 0 for all x and we say thatf(x) is the zero polynomial. If n denotes the largest subscript with an = 0,we say that the degree of f(x) is n. The simplest polynomials besides thezero polynomial are the constant polynomials f(x) = a0 of degree 0. Thenext simplest cases are the linear polynomials f(x) = a0 + a1x of degree 1and quadratic polynomials f(x) = a0 + a1x + a2x

2 of degree 2 (assuminga1 = 0 respectively a2 = 0), which we just gave examples of, and wemet a polynomial of degree 3 in the model of solubility of Ba(IO 3 ) 2 inProposition 7.10.

The polynomials are basic “building blocks” in the mathematics of func-tions, and spending some effort understanding polynomials and learningsome facts about them will be very useful to us later on. Below we will meetother functions such as the elementary functions including trigonometricfunctions like sin(x) and the exponential function exp(x). The elementaryfunctions are all solutions of certain fundamental differential equations,and evaluation of these functions requires solution of the correspondingdifferential equation. Thus, these functions are not called elementary be-cause they are elementary to evaluate, like a polynomial, but because theysatisfy fundamental “elementary” differential equations.

In the history of mathematics, there has been two grand attempts todescribe “general functions” in terms of (i) polynomial functions (powerseries) or (ii) trigonometric functions (Fourier series). In the finite elementmethod of our time, general functions are described using piecewise polyno-mials.

We start with linear and quadratic functions, before considering generalpolynomial functions.

10.2 Linear Polynomials

We start with the linear polynomial y = f(x) = mx, where m is a rationalnumber. We write here m instead of a1 because this notation is often used.We may choose D(f) = Q and if m = 0 then R(f) = Q because if y is anyrational number, then x = y/m inserted into f(x) = mx gives the value

10.2 Linear Polynomials 121

of f(x) = y. In other words the function f(x) = mx with m = 0 maps Q

onto Q.One way to view the set of (x, y) that satisfy y = mx is to realize that

such (x, y) also satisfy y/x = m. Suppose that (x0, y0) and (x1, y1) are twopoints satisfying y/x = m. If we draw a triangle with one corner at theorigin and with one side parallel to the x axis of length x0 and anotherside parallel to the y axis of length y0 then draw the corresponding trianglefor the other point with sides of length x1 and y1, see Fig. 10.1, then thecondition

y0x0

= m =y1x1

means that these two triangles are similar. In fact any point (x, y) satisfyingy/x = m must form a triangle similar to the triangle made by (x0, y0), seeFig. 10.1. This means that such points lie on a line that passes through theorigin as indicated.

Fig. 10.1. Points satisfying y = mx form similar triangles. In this figure,m = 3/2

In the language of architecture, m, or the ratio of y to x, is called therise over the run while mathematicians call m the slope of the line. Ifwe imagine standing on a straight road going up hill, then the slope tellshow much we have to climb for any horizontal distance we travel. In otherwords, the larger the slope m, the steeper the line. By the way, if the slopeis negative then the line slopes downwards. We show some different linesin Fig. 10.2. When the slope m = 0, then we get a horizontal line sittingon top of the x axis. A vertical line on the other hand is the set of points(x, y) where x = a for some constant a. Vertical lines do not have a welldefined slope.

Using the slope to describe how a line increases or decreases does notdepend on the line passing through the origin. We can start at any pointon a line and ask how much the line rises or lowers if we move horizontallya distance x, see Fig. 10.3. If the points are (x0, y0) and (x1, y1), then y1−y0is the amount of “rise” corresponding to the “run” of x1 − x0. Hence the


-2 2

y=2x

-4

-2

2

4

y=x/2

y=-x/3

y=-x

Fig. 10.2. Examples of lines

(x0,y0)

(x1,y1)

x1 - x0

y1 - y0

Fig. 10.3. The slope of any line is determined by the amount of rise over theamount of run between any two points on the line

slope of the line through the points (x0, y0) and (x1, y1) is

m =y1 − y0x1 − x0

.

If (x, y) is any other point on the line, then we know that

y − y0x− x0

= m =y1 − y0x1 − x0

or

(y − y0) = m(x− x0). (10.2)

This is called the point-slope equation for a line.

10.2 Linear Polynomials 123

Example 10.1. We find the equation of the line through (4,−5) and (2, 3).The slope is

m =3 − (−5)

2 − 4= 4

and the line is y − 3 = 4(x− 2).

We can rewrite (10.2) to resemble (10.1) by multiplying out the terms in(10.2) and solving for y. This yields the slope-intercept form:

y = mx+ b, (10.3)

with b = y1 −mx1. b is called the y-intercept of the line because the linecrosses the y axis at the point (0, b), i.e. at a height of b. The differencebetween the graphs of y = mx and y = mx + b is simply that every pointon y = mx+b is translated vertically a distance of b from the correspondingpoint on y = mx. In other words, we can graph y = mx+b by first graphingy = mx and the moving the line vertically by an amount of b. We illustratein Fig. 10.4. When b > 0 we move the line up and when b < 0 we movethe line down. Evidently, we can find the slope-intercept form directly fromknowing two points.

Fig. 10.4. The graph of y = mx+ b is found by translating the graph of y = mxvertically by an amount b. In this case b > 0

Example 10.2. We find the slope-intercept form of the line through (−3, 5)and (4, 1). The slope is

m =5 − 1−3 − 4

= −47.

To compute the y-intercept, we substitute either point into the equationy = − 4

7x+ b, for example,

5 = −47×−3 + b

so b = 23/7 and y = − 47x+ 23

7 .


The technique of translating a known graph can be very useful whengraphing. For example once we have plotted y = 4x, we can quickly ploty = 4x− 12, y = 4x− 1

5 , y = 4x+ 1, and y = 4x+ 113.45 by translation.

10.3 Parallel Lines

We now draw a connection to the parallel axiom of Euclidean geometry,which we discussed in chapter Euclid and Pythagoras. First, let y = mx+b1and y = mx+ b2 be two lines with same slope m, but different y-interceptsb1 and b2, so that the lines are not identical. These two lines cannot evercross, since there is no x for which mx + b1 = mx + b2, because b1 = b2.We conclude that two lines with the same slope are parallel in the sense ofEuclidean geometry.

On the other hand, if y = m1x+ b1 and y = m2x+ b2 are two lines withdifferent slopes m1 = m2, then the two lines will cross, since we can solvethe equation m1x+b1 = m2x+b2 uniquely, to get x = (b1−b2)/(m2−m1).We conclude that two lines corresponding to two linear polynomials y =m1x+ b1 and y = m2x+ b2 are parallel if and only if m1 = m2.

Example 10.3. We find the equation of the line that is parallel to the linethrough (2, 5) and (−11, 6) and passing through (1, 1). The slope of the linemust be m = (6 − 5)/(−11− 2) = −1/13. Therefore, 1 = −1/13× 1 + b orb = 14/13 and y = − 1

13x+ 1413 .

Example 10.4. We can find the point of intersection between the line y =2x + 3 and y = −7x − 4 by setting 2x + 3 = −7x − 4. Adding 7x andsubtracting 3 from both sides 2x + 3 + 7x − 3 = −7x − 4 + 7x − 3 gives9x = −7 or x = −7/9. We can get the value of y from either equation,y = 2x+ 3 = 2(−7

9 ) + 3 = 139 or y = −7x− 4 = −7(−7

9 ) − 4 = 139 .

10.4 Orthogonal Lines

Lets us next show that two lines corresponding to two linear polynomialsy = m1x + b1 and y = m2x + b2 are orthogonal, that is make an angle of90 or 270, if and only if m1m2 = −1.

Since the values of b1 and b2 can be changed without changing the direc-tions of the lines, it is sufficient to show that the statement is true for twolines that pass through the origin. Assume now that the lines are orthogo-nal. Then m1 and m2 must have different signs, since otherwise either bothof the lines are increasing or both are decreasing and then they cannot beperpendicular. Now consider the triangles drawn in Fig. 10.5. The lines areperpendicular only if the angles θ1 and θ2 that the lines make with the xaxis add up to 90. This can happen only if the triangles drawn are similar.

10.5 Quadratic Polynomials 125

θ1θ2

m1

m2

Line 1

Line 2

Fig. 10.5. Similar triangles defined by perpendicular lines with slope m1 andm2. The angles θ1 and θ2 add up to 90

This means that 1/|m1| = 1/|m2| or |m1| |m2| = 1. This shows the resultsince m1 and m2 have opposite signs or m1m2 < 0.

Finally, assuming that m1m2 = −1 shows that the two triangles aresimilar and the orthogonality follows.

Example 10.5. We find the equation of the line that is perpendicular tothe line through (2, 5) and (−11, 6) and passing through (1, 1). The slopeof the first line is m = (6 − 5)/(−11− 2) = −1/13, so the slope of the linewe compute is −1/(−1/13) = 13. Therefore, 1 = 13× 1 + b or b = −12 andy = 13x− 12.

We will return to the topic of parallel and orthogonal lines in a littlewider setting in chapter Analytic geometry in Q

2. In particular, the so farexcluded cases with vertical or horizontal lines, will then be included ina natural way.

10.5 Quadratic Polynomials

The general quadratic polynomial has the form

f(x) = a2x2 + a1x+ a0

for constants a2, a1, and a0, where we assume a2 = 0 (otherwise we goback to the linear case).

We show how to plot such a function by using the idea of plotting linesin the previous section starting with the simplest example of a quadratic


functiony = f(x) = x2.

The domain of f is the set of rational numbers while the range containssome of the nonnegative rational numbers. We list some of the values here:

x x2

−2 4−1 1−.5 .25

x x2

−.25 .125−.1 .01

0 0

x x2

.1 .01

.5 .251 1

x x2

2 43 94 16

We also observe that f(x) = x2 is increasing for x > 0, which means thatif 0 < x1 < x2 then f(x1) < f(x2). This follows because x1 < x2 meansthat x1 × x1 < x2 × x1 < x2 × x2. Likewise, we can show that f(x) = x2 isdecreasing for x < 0, which means that if x1 < x2 < 0 then f(x1) > f(x2).This means that the function at least cannot wiggle very much in betweenthe values we compute. We plot the values of f(x) = x2 in Fig. 10.6 for601 equally spaced points between x = −3 and x = 3.

-4 -3 -2 -1 1 2 3 4-2

1

4

7

10

Fig. 10.6. Plot of f(x) = x2. The function is decreasing for x < 0 and increasingfor x > 0

To draw the graph of a general quadratic function, we follow the ideabehind computing the graphs of lines by using translation. We start withf(x) = x2 and then change that graph to get the graph of any otherquadratic. There are two kinds of changes we make.

The first change is called scaling. Consider the plots of the quadraticfunctions in Fig. 10.7. Each of these functions has the form y = f(x) = a2x

2

for a constant a2. Their plots all have the same basic shape as y = x2.However the heights of the points on y = a2x

2 are a factor of |a2| higheror lower than the height of the corresponding point on y = x2: higher if|a2| > 1 and lower if |a2| < 1. If a2 < 0 then the plot is also “flipped” orreflected through the x-axis.

The second change we consider is translation. The two possibilities areto translate horizontally, or sideways, and vertically. We show examples of

10.5 Quadratic Polynomials 127

Fig. 10.7. Plots of y = x2 scaled four different ways

both in Fig. 10.8. Graphs of quadratic functions of the form f(x) = (x+x0)2

can be drawn by moving the graph of y = x2 sideways to the right a distanceof |x0| if x0 < 0 and to the left a distance of x0 if x0 > 0. The easiest wayto remember which direction to translate is to figure out the new positionof the vertex, which is the lowest or highest point of the quadratic. Fory = (x−1)2, the lowest point is x = 1 and the graph is obtained by movingthe graph of y = x2 so the vertex is now at x = 1. For y = (x + .5)2, thevertex is at x = −.5 and we get the graph by moving the graph of y = x2

to the left a distance of .5. On the other hand, the graph of a functiony = x2 + d can be obtained by translating the graph of y = x2 vertically,in a fashion similar to what we did for lines. Recall that d > 0 translatesthe graph upwards and d < 0 downwards.

Fig. 10.8. Plots of y = x2 translated four different ways

Now it is possible to put all of this together to plot the graph of thefunction y = f(x) = a(x− x0)2 + d by scaling and translating the graph ofy = x2. We perform each operation in the same order that we would use to


do the arithmetic in computing values of f(x); first translate horizontallyby x0, then scale by a, and finally translate vertically by d.

Example 10.6. We plot y = −2(x + 1)2 + 3 in Fig. 10.9 by starting withy = x2 in (a), translating horizontally to get y = (x + 1)2 in (b), scalingvertically to get y = −2(x+ 1)2 in (c), and finally translating vertically toget y = −2(x+ 1)2 + 3 in (d).

Fig. 10.9. Plotting y = −2(x+ 1)2 + 3 in a systematic way

The last step is to consider the plot of the quadratic y = ax2 + bx + c.The idea is to first rewrite this in the form y = a(x− x0)2 + d for some x0

and d, then we can draw the graph easily. To explain how to do this, wework backwards using the example y = −2(x + 1)2 + 3. Multiplying out,we get

y = −2(x2 + 2x+ 1) + 3 = −2x2 − 4x− 2 + 3 = −2x2 − 4x+ 1.

Now if we are given y = −2x2 − 4x+ 1, we can do the following steps

y = −2x2 − 4x+ 1

= −2(x2 + 2x) + 1

= −2(x2 + 2x+ 1 − 1) + 1

= −2(x2 + 2x+ 1) + 2 + 1

= −2(x+ 1)2 + 3.

This procedure is called completing the square. Given x2 + bx, the idea toadd the number m so that x2 + bx + m is the square (x − x0)2 for someappropriate x0. Of course we also have to subtract m so we don’t changethe function. Note that we added and subtracted 1 inside the parenthesis inthe example above! If we multiply out, we get

(x− x0)2 = x2 − 2x0x+ x20

10.6 Arithmetic with Polynomials 129

which is supposed to match

x2 + bx+m.

This means that x0 = −b/2 while m = x20 = b2/4. In the example above,

b = 2, x0 = −1, and m = 1.

Example 10.7. We complete the square on y2 − 3x + 7. Here b = −3,x0 = 3/2, and m = 9/4. So we write

y2 − 3x+ 7 = y2 − 3x+94− 9

4+ 7

=(

y − 32

)2

+194.

Example 10.8. We complete the square on 6y2 + 4y − 2. We first have towrite

6y2 + 4y − 2 = 6(

y2 +23y

)

− 2.

Now b = 2/3, x0 = −1/3, and m = 1/9. So we write

6y2 + 4y − 2 = 6(

y2 +23y +

19− 1

9

)

− 2

= 6(

y +13

)2

− 69− 2

= 6(

y +13

)2

− 83.

Example 10.9. We complete the square on y = 12x

2 − 2x+ 3.

12x2 − 2x+ 3 =

12(x2 − 4x) + 3

=12(x2 − 4x+ 4 − 4) + 3

=12(x− 2)2 − 2 + 3

=12(x− 2)2 + 1.

10.6 Arithmetic with Polynomials

We turn now to investigating properties of polynomials of general degree,beginning with arithmetic properties. Recall that if we add, subtract, ormultiply two rational numbers, then the result is another rational number.In this section, we show that the analogous property holds for polynomials.


The Σ Notation for Finite Sums

Before exploring arithmetic with polynomials, we introduce a convenientnotation for dealing with long finite sums using the Greek letter sigma Σ.Given any n + 1 quantities a0, a1, · · · , an indexed with subscripts, wewrite the sum

a0 + a1 + · · · + an =n∑

i=0

ai.

The index of the sum is i and it is assumed that it takes on all the integersbetween the lower limit, which is 0 here, and the upper limit, which is nhere, of the sum.

Example 10.10. The finite harmonic series of order n is

n∑

i=1

1i

= 1 +12

+13

+ · · · 1n

while the finite geometric series of order n with factor r is

1 + r + r2 + · · · + rn =n∑

i=0

ri.

Notice that the index i is a dummy variable in the sense that it can berenamed or the sum can be rewritten to start at another integer.

Example 10.11. The following sums are all the same:

n∑

i=1

1i

=n∑

z=1

1z

=n−1∑

i=0

1i+ 1

=n+3∑

i=4

1i− 3

.

Using the Σ notation, we can write the general polynomial (10.1) in themore condensed form:

f(x) =n∑

i=0

aixi = a0 + a1x

1 + · · · + anxn.

Example 10.12. We can write

1 + 2x+ 4x2 + 8x3 + · · · + 220x20 =20∑

i=0

2ixi

and

1 − x+ x2 − x3 − · · · − x99 =99∑

i=0

(−1)ixi.

since (−1)i = 1 if i is even and (−1)i = −1 if i is odd.


Addition of Polynomials

Given two polynomials

f(x) = a0 + a1x1 + a2x

2 + · · · + anxn

andg(x) = b0 + b1x

1 + b2x2 + · · · + bnx

n

we may define a new polynomial denoted by (f + g)(x), and referred to asthe sum of f(x) and g(x), by termwise addition of f(x) and g(x) as follows:

(f + g)(x) = (b0 + a0) + (b1 + a1)x1 + (b2 + a2)x2 + · · · (bn + an)xn.

Changing the order of summation, we see that

(f + g)(x) =n∑

i=0

(ai + bi)xi =n∑

i=0

aixi +

n∑

i=0

bixi = f(x) + g(x).

We can thus define the polynomial (f + g)(x) being the sum of f(x) andg(x) by the formula

(f + g)(x) = f(x) + g(x).

We will below extend this definition to general functions.

Example 10.13. If f(x) = 1 + x2 − x4 + 2x5 and g(x) = 33x+ 7x2 + 2x5,then

(f + g)(x) = 1 + 33x+ 8x2 − x4 + 4x5,

where of course we “fill in” the “missing” monomials, i.e. those with coef-ficients equal to zero in order to use the definition.

In general, to add the polynomials

f(x) =n∑

i=0

aixi

of degree n (assuming that an = 0) and the polynomial

g(x) =m∑

i=0

bixi

of degree m, where we assume that m ≤ n, we just fill in the “missing”coefficients in g by setting bm+1 = bm+2 = · · · bn = 0, and add using thedefinition.

Example 10.14.15∑

i=0

(i+ 1)xi +30∑

i=0

xi =30∑

i=0

aixi

with

ai =

i+ 2, 0 ≤ i ≤ 15i, 16 ≤ i ≤ 30


Multiplication of a Polynomial by a Number

Given a polynomial

f(x) =n∑

i=0

aixi,

and a number c ∈ Q we define a new polynomial denoted by (cf)(x), andreferred to as the product of f(x) by the number c, as follows:

(cf)(x) =n∑

i=0

caixi.

We note that we can equivalently define (cf)(x) by

(cf)(x) = cf(x) = c× f(x).

Example 10.15.

2.3(1 + 6x− x7) = 2.3 + 13.8x− 2.3x7.

Equality of Polynomials

Following these definitions, we say that two polynomials f(x) and g(x) areequal if (f − g)(x) is the zero polynomial with all coefficients equal to zero,that is the coefficients of f(x) and g(x) are the same. Two polynomials arenot necessarily equal because they happen to have the same value at justone point!

Example 10.16. f(x) = x2 − 4 and g(x) = 3x− 6 are both zero at x = 2but are not equal.

Linear Combinations of Polynomials

We may now combine polynomials by adding them and multiplying themby rational numbers, and thereby obtain new polynomials. Thus, if f1(x),f2(x), · · · , fn(x) are n given polynomials and c1, · · · , cn are n given num-bers, then

f(x) =n∑

m=1

cmfm(x)

is a new polynomial called the linear combination of the polynomials f1,· · · , fn with coefficients c1, · · · , cn.

Example 10.17. The linear combination of 2x2 and 4x−5 with coefficients1 and 2 is

1(2x2

)+ 2

(4x− 5

)= 2x2 + 8x− 10.


A general polynomial

f(x) =n∑

i=0

aixi

can be described as a linear combination of the particular polynomials 1, x,x2, · · · , xn, which are called the monomials, see Fig. 10.11 below, with thecoefficients a0, a1, . . . , an. To make the notation consistent, we set x0 = 1for all x.

We sum up:

Theorem 10.1 A linear combination of polynomials is a polynomial. A gen-eral polynomial is a linear combination of monomials.

As a consequence of the definitions made, we get a number of rules forlinear combinations of polynomials that reflect the corresponding rules forrational numbers. For example, if f , g and h are polynomials and c isrational number, then

f + g = g + f, (10.4)(f + g) + h = f + (g + h), (10.5)

c(f + g) = cf + cg, (10.6)

where the variable x was omitted for simplicity.

Multiplication of Polynomials

We now go into multiplication of polynomials. Given two polynomials f(x)=

∑ni=0 aix

i and g(x) =∑m

j=0 bjxj , we define a new polynomial denoted

by (fg)(x), and referred to as the product of f(x) and g(x), as follows

(fg)(x) = f(x)g(x).

To see that this is indeed a polynomial we consider first the product of twomonomials f(x) = xj and g(x) = xi:

(fg)(x) = f(x)g(x) = xjxi = xj × xi = xj+i.

We see that the degree of the product is the sum of the degrees of themonomials.

Next, if f(x) = xj and a polynomial g(x) =∑n

i=0 aixi, then by dis-

tributing xj , we get

(fg)(x) = xjg(x) = a0xj + a1x

j × x+ a2xj × x2 + · · · + anx

j × xn

= a0xj + a1x

1+j + a2x2+j + · · · + anx

n+j

=n∑

i=0

aixi+j ,

which is a polynomial of degree n+ j.


Example 10.18.

x3(2 − 3x+ x4 + 19x8) = 2x3 − 3x4 + x7 + 19x11.

Finally, for two general polynomials f(x) =∑n

i=0 aixi and g(x) =∑m

j=0 bjxj , we have

(fg)(x) = f(x)g(x) =

(n∑

i=0

aixi

)(m∑

i=0

bixi

)

=n∑

i=0

aixi

m∑

j=0

bjxj

=n∑

i=0

ai

m∑

j=0

bjxi+j

=n∑

i=0

m∑

j=0

aibjxi+j .

which is a polynomial of degree n+m. We consider an example

Example 10.19.

(1 + 2x+ 3x2)(x− x5) = 1(x− x5) + 2x(x− x5) + 3x2(x− x5)

= x− x5 + 2x2 − 2x6 + 3x3 − 3x7

= x+ 2x2 + 3x3 − x5 − 2x6 − 3x7

We sum up:

Theorem 10.2 The product of a polynomial of degree n and a polynomialof degree m is a polynomial of degree n+m.

The usual commutative, associative, and distributive laws hold for mul-tiplication of polynomials f , g, and h:

fg = gf, (10.7)(fg)h = f(gh), (10.8)

(f + g)h = fh+ gh, (10.9)

where we again left out the variable x.Products are tedious to compute but luckily it is not necessary very often

and if the polynomials are complicated, we can use MAPLE to computethem for example. There are a couple of examples that are good to keep inmind:

(x+ a)2 = (x+ a)(x+ a) = x2 + 2ax+ a2

(x + a)(x− a) = x2 − a2

(x+ a)3 = x3 + 3ax2 + 3a2x+ a3

10.7 Graphs of General Polynomials 135

10.7 Graphs of General Polynomials

A general polynomial of degree greater than 2 or 3 can be a quite compli-cated function and it is difficult to say much specific about their plots. Weshow an example in Fig. 10.10. When the degree of a polynomial is large,the tendency is for the plot to have large “wiggles” which makes it difficultto plot the function. The value of the polynomial shown in Fig. 10.10 is987940.8 at x = 3.

-1.05 -0.30 0.45 1.20

-8

-4

4

8

12

Fig. 10.10. A plot of y = 1.296 + 1.296x − 35.496x2 − 57.384x3 + 177.457x4

+203.889x5 − 368.554x6 − 211.266x7 + 313.197x8 + 70.965x9 − 97.9x10 − 7.5x11

+10x12

On the other hand, we can plot the monomials rather easily. It turnsout that once the degree n ≥ 2, the plots of the monomials with evendegree n all have a similar shape, as do the plots of all the monomialswith odd degree. We show some samples in Fig. 10.11. One of the most

-1.5 -0.5 0.5 1.5

-2

2

4

6

8

10

12

-1.5 -0.5 0.5 1.5

-18

-12

-6

6

12

18x6

x4

x2

x7

x5

x3

Fig. 10.11. Plots of some monomials


obvious feature of the graphs of the monomials are the symmetry in theplots. When the degree is even, the plots are symmetric across the y-axis,see Fig. 10.12. This means that the value of the monomial is the same forx and −x, or in other words xm = (−x)m for m even. When the degree isodd, the plots are symmetric through the origin. In other words, the valueof the function for x is the negative of the value of the function for −x or(−x)m = −xm for m odd.

x

y xeven

-x

x

y

-x

xodd

Fig. 10.12. The symmetries of the monomial functions of even and odd degree

We can use the ideas of scaling and translation to graph functions of theform y = a(x− x0)m + d.

Example 10.20. We plot y = −.5(x−1)3−6 in Fig. 10.13 by systematicallyusing translations and scaling. Luckily, however, there is no procedure likecompleting the square for monomials of higher degree.

-1

1 2

-27

-13.5

13.5

27

x3

(x-1)3

-.5(x-1)3

-.5(x-1)3-6

Fig. 10.13. The procedure for plotting y = −.5(x− 1)3 − 6

10.8 Piecewise Polynomial Functions 137

10.8 Piecewise Polynomial Functions

We started this chapter by declaring that polynomials are building blocksfor the mathematics of functions. An important class of functions con-structed using polynomials are the piecewise polynomials. These are func-tions that are equal to polynomials on intervals contained in the domain.

We have already met one example, namely

|x| =

x, x ≥ 0−x, x < 0

The function |x| looks like y = x for x ≥ 0 and y = −x for x < 0. We plotit in Fig. 10.14. The most interesting thing to note about the graph of |x|is the sharp corner at x = 0, which occurs right at the transition point ofthis piecewise polynomial.

Fig. 10.14. Plot of y = |x|

Fig. 10.15. Plot of a piecewise (quadratic) polynomial function


Chapter 10 Problems

10.1. Find the point-slope equations of the lines passing through the followingpairs of points. Plot the line in each case.

(a) (1, 3) & (2, 7) (b) (−4, 2) & (−6, 3)

(c) (3, 7) & (5, 7) (d) (3.5, 1.5) & (2.1, 11.8)

(e) (−3, 2) & (−3, 3) (f) (2,−1) & (4,−7).

10.2. Find the slope-intercept equations of the lines passing through the follow-ing pairs of points. Plot the line in each case.

(a) (4,−6) & (14, 2) (b) (3,−2) & (−1, 4)

(c) (13, 4) & (13, 89) (d) (4, 4) & (6, 4)

(e) (−.2, 9) & (−.4, 7) (f) (−1,−1) & (−4, 7).

10.3. Find a formula for the x-intercept of a line given in the form y = mx+ bin terms of m and b.

10.4. Plot the lines y = 12x, y = 1

2x−2, y = 1

2x+4, y = 1

2x+1 using translation.

10.5. Are the lines 2 − y = 7(4 − x) and y = 7x− 13 parallel?

10.6. Are the lines y = 311x− 4 and y = 13 − 11

3x perpendicular?

10.7. Find the point of intersection of the following pairs of lines:

(a) y = 3x+ 2 and y = −4x− 2,

(b) y − 5 = 7(x− 1) and y + 3 = −4(x− 9).

10.8. Find the lines that are (a) parallel and (b) perpendicular to the linethrough (9, 4) and (−1, 3) and passing through the point (3, 0).

10.9. Find the lines that are (a) parallel and (b) perpendicular to the linethrough (−2, 7) and (8, 8) and passing through the point (1, 2).

10.10. Show that f(x) = x2 is decreasing for x < 0.

10.11. Plot the following quadratic functions for −2 ≤ x ≤ 2: (a) 6x2, (b) − 14x2,

(c) 43x2.

10.12. Plot the following quadratic functions for −3 ≤ x ≤ 3: (a) (x− 2)2, (b)(x+ 1.5)2, (c) (x+ .5)2.

10.13. Plot the following quadratic functions for −2 ≤ x ≤ 2: (a) x2 − 3, (b)x2 + 2, (c) x2 − .5.


10.14. Plot the following quadratic functions for −3 ≤ x ≤ 3: (a) − 12(x−1)2+2,

(b) 2(x+ 2)2 − 5, (c) 13(x− 3)2 − 1.

10.15. Complete the square on the following quadratic functions then plot themfor −3 ≤ x ≤ 3: (a) x2 + 4x+ 5, (b) 2x2 − 2x− 1

2, (c) − 1

3x2 + 2x− 1.

10.16. Write the following finite sums using the summation notation. Be sureto get the starting and ending values for the index correct!

(a) 1 + 14

+ 19

+ 116

+ · · · + 1n2 (b) −1 + 1

4− 1

9+ 1

16− · · · ± 1

n2

(c) 1 + 12×3

+ 13×4

+ · · · + 1n×(n+1)

(d) 1 + 3 + 5 + 7 + · · · + 2n+ 1

(e) x4 + x5 + · · · + xn (f) 1 + x2 + x4 + x6 + · · · + x2n.

10.17. Write the finite sum

n∑

i=1

i2 so that: (a) i starts with −1, (b) i starts with

15, (c) the coefficent has the form (i+ 4)2, (d) i ends with n+ 7.

10.18. Given f1(x) = −4+6x+7x3, f2(x) = 2x2 −x3 +4x5 and f3(x) = 2−x4,compute the following polynomials: (a) f1 − 4f2, (b) 3f2 − 12f1, (c) f2 + f1 + f3,(d) f2f1, (e) f1f3, (f) f2f3, (g) f1f3 − f2, (h) (f1 + f2)f3, (i) f1f2f3.

10.19. For a equal to a constant, compute (a) (x+a)2, (b) (x+a)3, (c) (x−a)3,(d) (x+ a)4.

10.20. Compute f1f2 where f1(x) =

8∑

i=0

i2xi and f2(x) =

11∑

j=0

1

j + 1xj .

10.21. Plot the function

f(x) = 360x − 942x2 + 949x3 − 480x4 + 130x5 − 18x6 + x7

using Matlab or Maple. This takes some trial and error in choosing a good intervalon which to plot. You should make plots on several different intervals, startingwith −.5 ≤ x ≤ .5 then increasing the size.

10.22. (a) Show that the monomial x3 is increasing for all x. (b) Show themonomial x4 is decreasing for x < 0 and increasing for x > 0.

10.23. Plot the following monomial functions for −3 ≤ x ≤ 3: (a) x3, (b) x4,(c) x5.

10.24. Plot the following polynomials for −3 ≤ x ≤ 3:

(a) 13(x+ 2)3 − 2 (b) 2(x− 1)4 − 13 (c) (x+ 1)5 − 1.

10.25. Plot the following piecewise polynomials for −2 ≤ x ≤ 2

(a) f(x) =

1, −2 ≤ x ≤ −1,

x2, −1 < x < 1,

x, 1 ≤ x ≤ 2.

(b) f(x) =

−1 − x, −2 ≤ x ≤ −1,

1 + x, −1 < x ≤ 0,

1 − x, 0 < x ≤ 1,

−1 + x, 1 < x ≤ 2.

11Combinations of functions

And he gave a deep sigh, and tried very hard to listen to what Owlwas saying. (Winnie-the Pooh)

11.1 Introduction

In this chapter we consider different ways of creating new functions by com-bining old ones. We often seek to describe complicated functions as combi-nations of simpler functions that we know. In the last chapter, we saw howa general polynomials can be created adding up multiples of monomials,that is, as linear combinations of monomials. In this chapter, we considerfirst linear combinations of arbitrary functions, then multiplication anddivision, and finally composition of functions.

The idea of combining simple things to get complex ones is fundamentalin many different settings. Music is a good example: chords or harmoniesare formed by combining single tones, complex rhythmic patterns may beformed by overlaying simple basic rhythmic patterns, single instrumentsare combined to form an orchestra. Another example is a fancy dinner thatis made up of an entree, main dish, dessert, coffee, together with aperitif,wines and cognac, in endless combinations. Moreover, each dish is formedby combining ingredients like beef, carrots and potatoes.

142 11. Combinations of functions

11.2 Sum of Two Functions and Productof a Function with a Number

Given two functions f1 : Df1 → Q and f2 : Df2 → Q, we define a newfunction denoted by (f1 + f2)(x), and referred to as the sum of f1(x) andf2(x), as follows

(f1 + f2)(x) = f1(x) + f2(x), for x ∈ Df1 ∩Df2 .

Of course, we have to assume that x belongs to both Df1 and Df2 for bothf1(x) and f2(x) to be defined. We can thus write Df1+f2 = Df1 ∩Df2 .

Further, given a function f : Df → Q and a number c ∈ Q, we definea new function denoted by (cf)(x), and referred to as the product of f(x)with c, as follows

(cf)(x) = cf(x) for x ∈ Df .

The domain of cf is equal to the domain of f , that is, Dcf = Df .The definitions of sum of functions and product of a function with a num-

ber are consistent with the corresponding definitions for polynomials madeabove.

Example 11.1. The function f(x) = x3 + 1/x defined on Df = x ∈ Q :x = 0 is the sum of the functions f1(x) = x3 with domain Df1 = Q

and f2(x) = 1/x with domain Df2 = x in Q : x = 0. The functionf(x) = x2 + 2x defined on Z is the sum of x2 defined on Q and 2x definedon Z.

11.3 Linear Combinations of Functions

Given n functions f1 : Df1 → Q, . . . , fn : Dfn → Q, and numbers c1, . . . , cn,we define the linear combination of f1, . . . , fn with coefficients c1, . . . , cn,denoted by (c1f + · · · + cnfn)(x), as follows

(c1f + · · · + cnfn)(x) = c1f1(x) + · · · + cnfn(x)

The domain Dc1f+···+cnfn of the linear combination c1f + · · ·+ cnfn is theintersection of the domains Df1 , · · · , Dfn .

Example 11.2. The domain of the linear combination of

1x ,

x1+x ,

1+x2+x

given by

− 1x

+ 2x

1 + x+ 6

1 + x

2 + x

is x in Q : x = 0, x = −1, x = −2.

The sigma notation is useful for writing general linear combinations.

11.4 Multiplication and Division of Functions 143

Example 11.3. The linear combination of

1x , · · · ,

1xn

given by

2x

+4x

+8x

+ · · · + 2n

xn=

n∑

i=1

2i

xi

has domain x in Q : x = 0.

11.4 Multiplication and Division of Functions

We multiply functions using the same idea used to multiply polynomials.Given two functions f1 : Df1 → Q and f2 : Df2 → Q we define the productfunction (f1f2)(x) by

(f1f2)(x) = f1(x)f2(x) for x ∈ Df1 ∩Df2 ,

and quotient function by

(f1/f2)(x) =f1f2

(x) =f1(x)f2(x)

for x ∈ Df1 ∩Df2 ,

where we of course also assume that f2(x) = 0.


f(x) = (x2 − 3)3(

x6 − 1x− 3

)

with Df = x ∈ Q : x = 0 is the product of the functions f1(x) = (x2−3)3

and f2(x) = x6 − 1/x− 3. The function f(x) = x2 2x is the product of x2

and 2x.

Example 11.5. The domain of

1 + 1/(x+ 3)2x− 5

is the intersection of x in Q : x = −3 and x in Q excepting x = 5/2 orx in Q : x = −3, 5/2.

11.5 Rational Functions

The quotient f1/f2 of two polynomials f1(x) and f2(x) is called a rationalfunction. This is the analog of a rational number which is the quotient oftwo integers.


Example 11.6. The function f(x) = 1/x is a rational function defined forx in Q : x = 0. The function

f(x) =(x3 − 6x+ 1)(x11 − 5x6)(x4 − 1)(x+ 2)(x− 5)

is a rational function defined on x in Q : x = 1,−1,−2, 5.

In an example above, we saw that x− 3 divides into x2 − 2x− 3 exactlybecause x2 − 2x− 3 = (x− 3)(x+ 1) so

x2 − 2x− 3x− 3

= x+ 1.

In the same way, a rational number p/q sometimes simplifies to an inte-ger, in other words q divides into p exactly without a remainder. We candetermine if this is true by using long division. It turns out that long divi-sion also works for polynomials. Recall that in long division, we match theleading digit of the denominator with the remainder at each stage. Whendividing polynomials, we write them as a linear combination of monomialsstarting with the monomial of highest degree and then match coefficientsof the monomials one by one.

Example 11.7. We show a couple of examples of polynomial division. InFig. 11.1, we give an example where the remainder is zero. We concludethat

x3 + 4x2 − 2x+ 3x− 1

= x2 + 5x+ 3.

x3 + 4x2 - 2x - 3

x2 + 5x+ 3

x-1x3 - x2

5x2 - 2x

5x2 - 5x

3x2 - 3x3x2 - 3x

0

Fig. 11.1. An example of polynomial division with no remainder

In Fig. 11.2, we give an example in which there is a non-zero remainder,i.e. we carry out the division to the point where the remaining numeratorhas lower degree than the denominator. Note that in this example, the

11.6 The Composition of Functions 145

2x4 + 0x3 + 7x2 - 8x + 3

2x2 - 2x +15

x2+x-32x4 + 2x3 - 6x2

- 2x3+13x2 - 8x

15x2 - 14x + 3

-29x +48

- 2x3- 2x2 + 6x

15x2 + 15x -45

Fig. 11.2. An example of polynomial division with a remainder

numerator is “missing” a term so we fill in the missing term with a zerocoefficient to make the division easier. We conclude that

2x4 + 7x2 − 8x+ 3x2 + x− 3

= 2x2 − 2x+ 15 +−29x+ 48x2 + x− 3

.

We shall now consider polynomial division in the special case of a de-nominator of the form x − x of degree one, where x is considered fixed,resulting in

f(x) = (x− x) g(x) + r(x), (11.1)

where the reminder polynomial r(x) now must be of degree zero, that isa constant.

The following result is of particular interest. If f(x) is a polynomial ofdegree n with f(x) = 0, then x − x is a factor of f(x), that is, division off(x) with x− x gives

f(x) = (x− x) g(x) + r(x) (11.2)

with r(x) ≡ 0. Conversely, if r(x) ≡ 0, then obviously f(x) = 0. For theproof of this we note that the degree of r(x) is less than the degree of x− x,that is r(x) is in fact a constant. Further r(x) = 0 because f(x) = 0. Thatis r(x) is a constant which is zero, that is r(x) ≡ 0. We have thus proved

Theorem 11.1 If x is a root of a polynomial f(x), that is if f(x) = 0,then f(x) factors as f(x) = (x− x)g(x) for some polynomial g(x) of degreeone less than the degree of f(x). The factor g(x) can be found by polynomialdivision of f(x) by x− x.

11.6 The Composition of Functions

Given two functions f1 and f2, we can define a new function f by firstapplying f1 to an input and then applying f2 to the result, i.e.

f(x) = f2(f1(x))


We say that f is the composition of f2 and f1 and we write f = f2 f1,that is

(f2 f1)(x) = f2(f1(x)).

We illustrate this operation in Fig. 11.3.

x

f1 f2

Df1Df2

f1(x) f2(f1(x))

Fig. 11.3. Illustration of the composition f2 f1

Example 11.8. If f1(x) = x2 and f2(x) = x+1 then f1f2(x) = f1(f2(x)) =(x+ 1)2 while f2 f1 = f2(f1(x)) = x2 + 1.

This example illustrates the general fact that f2f1 = f1f2 in most cases.Determining the domain of the composition of f2 f1 can be compli-

cated. Certainly to compute f2(f1(x)), we have to make certain that x isin the domain of f1 otherwise f1(x) will be undefined. Next we apply f2 tothe result, therefore f1(x) must have a value that is in the domain of f2.Therefore the domain of f2 f1 is the set of points x in Df1 such that f1(x)is in Df2 .

Example 11.9. Let f1(x) = 3 + 1/x2 and f2(x) = 1/(x − 4). Then Df1 =x in Q : x = 0 while Df2 = x in Q : x = 4. Therefore to computef2 f1, we must avoid any points where 3 + 1/x2 = 4 or 1/x2 = 1 or x = 1and x = −1. We conclude that Df2f1 = x in Q : x = 0, 1,−1.

Chapter 11 Problems

11.1. Determine the domains of the following functions

(a) 3(x− 4)3 + 2x2 +4x

3x− 1+

6

(x− 1)2(d)

(2x− 3) 2x

4x+ 6

(b) 2 +4

x− 6x+ 4

(x− 2)(2x+ 1)(e)

6x− 1

(2 − 3x)(4 + x)

(c) x3

(

1 +1

x

)

(f)4

x+ 2+

6

x2 + 3x+ 2


11.2. Write the following linear combinations using the sigma notation anddetermine the domain of the result.

(a) 2x(x− 1) + 3x2(x− 1)2 + 4x3(x− 1)3 + · · · + 100x101(x− 1)101

(b)2

x− 1+

4

x− 2+

8

x− 3+ · · · + 8192

x− 13

11.3. (a) Let f(x) = ax + b, where a and b are numbers and show that f(x +y) = f(x) + f(y) for all numbers x and y. (b) Let g(x) = x2 and show thatg(x+ y) = g(x) + g(y) unless x and y have special values.

11.4. Use polynomial division on the following rational functions to show thatthe denominator divides the numerate exactly or to compute the remainder ifnot.

(a)x2 + 2x− 3

x− 1(b)

2x2 − 7x− 4

2x+ 1

(c)4x2 + 2x− 1

x+ 6(d)

x3 + 3x2 + 3x+ 2

x+ 2

(e)5x3 + 6x2 − 4

2x2 + 4x+ 1(f)

x4 − 4x2 − 5x− 4

x2 + x+ 1

(g)x8 − 1

x3 − 1(h)

xn − 1

x− 1, n in N

11.5. Given f1(x) = 3x−5, f2(x) = 2x2+1, and f3(x) = 4/x, write out formulasfor the following functions

(a) f1 f2 (b) f2 f3 (c) f3 f1 (d) f1 f2 f3

11.6. With f1(x) = 4x+ 2 and f2(x) = x/x2, show that f1 f2 = f2 f1.

11.7. Let f1(x) = ax + b and f2(x) = cx + d where a, b, c, and d are rationalnumbers. Find a condition on the numbers a, b, c, and d that implies that f1f2 =f2 f1 and produce an example that satisfies the condition.

11.8. For the given functions f1 and f2, determine the domain of f2 f1

(a) f1(x) = 4 − 1

xand f2(x) =

1

x2

(b) f1(x) =1

(x− 1)2− 4 and f2(x) =

x+ 1

x

12Lipschitz Continuity

Calculus required continuity, and continuity was supposed to requirethe infinitely little, but nobody could discover what the infinitelylittle might be. (Russell)

12.1 Introduction

When we graph a function f(x) of a rational variable x, we make a leap offaith and assume that the function values f(x) vary “smoothly” or “contin-uously” between the sample points x, so that we can draw the graph of thefunction without lifting the pen. In particular, we assume that the functionvalue f(x) does not make unknown sudden jumps for some values of x. Wethus assume that the function value f(x) changes by a small amount if wechange x by a small amount. A basic problem in Calculus is to measurehow much the function values f(x) may change when x changes, that is,to measure the “degree of continuity” of a function. In this chapter, we ap-proach this basic problem using the concept of Lipschitz continuity, whichplays a basic role in the version of Calculus presented in this book.

There will be a lot of inequalities (< and ≤) and absolute values (| · |)in this chapter, so it might be a good idea before you start to review therules for operating with these symbols from Chapter Rational numbers.

150 12. Lipschitz Continuity

Fig. 12.1. Rudolph Lipschitz (1832–1903), Inventor of Lipschitz continuity: “In-deed, I have found a very nice way of expressing continuity. . .”

12.2 The Lipschitz Continuity of a Linear Function

To start with we consider the behavior of a linear polynomial. The valueof a constant polynomial doesn’t change when we change the input, so thelinear polynomial is the first interesting example to consider. Suppose thelinear function is f(x) = mx + b, with m ∈ Q and b ∈ Q given, and letf(x1) = mx1 + b and f(x2) = mx2 + b to be the function values valuesfor x = x1 and x = x2. The change in the input is |x2 − x1| and for thecorresponding change in the output |f(x1) − f(x2)|, we have

|f(x2) − f(x1)| = |(mx2 + b) − (mx1 + b)| = |m(x2 − x1)| = |m||x2 − x1|.(12.1)

In other words, the absolute value of the change in the function values|f(x2) − f(x1)| is proportional to the absolute value of the change in theinput values |x2 − x1| with constant of proportionality equal to the slope|m|. In particular, this means that we can make the change in the outputarbitrarily small by making the change in the input small, which certainlyfits our intuition that a linear function varies continuously.

Example 12.1. Let f(x) = 2x give the total number of miles for an “outand back” bicycle ride that is x miles one way. To increase a given ride bya total of 4 miles, we increase the one way distance x by 4/2 = 2 mileswhile to increase a ride by a total of .01 miles, we increase the one waydistance x by .005 miles.

We now make an important observation: the slope m of the linear func-tion f(x) = mx + b determines how much the function values change as

12.3 The Definition of Lipschitz Continuity 151

the input value x changes. The larger |m| is, the steeper the line is, andthe more the function changes for a given change in input. We illustrate inFig. 12.2.

y = 3x

y = 1/2x

Fig. 12.2. These two linear functions which change a different amount for a givenchange in input

Example 12.2. Suppose that f1(x) = 4x + 1 while f2(x) = 100x − 5. Toincrease the value of f1(x) at x by an amount of .01, we change the valueof x by .01/4 = .0025. On the other hand, to change the value of f2(x) atx by an amount of .01, we change the value of x by .01/100 = .0001.

12.3 The Definition of Lipschitz Continuity

We are now prepared to introduce the concept of Lipschitz continuity,designed to measure change of function values versus change in the in-dependent variable for a general function f : I → Q where I is a setof rational numbers. Typically, I may be an interval of rational numbersx ∈ Q : a ≤ x ≤ b for some rational numbers a and b. If x1 and x2 are twonumbers in I, then |x2−x1| is the change in the input and |f(x2)−f(x1)| isthe corresponding change in the output. We say that f is Lipschitz contin-uous with Lipschitz constant Lf on I, if there is a (necessarily nonnegative)constant Lf such that

|f(x1) − f(x2)| ≤ Lf |x1 − x2| for all x1, x2 ∈ I. (12.2)

As indicated by the notation, the Lipschitz constant Lf depends on thefunction f , and thus may vary from being small for one function to belarge for another function. If Lf is small, then f(x) may change only a little


with a small change of x, while if Lf is large, then f(x) may change a lotunder only a small change of x. Again: Lf may vary from small to largedepending on the function f .

Example 12.3. A linear function f(x) = mx + b is Lipschitz continuouswith Lipschitz constant Lf = |m| on the entire set of rational numbers Q.

Example 12.4. We show that f(x) = x2 is Lipschitz continuous on theinterval I = [−2, 2] with Lipschitz constant Lf = 4. We choose two rationalnumbers x1 and x2 in [−2, 2]. The corresponding change in the functionvalues is

|f(x2) − f(x1)| = |x22 − x2

1|.The goal is to estimate this in terms of the difference in the input values|x2 −x1|. Using the identity for products of polynomials derived in Section10.6, we get

|f(x2) − f(x1)| = |x2 + x1| |x2 − x1|. (12.3)

We have the desired difference on the right, but it is multiplied by a factorthat depends on x1 and x2. In contrast, the analogous relationship (12.1)for the linear function has a factor that is constant, namely |m|. At thispoint, we have to use the fact that x1 and x2 are in the interval [−2, 2],which means that

|x2 + x1| ≤ |x2| + |x1| ≤ 2 + 2 = 4,

by the triangle inequality. We conclude that

|f(x2) − f(x1)| ≤ 4|x2 − x1|

for all x1 and x2 in [−2, 2].

Lipschitz continuity quantifies the idea of continuous behavior of a func-tion f(x) using the Lipschitz constant Lf . We repeat: If Lf is moderatelysized then small changes in input x yield small changes in the function’soutput f(x), but a large Lipschitz constant means that the function’s val-ues f(x) may make a large change when the input values x change by onlya small amount.

However it is important to note that there is a certain amount of impre-cision inherit to the definition of Lipschitz continuity (12.2) and we have tobe circumspect about drawing conclusions when the Lipschitz constant islarge. The reason is that (12.2) is only an upper estimate on how muchthe function changes and the actual change might be much smaller thanindicated by the constant.

Example 12.5. From Example 12.4, we know that f(x) = x2 is Lipschitzcontinuous on I = [−2, 2] with Lipschitz constant Lf = 4. It is also Lips-chitz constant on I with Lipschitz constant Lf = 121 since

|f(x2) − f(x1)| ≤ 4|x2 − x1| ≤ 121|x2 − x1|.

12.3 The Definition of Lipschitz Continuity 153

But the second value of Lf greatly overestimates the change in f , whereasthe value Lf = 4 is just about right when x1 and x2 are near 2 since22 − 1.92 = .39 = 3.9 × (2 − 1.9) and 3.9 ≈ 4.

To determine the Lipschitz constant, we have to make some estimates andthe result can vary greatly depending on how difficult the estimates are tocompute and our skill at making estimates.

It is also important to note that the size and location of the interval inthe definition is important and if we change the interval then we expect toget a different Lipschitz constant Lf .

Example 12.6. We show that f(x) = x2 is Lipschitz continuous on theinterval I = [2, 4], with Lipschitz constant Lf = 8. Starting with (12.3), forx1 and x2 in [2, 4] we have

|x2 + x1| ≤ |x2| + |x1| ≤ 4 + 4 = 8

so|f(x2) − f(x1)| ≤ 8|x2 − x1|

for all x1 and x2 in [2, 4].

The reason that the Lipschitz constant is bigger in the second example isclear from the graph, see Fig. 12.3, where we show the change in f corre-sponding to equal changes in x near x = 2 and x = 4. Because f(x) = x2 issteeper near x = 4, f changes more near x = 4 for a given change in input.

−4 −2 0 2 4

16

Fig. 12.3. The change in f(x) = x2 for equal changes in x near x = 2 and x = 4


Example 12.7. f(x) = x2 is Lipschitz continuous on I = [−8, 8] with Lips-chitz constant Lf = 16 and on I = [−400, 200] with Lf = 800.

In all of the examples involving f(x) = x2, we use the fact that theinterval under consideration is of finite size. A set of rational numbers Iis bounded with size a if |x| ≤ a for all x in I, for some (finite) rationalnumber a.

Example 12.8. The set of rational numbers I = [−1, 500] is bounded butthe set of even integers is not bounded.

While linear functions are Lipschitz continuous on the unbounded set Q,functions that are not linear are usually only Lipschitz continuous onbounded sets.

Example 12.9. The function f(x) = x2 is not Lipschitz continuous on theset Q of rational numbers. This follows from (12.3) because |x1 + x2| canbe made arbitrarily large by choosing x1 and x2 freely in Q, so it is notpossible to find a constant Lf such that

|f(x2) − f(x1)| = |x2 + x1||x2 − x1| ≤ Lf |x2 − x1|

for all x1 and x2 in Q.

The definition of Lipschitz continuity is due to the German mathemati-cian Rudolph Lipschitz (1832–1903), who used his concept of continuity toprove existence of solutions to some important differential equations. Thisis not the usual definition of continuity used in Calculus courses, whichis purely qualitative, while Lipschitz continuity is quantitative. Of coursethere is a strong connection, and a function which is Lipschitz continuousis also continuous according to the usual definition of continuity, while theopposite may not be true: Lipschitz continuity is a somewhat more de-manding property. However, quantifying continuous behavior in terms ofLipschitz continuity simplifies many aspects of mathematical analysis andthe use of Lipschitz continuity has become ubiquitous in engineering andapplied mathematics. It also has the benefit of eliminating some rathertechnical issues in defining continuity that are tricky yet unimportant inpractice.

12.4 Monomials

Continuing the investigation of continuous functions, we next show thatthe monomials are Lipschitz continuous on bounded intervals, as we expectbased on their graphs.

12.4 Monomials 155

Example 12.10. We show that the function f(x) = x4 is Lipschitz continu-ous on I = [−2, 2] with Lipschitz constant Lf = 32. We choose x1 and x2

in I and we want to estimate

|f(x2) − f(x1)| = |x42 − x4

1|

in terms of |x2 − x1|.To do this we first show that

x42 − x4

1 = (x2 − x1)(x32 + x2

2x1 + x2x21 + x3

1)

by multiplying out

(x2 − x1)(x32 + x2

2x1 + x2x21 + x3

1)

= x42 + x3

2x1 + x22x

21 + x2x

31 − x3

2x1 − x22x

21 − x2x

31 − x4

1

and then cancelling the terms in the middle to get x42 − x4

1.

This means that

|f(x2) − f(x1)| = |x32 + x2

2x1 + x2x21 + x3

1| |x2 − x1|.

We have the desired difference |x2 − x1| on the right and we just have tobound the factor |x3

2 + x22x1 + x2x

21 + x3

1|. By the triangle inequality

|x32 + x2

2x1 + x2x21 + x3

1| ≤ |x2|3 + |x2|2|x1| + |x2||x1|2 + |x1|3.

Now because x1 and x2 are in I, |x1| ≤ 2 and |x2| ≤ 2, so

|x32 + x2

2x1 + x2x21 + x3

1| ≤ 23 + 22 2 + 2 22 + 23 = 32

and|f(x2) − f(x1)| ≤ 32|x2 − x1|.

Recall that the Lipschitz constant of f(x) = x2 on I is Lf = 4. The factthat the Lipschitz constant of x4 is larger than the constant for x2 on [−2, 2]is not surprising considering the plots of the two functions, see Fig. 10.12.

We can use the same technique to show that the function f(x) = xm isLipschitz continuous where m is any natural number.

Example 12.11. The function f(x) = xm is Lipschitz continuous on anyinterval I = [−a, a], where a is a positive rational number, with Lipschitzconstant Lf = mam−1. Given x1 and x2 in I, we want to estimate

|f(x2) − f(x1)| = |xm2 − xm

1 |

in terms of |x2 − x1|. We can do this using the fact that

xm2 − xm

1 = (x2 − x1)(xm−12 + xm−2

2 x1 + · · · + x2xm−21 + xm−1

1 )

= (x2 − x1)m−1∑

i=0

xm−1−i2 xi

1.


We show this by first multiplying out

(x2 − x1)m−1∑

i=0

xm−1−i2 xi

1 =m−1∑

i=0

xm−i2 xi

1 −m−1∑

i=0

xm−1−i2 xi+1

1

To see that there is a lot of cancellation among the terms in the middle inthe two sums on the right, we separate the first term out of the first sumand the last term in the second sum

(x2 − x1)m−1∑

i=0

xm−1−i2 xi

1 = xm2 +

m−1∑

i=1

xm−i2 xi

1 −m−2∑

i=0

xm−1−i2 xi+1

1 − xm1

and then changing the index in the second sum to get

(x2 − x1)m−1∑

i=0

xm−1−i2 xi

1

= xm2 +

m−1∑

i=1

xm−i2 xi

1 −m−1∑

i=1

xm−i2 xi

1 − xm1 = xm

2 − xm1 .

This is tedious, but it is good practice to go through the details and makesure this argument is correct.

This means that

|f(x2) − f(x1)| =

∣∣∣∣∣

m−1∑

i=0

xm−1−i2 xi

1

∣∣∣∣∣|x2 − x1|.

We have the desired difference |x2 − x1| on the right and we just have tobound the factor ∣

∣∣∣∣

m−1∑

i=0

xm−1−i2 xi

1

∣∣∣∣∣.

By the triangle inequality∣∣∣∣∣

m−1∑

i=0

xm−1−i2 xi

1

∣∣∣∣∣≤

m−1∑

i=0

|x2|m−1−i|x1|i.

Now because x1 and x2 are in [−a, a], |x1| ≤ a and |x2| ≤ a. So∣∣∣∣∣

m−1∑

i=0

xm−1−i2 xi

1

∣∣∣∣∣≤

m−1∑

i=0

am−1−iai =m−1∑

i=0

am−1 = mam−1.

and|f(x2) − f(x1)| ≤ mam−1|x2 − x1|.

12.5 Linear Combinations of Functions 157

12.5 Linear Combinations of Functions

Now that we have seen that the monomials are Lipschitz continuous ona given bounded interval, it is a short step to show that any polynomialis Lipschitz continuous on a given bounded interval. But rather than justdo this for polynomials, we show that a linear combination of arbitraryLipschitz continuous functions is Lipschitz continuous

Suppose that f1 is Lipschitz continuous with constant L1 and f2 is Lip-schitz continuous with constant L2 on the interval I. Note that here (andbelow) we condense the notation and write e.g. L1 instead of Lf1 . Thenf1 + f2 is Lipschitz continuous with constant L1 + L2 on I, because if wechoose two points x and y in I, then

|(f1 + f2)(y) − (f1 + f2)(x)| = |(f1(y) − f1(x)) + (f2(y) − f2(x))|≤ |f1(y) − f1(x)| + |f2(y) − f2(x)|≤ L1|y − x| + L2|y − x|= (L1 + L2)|y − x|

by the triangle inequality. The same argument shows that f2−f1 is Lipschitzcontinuous with constant L1 +L2 as well (not L1−L2 of course!). It is eveneasier to show that if f(x) is Lipschitz continuous on an interval I withLipschitz constant L then cf(x) is Lipschitz continuous on I with Lipschitzconstant |c|L.

From these two facts, it is a short step to extend the result to any linearcombination of Lipschitz continuous functions. Suppose that f1, · · · , fn areLipschitz continuous on I with Lipschitz constants L1, · · · , Ln respectively.We use induction, so we begin by considering the linear combination of twofunctions. From the remarks above, it follows that c1f1 + c2f2 is Lipschitzcontinuous with constant |c1|L1 + |c2|L2. Next given i ≤ n, we assume thatc1f1 + · · · + ci−1fi−1 is Lipschitz continuous with constant |c1|L1 + · · · +|ci−1|Li−1. To prove the result for i, we write

c1f1 + · · · + cifi =(c1f1 + · · · + ci−1fi−1

)+ cnfn.

But the assumption on(c1f1 + · · ·+ ci−1fi−1

)means that we have written

c1f1 + · · · + cifi as the sum of two Lipschitz continuous functions, namely(c1f1 + · · · + ci−1fi−1

)and cnfn. The result follows by the result for the

linear combination of two functions. By induction, we have proved

Theorem 12.1 Suppose that f1, · · · , fn are Lipschitz continuous on I withLipschitz constants L1, · · · , Ln respectively. Then the linear combinationc1f1+· · ·+cnfn is Lipschitz continuous on I with Lipschitz constant |c1|L1+· · · + |cn|Ln.

Corollary 12.2 A polynomial is Lipschitz continuous on any boundedinterval.


Example 12.12. We show that the function f(x) = x4 − 3x2 is Lipschitzcontinuous on [−2, 2], with constant Lf = 44. For x1 and x2 in [−2, 2], wehave to estimate

|f (x2) − f (x1)| =∣∣(x4

2 − 3x22

)−(x4

1 − 3x21

)∣∣

=∣∣(x4

2 − x41

)−(3x2

2 − 3x21

)∣∣

≤∣∣x4

2 − x41

∣∣ + 3

∣∣x2

2 − x21

∣∣ .

From Example 12.11, we know that x4 is Lipschitz continuous on [−2, 2]with constant 32 while x2 is Lipschitz continuous on [−2, 2] with Lipschitzconstant 4. Therefore

|f(x2) − f(x1)| ≤ 32|x2 − x1| + 3 × 4|x2 − x1| = 44|x2 − x1|.

12.6 Bounded Functions

Lipschitz continuity is related to another important property of a functioncalled boundedness. A function f is bounded on a set of rational numbersI if there is a constant M such that, see Fig. 12.4

|f(x)| ≤M for all x in I.

In fact if we think about the estimates we have made to verify the definitionof Lipschitz continuity (12.2), we see that in every case these involvedshowing that some function is bounded on the given interval.

x

y

y = f(x)

y = M

y = −M

Fig. 12.4. A bounded function on I

Example 12.13. To show that f(x) = x2 is Lipschitz continuous on [−2, 2]in Example 12.4, we proved that |x1 + x2| ≤ 4 for x1 and x2 in [−2, 2].

12.7 The Product of Functions 159

It turns out that a function that is Lipschitz continuous on a boundeddomain is automatically bounded on that domain. To be more precise,suppose that a function f is Lipschitz continuous with Lipschitz constantLf on a bounded set I with size a and choose a point y in I. Then for anyother point x in I

|f(x) − f(y)| ≤ Lf |x− y|.First we know that |x− y| ≤ |x| + |y| ≤ 2a. Also, since |b+ c| ≤ |d| meansthat |b| ≤ |d| + |c| for any numbers a, b, c, we get

|f(x)| ≤ |f(y)| + Lf |x− y| ≤ |f(y)| + 2Lfa.

Even though we don’t know |f(y)|, we do know that it is finite. This showsthat |f(x)| is bounded by the constant M = |f(y)| + 2Lfa for any x in Q.We express this by saying that f(x) is bounded on I. We have thus proved

Theorem 12.3 A Lipschitz continuous function on a bounded set I isbounded on I.

Example 12.14. In Example 12.12, we showed that f(x) = x4 + 3x2 isLipschitz continuous on [−2, 2] with Lipschitz constant Lf = 44. Usingthis argument, we find that

|f(x)| ≤ |f(0)| + 44|x− 0| ≤ 0 + 44 × 2 = 88

for any x in [−2, 2]. Since x4 is increasing for 0 ≤ x, in fact we know that|f(x)| ≤ |f(2)| = 16 for any x in [−2, 2]. So the estimate on the size of |f |using the Lipschitz constant is not very accurate.

12.7 The Product of Functions

The next step in investigating which functions are Lipschitz continuous isto consider the product of two Lipschitz continuous functions on a boundedinterval I. We show that the product is also Lipschitz continuous on I. Moreprecisely, if f1 is Lipschitz continuous with constant L1 and f2 is Lipschitzcontinuous with constant L2 on a bounded interval I then f1f2 is Lipschitzcontinuous on I. We choose two points x and y in I and estimate by usingthe old trick of adding and subtracting the same quantity

|f1(y)f2(y) − f1(x)f2(x)|= |f1(y)f2(y) − f1(y)f2(x) + f1(y)f2(x) − f1(x)f2(x)|≤ |f1(y)f2(y) − f1(y)f2(x)| + |f1(y)f2(x) − f1(x)f2(x)|= |f1(y)| |f2(y) − ff2(x)| + |f2(x)| |f1(y) − f1(x)|

Now Theorem 12.3, which says that Lipschitz continuous functions arebounded, implies there is some constant M such that |f1(y)| ≤ M and


|f2(x)| ≤ M for x, y ∈ I. Using the Lipschitz continuity of f1 and f2 in I,we find

|f1(y)f2(y) − f1(x)f2(x)| ≤ML1|y − x| +ML2|y − x|= M(L1 + L2)|y − x|.

We summarize

Theorem 12.4 If f1 and f2 are Lipschitz continuous on a bounded intervalI then f1f2 is Lipschitz continuous on I.

Example 12.15. The function f(x) = (x2 + 5)10 is Lipschitz continuouson the set I = [−10, 10] because x2 + 5 is Lipschitz continuous on I andtherefore (x2+5)10 = (x2+5)(x2+5) · · · (x2+5) is as well by Theorem 12.4.

12.8 The Quotient of Functions

Continuing our investigation, we now consider the ratio of two Lipschitzcontinuous functions. In this case however, we require more informationabout the function in the denominator than just that it is Lipschitz con-tinuous. We also have to know that it does not become too small. Tounderstand this, we first consider an example.

Example 12.16. We show that f(x) = 1/x2 is Lipschitz continuous on theinterval [1/2, 2], with Lipschitz constant L = 64. We choose two points x1

and x2 in Q and we estimate the change

|f(x2) − f(x1)| =∣∣∣∣

1x2

2

− 1x2

1

∣∣∣∣

by first doing some algebra

1x2

2

− 1x2

1

=x2

1

x21x

22

− x22

x21x

22

=x2

1 − x22

x21x

22

=(x1 + x2)(x1 − x2)

x21x

22

.

This means that

|f(x2) − f(x1)| =∣∣∣∣x1 + x2

x21x

22

∣∣∣∣ |x2 − x1|.

Now we have the good difference on the right, we just have to bound thefactor. The numerator of the factor is the same as in Example 12.4, andwe know that

|x1 + x2| ≤ 4.

We also know that

x1 ≥ 12

implies1x1

≤ 2 implies1x2

1

≤ 4

12.9 The Composition of Functions 161

and likewise 1x22≤ 4. So we get

|f(x2) − f(x1)| ≤ 4 × 4 × 4 |x2 − x1| = 64|x2 − x1|.

In this example, we have to use the fact that the left-hand endpoint of theinterval I is 1/2. The closer the left-hand endpoint is to zero, the largerthe Lipschitz constant will be. In fact, 1/x2 is not Lipschitz continuous on[0, 2].

We mimic this example in the general case f1/f2 by assuming that thedenominator f2 is bounded below by a positive constant. We give the proofof the following theorem as an exercise.

Theorem 12.5 Assume that f1 and f2 are Lipschitz continuous functionson a bounded set I with constants L1 and L2 and moreover assume thereis a constant m > 0 such that |f2(x)| ≥ m for all x in I. Then f1/f2 isLipschitz continuous on I.

Example 12.17. The function 1/x2 does not satisfy the assumptions ofTheorem 12.5 on the interval [0, 2] and we know that it is not Lipschitzcontinuous on that interval.

12.9 The Composition of Functions

We conclude the investigation into Lipschitz continuity by considering thecomposition of Lipschitz continuous functions. This is actually easier thaneither products or ratios of functions. The only complication is that wehave to be careful about the domains and ranges of the functions. Considerthe composition f2(f1(x)). Presumably, we have to restrict x to an intervalon which f1 is Lipschitz continuous and we also have to make sure that thevalues of f1 are in a set on which f2 is Lipschitz continuous.

So we assume that f1 is Lipschitz continuous on I1 with constant L1 andthat f2 is Lipschitz continuous on I2 with constant L2. If x and y are pointsin I1 then as long as f1(x) and f1(y) are in I2 then

|f2(f1(y)) − f2(f1(x))| ≤ L2|f1(y) − f1(x)| ≤ L1L2|y − x|.

We summarize as a theorem.

Theorem 12.6 Let f1 be Lipschitz continuous on a set I1 with Lipschitzconstant L1 and f2 be Lipschitz continuous on I2 with Lipschitz constantL2 such that f1(I1) ⊂ I2. Then the composite function = f2f1 is Lipschitzcontinuous on I1 with Lipschitz constant L1L2.

Example 12.18. The function f(x) = (2x− 1)4 is Lipschitz continuous onany bounded interval since f1(x) = 2x − 1 and f2(x) = x4 are Lipschitz


continuous on any bounded interval. If we consider the interval [−.5, 1.5]then f1(I) ⊂ [−2, 2]. From Example 12.10, we know that x4 is Lipschitzcontinuous on [−2, 2] with Lipschitz constant 32 while the Lipschitz con-stant of 2x− 1 is 2. Therefore, f is Lipschitz continuous on [−.5, 1.5] withconstant 64.

Example 12.19. The function 1/(x2 − 4) is Lipschitz continuous on anyclosed interval that does not contain either 2 or −2. This follows becausef1(x) = x2 − 4 is Lipschitz continuous on any bounded interval whilef2(x) = 1/x is Lipschitz continuous on any closed interval that does notcontain 0. To avoid zero, we must avoid x2 = 4 or x = ±2.

12.10 Functions of Two Rational Variables

Until now, we have considered functions f(x) of one rational variable x.But of course, there are functions that depend on more than one input.Consider for example the function

f(x1, x2) = x1 + x2,

which to each pair of rational numbers x1 and x2 associates the sum x1+x2.We may write this as f : Q × Q → Q, meaning that to each x1 ∈ Q andx2 ∈ Q we associate a value f(x1, x2) ∈ Q. For example, f(x1, x2) = x1+x2.We say that f(x1, x2) is a function of two independent rational variables x1

and x2. Here, we think of Q×Q as the set of all pairs (x1, x2) with x1 ∈ Q

and x2 ∈ Q.We shall write Q

2 = Q×Q and consider f(x1, x2) = x1+x2 as a functionf : Q

2 → Q. We will also consider functions f : I × J → Q, where I andJ are subsets such as intervals, of Q. This just means that for each x1 ∈ Iand x2 ∈ J , we associate a value f(x1, x2) ∈ Q.

We may naturally extend the concept of Lipschitz continuity to functionsof two rational variables. We say that f : I×J → Q is Lipschitz continuouswith Lipschitz constant Lf if

|f(x1, y1) − f(x2, y2)| ≤ Lf(|x1 − x2| + |y1 − y2|)

for x1, x2 ∈ I and y1, y2 ∈ J .

Example 12.20. The function f : Q2 → Q defined by f(x1, x2) = x1 + x2

is Lipschitz continuous with Lipschitz constant Lf = 1.

Example 12.21. The function f : [0, 2] × [0, 2] → Q defined by f(x1, x2) =x1x2 is Lipschitz continuous with Lipschitz constant Lf = 2, since forx1, x2 ∈ [0, 1]

|x1x2 − y1y2| = |x1x2 − y1x2 + y1x2 − y1y2|≤ |x1 − y1|x2 + y1|x2 − y2| ≤ 2(|x1 − y1| + |x2 − y2|).

12.11 Functions of Several Rational Variables 163

12.11 Functions of Several Rational Variables

The concept of a function also extends to several variables, i.e. we con-sider functions f(x1, . . . , xd) of d rational variables. We write f : R

d → Q

if for given rational numbers x1, · · · , xd, a rational number denoted byf(x1, . . . , xd) is given.

The definition of Lipschitz continuity also directly extends. We say thatf : Q

d → Q is Lipschitz continuous with Lipschitz constant Lf if for allx1, · · · , xd ∈ Q and y1, · · · , yd ∈ Q,

|f(x1, . . . , xd) − f(y1, . . . , yd)| ≤ Lf (|x1 − y1| + · · · + |xd − yd|).

Example 12.22. The function f : Rd → Q defined by f(x1, . . . , xd) = x1 +

x2 + · · ·xd is Lipschitz continuous with Lipschitz constant Lf = 1.

Chapter 12 Problems

12.1. Verify the claims in Example 12.7.

12.2. Show that f(x) = x2 is Lipschitz continuous on [10, 13] directly andcompute a Lipschitz constant.

12.3. Show that f(x) = 4x− 2x2 is Lipschitz continuous on [−2, 2] directly andcompute a Lipschitz constant.

12.4. Show that f(x) = x3 is Lipschitz continuous on [−2, 2] directly and com-pute a Lipschitz constant.

12.5. Show that f(x) = |x| is Lipschitz continuous on Q directly and computea Lipschitz constant.

12.6. In Example 12.10, we show that x4 is Lipschitz continuous on [−2, 2]with Lipschitz constant L = 32. Explain why this is a reasonable value for theLipschitz constant.

12.7. Show that f(x) = 1/x2 is Lipschitz continuous on [1, 2] directly andcompute the Lipschitz constant.

12.8. Show that f(x) = 1/(x2 + 1) is Lipschitz continuous on [−2, 2] directlyand compute a Lipschitz constant.

12.9. Compute the Lipschitz constant of f(x) = 1/x on the intervals (a) [.1, 1],(b) [.01, 1], and [.001, 1].

12.10. Find the Lipschitz constant of the function f(x) =√x with D(f) =

(δ,∞) for given δ > 0.


12.11. Explain why f(x) = 1/x is not Lipschitz continuous on (0, 1].

12.12. (a) Explain why the function

f(x) =

1, x < 0

x2, x ≥ 0

is not Lipschitz continuous on [−1, 1]. (b) Is f Lipschitz continuous on [1, 4]?

12.13. Suppose the Lipschitz constant L of a function f is equal to L = 10100.Discuss the continuity properties of f(x) and in particular decide if f continuousfrom a practical point of view.

12.14. Assume that f1 is Lipschitz continuous with constant L1, f2 is Lipschitzcontinuous with constant L2 on a set I , and c is a number. Show that f1 − f2 isLipschitz continuous with constant L1 +L2 on I and cf1 is Lipschitz continuouswith constant cL1 on I .

12.15. Show that the Lipschitz constant of a polynomial f(x) =∑ni=0 aix

i onthe interval [−c, c] is

L =

n∑

i=1

|ai|ici−1 = |a1| + 2c|a2| + · · · + ncn−1|an|.

12.16. Explain why f(x) = 1/x is not bounded on [−1, 0].

12.17. Prove Theorem 12.5.

12.18. Use the theorems in this chapter to show that the following functionsare Lipschitz continuous on the given intervals and try to estimate a Lipschitzconstant or prove they are not Lipschitz continuous.

(a) f(x) = 2x4 − 16x2 + 5x on [−2, 2] (b)1

x2 − 1on

[

−1

2,1

2

]

(c)1

x2 − 2x− 3on [2, 3) (d)

(

1 +1

x

)4

on [1, 2]

12.19. Show the function

f(x) =1

c1x+ c2(1 − x)

where c1 > 0 and c2 > 0 is Lipschitz continuous on [0, 1].

13Sequences and limits

He sat down and thought, in the most thoughtful way he could think.(Winnie-the-Pooh)

13.1 A First Encounter with Sequences and Limits

The decimal expansions of rational numbers discussed in chapter RationalNumbers leads into the concepts sequence, converging sequence and limit ofa sequence, which play a fundamental role in mathematics. The develop-ment of calculus has largely been a struggle to come to grips with certainevasive aspects of these concepts. We will try to uncover the mysteries bybeing as concrete and down-to-earth as possible.

We begin recalling the decimal expansion 1.11 . . . of 109 , and that by (7.7)

109

= 1.11 · · ·11n +1910−n. (13.1)

Rewriting this equation and replacing for simplicity 1910−n by the up-

per bound 10−n, we get the following estimate for the difference between1.111 · · ·11n and 10/9,

∣∣∣∣109

− 1.11 · · ·11n

∣∣∣∣ ≤ 10−n. (13.2)

This estimate shows that we may consider 1.11 · · ·11n as an approximationof 10/9, which becomes increasingly accurate as the number of decimal

166 13. Sequences and limits

places n increases. In other words, the error |10/9 − 1.11 · · ·11n| can bemade as small as we please by taking n sufficiently large. If we want theerror to be smaller than or equal to 10−10, then we simply choose n ≥ 10.

We may view the successive approximations 1.1, 1.11, 1.111, 1.11 . . .11n,and so on, as a sequence of numbers an, with n = 1, 2, 3, . . ., where a1 = 1.1,a2 = 1.11, . . ., an = 1.11 . . .11n, . . ., are called the elements of the sequence.More generally, a sequence a1, a2, a3, . . ., is a never-ending list of elementsa1, a2, a3, . . ., where the index takes successively the values of the naturalnumbers 1, 2, 3, . . .. A sequence of rational numbers is a list a1, a2, a3, . . .,where each element an is a rational number. We will denote a sequence by

an∞n=1

which thus means the never ending list a1, a2, a3, . . ., of elements an,with the index n going through the natural numbers n = 1, 2, 3, . . .. Thesymbol ∞, called “infinity”, indicates that the list continues for ever in thesame sense that the natural numbers 1, 2, 3, . . . , continues for ever withoutcoming to an end.

We now return to the sequence of rational numbers an∞n=1, wherean = 1.11..11n, that is the sequence 1.11..11n∞n=1. The accuracy of ele-ment an = 1.11 . . . 11n, as an approximation of 10

9 , increases as the numberof decimals n increases. Each number in the sequence in turn is a betterapproximation to 10/9 than the preceding number and as we move from leftto right the numbers become ever closer to 10/9. An advantage of consider-ing the sequence 1.11..11n∞n=1 or never ending list 1.1, 1.11, 1.111,. . . , isthat we are ready to meet any accuracy requirement that could be posed.If we just consider one element, say 1.11..1110, then we could not meet anaccuracy requirement in the approximation of 10

9 of say 10−15. But if wehave the whole sequence at hand, then we can pick the element 1.11..1116

or 1.11..1117 or more generally any 1.11..11n with n ≥ 15, as a decimal ap-proximation of 10

9 with an error less than 10−15. The sequence thus givesus a whole “bag” of numbers, or a collection of approximations with whichwe can meet any accuracy requirement in the approximation of 10

9 . Thesequence 1.1, . . . , 1.11..11n, . . . , thus can be viewed as a collection of suc-cessively more accurate approximations of 10

9 , where we can satisfy anydesired accuracy.

We say that the sequence 1.11..11n∞n=1 converges to the value 109 , since

the difference between 109 and 1.11..11n becomes smaller than any given

positive number if only we take n large enough, as follows from (13.2).We say that 10

9 is the limit of the sequence an∞n=1=1, 11..11n∞n=1. Wewill express the convergence of the sequence an∞n=1 with elements an =1.11 . . .11n, as follows:

limn→∞

an =109

or limn→∞

1.11..11n =109.

13.2 Socket Wrench Sets 167

The limit 109 does not have a finite decimal expansion. The elements

1.11..11n of the converging sequence 1.11..11n∞n=1 are finite decimal ap-proximations of the limit 10

9 , with an error which is smaller than any givenpositive number if we only take n large enough.

Suppose that we restrict ourselves to work with finite decimal expansions,which is what a computer usually does. In this case we cannot exactlyexpress the value 10

9 with the available resources, because 109 does not have

a finite decimal expansion. As a substitute or approximation we may choosefor example 1.11..1110, but there is limit to the accuracy with this singleelement. It would not be entirely correct to say that 10

9 = 1.11..1110. Ifwe instead have the whole sequence 1.11..11n∞n=1 at hand, then we canmeet any accuracy by choosing the element 1.11..11n with n large enough.Choosing more and more decimals, we could increase the accuracy to anydesired degree.

The sequence 1.11..11n∞n=1 includes finite decimal approximations of109 satisfying any given positive tolerance or accuracy requirement. This issometimes expressed as

1.111 . . . =109,

where the three little dots are there to indicate that any precision couldbe attained by taking sufficiently many decimals (all equal to 1). Anotherway of writing this, would be

limn→∞

1.11..11n =109,

avoiding the possible ambiguity using the three little dots.

13.2 Socket Wrench Sets

To tighten or loosen a hex bolt with head diameter 2/3, a mechanic needs touse a socket wrench of a slightly bigger size. The tolerance on the differencebetween the sizes of the bolt and the wrench depend on the tightness, thematerial of the bolt and the wrench, and conditions such as whether thebolt threads are lubricated and whether the bolt is rusty or not. If thewrench is too large then the head of the bolt will simply be stripped beforethe bolt is tightened or loosened. We show two wrenches with differenttolerances in Fig. 13.1.

An amateur mechanic would have one socket, of say dimension 0.7.A pro mechanic would perhaps have 10 sockets of dimensions 0.7, 0.67,0.667,. . . ,0.66 · · ·66710. Both the amateur and pro would get stuck undersufficiently tough conditions because the socket would be too large to dothe job.


Fig. 13.1. Two socket wrenches with different tolerances

A ideal expert mechanic would have the whole sequence 0.66 · · ·67n∞n=1

at his/her disposal with the error of wrench number n being estimated by

∣∣∣∣0.66 · · ·67n − 2

3

∣∣∣∣ ≤ 10−n.

The ideal expert can thus reach into the tool chest and pull out a wrenchthat meets any accuracy requirement, and would thus be able to turn thebolt under arbitrarily tough conditions, or meet any crank torque specifiedby a bicycle manufacturer. More precisely, the ideal expert could be thoughtof as being able to construct a socket himself to meet any given tolerance oraccuracy. If necessary, the ideal mechanic could construct a wrench of forexample the dimension 0.66 · · ·6720, unless he already has such a wrenchin his (big) tool chest. The amateur and pro mechanic would not have thiscapability of constructing their own wrenches, but would have to be contentwith their ready-made wrench sets (which they could buy in the hard-warestore). We expect the cost to construct a wrench of dimension 0.6 · · ·67n

to increase (rapidly) with n, since the precision in the construction processhas to improve.

As a general point, computing the numbers 0.66 · · ·67n by long divisionof 2

3 , requires more work as n increases. What we gain from doing morework is better accuracy in using 0.66 · · ·67n as an approximation to 2

3 .Trading work for accuracy is the central idea behind solving equationsusing computation, especially on a computer. An estimate like (13.2) givesa quantitative measurement of how much accuracy we gain for each increasein work and so such estimates are useful not only to mathematicians butto engineers and scientists.

The need of approximating better and better in this case may be seen asan incompatibility of two systems: the bolt has dimension 2

3 in the systemof rational numbers, and the wrenches come in the decimal system 0.7,0.67, 0.667,. . . and there is no wrench of size exactly 2

3 .

13.3 J.P. Johansson’s Adjustable Wrenches 169

13.3 J.P. Johansson’s Adjustable Wrenches

The adjustable wrench is a Swedish invention created 1891 by the geniusJ.P. Johansson (1839–1924) see Fig. 13.2. In principle the adjustable wrenchis an analog device which fits a bolt of any size within a certain range. Everymechanic knows that an adjustable wrench may fail in cases when a prop-erly chosen fixed size wrench does not, because the size of the adjustablewrench is not completely stable under increasing torque.

Fig. 13.2. The Swedish inventor J.P. Johansson with two adjustable wrenchesof different design

13.4 The Power of Language:From Infinitely Many to One

The decimal expansion 0.6666 . . . of 23 contains infinitely many decimals.

The sequence 0.66 · · ·667n∞n=1 contains infinitely many elements, whichare increasingly accurate approximations of 2

3 . Talking or thinking of in-finitely many decimals or infinitely many elements, presents a serious dif-ficulty, which is handled by introducing the concept of a sequence. A se-quence has infinitely many elements, but the sequence itself is just oneentity. We thus group the infinitely many elements together to form onesequence, and thus pass from infinity to one. After this semantic construc-


tion, we are thus able to speak about one sequence and may momentarilyforget that the sequence in fact has infinitely many elements.

This would be like speaking about the expert mechanics tool chest con-taining the sequence 0.66 · · ·667n∞n=1 of infinitely many wrenches as oneentity. One tool chest with infinitely many wrenches. To call a tool chesta wrench seems strange initially, but we could get there by first callingthe tool chest something like a “super-wrench”, and then later omit the“super”.

Analogously, we could say that 0.6666 . . . is a “super-number” becauseit has infinitely many decimals, and then forget the “super” and say that0.6666 . . . , is a number. In fact, this makes complete sense since we identify0.6666 . . . with 2

3 , which is a number. Below, we shall meet non-periodicinfinite decimal expansions that do not correspond to rational numbers.Initially, we may think of these as some kind of “super-number”, and thenlater will refer such numbers as “real numbers”.

The discussion illustrates the usefulness of the concept of one set orsequence with infinitely many elements. Of course, we should be aware ofthe risk involved using the language to hide real facts. Political languageis often used this way, which is one reason for the eroding credibility ofpoliticians. As mathematicians, there is no reason that we should try to beas honest as possible, and use the language as clearly as possible.

13.5 The ε − N Definition of a Limit

The mathematical formulation of the idea of a limit says that the terms an

of a convergent sequence an∞n=1 differ from the limit A with as little aswe please if only the index n is large enough, and we decided to write thisas

limn→∞

an = A,

There is a mathematical jargon to express this fact that has become ex-tremely popular. It was developed by Karl Weierstrass (1815–97), seeFig. 13.3 and takes the following form: The limit of the sequence an∞n=1

equals A, which we write as

limn→∞

an = A,

if for any (rational) ε > 0 there is a natural number N such that

|an −A| ≤ ε for all n ≥ N

For example, we know that the value of 10/9 is approximated by the element1.11 · · ·1n from the sequence 1.11 · · ·1n to any specified accuracy (biggerthan zero) by taking n sufficiently large. We know from (13.2) that

∣∣∣∣109

− 1.11 · · ·1n

∣∣∣∣ ≤ 10−n,

13.5 The ε−N Definition of a Limit 171

Fig. 13.3. Weierstrass to Sonya Kovalevskaya: “. . . dreamed and been enrap-tured of so many riddles that remain for us to solve, on finite and infinite spaces,on the stability of the world system, and on all the other major problems of themathematics and the physics of the future. . . . you have been close . . . throughoutmy entire life . . . and never have I found anyone who could bring me such under-standing of the highest aims of science and such joyful accord with my intentionsand basic principles as you”

and thus ∣∣∣∣109

− 1.11 · · ·1n

∣∣∣∣ ≤ ε

if 10−n ≤ ε. We can phrase this as∣∣∣∣109

− 1.11 · · ·1n

∣∣∣∣ ≤ ε

if n ≥ N , where 10−N ≤ ε. If ε = .p1p2 · · · , where p1 = p2 = · · · = pm = 0,while pm+1 = 0, then we may choose any N such that N ≥ m. We see thatchoosing ε smaller, requires N to be bigger, and thus N depends on ε.

We emphasize that the ε − N definition of convergence is a fancy wayof saying that the difference |A− an| can be made smaller than any givenpositive number if only n is taken large enough.

There is a risk (and temptation) in using the ε−N definition of conver-gence, instead of the more pedestrian “as small as we please if only n islarge enough”. The statement “|A−an| can be made smaller than any givenpositive number if only n is large enough” is a very qualitative statement.Nothing is said about how large n has to be to reach a certain accuracy.A very qualitative statement is necessarily a bit vague. On the other hand,the statement “for any ε > 0 there is an N such that |A − an| ≤ ε ifn ≥ N” has the form of a very exact and precise statement, while in factit may be as qualitative as the first statement, unless the dependence of Non ε is made clear. The risk is thus that using the ε − N -jargon, we may


get confused and believe that something vague, in fact is very precise. Ofcourse there is also a temptation in this, which relates to the general ideaof mathematics as something being extremely precise. So be cautious anddon’t get fooled by simple tricks: the ε−N limit definition is vague to theextent the dependence of N on ε is vague.

The concept of a limit of a sequence of numbers is central to calculus.It is closely connected to never-ending decimal expansions, that is decimalexpansions with infinitely many non-zero decimals. The elements in thesequence with this connection are obtained by successively taking more andmore decimals into account. In fact, the fundamental reason for lookingat sequences comes form this connection. However, as happens, the ideaof a sequence and limit has taken on a life of its own, which has beenplaguing many students of calculus. We will try to refrain from excessesin this direction and keep a strong connection with the original motivationfor introducing the concepts of sequences and limits, namely describingsuccessively better and better approximations of solutions of equations.

We shall now practice the ε−N jargon in a couple of examples to showthat certain sequences have limits. The sequences we present are “artifi-cial”, that is given by cooked-up formulas, but we use them to illustratebasic aspects. After going through these examples, the reader should beable to look through the apparent mystery of the ε − N definition, andunderstand that it expresses something intuitively quite simple. But re-member: the ε − N definition of a limit is vague to the extent that thedependence of N on ε is vague.

Example 13.1. The limit of the sequence 1n∞n=1 equals 0, i.e.

limn→∞

1n

= 0.

Note that this is obvious simply because 1n can be made as close to 0 as

we please by taking n large enough. We shall now phrase this obvious (andtrivial fact) using the ε − N jargon. We thus have to satisfy the deviousmathematician who gives an ε > 0 and asks for a natural number N suchthat ∣

∣∣∣1n− 0

∣∣∣∣ ≤ ε (13.3)

for all n ≥ N . Well, to satisfy this request we choose N to be any naturalnumber larger than (or equal to) 1/ε, for instance the smallest naturalnumber larger than or equal to 1/ε. Then (13.3) holds for n ≥ N , andwe have satisfied the devious demand, which shows that limn→∞

1n = 0.

In this example, the connection between ε and N is very clear: we cantake N to be the smallest natural number larger than or equal to 1/ε. Forexample, if ε = 1/100, then N = 100. We hope the reader can make theconnection of the simple idea that 1/n gets as close to 0 as we please by

13.5 The ε−N Definition of a Limit 173

taking n sufficiently large, and the more pompous phrasing of this idea inthe ε−N -jargon.

Example 13.2. We next show that the limit of the sequence nn+1∞n=1 =

12 ,

23 , · · · equals 1, that is

limn→∞

n

n+ 1= 1. (13.4)

We compute∣∣∣∣1 − n

n+ 1

∣∣∣∣ =

∣∣∣∣n+ 1 − n

n+ 1

∣∣∣∣ =

1n+ 1

,

which shows that nn+1 is arbitrarily close to 1 if n is large enough, and thus

proves the claim. We now phrase this using the ε−N jargon. Let thus ε > 0be given. Now 1

n+1 ≤ ε provided that n ≥ 1/ε− 1. Hence∣∣∣1 − n

n+1

∣∣∣ ≤ ε for

all n ≥ N provided N is chosen so that N ≥ 1/ε− 1. Again this proves theclaim.

Example 13.3. The sum

1 + r + r2 + · · · + rn =n∑

i=0

ri = sn

is said to be a finite geometric series of order n with factor r, including thepowers ri of the factor r up to i = n. We considered this series above withr = 0.1 and sn = 1.11 · · ·11n. We now consider an arbitrary value of thefactor r in the range |r| < 1. We recall the formula

sn =n∑

i=0

ri =1 − rn+1

1 − r

valid for any r = 1. What happens as the number n of terms get biggerand bigger? To answer this it is natural to consider the sequence sn∞n=1.We shall prove that if |r| < 1, then

limn→∞

sn = limn→∞

(1 + r + r2 + · · · + rn) =1

1 − r, (13.5)

which we will write as∞∑

i=0

ri =1

1 − rif |r| < 1.

Intuitively, we feel that this is correct, because rn+1 gets as small as weplease by taking n large enough (remember that |r| < 1). We say that∑∞

i=0 ri is an infinite geometric series with factor r.


We now give an ε−N proof of (13.5). We need to show that for any ε > 0,there is an N such that

∣∣∣∣1 − rn+1

1 − r− 1

1 − r

∣∣∣∣ =

∣∣∣∣rn+1

1 − r

∣∣∣∣ ≤ ε

for all n ≥ N . To this end it is sufficient, since |r| < 1, to find N such that

|r|N+1 ≤ ε |1 − r| (13.6)

Since |r| < 1, we can make |r|N+1 as small as we please by taking Nsufficiently large, and thus we can also satisfy the inequality (13.6) by takingN sufficiently large. Below, we will define a function called the logarithmthat we can use to get a precise value for N as a function of ε from (13.6).

13.6 A Converging Sequence Has a Unique Limit

The limit of a converging sequence is uniquely defined. This should beself-evident from the fact that it is impossible to be arbitrarily close totwo different numbers at the same time. Try! We now also give a morelengthy proof using a type of argument often found in math books. Thereader could profit from going through this argument and understandingthat something seemingly difficult, in fact can hide a very simple idea.

We start from the following variation of the triangle inequality, see Prob-lem 7.15,

|a− b| ≤ |a− c| + |c− b| (13.7)

which holds for all a, b, and c. Suppose that the sequence an∞n=1 convergesto two possibly different numbers A1 and A2. Using (13.7) with a = A1,b = A2, and c = an, we get

|A1 −A2| ≤ |an −A1| + |an −A2|

for any n. Now because an converges to A1, we can make |an − A1| assmall as we like, and in particular smaller than 1

4 |A1 −A2| if A1 = A2, bytaking n large enough. Likewise we can make |an − A2| ≤ 1

4 |A1 − A2| bytaking n large enough. By (13.7), this means that |A1 −A2| ≤ 1

2 |A1 −A2|for n large, which can only hold if A1 = A2, and thus contradicts theobstructional assumption A1 = A2, which therefore must be rejected.

We note that if limn→∞ an = A then also, for example, limn→∞ an+1 = A,and limn→∞ an+7 = A. In other word, only the “very tail” of a sequencean matters to the limit limn→∞ an.

13.7 Lipschitz Continuous Functions and Sequences 175

13.7 Lipschitz Continuous Functionsand Sequences

A basic reason for introducing the concept of a Lipschitz continuous func-tion f : Q → Q is its relation to sequences of rational numbers. Thefundamental issue is the following. Let an be a converging sequence withrational limit limn→∞ an and let f : Q → Q be a Lipschitz continuousfunction with Lipschitz constant L. What can be said about the sequencef(an)? Does it converge and if so, to what?

The answer is easy to state: the sequence f(an) converges and

limn→∞

f(an) = f(

limn→∞

an

).

The proof is also easy. By the Lipschitz continuity of f : Q → Q, we have∣∣∣f(am) − f

(lim

n→∞an

)∣∣∣ ≤ L

∣∣∣am − lim

n→∞an

∣∣∣ .

Since an converges to A, the right-hand side can be made smaller thanany given positive number by taking m large enough, and thus we can alsomake the left hand side smaller than any positive number by choosing mlarge enough, which shows the desired result.

Note that since limn→∞ an is a rational number, the function valuef(limn→∞ an) is well defined since we assume that f : Q → Q.

We see that it is sufficient that f(x) is Lipschitz continuous on an inter-val I containing all the elements an as well as limn→∞ an. We have thusproved the following fundamental result.

Theorem 13.1 Let an be a sequence with rational limit limn→∞ an. Letf : I → Q be a Lipschitz continuous function, and assume that an ∈ I forall n and limn→∞ an ∈ I. Then,

limn→∞

f(an) = f(

limn→∞

an

). (13.8)

Note that choosing I to be a closed interval guarantees that limn→∞ an ∈ Iif an ∈ I for all n.

We now look at some examples.

Example 13.4. In the growth of bacteria model of Chapter Rational Num-bers, we need to compute

limn→∞

Pn = limn→∞

112nQ0 +

1K

(

1 − 12n

) .

The sequence Pn is obtained by applying the function

f(x) =1

Q0x+ 1K (1 − x)


to the terms in the sequence

12n

. Since limn→∞ 1/2n = 0, we have

limn→∞ Pn = f(0) = K, since f is Lipschitz continuous on for example[0, 1/2]. The Lipschitz continuity follows from the fact that f(x) is thecomposition of the function f2(x) = Q0x + 1

K (1 − x) and the functionf2(y) = 1/y.

Example 13.5. The function f(x) = x2 is Lipschitz continuous on boundedintervals. We conclude that if an∞n=1 converges to a rational limit A, then

limn→∞

(an)2 = A2.

In the next chapter, we will be interested in computing limn→∞(an)2 fora certain sequence an∞n=1 arising in connection with the Muddy Yardmodel, which will bring a surprise. Can you guess what it is?

Example 13.6. By Theorem 13.1, with appropriate choices (which?) of thefunction f(x):

limn→∞

(3 + 1

n

4 + 2n

)9

=(

limn→∞

3 + 1n

4 + 2n

)9

=(

limn→∞(3 + 1n )

limn→∞(4 + 2n )

)9

=(

34

)9

.

Example 13.7. By Theorem 13.1,

limn→∞

(2−n)7 + 14(2−n)4 − 3(2−n) + 2 = 07 + 14 × 04 − 3 × 0 + 2 = 2.

13.8 Generalization to Functions of Two Variables

We recall that a function f : I × J → Q of two rational variables, where Iand J are closed intervals of Q, is said to be Lipschitz continuous if thereis constant L such that

|f(x1, x2) − f(x1, x2)| ≤ L(|x1 − x1| + |x2 − x2|)

for x1, x1 ∈ I and x2, x2 ∈ J .Let now an and bn be two converging sequences of rational numbers

with an ∈ I and bn ∈ J . Then

f(

limn→∞

an, limn→∞

bn

))TS

b = limn→∞

f(an, bn). (13.9)

The proof is immediate:∣∣∣f

(lim

n→∞an, lim

n→∞bn

))TS

b − f (am, bm)∣∣∣ ≤ L

(∣∣∣ limn→∞

an − am

∣∣∣ +

∣∣∣ limn→∞

bn − bm

∣∣∣)

(13.10)

TSb Please check this closing parenthesis.


13.9 Computing Limits 177

where the right hand side can be made arbitrarily small by choosing mlarge enough.

We give a first application of this result with f(x1, x2) = x1+x2, which isLipschitz continuous on Q×Q with Lipschitz constant L = 1. We concludefrom (13.9) the natural formula

limn→∞

(an + bn) = limn→∞

an + limn→∞

bn (13.11)

stating that the limit of the sum is the sum of the limits.Similarly we have, of course, using the function f(x1, x2) = x1 − x2,

limn→∞

(an − bn) = limn→∞

an − limn→∞

bn.

As a special case choosing an = a for all n,

limn→∞

(a+ bn) = a+ limn→∞

bn.

Next, we consider the function f(x1, x2) = x1x2, which is Lipschitz con-tinuous on I×J , if I and J are closed bounded intervals of Q. Using (13.9)we find that if an and bn are two converging sequences of rationalnumbers, then

limn→∞

(an × bn) = limn→∞

an × limn→∞

bn,

stating that the limit of the products is the product of the limits.As a special case choosing an = a for all n, we have

limn→∞

(a× bn) = a limn→∞

bn.

We now consider the function f(x1, x2) = x1/x2, which is Lipschitz con-tinuous on I × J , if I and J are closed intervals of Q with J not including0. If bn ∈ J for all n and limn→∞ bn = 0, then

limn→∞

(an/bn) =limn→∞ an

limn→∞ bn,

stating that the limit of the quotient is the quotient of the limits if thelimit of the denominator is not zero.

13.9 Computing Limits

We now apply the above rules to compute some limits.

Example 13.8. Consider 2 + 3n−4 + (−1)nn−1∞n=1.

limn→∞

(2 + 3n−4 + (−1)nn−1

)

= limn→∞

2 + 3 limn→∞

n−4 + limn→∞

(−1)nn−1

= 2 + 3 × 0 + 0 = 2.


To do this example, we use (13.11) and the fact that

limn→∞

n−p =(

limn→∞

n−1)p

= 0p = 0

for any natural number p.

Another useful fact is

limn→∞

rn =

0 if |r| < 1,1 if r = 1,diverges to ∞ if r > 1,diverges otherwise.

We showed the case when r = 1/2 in Example 13.4 and you will show thegeneral result later as an exercise.

Example 13.9. Using 13.3, we can solve for the limiting behavior of thepopulation of bacteria described in Example 13.4. We have

limn→∞

Pn =1

limn→∞

12nQ0 + lim

n→∞

1K

(

1 − 12n

)

=1

0 +1K

(1 − 0)= K.

In words, the population of the bacteria growing under the limited resourcesas modeled by the Verhulst model tends to a constant population.

Example 13.10. Consider

41 + n−3

3 + n−2

∞

n=1

.

We compute the limit using the different rules:

limn→∞

41 + n−3

3 + n−2= 4

limn→∞(1 + n−3)limn→∞(3 + n−2)

= 41 + limn→∞ n−3

3 + limn→∞ n−2

= 41 + 03 + 0

=43.

Example 13.11. Consider

6n2 + 24n2 − n+ 1000

∞

n=1

.

13.9 Computing Limits 179

Before computing the limit, think about what is going on as n becomeslarge. In the numerator, 6n2 is much larger than 2 when n is large andlikewise in the denominator, 4n2 becomes much larger than −n + 1000 insize when n is large. So we might guess that for n large,

6n2 + 24n2 − n+ 1000

≈ 6n2

4n2=

64.

This would be a good guess for the limit. To see that this is true, we usea trick to put the sequence in a better form to compute the limit:

limn→∞

6n2 + 24n2 − n+ 1000

= limn→∞

(6n2 + 2)n−2

(4n2 − n+ 1000)n−2

= limn→∞

6 + 2n−2

4 − n−1 + 1000n−2

=64

where we finished the computation as in the previous example.

The trick of multiplying top and bottom of a ratio by a power can also beused to figure out when a sequence converges to zero or diverges to infinity.

Example 13.12.

limn→∞

n3 − 20n2 + 1n8 + 2n

= limn→∞

(n3 − 20n2 + 1)n−3

(n8 + 2n)n−3

= limn→∞

1 − 20n−1 + n−3

n5 + 2n−2.

From this we see that the numerator converges to 1 while the denominatorincreases without bound. Therefore

limn→∞

n3 − 20n2 + 1n8 + 2n

= 0.

Example 13.13.

limn→∞

−n6 + n+ 1080n4 + 7

= limn→∞

(−n6 + n+ 10)n−4

(80n4 + 7)n−4

= limn→∞

−n2 + n−3 + 10n−4

80 + 7n−4.

From this we see that the numerator grows in the negative direction withoutbound while the denominator tends towards 80. Therefore

−n6 + n+ 10

80n4 + 7

∞

n=1

diverges to −∞.


13.10 Computer Representationof Rational Numbers

The decimal expansion ±pmpm−1 · · · p1p0.q1q2 · · · qn uses the base 10 andconsequently each of the digits pi and qj may take on one of the 10 val-ues 0, 1, 2, . . .9. Of course, it is possible to use bases other than ten. Forexample, the Babylonians used the base sixty and thus their digits rangebetween 0 and 59. The computer operates with the base 2 and the twodigits 0 and 1. A base 2 number has the form

± pm2m + pm−12m−1 + . . .+ p222 + p121 + p020 + q12−1 + q22−2

+ . . .+ qn−12−(n−1) + qn2−n,

which we again may write in short hand

±pmpm−1 . . . p1p0.q1q2 . . . qn = pmpm−1 . . . p1p0 + 0.q1q2 . . . qn

where again n and m are natural numbers, and now each pi and qj takethe value 0 or 1. For example, in the base two

11.101 = 1 · 21 + 1 · 20 + 1 · 2−1 + 1 · 2−3.

In the floating point arithmetic of a computer using the standard 32 bits,numbers are represented in the form

±r2N ,

where 1 ≤ r ≤ 2 is the mantissa and the exponent N is an integer. Out ofthe 32 bits, 23 bits are used to store the mantissa, 8 bits are used to store theexponent and finally one bit is used to store the sign. Since 210 ≈ 10−3 thisgives 6 to 7 decimal digits for the mantissa while the exponent N may rangefrom −126 to 127, implying that the absolute value of numbers stored on thecomputer may range from approximately 10−40 to 1040. Numbers outsidethese ranges cannot be stored by a computer using 32 bits. Some languagespermit the use of double precision variables using 64 bits for storage with11 bits used to store the exponent, giving a range of −1022 ≤ n ≤ 1023,52 bits used to store the the mantissa, giving about 15 decimal places.

We point out that the finite storage capability of a computer has twoeffects when storing rational numbers. The first effect is similar to theeffect of finite storage on integers, namely only rational numbers withina finite range can be stored. The second effect is more subtle but actuallyhas more serious consequences. This is the fact that numbers are storedonly up to a specified number of digits. Any rational number that requiresmore than the finite number of digits in its decimal expansions, whichincluded all rational numbers with infinite periodic expansions for example,are therefore stored on a computer with an error. So for example 2/11 is

13.11 Sonya Kovalevskaya 181

stored as .1818181 or .1818182 depending on whether the computer roundsor not.

But this is not the end of the story. Introduction of an error in the7’th or 15’th digit would not be so serious except for the fact that suchround-off errors accumulate when arithmetic operations are performed. Inother words, if we add two numbers with a small error, the result may havea larger error being the sum of the individual errors (unless the errors haveopposite sign or even cancel).

We give below in Chapter Series an example showing a startling conse-quence of working with finite decimal representations with round off errors.

13.11 Sonya Kovalevskaya: The First WomanWith a Chair in Mathematics

Sonya Kovalevskaya (1850–91) was a student of Weierstrass and as thefirst woman ever got a position 1889 as Professor in Mathematics at theUniversity of Stockholm, see Fig. 13.4. Her mentor was Gosta Mittag-Leffler(1846–1927), famous Swedish mathematician and founder of the prestigousjournal Acta Mathematica, see Fig. 21.34.

Kovalevskaya was 1886 awarded the 5,000 francs Prix Bordin for herpaper Memoire sur un cas particulier du probleme de le rotation d’un corpspesant autour d’un point fixe, ou l’intgration s’effectue l’aide des fonctionsultraelliptiques du temps. At the height of her career, Kovalevskaya died ofinfluenza complicated by pneumonia, only 41 years old.

Fig. 13.4. Sonya Kovalevskaya, first woman as a Professor of Mathematics:“I be-gan to feel an attraction for my mathematics so intense that I started to neglectmy other studies” (at age 11)


Chapter 13 Problems

13.1. Plot the functions; (a) 2−n, (b) 5−n, and (c) 10−n; defined on the naturalnumbers n. Compare the plots.

13.2. Plot the function f(n) = 109

(1− 10−n−1) defined on the natural numbers.

13.3. Write the following sequences using the index notation:

(a) 1, 3, 9, 27, · · · (b) 16, 64, 256, · · ·

(c) 1,−1, 1,−1, 1, · · · (d) 4, 7, 10, 13, · · ·

(e) 2, 5, 8, 11, · · · (f) 125, 25, 5, 1, 15,

1

25,

1

125, · · · .

13.4. Show the following limits hold using the formal definition of the limit:

(a) limn→∞

8

3n+ 1= 0 (b) lim

n→∞4n+ 3

7n− 1=

4

7(c) lim

n→∞n2

n2 + 1= 1.

13.5. Show that limn→∞

rn = 0 for any r with |r| ≤ 1/2.

13.6. One of the classic paradoxes posed by the Greek philosophers can be solvedusing the geometric series. Suppose you are in Paulding county on your bike, 32miles from home. You break a spoke, you have no more food and you drank thelast of your water, you forgot to bring money and it starts to rain. While ridinghome, as wont to do, you begin to think about how far you have to ride. Thenyou have a depressing thought: you can never get home! You think to yourself:first I have to ride 16 miles, then 8 miles after that, then 4 miles, then 2, then 1,then 1/2, then 1/4, and so on. Apparently you always have a little way to go, nomatter how close you are, and you have to add up an infinite number of distancesto get anywhere! The Greek philosophers did not understand how to interpreta limit of a sequence, so this caused them a great deal of trouble. Explain whythere is no paradox involved here using the sum of a geometric series.

13.7. Show the following hold using the formal definition for divergence toinfinity:

(a) limn→∞

−4n+ 1 = −∞ (b) limn→∞

n3 + n2 = ∞.

13.8. Show that limn→∞

rn = ∞ for any r with |r| ≥ 2.

13.9. Find the values of

(a) 1 − .5 + .25 − .125 + · · ·

(b) 3 +3

4+

3

16+ · · ·

(c) 5−2 + 5−3 + 5−4 + · · ·


13.10. Find formulas for the sums of the following series by using the formulafor the sum of the geometric series assuming |r| < 1:

(a) 1 + r2 + r4 + · · ·

(b) 1 − r + r2 − r3 + r4 − r5 + · · ·

13.11. Determine the number of different sequences there are in the followinglist and identify the sequences that are equal.

(a)

4n/2

4 + (−1)n

∞

n=1

(b)

2n

4 + (−1)n

∞

n=1

(c)

2 car

4 + (−1) car

∞

car =1

(d)

2n−1

4 + (−1)n−1

∞

n=2

(e)

2n+2

4 + (−1)n+2

∞

n=0

(f)

82n

4 + (−1)n+3

∞

n=−2

.

13.12. Rewrite the sequence

2 + n2

9n

∞

n=1

so that: (a) the index n runs from

−4 to ∞, (b) the index n runs from 3 to ∞, (c) the index n runs from 2 to −∞.

13.13. Show that (7.14) holds by considering the different cases: a < 0, b < 0,a < 0, b > 0, a > 0, b < 0, a > 0, b > 0. Show that (13.7) holds using (7.14) andthe fact that a− c+ c− b = a− b.

13.14. Suppose that an∞n=1 converges to A and bn∞n=1 converges to B. Showthat an − bn∞n=1 converges to A−B.

13.15. (Harder) Suppose that an∞n=1 converges to A and bn∞n=1 convergesto B. Show that if bn = 0 for all n and B = 0, then an/bn∞n=1 converges toA/B. Hint: write

anbn

− A

B=anbn

− anB

+anB

− A

B

and the fact that for n large enough, |bn| ≥ B/2. Be sure to say why the last factis true!

13.16. Compute the limits of the sequences an∞n=1 with the indicated termsor show they diverge.


(a) an = 1 +7

n(b) an = 4n2 − 6n

(c) an =(−1)n

n2(d) an =

2n2 + 9n+ 3

6n2 + 2

(e) an =(−1)nn2

7n2 + 1(f) an =

(2

3

)n+ 2

(g) an =(n− 1)2 − (n+ 1)2

n(h) an =

1 − 5n8

4 + 51n3 + 8n8

(i) an =2n3 + n+ 1

6n2 − 5(j) an =

(78

)n − 1(

78

)n+ 1

.

13.17. Compute the following limits

(a) limn→∞

(n+ 3

2n+ 8

)37

(b) limn→∞

(31

n2+

2

n+ 7

)4

(c) limn→∞

1(2 + 1

n

)8 (d) limn→∞

(((

1 +2

n

)2)3)4

5

.

13.18. Rewrite the following sequences as a function applied to another sequencethree different ways:

(a)

(n2 + 2

n2 + 1

)3∞

n=1

. (b)(n2)4 + (n2)2 + 1

∞n=1

13.19. Show that the infinite decimal expansion 0.9999 . . . is equal to 1. In otherwords, show that

limn→∞

0.99 · · · 99n = 1,

where 0.99 · · · 99n contains n decimals all equal to 9.

13.20. Determine the number of digits used to store rational numbers in the pro-gramming language that you use and whether the language truncates or rounds.

13.21. The machine number u is the smallest positive number u stored in a com-puter that satisfies 1 + u > 1. Note that u is not zero! For example in a singleprecision language 1+ .00000000001 = 1, explain why. Write a little program thatcomputes the u for your computer and programming language. Hint: 1 + .5 > 1in any programming language. Also 1 + .25 > 1. Continue. . .

14The Square Root of Two

He is unworthy of the name man who is ignorant of the fact that thediagonal of a square is incommensurable with its side. (Plato)

Just as the introduction of the irrational number is a convenientmyth which simplifies the laws of arithmetics. . . so physical objectsare postulated entities which round out and simplify our account ofthe flux of existence. . . The conceptual scheme of physical objects islikewise a convenient myth, simpler than the literal truth and yetcontaining that literal truth as a scattered part. (Quine)

14.1 Introduction

We met the equation x2 = 2 in the context of the Muddy Yard model,trying to determine the length of the diagonal of a square with side length1. We have learned in school that the (positive) solution of the equationx2 = 2 is x =

√2. But, honestly speaking, what is in fact

√2? To simply

say that it is the solution of the equation x2 = 2, or “that number whichwhen squared is equal to 2”, leads to circular reasoning, and would nothelp much when trying to by a pipe of length

√2.

We then may recall again from school that√

2 ≈ 1.41, but computing1.412 = 1.9881, we see that

√2 is not exactly equal to 1.41. A better guess

is 1.414, but then we get 1.4142 = 1.999386. We use MAPLE to compute

186 14. The Square Root of Two

the decimal expansion of√

2 to 415 places:

x = 1.414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641572735013846230912297024924836055850737212644121497099935831413222665927505592755799950501152782060571470109559971605970274534596862014728517418640889198609552329230484308714321450839762603627995251407989687253396546331808829640620615258352395054745750287759961729835575220337531857011354374603408498847160386899970699

Computing x2 again using MAPLE, we find that

x2 = 1.9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999863810370027903935475449214815675207193643367223922486271791890987870158099602326405972613126407604056912999503092957478318885969500708874056058336501652271573809445593320690045817264222173935969533242515158760233604272994889141803598971038204956184812333321625160160972831371230644994979436534796986297766833340665770240318513306002427232125175273043547767486608089987807935797774759645877082503170068870585486010

The number x = 1.4142 · · ·699 satisfies the equation x2 = 2 to a highdegree of precision but not exactly. In fact, it turns out that no matter howmany digits we take in a guess of

√2 with a finite decimal expansion, we

never get a number which squared gives exactly 2. So it seems that we havenot yet really caught the exact value of

√2. So what is it?

14.2√

2 Is Not a Rational Number! 187

To get a clue, we may try to examine the decimal expansion of√

2, butwe will not find any pattern. In particular, the first 415 places show noperiodic pattern.

14.2√

2 Is Not a Rational Number!

In this section, we show that√

2 cannot be a rational number of the formp/q with p and q natural numbers, and thus the decimal expansion of

√2

cannot be periodic. In the proof we use the fact that a natural number canbe uniquely factored into prime factors. We showed this in chapter NaturalNumbers and Integers. One consequence of the factorization into primenumbers is the following fact: Suppose that we know that 2 is a factor of n.If n = pq is a factorization of n into integers p and q, if follows that at leastone of the factors p and q must have a factor of 2.

We argue by contradiction. Thus we shall show that assuming that√

2 isrational leads to a contradiction, and thus

√2 cannot be rational. We thus

assume that√

2 = p/q, where all common factors in the natural numbers pand q have been divided out. For example if p and q both have the factor 3,then we replace p by p/3 and q by q/3, which does not change the quotientp/q. We write this as

√2q = p where p and q have no common factors, or

squaring both sides, 2q2 = p2. Since the left hand side contains the factor 2,the right hand side p2 must contain the factor 2, which means that p mustcontain the factor 2. Thus we can write p = 2× p with p a natural number.We conclude that 2q2 = 4× p2, that is q2 = 2× p2. But the same argumentimplies that q must also contain a factor of 2. Thus both p and q containthe factor 2 which contradicts the original assumption that p and q hadno common factors. Assuming

√2 to be rational number thus leads to

a contradiction and therefore√

2 cannot be a rational number.The argument just given was known to the Pythagoreans, who thus knew

that√

2 is not a rational number. This knowledge caused a lot of trouble.On one hand,

√2 represents the diagonal of a square of side one, so it

seemed that√

2 had to exist. On the other hand, the Pythagorean school ofphilosophy was based on the principle that everything could be describedin terms of natural numbers. The discovery that

√2 was not a rational

number, that is that√

2 could not be viewed as a pair of natural numbers,came as a shock! Legend says that the person who discovered the proofwas punished by the gods for revealing an imperfection in the universe.The Pythagoreans tried to keep the discovery secret by teaching it onlyto a select few, but eventually the discovery was revealed and after thatthe Pythagorean school quickly fell apart. At the same time, the Euclideanschool, which was based on geometry instead of numbers, became moreinfluential. Considered from the point of view of geometry, the difficultywith

√2 seems to “disappear”, because no one would question that a square


of side length 1 will have a diagonal of a certain length, and we could thensimply define

√2 to be that length. The Euclidean geometric school took

over and ruled all through the Dark Ages until the time of Descartes in the17th century who resurrected the Pythagorean school based on numbers,in the form of analytical geometry. Since the digital computer of todayis based on natural numbers, or rather sequences of 0s and 1s, we maysay that Pythagoras ideas are very much alive today: everything can bedescribed in terms of natural numbers. Other Pythagorean dogmas like“never eat beans” and “never pick up anything that has fallen down” havenot survived equally well.

14.3 Computing√

2 by the Bisection Algorithm

We now present an algorithm for computing a sequence of rational numbersthat satisfy the equation x2 = 2 more and more accurately. That is, weconstruct a sequence of rational number approximations of a solution ofthe equation

f(x) = 0 (14.1)

with f(x) = x2 − 2. The algorithm uses a trial and error strategy thatchecks whether a given number r satisfies f(r) < 0 or f(r) > 0, i.e. ifr2 < 2 or r2 > 2. All of the numbers r constructed during this process arerational, so none of them can ever actually equal

√2.

We begin by noting that f(1) < 0 since 12 < 2 and f(2) > 0 since 22 > 2.Now since 0 < x < y means that x2 < xy < y2, we know that f(x) < 0 forall 0 < x ≤ 1 and f(x) > 0 for all x ≥ 2. So any solution of (14.1) must liebetween 1 and 2. Hence we choose a point between 1 and 2 and check thesign of f at that point. For the sake of symmetry, we choose the halfwaypoint 1.5 = (1 + 2)/2 of 1 and 2. We find that f(1.5) > 0. Rememberingthat f(1) < 0, we conclude that a (positive) solution of (14.1) must liebetween 1 and 1.5.

We continue, next checking the mean value 1.25 of 1 and 1.5 to find thatf(1.25) < 0. This means that a solution of (14.1) must lie between 1.25 and1.5. Next we choose the point halfway between these two, 1.375, and findthat f(1.375) < 0, implying that any solution of (14.1) lies between 1.375and 1.5. We can continue to search in this way as long as we like, each timedetermining two rational numbers that “trap” any solution of (14.1). Thisprocess is called the Bisection algorithm.

1. Choose the initial values x0 and X0 so that f(x0) < 0 and f(X0) > 0.Set i = 1.

2. Given two rational numbers xi−1 and Xi−1 with the property thatf(xi−1) < 0 and f(Xi−1) > 0, set xi = (xi−1 +Xi−1)/2.

14.4 The Bisection Algorithm Converges! 189

If f(xi) = 0, then stop.

If f(xi) < 0, then set xi = xi and Xi = Xi−1.

If f(xi) > 0, then set xi = xi−1 and Xi = xi.

3. Increase i by 1 and go back to step 2.

We list the output for 20 steps from a MATLABm-file implementingthis algorithm in Fig. 14.1 with x0 = 1 and X0 = 2.

i xi Xi

0 1.00000000000000 2.000000000000001 1.00000000000000 1.500000000000002 1.25000000000000 1.500000000000003 1.37500000000000 1.500000000000004 1.37500000000000 1.437500000000005 1.40625000000000 1.437500000000006 1.40625000000000 1.421875000000007 1.41406250000000 1.421875000000008 1.41406250000000 1.417968750000009 1.41406250000000 1.4160156250000010 1.41406250000000 1.4150390625000011 1.41406250000000 1.4145507812500012 1.41406250000000 1.4143066406250013 1.41418457031250 1.4143066406250014 1.41418457031250 1.4142456054687515 1.41418457031250 1.4142150878906216 1.41419982910156 1.4142150878906217 1.41420745849609 1.4142150878906218 1.41421127319336 1.4142150878906219 1.41421318054199 1.4142150878906220 1.41421318054199 1.41421413421631

Fig. 14.1. 20 steps of the Bisection algorithm

14.4 The Bisection Algorithm Converges!

By continuing the Bisection algorithm without stopping, we generate twosequences of rational numbers xi∞i=0 and Xi∞i=0. By construction,

x0 ≤ x1 ≤ x2 ≤ · · · and X0 ≥ X1 ≥ X2 ≥ · · ·xi < Xj for all i, j = 0, 1, 2, . . .

In other words, the terms xi either increase or stay constant while theXi always decrease or remain constant as i increases, and any xi is smaller


than any Xj. Moreover, the choice of the midpoint means that the distancebetween Xi and xi is always strictly decreasing as i increases. In fact,

0 ≤ Xi − xi ≤ 2−i for i = 0, 1, 2, · · · , (14.2)

i.e. the difference between the value xi for which f(xi) < 0 and the valueXi for which f(Xi) > 0 is halved for each step increase i by 1. This meansthat as i increases, more and more digits in the decimal expansions of xi

and Xi agree. Since 2−10 ≈ 10−3, we gain approximately 3 decimal placesfor every 10 successive steps of the bisection algorithm. We can see this inFig. 14.1.

The estimate (14.2) on the difference of Xi − xi also implies that theterms in the sequence xi∞i=0 become closer as the index increase. Thisfollows because xi ≤ xj < Xj ≤ Xi if j > i so (14.2) implies

|xi − xj | ≤ |xi −Xi| ≤ 2−i if j ≥ i.

that is|xi − xj | ≤ 2−i if j ≥ i. (14.3)

We illustrate in Fig. 14.2. In particular, this means that when 2−i ≤10−N−1, the first N decimals of xj are the same as the first N decimals inxi for any j ≥ i.

In other words, as we compute more and more numbers xi, more andmore leading decimals of the numbers xi agree. We conclude that the se-quence xi∞i=0 determines a specific (infinite) decimal expansion. To getthe first N digits of this expansion, we simply take the first N digits of anynumber xj in the sequence with 2−j ≤ 10−N−1. By the inequality (14.3),all such xj agree in the first N digits.

If this infinite decimal expansion was the decimal expansion of a rationalnumber x, then we would of course have

x = limi→∞

xi.

xi Xixj Xj

|xi − xj |

|xi −Xi|

Fig. 14.2. |xi − xj | ≤ |Xi − xi|

14.4 The Bisection Algorithm Converges! 191

However, we showed above that the decimal expansion defined by the se-quence xi∞i=0 cannot be periodic. So there is no rational number x thatcan be the limit of the sequence xi.

We have now come to the point where the Pythagoreans got stuck 2.500years ago. The sequence xi “tries to converge” to a limit, but the limitis not a number of the type we already know, that is a rational number.To avoid the fate of the Pythagoreans, we have to find a way out of thisdilemma. The limit appears to be a number of a new kind and thus it ap-pears that we have to somehow extend the rational numbers. The extensionwill be accomplished by viewing any infinite decimal expansion, periodicor not, as some kind of number, more precisely as a real number. In thisway, we will clearly get an extension of the set of rational numbers sincethe rational numbers correspond to periodic decimal expansions. We willrefer to non-periodic decimal expansions as irrational numbers.

For the extension from rational to real numbers to make sense, we mustshow that we can compute with irrational numbers in pretty much the sameway as with rational numbers. We shall see this is indeed possible and weshall see that the basic idea when computing with irrational numbers isthe natural one: compute with truncated decimal expansions! We give thedetails in the next chapter devoted to a study of real numbers.

Let us now summarize and see where we stand: the Bisection algorithmapplied to the equation x2 − 2 = 0 generates a sequence xi∞i=1 satisfying

|xi − xj | ≤ 2−i if j ≥ i. (14.4)

The sequence xi∞i=1 defines an infinite non-periodic decimal expansion,which we will view as an irrational number. We will give this irrationalnumber the name

√2. We thus use

√2 as a symbol to denote a certain

infinite decimal expansion determined by the Bisection algorithm appliedto the equation x2 − 2 = 0.

We now need to specify how to compute with irrational numbers. Oncewe have done this, it remains to show that the particular irrational numbernamed

√2 constructed above indeed does satisfy the equation x2 = 2. That

is after defining multiplication of irrational numbers like√

2, we need toshow that √

2√

2 = 2. (14.5)

Note that this equality does not follow directly by definition, as it wouldif we had defined

√2 as “that thing” which multiplied with itself equals 2

(which doesn’t make sense since we don’t know that “that thing” exists).Instead, we have now defined

√2 as the infinite decimal expansion defined

by the Bisection algorithm applied to x2−2 = 0, and it is a non-trivial stepto first define what we mean by multiplying

√2 by

√2, and then to show

that indeed√

2√

2 = 2. This is what the Pythagoreans could not manageto do, which had devastating effects on their society.


We return to verifying (14.5) after showing in the next chapter how tocompute with real numbers, so that in particular we know how to multiplythe irrational number

√2 with itself!

14.5 First Encounters with Cauchy Sequences

We recall that the sequence xi defined by the Bisection algorithm forsolving the equation x2 = 2, satisfies

|xi − xj | ≤ 2−i if j ≥ i, (14.6)

from which we concluded that the sequence xi∞i=1 specifies a certain infi-nite decimal expansion. To get the firstN decimals of the expansion we takethe first N decimals of any number xj in the sequence with 2−j ≤ 10−N−1.Any two such xj will agree to N decimals in the sense that their differenceis a most 1 in decimal place N + 1.

The sequence xi satisfying (14.6) is an example of a Cauchy sequenceof rational numbers. More generally, a sequence yi of rational numbers issaid to be a Cauchy sequence if for any ε > 0 there is a natural number Nsuch that

|yi − yj | ≤ ε if i, j ≥ N.

To show that the sequence xi satisfying (14.6) is indeed a Cauchy se-quence, we first choose ε > 0 and then we choose N so that 2−N ≤ ε.

As a basic example let us prove that the sequence xi∞i=1 with xi = i−1i

is a Cauchy sequence. We have for j > i∣∣∣∣i− 1i

− j − 1j

∣∣∣∣ =

∣∣∣∣(i− 1)j − i(j − 1)

ij

∣∣∣∣ =

∣∣∣∣i− j

ij

∣∣∣∣ ≤

1i.

For a given ε > 0, we now choose the natural number N ≥ 1/ε, so that1

N+1 ≤ ε, in which case we have∣∣∣∣i− 1i

− j − 1j

∣∣∣∣ ≤ ε if i, j ≥ N.

This shows that xi with xi = i−1i is a Cauchy sequence, and thus con-

verges to a limit limi→∞ xi. We proved above that limi→∞ xi = 1.

14.6 Computing√

2 by the Deca-section Algorithm

We now describe a variation of the Bisection algorithm for x2−2 = 0 calledthe Deca-section algorithm. Like the Bisection algorithm, the Deca-sectionalgorithm produces a sequence of numbers xi∞i=0 that converges to

√2.

14.6 Computing√

2 by the Deca-section Algorithm 193

In the Deca-section algorithm, the element xi agrees with√

2 to i decimalplaces, and thus the rate of convergence is easy to grip.

The Deca-section algorithm looks the same as the Bisection algorithmexcept that at each step the current interval is divided into 10 subintervalsinstead of 2. We start again with f(x) = x2 − 2 and x0 = 1 and X0 = 2so that f(x0) < 0 and f(X0) > 0. Now we compute the value of f at theintermediate rational points 1.1, 1.2, · · · , 1.9 and then choose two consec-utive numbers x1 and X1 with f(x1) < 0 and f(X1) > 0. There has to betwo such consecutive points because we know that f(x0) = f(1) < 0 andthen either f(y) < 0 for all y = 1.1, 1.2, · · · , 1.9 at which point f(2) > 0,so we set x1 = 1.9 and X1 = 2, or f(y) > 0 at some intermediate point. Wefind that this gives x1 = 1.4 and X1 = 1.5. Now we continue the process byevaluating f at the rational numbers 1.41, 1.42, · · · , 1.49, and then choosingtwo consecutive numbers x2 and X2 with f(x2) < 0 and f(X2) > 0. Thisgives x2 = 1.41 and X2 = 1.42. Then we work on the third, fourth, fifth, · · ·decimal places in order, obtaining two sequences xi∞i=0 and Xi∞i=0 bothconverging to

√2. We show the first 14 steps computed using a MATLAB

m-file implementation of this algorithm in Fig. 14.3.

i xi Xi

0 1.00000000000000 2.000000000000001 1.40000000000000 1.500000000000002 1.41000000000000 1.420000000000003 1.41400000000000 1.415000000000004 1.41420000000000 1.414300000000005 1.41421000000000 1.414220000000006 1.41421300000000 1.414214000000007 1.41421350000000 1.414213600000008 1.41421356000000 1.414213570000009 1.41421356200000 1.4142135630000010 1.41421356230000 1.4142135624000011 1.41421356237000 1.4142135623800012 1.41421356237300 1.4142135623740013 1.41421356237300 1.4142135623731014 1.41421356237309 1.41421356237310

Fig. 14.3. 14 steps of the deca-section algorithm

By construction|xi −Xi| ≤ 10−i,

and also|xi − xj | ≤ 10−i for j ≥ i. (14.7)

The inequality (14.7) implies that xi is a Cauchy sequence and thus de-termines an infinite decimal expansion. Since in the Deca-section algorithm,


we gain one decimal per step, we may identify element xi of the sequencewith the truncated decimal expansion with i decimals. In this case thereis thus a very simple connection between the Cauchy sequence and thedecimal expansion.

Chapter 14 Problems

14.1. Use the evalf function in MAPLE to compute√

2 to 1000 places andthen square the result and compare to 2.

14.2. (a) Show that√

3 (see Problem 3.5) is irrational. Hint: use a powerfulmathematical technique: try to copy a proof you already know. (b) Do the samefor

√a where a is any prime number.

14.3. Specify three different irrational numbers using the digits 3 and 4.

14.4. Program the Bisection algorithm. Write down the output for 30 stepsstarting with: (a) x0 = 1 and X0 = 2, (b) x0 = 0 and X0 = 2, (c) x0 = 1 andX0 = 3, (d) x0 = 1 and X0 = 20. Compare the accuracy of the methods at eachstep by comparing the values of xi versus the decimal expansion of

√2 given

above. Explain why there is a difference in accuracy resulting from the differentinitial values.

14.5. (a) Use the program in Problem 14.4 and write down the output for 40steps using x0 = 1 and X0 = 2. (b) Describe anything you notice about the last10 values xi and Xi. (c) Explain what you see. (Hint: consider floating pointrepresentation on the computer you use.)

14.6. Using the results in Problem 14.4(a), make plots of: (a) |Xi − xi| versusi (b) |xi − xi−1| versus i; and (c) |f(xi)| versus i. In each case, determine if thequantity decreases by a factor of 1/2 after each step.

14.7. Solve the equations x2 = 3 and x3 = 2 using the Bisection algorithm.Also, make the algorithm find the negative root of x2 = 3.

14.8. Show that if a < 0 and b > 0 then b− a < c implies |b| < c and |a| < c.

14.9. (a) Write down an algorithm for Deca-section. (b) Program the algorithmin (a) and then compute 16 steps using x0 = 0 and X0 = 2.

14.10. (a) Construct a “trisection” algorithm; (b) implement the trisection al-gorithm and compute 30 steps using x0 = 0 and X0 = 2; (c) show that thetridiagonal algorithm determines a decimal expansion and call this x; (d) Showthat x =

√2; (e) get an estimate on |xi − x|.

14.11. Compute the cost of the tridiagonal algorithm from Problem 14.10 andcompare to the costs of the Bisection and Deca-section methods.

14.12. Use the Bisection code from Problem 14.4 to compute√

3 (recall Prob-lem 3.5). Hint: 1 <

√3 < 2.

15Real numbers

I often say that when you can measure what you are speaking about,and express it in numbers, you know something about it; but whenyou cannot express it in numbers, your knowledge is of a meagreand unsatisfactory kind: it may be the beginning of knowledge, butyou have scarcely, in your thoughts, advanced to the stage of science,whatever the matter may be. (Kelvin 1889)

Vattnet drar sig tillbakastenarna blir synliga.

Det var lange sen sist.De har egentligen inte forandrats.

De gamla stenarna.

(Brunnen, Lars Gustafsson, 1977)

15.1 Introduction

We are now ready to introduce the concept of a real number. We shall viewa real number as being specified by an infinite decimal expansion of theform

±pm · · · p0.q1q2q3 · · ·

with a never ending list of decimals q1, q2, . . ., where each one of the pi

and qj are one of the 10 digits 0, 1, . . . , 9. We met the decimal expansion

196 15. Real numbers

1.4142135623 . . . of√

2 above. The corresponding sequence xi∞i=1 of trun-cated decimal expansions is given by the rational numbers

xi = ±pm · · · p0.q1 · · · qi = ±(pm10m + · · · + qi10−i).

We have for j > i,

|xi − xj | = |0.0 · · ·0qi+1 · · · qj | ≤ 10−i. (15.1)

We conclude that the sequence xi∞i=1 of truncated decimal expansions ofthe infinite decimal expansion ±pm · · · p0.q1q2q3 · · · , is a Cauchy sequenceof rational numbers.

More generally, we know from the discussion in Chapter Sequences andLimits, that any Cauchy sequence of rational numbers specifies an infinitedecimal expansion and thus a Cauchy sequence of rational numbers specifiesa real number. We may thus view a real number as being specified by aninfinite decimal expansion, or by a Cauchy sequence of rational numbers.Note that we use the semantic trick of referring to an infinite decimalexpansion as one real number.

We divide real numbers into two types: rational numbers with periodicdecimal expansions and irrational numbers with non-periodic decimal ex-pansions. Note that we may naturally include rational numbers with finitelymany nonzero decimals, like 0.25, as particular periodic infinite decimal ex-pansions with all the decimals qi = 0 for i sufficiently large.

We say that the infinite decimal expansion ±pm · · · p0.q1q2q3 · · · specifiesthe real number x = ±pm · · · p0.q1q2q3 · · · , and we agree to write

limi→∞

xi = x, (15.2)

where xi∞i=1 is the corresponding sequence of truncated decimal expan-sions of x. If x = ±pm · · · p0.q1q2q3 · · · is a periodic expansion, that is ifx is a rational number, this agrees with our earlier definition from Chap-ter Sequences and Limits of the limit of the sequence xi∞i=1 of truncateddecimal expansions of x. For example, we recall that

109

= limi→∞

xi, where xi = 1.11 · · ·1i.

If x = ±pm · · · p0.q1q2q3 · · · is non-periodic, that is, if x is an irrationalnumber, then (15.2) serves as a definition, where the real number is specifiedby the decimal expansion x = ±pm · · · p0.q1q2q3 · · · , that is the real numberx specified by the Cauchy sequence xi∞i=1 of truncated decimal expansionsof x, is denoted by limi→∞ xi. Alternatively, (15.2) serves to denote the limitlimi→∞ xi by ±pm · · · p0.q1q2q3 · · · .

For the sequence xi∞i=1 generated by the Bisection algorithm appliedto the equation x2 − 2 = 0, we decided to write

√2 = limi→∞ xi, and thus

15.2 Adding and Subtracting Real Numbers 197

we may write√

2 = 1.412 . . . , with 1.412 . . . , denoting the infinite decimalexpansion given by the Bisection algorithm.

We shall now specify how to compute with real numbers defined in thisway. In particular, we shall specify how to add, subtract, multiply anddivide real numbers. Of course we will do this so that it extends our expe-rience in computing with rational numbers. This will complete our processof extending the natural numbers to obtain first the integers and then therational numbers, and finally the real numbers.

We denote by R the set of all possible real numbers, that is the set of allpossible infinite decimal expansions. We discuss this definition in ChapterDo Mathematicians Quarrel? below.

15.2 Adding and Subtracting Real Numbers

To exhibit the main concern, consider the problem of adding two real num-bers x and x specified by the decimal expansions

x = ±pm · · · p0.q1q2q3 · · · = limi→∞

xi,

x = ±pm · · · p0.q1q2q3 · · · = limi→∞

xi,

with corresponding truncated decimal expansions

xi = ±pm · · · p0.q1 · · · qi,xi = ±pm · · · p0.q1 · · · qi.

We know how to add xi and xi: we then start from the right and add thedecimals qi and qi, and get a new ith decimal and possibly a carry-overdigit to be added to the sum of the next digits qi−1 and qi−1, and so on.The important thing to notice is that we start from the right (smallestdecimal) and move to the left (larger decimals).

Now, trying to add the two infinite sequences x = ±pm · · · p0.q1q2q3 · · ·and x = ±pm · · · p0.q1q2q3 · · · in the same way by starting from the right,we run into a difficulty because there is no far right decimal to start with.So, what can we do?

Well, the natural way out is of course to consider the sequence yigenerated by yi = xi + xi. Since both xi and xi are Cauchy sequences,it follows that yi is also a Cauchy sequence, and thus defines a decimalexpansion and thus defines a real number. Of course, the right thing is thento define

x+ x = limi→∞

yi = limi→∞

(xi + xi).

This corresponds to the formula

limi→∞

xi + limi→∞

xi = limi→∞

(xi + xi).


We give a concrete example: To compute the sum of

x =√

2 = 1.4142135623730950488 · · ·

and

x =1043439

= 2.3758542141230068337 · · · ,

we compute yi = xi+xi for i = 1, 2, . . ., which defines the decimal expansionof x+ x, see Fig. 15.1. We may notice that occasionally adding two digits

i xi + xi

1 32 3.73 3.784 3.7895 3.7900∗

6 3.790067 3.7900678 3.79006779 3.79006777

10 3.79006777611 3.790067776412 3.7900677764913 3.79006777649614 3.790067776496015 3.7900677764960916 3.790067776496101∗

17 3.790067776496101818 3.7900677764961018719 3.790067776496101881∗

20 3.7900677764961018825∗...

...

Fig. 15.1. Computing the decimal expansion of√

2 + 1043/439 by using thetruncated decimal sequences. Note the changes in the digits marked by the ∗where adding the new digits affects previous digits

affects the digits to the left, as in 0.9999 + 0.0001 = 1.000.Similarly, the difference x − x of two real numbers x = limi→∞ xi and

x = limi→∞ xi is of course defined by

x− x = limi→∞

(xi − xi).

15.3 Generalization to f(x, x) with f Lipschitz 199

15.3 Generalization to f(x, x) with f Lipschitz

We now generalize to other combinations of real numbers than addition.Suppose we want to combine x and x to a certain quantity f(x, x) depend-ing on x and x, where x and x are real numbers. For example, we maychoose f(x, x) = x + x, corresponding to determining the sum x + x oftwo real numbers x and x or f(x, x) = xx corresponding to multiplying xand x.

To be able to define f(x, x) following the idea used in the case f(x, x) =x+x, we suppose that f : Q×Q → Q is Lipschitz continuous. This is a verycrucial assumption and our focus on the concept of Lipschitz continuity islargely motivated by its use in the present context.

We know from Chapter Sequences and Limits that if x = limi→∞ xi andx = limi→∞ xi are rational, then

f(x, x) = f( limi→∞

xi, limi→∞

xi) = limi→∞

f(xi, xi)

If x = limi→∞ xi and x = limi→∞ xi are irrational, we simply decide touse this formula to define the real number f(x, x). This is possible, be-cause f(xi, xi) is a Cauchy sequence and thus defines a real number.Note that f(xi, xi) is a Cauchy sequence because xi and xi are bothCauchy sequences and f : Q×Q → Q is Lipschitz continuous. The formulacontaining this crucial information is

|f(xi, xi) − f(xj , xj)| ≤ L(|xi − xj | + |xi − xj |)

where L is the Lipschitz constant of f .Applying this reasoning to the case f : Q×Q → Q with f(x, x) = x+ x,

which is Lipschitz continuous with Lipschitz constant L = 1, we define thesum x+ x of two real numbers x = limi→∞ xi and x = limi→∞ xi by

x+ x = limi→∞

(xi + xi),

that islim

i→∞xi + lim

i→∞xi = lim

i→∞(xi + xi). (15.3)

This is exactly what we did above.We repeat, the important formula is

f( limi→∞

xi, limi→∞

xi) = limi→∞

f(xi, xi),

which we already know for x = limi→∞ xi and x = limi→∞ xi rational andwhich defines f(x, x) for x or x irrational. We also repeat that the Lipschitzcontinuity of f is crucial.

We may directly extend to Lipschitz functions f : I × J → Q, where Iand J are intervals of Q, under the assumption that xi ∈ I and xi ∈ J fori = 1, 2, . . .


15.4 Multiplying and Dividing Real Numbers

The function f(x, x) = xx is Lipschitz continuous for x ∈ I and x ∈ J ,where I and J are bounded intervals of Q. We may thus define the productxx of two real numbers x = limi→∞ xi and x = limi→∞ xi as follows:

xx = limi→∞

xixi.

The function f(x, x) = xx is Lipschitz continuous for x ∈ I and x ∈ J ,

if I and J are bounded intervals of Q and J is bounded away from 0. Wemay thus define the quotient x

x of two real numbers x = limi→∞ xi andx = limi→∞ xi with x = 0 by

x

x= lim

i→∞

xi

xi.

15.5 The Absolute Value

The function f(x) = |x| is Lipschitz continuous on Q. We may thus definethe absolute value |x| of a real number x = limi→∞ xi by

|x| = limi→∞

|xi|.

If xi is the sequence of truncated decimal expansions of x = limi→∞ xi,then by (15.1) we have |xj −xi| ≤ 10−i for j > i, and thus taking the limitas j tends to infinity,

|x− xi| ≤ 10−i for i = 1, 2, . . . (15.4)

15.6 Comparing Two Real Numbers

Let x = limi→∞ xi and x = limi→∞ xi be two real numbers with corre-sponding sequences of truncated decimal expansions xi∞i=1 and xi∞i=1.How can we tell if x = x? Is it necessary that xi = xi for all i? Not quite.For example, consider the two numbers x = 0.99999 · · · and x = 1.0000 · · · .In fact, it is natural to give a little more freedom and say that x = x if andonly if

|xi − xi| ≤ 10−i for i = 1, 2, . . . (15.5)

15.7 Summary of Arithmetic with Real Numbers 201

This condition is clearly sufficient to motivate to write x = x, since thedifference |xi − xi| becomes as small as we please by taking i large enough.In other words, we have

|x− x| = limi→∞

|xi − xi| = 0,

so x = x.Conversely if (15.5) does not hold, then there is a positive ε and i such

thatxi − xi > 10−i + ε or xi − xi < 10−i − ε.

Since |xi − xj | ≤ 10−i for j > i, we must then have

xj − xj > ε or xj − xj < −ε for j > i

and thus taking the limit as j tends to infinity

x− x ≥ ε or x− x ≤ −ε.

We conclude that two real numbers x and x either satisfy x = x, or x > xor x < x.

This conclusion, however, hides a subtle point. To know if two real num-bers are equal or not, may require a complete knowledge of the decimal ex-pansions, which may not be realistic. For example, suppose we set x = 10−p,where p is the decimal position of the start of the first sequence of 59 dec-imals all equal to 1 in the decimal expansion of

√2. To complete the defi-

nition of x, we set x = 0 if there is no such p. How are we to know if x > 0or x = 0, unless we happen to find that sequence of 59 decimals all equalto 1 among say the first 1050 decimals, or whatever number of decimals of√

2 we can think of possibly computing. In a case like this, it seems morereasonable to say that we cannot know if x = 0 or x > 0.

15.7 Summary of Arithmetic with Real Numbers

With these definitions, we can easily show that the usual commutative,distributive, and associative rules for rational numbers all hold for realnumbers. For example, addition is commutative since

x+ x = limi→∞

(xi + xi) = limi→∞

(xi + xi) = x+ x.

15.8 Why√

2√

2 Equals 2

Let xi and Xi be the sequences given by the Bisection algorithm ap-plied to the equation x2 = 2 constructed above. We have defined

√2 = lim

i→∞xi, (15.6)


that is, we denote by√

2 the infinite non-periodic decimal expansion givenby the Bisection algorithm applied to the equation x2 = 2.

We now verify that√

2√

2 = 2, which we left open above. By the defini-tion of multiplication of real numbers, we have

√2√

2 = limi→∞

x2i , (15.7)

and we thus need to show that

limi→∞

x2i = 2 (15.8)

To prove this fact, we use the Lipschitz continuity of the function x → x2

on [0, 2] with Lipschitz constant L = 4, to see that

|(xi)2 − (Xi)2| ≤ 4|xi −Xi| ≤ 2−i+2.

where we use the inequality |xi −Xi| ≤ 2−i. By construction x2i < 2 < X2

i ,and thus in fact

|x2i − 2| ≤ 2−i+2

which shows thatlim

i→∞(xi)2 = 2

and (15.8) follows.We summarize the approach used to compute and define

√2 as follows:

We use the Bisection Algorithm applied to the equation x2 = 2 to de-fine a sequence of rational numbers xi∞i=0 that converges to a limit,which we denote by

√2 = limi→∞ xi.

We define√

2√

2 = limi→∞(xi)2.

We show that limi→∞(xi)2 = 2.

We conclude that√

2√

2 = 2 which means that√

2 solves the equationx2 = 2.

15.9 A Reflection on the Nature of√

2

We may now return to comparing the following two definitions of√

2:

1.√

2 is “that thing” which when squared is equal to 2

2.√

2 is the name of the decimal expansion given by the sequencexi∞i=1 generated by the Bisection algorithm for the equation x2 = 2,which with a suitable definition of multiplication satisfies

√2√

2 = 2.

15.10 Cauchy Sequences of Real Numbers 203

This is analogous to the following two definitions of 12 :

1. 12 is “that thing” which when multiplied by 2 equals 1

2. 12 is the ordered pair (1, 2) which with a suitable definition of multi-plication satisfies the equation (2, 1) × (1, 2) = (1, 1).

We conclude that in both cases the meaning 1. could be criticized forbeing unclear in the sense that no clue is given to what “that thing” isin terms of already known things, and that the definition appears circularand eventually seems to be just a play with words. We conclude that onlythe definition 2. has a solid constructive basis, although we may intuitivelyuse 1. when we think.

Occasionally, we can do computations including√

2, where we only needto use that (

√2)2 = 2, and we do not need the decimal expansion of

√2.

For example, we can verify that (√

2)4 = 4 by only using that (√

2)2 = 2without knowing a single decimal of

√2. In this case we just use

√2 as

a symbol for “that thing which squared equals 2”. It is rare that this kindof symbolic manipulation only, leads to the end and gives a definite answer.

We note that the fact that√

2 solves the equation x2 = 2 includes somekind of convention or agreement or definition. What we actually did wasto show that the truncated decimal expansions of

√2 when squared could

be made arbitrarily close to 2. We took this as a definition, or agreement,that (

√2)2 = 2. Doing this, solved the dilemma of the Pythagoreans, and

thus we may (almost) say that we solved the problem by agreeing that theproblem did not exist. This may be the only way out in some (difficult)cases.

In fact, the standpoint of the famous philosopher Wittgenstein was thatthe only way to solve philosophical problems was to show (after much work)that in fact the problem at hand does not exist. The net result of this kind ofreasoning would appear to be zero: first posing a problem and then showingthat the problem does not exist. However, the process itself of coming to thisconclusion would be considered as important by giving added insight, notso much the result. This approach also could be fruitful outside philosophyor mathematics.

15.10 Cauchy Sequences of Real Numbers

We may extend the notion of sequence and Cauchy sequence to real num-bers. We say that xi∞i=1 is a sequence of real numbers if the elements xi

are real numbers. The definition of convergence is the same as for sequencesof rational numbers. A sequence xi∞i=1 of real numbers converges to a realnumber x if for any ε > 0 there is a natural number N such that |xi−x| < εfor i ≥ N and we write x = limi→∞ xi.


We say that a sequence xi∞i=1 of real numbers is a Cauchy sequence iffor all ε > 0 there is a natural number N such that

|xi − xj | ≤ ε for i, j ≥ N. (15.9)

If xi∞i=1 is a converging sequence of real numbers with limit x =limi→∞ xi, then by the Triangle Inequality,

|xi − xj | ≤ |x− xi| + |x− xj |,

where we wrote xi − xj = xi − x + x − xj . This proves that xi∞i=1 isa Cauchy sequence. We state this (obvious) result as a theorem.

Theorem 15.1 A converging sequence of real numbers is a Cauchy se-quence of real numbers.

A Cauchy sequence of real numbers determines a decimal expansion justin the same way as a sequence of rational numbers does. We may assume,possibly by deleting elements and changing the indexing, that a Cauchysequence of real numbers satisfies |xi − xj | ≤ 2−i for j ≥ i.

We conclude that a Cauchy sequence of real numbers converges to a realnumber. This is a fundamental result about real numbers which we stateas a theorem.

Theorem 15.2 A Cauchy sequence of real numbers converges to a uniquereal number.

The use of Cauchy sequences has been popular in mathematics since thedays of the great mathematician Cauchy in the first half of the 19th century.Cauchy was a teacher at Ecole Polytechnique in Paris, which was createdby Napoleon and became a model for technical universities all over Europe(Chalmers 1829, Helsinki 1849, . . . ). Cauchy’s reform of the engineeringCalculus course including his famous Cours d’Analyse also became a model,which permeates much of the Calculus teaching still today.

15.11 Extension from f : Q → Q to f : R → R

In this section, we show how to extend a given Lipschitz continuous functionf : Q → Q, to a function f : R → R. We thus assume that f(x) is defined forx rational, and that f(x) is a rational number, and we shall now show howto define f(x) for x irrational. We shall see that the Lipschitz continuityis crucial in this extension process. In fact, much of the motivation forintroducing the concept of Lipschitz continuity, comes from its use in thiscontext.

We have already met the basic issues when defining how to compute withreal numbers, and we follow the same idea for a general function f : Q → Q.

15.12 Lipschitz Continuity of Extended Functions 205

If x = limi→∞ xi is an irrational real number, with the sequence xi∞i=1

being the truncated decimal expansions of x, we define f(x) to be the realnumber defined by

f(x) = limi→∞

f(xi). (15.10)

Note that by the Lipschitz continuity of f(x) with Lipschitz constant L,we have

|f(xi) − f(xj)| ≤ L|xi − xj |,

which shows that the sequence f(xi)∞i=1 is a Cauchy sequence, becausexi∞i=1 is a Cauchy sequence Thus limi→∞ f(xi) exists and defines a realnumber f(x). This defines f : R → R, and we say that this function isthe extension of f : Q → Q, from the rational numbers Q to the realnumbers R.

Similarly, we can generalize and extend a Lipschitz continuous functionf : I → Q, where I = x ∈ Q : a ≤ x ≤ b is an interval of rationalnumbers, to a function f : J → R, where J = x ∈ R : a ≤ x ≤ b is thecorresponding interval of real numbers. Evidently, the extended functionf : J → R satisfies:

f( limi→∞

xi) = limi→∞

f(xi), (15.11)

for any convergent sequence xi in J (with automatically limi→∞ xi ∈ Jbecause J is closed).

15.12 Lipschitz Continuity of Extended Functions

If f : Q → Q is Lipschitz continuous with Lipschitz constant Lf , then itsextension f : R → R is also Lipschitz continuous with the same Lipschitzconstant Lf . This is because if x = limi→∞ xi and y = limi→∞ yi, then

|f(x) − f(y)| =∣∣∣ limi→∞

(f(xi) − f(yi))∣∣∣ ≤ L lim

i→∞|xi − yi| = L|x− y|.

It is now straightforward to show that the properties of Lipschitz contin-uous functions f : Q → Q stated above hold for the corresponding extendedfunctions f : R → R. We summarize in the following theorem

Theorem 15.3 A Lipschitz continuous function f : I → R, where I =[a, b] is an interval of real numbers, satisfies:

f( limi→∞

xi) = limi→∞

f(xi), (15.12)

for any convergent sequence xi in I. If f : I → R and g : I → R

are Lipschitz continuous, and α and β are real numbers, then the linearcombination αf(x) + βg(x) is Lipschitz continuous on I. If the interval


I is bounded, then f(x) and g(x) are bounded and f(x)g(x) is Lipschitzcontinuous on I. If I is bounded and moreover |g(x)| ≥ c > 0 for all x inI, where c is some constant, then f(x)/g(x) is Lipschitz continuous on I.

Example 15.1. We can extend any polynomial to be defined on the realnumbers. This is possible because a polynomial is Lipschitz continuous onany bounded interval of rational numbers.

Example 15.2. The previous example means that we can extend f(x) = xn

to the real numbers for any integer n. We can also show that f(x) = x−n isLipschitz continuous on any closed interval of rational numbers that doesnot contain 0. Therefore f(x) = xn can be extended to the real numbers,where n is any integer provided that when n < 0, x = 0.

15.13 Graphing Functions f : R → R

Graphing a function f : R → R follows the same principles as graphinga function f : Q → Q.

15.14 Extending a Lipschitz Continuous Function

Suppose f : (a, b] → R is Lipschitz continuous with Lipschitz constantLf on the half-open interval (a, b], but that the value of f(a) has notbeen defined. Is there a way to define f(a) so that the extended functionf : [a, b] → R is Lipschitz continuous? Yes, there is. To see this we letxi∞i=1 be a sequence of real numbers in (a, b] converging to a, that islimi→∞ xi = a. The sequence xi∞i=1 is a Cauchy sequence, and becausef(x) is Lipschitz continuous on (a, b], so is the sequence f(xi)∞i=1, andthus limi→∞ f(xi) exists and we may then define f(a) = limi→∞ f(xi).It follows readily that the extended function f : [a, b] → R is Lipschitzcontinuous with the same Lipschitz constant.

We give an application of this idea arising when considering quotientsof two functions. Clearly, we must avoid points at which the denominatoris zero and the numerator is nonzero. However, if both the numerator anddenominator are zero at a point, the function can be extended to that pointif the quotient function is Lipschitz continuous off the point. We give firsta “trivial” example.

Example 15.3. Consider the quotient

x− 1x− 1

with domain x ∈ R : x = 1. Since

x− 1 = 1 × (x− 1) (15.13)

15.15 Intervals of Real Numbers 207

for all x, it is natural to “divide” the polynomials to get

x− 1x− 1

= 1. (15.14)

However, the domain of the constant function 1 is R so the left- and right-hand sides of (15.14) have different domains and therefore must representdifferent functions. We plot the two functions in Fig. 15.2. We see that thetwo functions agree at every point except for the “missing” point x = 1.

11

Fig. 15.2. Plots of (x− 1)/(x− 1) on the left and 1 on the right

Example 15.4. Since x2 − 2x− 3 = (x− 3)(x+ 1), we have for x = 3 that

x2 − 2x− 3x− 3

= x+ 1.

The function (x2 − 2x − 3)/(x − 3) defined for x ∈ R : x = 3 may beextended to the function x+ 1 defined for all x in R.

Note that the fact that two functions f1 and f2 are zero at the same pointsdoes not mean that we can automatically replace their quotient by a func-tion defined at all points.


x− 1(x− 1)2

,

defined for x ∈ R : x = 1, is equal to the function 1/(x− 1) also definedon x ∈ R : x = 1, which cannot be extended to x = 1.

15.15 Intervals of Real Numbers

Let a and b be two real numbers with a < b. The set of real numbersx such that x > a and x < b, that is x ∈ R : a < x < b, is calledthe open interval between a and b and is denoted by (a, b). Graphically


we draw a thick line on the number line connecting little circles drawn atpositions a and b. We illustrate in Fig. 15.3. The word “open” refers tothe strict inequality defining (a, b) and we use the curved parentheses “(”and the open circle on the number line to mark this. a and b are called theendpoints of the interval. An open interval does not contain its endpoints.The closed interval [a, b] is the set x : a ≤ x ≤ b and is denoted on thenumber line using solid circles. Note the use of square parentheses “[” whenthe inequalities are not strict. A closed interval does contain its endpoints.Finally, we can have half-open intervals with one end open and the otherclosed, such as (a, b] = x : a < x ≤ b. See Fig. 15.3.

a b

a < x < b

a b

a ≤ x < b

a b

a < x ≤ b

a b

a ≤ x ≤ b

Fig. 15.3. Intervals corresponding to the real numbers between two real numbersa and b. Note the use of a solid and closed circles in the four cases

We also have “infinite” intervals such as (−∞, a) = x : x < a and[b,∞) = x : b ≤ x. We illustrate these in Fig. 15.4. With this notation,we denote the set of real numbers by R = (−∞,∞).

Clearly, we can now consider Lipschitz continuous functions f : I → R

defined on intervals I of R.

a

x < a

b

b ≤ x

Fig. 15.4. Infinite intervals (−∞, a) and [b,∞)

15.16 What Is f(x) if x Is Irrational?

Note that if x is irrational, then the process of determining the sequence oftruncated decimal expansions of x and f(x) is carried out in parallel. Themore decimals we have of x, the more decimals we get of f(x). This is sim-ply because f(x) = limi→∞ f(xi) with xi∞i=1 the sequence of truncateddecimal expansions of x. This is obvious from Fig. 15.5. This means thatthe conventional idea of viewing f(x) as a function of x comes into a new

15.16 What Is f(x) if x Is Irrational? 209

light. In the traditional way of thinking of a function f(x), we think of xas given and then associating the value f(x) to x. We may even write thisas x→ f(x) indicating that we go from x to f(x).

However, we just noticed that when x is irrational, we cannot start fromknowing all the decimals of x, and then determine f(x). Instead, we deter-mine successively the decimal expansions xi and the corresponding functionvalues f(xi), that is, we may write xi → f(xi) for i = 1, 2, . . ., but not re-ally x→ f(x). We rather jump back and forth between approximations xi

of x and approximations f(xi) of f(x). This means that we do not haveexact knowledge of x when we compute f(x). In order to make this processto be meaningful, we need the function f(x) to be Lipschitz continuous. Inthis case, small changes in x cause small changes in f(x), and the extensionprocess is possible.

Example 15.6. We evaluate f(x) = .4x3−x for x =√

2 using the truncateddecimal sequence xi in Fig. 15.5.

i xi .4x3i − xi

1 1 −.62 1.4 .09763 1.41 .12128844 1.414 .13085837765 1.4142 .13133830051526 1.41421 .13136230022458447 1.414213 .13136950020358463888 1.4142135 .131370700203054524159 1.41421356 .1313708442030479314744064

10 1.414213562 .1313708490030479221535281312...

......

Fig. 15.5. Computing the decimal expansion of f(√

2) for f(x) = .4x3 − x byusing the truncated decimal sequence

This leads to the idea that we can only talk about Lipschitz continuousfunctions. If some association of x-values to values f(x) is not Lipschitzcontinuous, this association should not deserve to be called a function. Weare thus led to the conclusion that all functions are Lipschitz continuous(more or less).

This statement would be shocking to many mathematicians, who areused to work with discontinuous functions day and night. In fact, in largeparts of mathematics (e.g. integration theory), a lot of attention is payedto extremely discontinuous “functions”, like the following popular one

f(x) = 0 if x is rational,f(x) = 1 if x is irrational.


Whatever this is, it is not a Lipschitz function, and thus from our perspec-tive, we would not consider it to be a function at all. This is because forsome arguments x it may be extremely difficult to know if x is rationalor irrational, and then we would not know which of the vastly differentfunction values f(x) = 0 or f(x) = 1 to choose. To be able to determine ifx is rational or not, we may have to know the infinite decimal expansionof x, which may be impossible to us as human beings. For example, if wedidn’t know the smart argument showing that x =

√2 cant be rational,

we would not be able to tell from any truncated decimal expansion of√

2whether f(x) = 0 or f(x) = 1.

We would even get into trouble trying to define the following “func-tion” f(x)

f(x) = a if x < x,

f(x) = b if x ≥ x,

with a “jump” at x from a value a to a different value b. If x is irrational,we may lack complete knowledge of all the decimals of x, and it may bevirtually impossible to determine for a given x if x < x or x ≥ x. It would bemore natural to view the “function with a jump” as two functions composedof one Lipschitz function

f(x) = a if x ≤ x,

and another Lipschitz function

f(x) = b if x ≥ x,

with two possible values a = b for x = x: the value a from the left (x ≤ x),and the value b from the right (x ≥ x), see Fig. 15.6.

"+" "="

xxx

Fig. 15.6. A “jump function” viewed as two functions

It thus seems that we have to reject the very idea that a function f(x) canbe discontinuous. This is because we cannot assume that we know x exactly,and thus we can only handle a situation where small changes in x causessmall changes in f(x), which is the essence of Lipschitz continuity. Insteadwe are led to handle functions with jumps as combinations of Lipschitzcontinuous functions with two possible values at the jumps, one value fromthe right and another value from the left.

15.17 Continuity Versus Lipschitz Continuity 211

15.17 Continuity Versus Lipschitz Continuity

As indicated, we use a definition of continuity (Lipschitz continuity), whichdiffers from the usual definition met in most Calculus texts. We recall thebasic property of a Lipschitz continuous function f : I → R:

f(

limi→∞

xi

)= lim

i→∞f(xi), (15.15)

for any convergent sequence xi in I with limi→∞ xi ∈ I. Now, the stan-dard definition of continuity of a function f : I → R starts at the relation(15.15), and reads as follows: The function f : I → R is said to be con-tinuous on I (according to the standard definition) if (15.15) holds forany convergent sequence xi in I with limi→∞ xi ∈ I. Apparently, a Lips-chitz continuous function is continuous according to the standard definition,while the opposite implication does not have to be true. In other words, weuse a somewhat more stringent definition than the standard one.

The standard definition satisfies a condition of maximality (attractive tomany pure mathematicians), but suffers from an (often confusing) use oflimits. In fact, the intuitive idea of “continuous dependence” of functionvalues f(x) of a real variable x, can be expressed as “f(x) is close to f(y)whenever x is close to y”, of which Lipschitz continuity gives a quantitativeprecise formulation, while the connection in the standard definition is morefarfetched. Right?

Chapter 15 Problems

15.1. Define a “sentence” to be any combination of 500 characters consisting of26 letters and spaces lined up in a row. Compute (approximately) the number ofpossible sentences.

15.2. Suppose that x and y are two real numbers and xi and yi are thesequences generated by truncating their decimal expansions. Using (7.14) and(15.4), obtain estimates on (a) |(x+ y)− (xi + yi)| and (b) |xy − xiyi|. Hint: for(b), use that xy − xiyi = (x− xi)y + xi(y − yi), and explain why (15.4) impliesthat for i sufficiently big, |xi| ≤ |x| + 1.

15.3. Find i as small as possible such that |xy − xiyi| ≤ 10−6 if x ≈ 100 andy ≈ 1. Find i and j as small as possible such that |xy − xiyj | ≤ 10−6

15.4. Let x = .37373737 · · · and y =√

2 and xi and yi be the sequencesgenerated by truncating their decimal expansions. Compute the first 10 terms ofthe sequences defining x + y and y − x and the first 5 terms of the sequencesdefining xy and x/y. Hint: follow the example in Fig. 15.1.

15.5. Let x be the limit of the sequence

i

i+ 1

. Is x < 1?. Give a reason for

your answer.


15.6. Let x be the limit of the sequence of rational numbers xi where thefirst i− 1 decimal places of xi agree with the first i− 1 decimal places of

√2, the

i’th decimal place is equal to 3, and the rest of the decimal places are zero. Isx =

√2? Give a reason for your answer.

15.7. Let x, y, and z be real numbers. Show the following properties hold.

(a) x < y and y < z implies x < z.

(b) x < y implies x+ z < y + z.

(c) x < y implies −x > −y.

15.8. Find the set of x that satisfies (a) |√

2x− 3| ≤ 7 and (b) |3x − 6√

2| > 2.

15.9. Verify that the triangle inequality (7.14) extends to real numbers s and t.

15.10. (Harder) (a) If p is a rational number, x is a real number, and xi isany sequence of rational numbers that converges to x, show that p < x impliesthat p < xi for all i sufficiently large. (b) If x and y are real numbers and yiis any sequence that converges to y, show that x < y implies x < yi for all isufficiently large.

15.11. Show that the following sequences are Cauchy sequences.

(a)

1

(i+ 1)2

(b)

4 − 1

2i

(c)

i

3i+ 1

15.12. Show that the sequence i2 is not a Cauchy sequence.

15.13. Let xi denote the sequence of real numbers defined by

x1 = .373373337 · · ·x2 = .337733377333377 · · ·x3 = .333777333377733333777 · · ·x4 = .333377773333377773333337777 · · ·

...

(a) Show that the sequence is a Cauchy sequence and (b) find limi→∞

xi. This shows

that a sequence of irrational numbers can converge to a rational number.

15.14. Can a number of the form sx+ t, with s and t rational and x irrational,be rational?

15.15. Let xi and yi be Cauchy sequences with limits x and y respectively.(a) Show that xi−yi is a Cauchy sequence and compute its limit. (b) Assuming

there is a constant c such that yi ≥ c > 0 for all i, show that

xiyi

is a Cauchy

sequence and compute its limit.


15.16. Show that a sequence that converges is a Cauchy sequence. Hint: if xiconverges to x, write xi−xj = (xi−x)+(x−xj) and use the triangle inequality.

15.17. (Harder) Let xi be an increasing sequence, xi−1 ≤ xi, which is boundedabove, i.e. there is a number c such that xi ≤ c for all i. Prove that xi con-verges. Hint: Use a variation of the argument for the convergence of the bisectionalgorithm.

15.18. Compute the first 5 terms of the sequence that defines the value of the

function f(x) =x

x+ 2at x =

√2. Hint: follow Fig. 15.5 and use the evalf function

of MAPLE in order to determine all the digits.

15.19. Let xi be the sequence with xi = 3 − 2

iand f(x) = x2 − x. What is

the limit of the sequence f(xi)?

15.20. Show that |x| is Lipschitz continuous on the real numbers R.

15.21. Let n be a natural number. Show that1

xnis Lipschitz continuous on

the set of rational numbers Q = x : .01 ≤ x ≤ 1 and find a Lipschitz constantwithout using Theorem 15.3. Hint: Use the identity

xn2 − xn1 = (x2 − x1)(xn−1

2 + xn−22 x1 + xn−3

2 x21

+ · · · + x22xn−31 + x2x

n−21 + xn−1

1

)

= (x2 − x1)

n−1∑

j=0

xn−1−j2 xj1

after showing that it is true. Note there are n terms in the last sum, the Lipschitzconstant definitely depends on n.

15.22. Show Theorem 15.3 is true.

15.23. Write each of the following sets using the interval notation and thenmark the sets on a number line.

(a) x : −2 < x ≤ 4 (b) x : −3 < x < −1 ∪ x : −1 < x ≤ 2(c) x : x = −2, 0 ≤ x (b) x : x < 0 ∪ x : x > 1

15.24. Produce an interval that contains all the points 3 − 10−j for j ≥ 0 butdoes not contain 3.

15.25. Using MATLAB or MAPLE, graph the following functions on onegraph: y = 1 × x, y = 1.4 × x, y = 1.41 × x, y = 1.414 × x, y = 1.4142 × x,y = 1.41421 × x. Use your results to explain how you could graph the functiony =

√2 × x.

15.26. (a) Give a definition of an interval (a, b) where a and b are real numbersin terms of intervals with rational endpoints. (b) Do the same for [a, b].


15.27. Explain why there are infinitely many real numbers between any twodistinct real numbers by giving a systematic way to write them down. Hint: firstconsider the case when the two distinct numbers are integers and work one digitat a time.

15.28. Find the Lipschitz constant of the function f(x) =√x with D(f) =

(δ,∞) for given δ > 0.

The aim of Book X of Euclid’s treatise on the “Elements” is to in-vestigate the commensurable and the incommensurable, the rationaland irrational continuous quantities. This science has its origin inthe school of Pythagoras, but underwent an important developmentin the hands of the Athenian, Theaetetus, who is justly admired forhis natural aptitude in this as in other branches of mathematics.One of the most gifted of men, he patiently pursued the investiga-tion of truth contained in these branches of science . . . and was inmy opinion the chief means of establishing exact distinctions andirrefutable proofs with respect to the above mentioned quantities.(Pappus 290–350 (about))

16The Bisection Algorithm for f(x) = 0

Divide ut regnes (divide and conquer). (Machiavelli 1469–1527)

16.1 Bisection

We now generalize the Bisection algorithm used above to compute thepositive root of the equation x2 − 2 = 0, to compute roots of the equation

f(x) = 0 (16.1)

where f : R → R is a Lipschitz continuous function. The Bisection algo-rithm reads as follows, where TOL is a given positive tolerance:

1. Choose initial values x0 andX0 with x0 < X0 so that f(x0)f(X0) < 0.Set i = 1.

2. Given two rational numbers xi−1 < Xi−1 with the property thatf(xi−1)f(Xi−1) < 0, set xi = (xi−1 +Xi−1)/2.

If f(xi) = 0, then stop.

If f(xi)f(Xi−1) < 0, then set xi = xi and Xi = Xi.

If f(xi)f(xi−1) < 0, then set xi = xi and Xi = xi.

3. Stop if Xi − xi ≤ TOL.

4. Increase i by 1 and go back to step 2.

216 16. The Bisection Algorithm for f(x) = 0

The equation f(x) = 0 may have many roots, and the choice of initialapproximations x0 and X0 such that f(x0)f(X0) ≤ 0 restricts the searchfor one or more roots to the interval [x0, X0]. To find all roots of an equationf(x) it may be necessary to systematically search for all the possible startintervals [x0, X0].

The proof that the Bisection algorithm converges is the same as thatgiven above in the special case when f(x) = x2 − 0. By construction, wehave after i steps, assuming that we don’t stop because f(xi) = 0 andX0 − x0 = 1, that

0 ≤ Xi − xi ≤ 2−i,

and as before that|xi − xj | ≤ 2−i if j ≥ i.

Again, xi∞i=1 is a Cauchy sequence and thus converges to a unique realnumber x, and by construction

|xi − x| ≤ 2−i and |Xi − x| ≤ 2−i.

It remains to show that x is a root of f(x) = 0, that is, we have to showthat f(x) = 0. By definition f(x) = f(limi→∞ xi) = limi→∞ f(xi) and thuswe need to show that limi→∞ f(xi) = 0. To this end we use the Lipschitzcontinuity to see that

|f(xi) − f(Xi)| ≤ L|xi −Xi| ≤ L2−i.

Since f(xi)f(Xi) < 0, that is the signs of f(xi) and f(Xi) are different,this proves that in fact

|f(xi)| ≤ L2−i (and also |f(Xi)| ≤ L2−i),

which proves that limi→∞ f(xi) = 0, and thus f(x) = limi→∞ f(xi) = 0 aswe wanted to show.

We summarize this as a theorem, which is known as Bolzano’s Theoremafter the Catholic priest B. Bolzano (1781–1848), who was one of the firstpeople to work out analytic proofs of properties of continuous functions.

Theorem 16.1 (Bolzano’s Theorem) If f : [a, b] → R is Lipschitzcontinuous and f(a)f(b) < 0, then there is a real number x ∈ [a, b] suchthat f(x) = 0.

One consequence of this theorem is called the Intermediate Value The-orem, which states that if g(x) is Lipschitz continuous on an interval [a, b]then g(x) takes on every value between g(a) and g(b) at least once as xvaries over [a, b]. This follows applying Bolzanos theorem to the functionf(x) = g(x) − y = 0, where y lies between f(a) and f(b).

Theorem 16.2 (The Intermediate Value Theorem) If f : [a, b] → R

is Lipschitz continuous then for any real number y in the interval betweenf(a) and f(b), there is a real number x ∈ [a, b] such that f(x) = y.

16.2 An Example 217

Fig. 16.1. Bernard Placidus Johann Nepomuk Bolzano 1781–1848, Czech math-ematician, philosopher and catholic priest: “My special pleasure in mathemat-ics rested therefore particularly on its purely speculative parts, in other wordsI prized only that part of mathematics which was at the same time philosophy”

16.2 An Example

As an application of the Bisection algorithm, we compute the roots of thechemical equilibrium equation (7.13) in Chapter Rational Numbers,

S (.02 + 2S)2 − 1.57 × 10−9 = 0. (16.2)

We show a plot of the function involved in Fig. 16.2. Apparently thereare roots near −.01 and 0, but to compute them it seems advisable to first

-.02 .02

-0.000008

0.000032

0.000072

Fig. 16.2. A plot of the function S (.02 + 2S)2 − 1.57 × 10−9


rescale the equation. We then first multiply both sides of (16.2) by 109 toget

109 × S (.02 + 2S)2 − 1.57 = 0,

and write

109 × S (.02 + 2S)2 = 103 × S × 106 × (.02 + 2S)2

= 103 × S ×(103

)2 × (.02 + 2S)2 = 103 × S ×(103 × (.02 + 2S)

)2

= 103 × S ×(20 + 2 × 103 × S

)2.

If we name a new variable x = 103S, then we obtain the following equationto solve

f(x) = x(20 + 2x)2 − 1.57 = 0. (16.3)

The polynomial f(x) has more reasonable coefficients and the roots arenot nearly as small as in the original formulation. If we find a root x off(x) = 0, then we can find the physical variable S = 10−3x. We note thatonly positive roots can have any meaning in this model, since we cannothave “negative” solubility.

The function f(x) is a polynomial and thus is Lipschitz continuous onany bounded interval, and thus the Bisection algorithm can be used tocompute its roots. We plot f(x) in Fig. 16.3. It appears that f(x) = 0might have one root near 0 and another root near −10.

-11 -5.3 .4

-600

-250

100

Fig. 16.3. A plot of the function f(x) = x(20 + 2x)2 − 1.57

To compute a positive root, we now choose x0 = −.1 and X0 = .1 andapply the Bisection algorithm for 20 steps. We show the results in Fig. 16.4.This suggests that the root of (16.3) is x ≈ .00392 or S ≈ 3.92 × 10−6.

16.3 Computational Cost 219

i xi Xi

0 −0.10000000000000 0.100000000000001 0.00000000000000 0.100000000000002 0.00000000000000 0.050000000000003 0.00000000000000 0.025000000000004 0.00000000000000 0.012500000000005 0.00000000000000 0.006250000000006 0.00312500000000 0.006250000000007 0.00312500000000 0.004687500000008 0.00390625000000 0.004687500000009 0.00390625000000 0.0042968750000010 0.00390625000000 0.0041015625000011 0.00390625000000 0.0040039062500012 0.00390625000000 0.0039550781250013 0.00390625000000 0.0039306640625014 0.00391845703125 0.0039306640625015 0.00391845703125 0.0039245605468816 0.00392150878906 0.0039245605468817 0.00392150878906 0.0039230346679718 0.00392150878906 0.0039222717285219 0.00392189025879 0.0039222717285220 0.00392189025879 0.00392208099365

Fig. 16.4. 20 steps of the Bisection algorithm applied to (16.3) using x0 = −.1and X0 = .1

16.3 Computational Cost

We applied the Deca-section to compute√

2 above. Of course we can usethis method also for computing the root of a general equation. Once wehave more than one method to compute a root of a equation, it is natural toask which method is “best”. We have to decide what we mean by “best” ofcourse. For this problem, best might mean “most accurate” or “cheapest”for example. The exact criteria depends on our needs.

The criteria may depend on many things, such as the the level of accu-racy to try to achieve. Of course, this depends on the application and thecomputational cost. In the Muddy Yard Model, a couple of decimal placesis certainly sufficient from a practical point of view. If we actually tried tomeasure the diagonal using a tape measure for example, we would only getto within a few centimeters of the true value even neglecting the difficultyof measuring along a straight line. For more accuracy, we could use a laserand measure the distance to within a couple of wavelengths, and thus wemight want to compute with a corresponding precision of many decimals.This would of course be overkill in the present case, but could be neces-sary in applications to e.g. astronomy or geodesic (for instance continental


drift). In physics there is a strong need to compute certain quantities withmany digits. For example one would like to know the mass of the electronvery accurately. In applications of mechanics, a couple of decimals in thefinal answer may often be enough.

For the Deca-section and Bisection algorithms, accuracy is apparentlynot an issue, since both algorithms can be executed until we get 16 places orwhatever number of digits is used for floating point representation. There-fore the way to compare the methods is by the amount of computing timeit takes to achieve a given level of accuracy. This computing time is oftencalled the cost of the computation, a left-over from the days when computertime was actually purchased by the second.

The cost involved in one of these algorithms can be determined by fig-uring out the cost per iteration step and then multiplying by the totalnumber of steps we need to reach the desired accuracy. In one step of theBisection Algorithm, the computer must compute the midpoint betweentwo points, evaluate the function f at that point and store the value tem-porarily, check the sign of the function value, and then store the new xi

and Xi. We assume that the time it takes for the computer to do each ofthese operations can be measured and we define

Cm = cost of computing the midpointCf = cost of evaluating f at a point

C± = cost of checking the sign of a variableCs = cost of storing a variable.

The total cost of one step of the bisection algorithm is Cm +Cf +C±+4Cs,and the cost after Nb steps is

Nb(Cm + Cf + C± + 4Cs). (16.4)

One step of the Deca-section algorithm has a considerably higher cost be-cause there are 9 intermediate points to check. The total cost after Nd stepsof the Deca-section algorithm is

Nd(9Cm + 9Cf + 9C± + 20Cs). (16.5)

On the other hand, the difference |xi − x| decreases by a factor of 1/10after each step of the Deca-section algorithm as compared to a factor of1/2 after each step of the Bisection algorithm. Since 1/23 > 1/10 > 1/24,this means that the Bisection algorithm requires between 3 and 4 times asmany steps as the Deca-section algorithm in order to reduce the initial size|x0− x| by a given factor. So Nb ≈ 4Nd. This gives the cost of the BisectionAlgorithm as

4Nd(Cm + Cf + C± + 4Cs) = Nd(4Cm + 4Cf + 4C± + 16Cs)

as compared to (16.5). This means that the Bisection algorithm is cheaperto use than the Deca-section algorithm.

17Do Mathematicians Quarrel?*

The proofs of Bolzano’s and Weierstrass theorems have a decidedlynon-constructive character. They do not provide a method for actu-ally finding the location of a zero or the greatest or smallest value ofa function with a prescribed degree of precision in a finite number ofsteps. Only the mere existence, or rather the absurdity of the non-existence, of the desired value is proved. This is another importantinstance where the “intuitionists” have raised objections; some haveeven insisted that such theorems be eliminated from mathematics.The student of mathematics should take this no more seriously thandid most of the critics. (Courant)

I know that the great Hilbert said “We will not be driven out fromthe paradise Cantor has created for us”, and I reply “I see no reasonto walking in”. (R. Hamming)

There is a concept which corrupts and upsets all others. I refer not tothe Evil, whose limited realm is that of ethics; I refer to the infinite.(Borges).

Either mathematics is too big for the human mind or the humanmind is more than a machine. (Godel)

17.1 Introduction

Mathematics is often taught as an “absolute science” where there is a cleardistinction between true and false or right and wrong, which should be uni-versally accepted by all professional mathematicians and every enlightened

222 17. Do Mathematicians Quarrel?*

layman. This is true to a large extent, but there are important aspects ofmathematics where agreement has been lacking and still is lacking. Thedevelopment of mathematics in fact includes as fierce quarrels as any otherscience. In the beginning of the 20th century, the very foundations of math-ematics were under intense discussion. In parallel, a split between “pure”and “applied” mathematics developed, which had never existed before. Tra-ditionally, mathematicians were generalists combining theoretical mathe-matical work with applications of mathematics and even work in mechanics,physics and other disciplines. Leibniz, Lagrange, Gauss, Poincare and vonNeumann all worked with concrete problems from mechanics, physics anda variety of applications, as well as with theoretical mathematical questions.

In terms of the foundations of mathematics, there are different “math-ematical schools” that view the basic concepts and axioms somewhat dif-ferently and that use somewhat different types of arguments in their work.The three principal schools are the formalists, the logicists and finally theintuitionists, also known as the constructivists.

As we explain below, we group both the formalists and the logiciststogether under an idealistic tradition and the the constructivists undera realistic tradition. It is possible to associate the idealistic tradition to an“aristocratic” standpoint and the realistic tradition to a “democratic” one.The history of the Western World can largely be be viewed as a battle be-tween an idealistic/aristochratic and a realistic/democratic tradition. TheGreek philosopher Plato is the portal figure of the idealistic/aristocratictradition, while along with the scientific revolution initiated in the 16thcentury, the realistic/democratic tradition has taken a leading role in oursociety.

The debate between the formalists/logicists and the constructivists cul-minated in the 1930s, when the program put forward by the formalists andlogicists suffered a strong blow from the logician Kurt Godel. Godel showed,to the surprise of world including great mathematicians like Hilbert, thatin any axiomatic mathematical theory containing the axioms for the natu-ral numbers, there are true facts which cannot be proved from the axioms.This is Godel’s famous incompleteness theorem.

Alan Turing (1912–54, dissertation at Kings College, Cambridge 1935)took up a similar line of thought in the form of computability of real num-bers in his famous 1936 article On Computable Numbers, with an appli-cation to the Entscheidungsproblem. In this paper Turing introduced anabstract machine, now called a Turing machine, which became the proto-type of the modern programmable computer. Turing defined a computablenumber as real number whose decimal expansion could be produced bya Turing machine. He showed that π was computable, but claimed thatmost real numbers are not computable. He gave gave examples of “unde-cidable problems” formulated as the problem if the Turing machine wouldcome to a halt or not, seeTS

c Fig. 17.2. Turing laid out plans for an elec-tronic computer named Analytical Computing Engine ACE, with reference

TSc Figure 1 is not referred to in the text.


17.1 Introduction 223

Fig. 17.1. Kurt Godel (with Einstein 1950): “Every formal system is incomplete”

Fig. 17.2. Alan Turing: “I wonder if my machine will come to a halt?”

to Babbages’ Analytical Engine, at the same time as the ENIAC was de-signed in the US.


Godel’s and Turing’s work signified a clear defeat for the formalists/logicists and a corresponding victory for the constructivists. Paradoxically,soon after the defeat the formalists/logicists gained control of the mathe-matics departments and the constructivists left to create new departmentsof computer science and numerical analysis based on constructive mathe-matics. It appears that the trauma generated by Godel’s and Turing’s find-ings on the incompleteness of axiomatic methods and un-computability,was so strong that the earlier co-existence of the formalists/logicists andconstructivists was no longer possible. Even today, the world of mathemat-ics is heavily influenced by this split.

We will come back to the dispute between the formalists/logicists andconstructivists below, and use it to illustrate fundamental aspects of math-ematics which hopefully can help us to understand our subject better.

17.2 The Formalists

The formalist school says that it does not matter what the basic conceptsactually mean, because in mathematics we are just concerned with relationsbetween the basic concepts whatever the meaning may be. Thus, we donot have to (and cannot) explain or define the basic concepts and can viewmathematics as some kind of “game”. However, a formalist would be veryanxious to demonstrate that in his formal system it would not be possibleto arrive at contradictions, in which case his game would be at risk ofbreaking down. A formalist would thus like to be absolutely sure about theconsistency of his formal system. Further, a formalist would like to knowthat, at least in principle, he would be able to understand his own gamefully, that is that he would in principle be able to give a mathematicalexplanation or proof of any true property of his game. The mathematicianHilbert was the leader of the formalist school. Hilbert was shocked by theresults by Godel.

17.3 The Logicists and Set Theory

The logicists try to base mathematics on logic and set theory. Set theorywas developed during the second half of the 19th century and the languageof set theory has become a part of our every day language and is verymuch appreciated by both the formalist and logicist schools, while theconstructivists have a more reserved attitude. A set is a collection of items,which are the elements of the set. An element of the set is said to belong tothe set. For example, a dinner may be viewed as a set consisting of variousdishes (entree, main course, dessert, coffee). A family (the Wilsons) may beviewed as a set consisting of a father (Mr. Wilson), a mother (Mrs. Wilson)

17.3 The Logicists and Set Theory 225

and two kids (Tom and Mary). A soccer team (IFK Goteborg for example)consists of the set of players of the team. Humanity may be said to be setof all human beings.

Set theory makes it possible to speak about collections of objects asif they were single objects. This is very attractive in both science andpolitics, since it gives the possibility of forming new concepts and groupsin hierarchical structures. Out of old sets, one may form new sets whoseelements are the old sets. Mathematicians like to speak about the set of allreal numbers, denoted by R, the set of all positive real numbers, the set of allprime numbers, et cetera, and a politician planning a campaign may thinkof the set of democratic voters, the set of auto workers, the set of femalefirst time voters, or the set of all poor, jobless, male criminals. Further,a workers union may be thought of as a set of workers in a particularfactory or field, and workers unions may come together into unions or setsof workers unions.

A set may be described by listing all the elements of the set. This maybe very demanding if the set contains many elements (for example if theset is humanity). An alternative is to describe the set through a propertyshared by all the elements of the set, e.g. the set of all people who have theproperties of being poor, jobless, male, and criminal at the same time. Todescribe humanity as the set of beings which share the property of beinghuman, however seems to more of a play with words than something veryuseful.

The leader of the logicist school was the philosopher and peace activistBertrand Russell (1872–1970). Russell discovered that building sets freelycan lead into contradictions that threaten the credibility of the whole

Fig. 17.3. Bertrand Russell: “I am protesting”


logicist system. Russell created variants of the old liars paradox and barbersparadox, which we now recall. Godel’s theorem may be viewed to a variantof this paradox.

The Liars Paradox

The liars paradox goes as follows: A person says “I am lying”. How shouldyou interpret this sentence? If you assume that what the person says isindeed true, then it means that he is lying and then what he says is nottrue. On the other hand, if you assume that what he says is not true, thismeans that he is not lying and thus telling the truth, which means thatwhat he says is true. In either case, you seem to be led to a contradiction,right? Compare Fig. 17.4.

Fig. 17.4. “I am (not) lying”

The Barbers Paradox

The barbers paradox goes as follows: The barber in the village has decidedto cut the hair of everyone in the village who does not cut his own hair.What shall the barber do himself? If he decides to cut his own hair, he willbelong to the group of people who cut their own hair and then according tohis decision, he should not cut his own hair, which leads to a contradiction.On the other hand, if he decides not to cut his own hair, then he wouldbelong to the group of people not cutting their own hair and then accord-ing to his decision, he should cut his hair, which is again a contradiction.Compare Fig. 17.5.

17.4 The Constructivists 227

Fig. 17.5. Attitudes to the “barbers paradox”: one relaxed and one very con-cerned

17.4 The Constructivists

The intuitionist/constructivist view is to consider the basic concepts tohave a meaning which may be directly “intuitively” understood by ourbrains and bodies through experience, without any further explanation.Furthermore, the intuitionists would like to use as concrete or “construc-tive” arguments as possible, in order for their mathematics always to havean intuitive “real” meaning and not just be a formality like a game.

An intuitionist may say that the natural numbers 1, 2, 3, . . ., are obtainedby repeatedly adding 1 starting at 1. We took this standpoint when intro-ducing the natural numbers. We know that from the constructivist pointof view, the natural numbers are something in the state of being created ina process without end. Given a natural number n, there is always a nextnatural number n+ 1 and the process never stops. A constructivist wouldnot speak of the set of all natural numbers as something having been com-pleted and constituting an entity in itself, like the set of all natural numbersas a formalist or logicist would be willing to do. Gauss pointed out that“the set of natural numbers” rather would reflect a “mode of speaking”than existence as a set.

An intuitionist would not feel a need of “justification” or a proof of con-sistency through some extra arguments, but would say that the justificationis built into the very process of developing mathematics using constructiveprocesses. A constructivist would so to speak build a machine that couldfly (an airplane) and that very constructive process would itself be a proofof the claim that building an airplane would be possible. A constructivist isthus in spirit close to a practicing engineer. A formalist would not actuallybuild an airplane, rather make some model of an airplane, and would thenneed some type of argument to convince investors and passengers that his


airplane would actually be able to fly, at least in principle. The leader of theintuitionist school was Brouwer (1881–1967), see Fig. 17.6. Hard-core con-structivism makes life very difficult (like strong vegetarianism), and becausethe Brouwer school of constructivists were rather fundamentalist in theirspirit, they were quickly marginalized and lost influence in the 1930s. Thequote by Courant given above shows the strong feelings involved related tothe fact that very fundamental dogmas were at stake, and the general lackof rational arguments to meet the criticism from the intuitionists, whichwas often replaced by ridicule and oppression.

Fig. 17.6. Luitzen Egbertus Jan Brouwer 1881–1966: “One cannot inquire intothe foundations and nature of mathematics without delving into the question ofthe operations by which mathematical activity of the mind is conducted. If onefailed to take that into account, then one would be left studying only the languagein which mathematics is represented rather than the essence of mathematics”

Van der Waerden, mathematician who studied at Amsterdam from 1919to 1923 wrote: “Brouwer came [to the university] to give his courses butlived in Laren. He came only once a week. In general that would havenot been permitted – he should have lived in Amsterdam – but for himan exception was made.. . . I once interrupted him during a lecture to aska question. Before the next week’s lesson, his assistant came to me to saythat Brouwer did not want questions put to him in class. He just did notwant them, he was always looking at the blackboard, never towards thestudents. . . . Even though his most important research contributions werein topology, Brouwer never gave courses on topology, but always on – andonly on – the foundations of intuitionism. It seemed that he was no longerconvinced of his results in topology because they were not correct fromthe point of view of intuitionism, and he judged everything he had donebefore, his greatest output, false according to his philosophy. He was a verystrange person, crazy in love with his philosophy”.

17.5 The Peano Axiom System for Natural Numbers 229

17.5 The Peano Axiom System for NaturalNumbers

The Italian mathematician Peano (1858–1932) set up an axiom systemfor the natural numbers using as undefined concepts “natural number”,“successor”, “belong to”, “set” and “equal to”. His five axioms are

1. 1 is a natural number

2. 1 is not the successor of any other natural number

3. Each natural number n has a successor

4. If the successors of n and m are equal then so are n and m

There is a fifth axiom which is the axiom of mathematical induction statingthat if a property holds for any natural number n, whenever it holds forthe natural number preceding n and it holds for n = 1, then it holds forall natural numbers. Starting with these five axioms, one can derive all thebasic properties of real numbers indicated above.

We see that the Peano axiom system tries to catch the essence of ourintuitive feeling of natural numbers as resulting from successively adding1 without ever stopping. The question is if we get a more clear idea of thenatural numbers from the Peano axiom system than from our intuitive feel-ing. Maybe the Peano axiom system helps to identify the basic propertiesof natural numbers, but it is not so clear what the improved insight reallyconsists of.

The logicist Russell proposed in Principia Mathematica to define thenatural numbers using set theory and logic. For instance, the number 1would be defined roughly speaking as the set of all singletons, the numbertwo the set of all dyads or pairs, the number three as the set of all triples,et cetera. Again the question is if this adds insight to our conception ofnatural numbers?

17.6 Real Numbers

Many textbooks in calculus start with the assumption that the reader isalready familiar with real numbers and quickly introduce the notation R

to denote the set of all real numbers. The reader is usually reminded thatthe real numbers may be represented as points on the real line depictedas a horizontal (thin straight black) line with marks indicating 1, 2, andmaybe numbers like 1.1, 1.2,

√2, π et cetera. This idea of basing arith-

metic, that is numbers, on geometry goes back to Euclid, who took thisroute to get around the difficulties of irrational numbers discovered by thePythagoreans. However, relying solely on arguments from geometry is very


impractical and Descartes turned the picture around in the 17th century bybasing geometry on arithmetic, which opened the way to the revolution ofCalculus. The difficulties related to the evasive nature of irrational numbersencountered by the Pythagoreans, then of course reappeared, and the re-lated questions concerning the very foundations of mathematics graduallydeveloped into a quarrel with fierce participation of many of the greatestmathematicians which culminated in the 1930s, and which has shaped themathematical world of today.

We have come to the standpoint above that a real number may be de-fined through its decimal expansion. A rational real number has a decimalexpansion that eventually becomes periodic. An irrational real number hasan expansion which is infinite and is not periodic. We have defined R asthe set of all possible infinite decimal expansions, with the agreement thatthis definition is a bit vague because the meaning of “possible” is vague.We may say that we use a constructivist/intuitionist definition of R.

The formalist/logicist would rather like to define R as the set of all infinitedecimal expansions, or set of all Cauchy sequences of rational numbers, inwhat we called a universal Big Brother style above.

The set of real numbers is often referred to as the “continuum” of realnumbers. The idea of a “continuum” is basic in classical mechanics whereboth space and time is supposed to be “continuous” rather than “discrete”.On the other hand, in quantum mechanics, which is the modern versionof mechanics on the scales of atoms and molecules, matter starts to showfeatures of being discrete rather than continuous. This reflects the famousparticle-wave duality in quantum mechanics with the particle being dis-crete and the wave being continuous. Depending on what glasses we use,phenomena may appear to be more or less discrete or continuous and nosingle mode of description seems to suffice. The discussions on the nature ofreal numbers may be rooted in this dilemma, which may never be resolved.

17.7 Cantor Versus Kronecker

Let us give a glimpse of the discussion on the nature of real numbersthrough two of the key personalities, namely Cantor (1845–1918) in the for-malist corner and Kronecker (1823–91), in the constructivist corner. Thesetwo mathematicians were during the late half of the 19th century involvedin a bitter academic fight through their professional lives (which eventuallyled Cantor into a tragic mental disorder). Cantor created set theory and inparticular a theory about sets with infinitely many elements, such as the setof natural numbers or the set of real numbers. Cantors theory was criticizedby Kronecker, and many others, who simply could not believe in Cantorsmental constructions or consider them to be really interesting. Kroneckertook a down-to-earth approach and said that only sets with finitely many

17.7 Cantor Versus Kronecker 231

elements can be properly understood by human brains (“God created theintegers, all else is the work of man”). Alternatively, Kronecker said thatonly mathematical objects that can be “constructed” in a finite number ofsteps actually “exist”, while Cantor allowed infinitely many steps in a “con-struction”. Cantor would say that the set of all natural numbers that is theset with the elements 1, 2, 3, 4, . . ., would “exist” as an object in itself as theset of all natural numbers which could be grasped by human brains, whileKronecker would deny such a possibility and reserve it to a higher being. Ofcourse, Kronecker did not claim that there are only finitely many naturalnumbers or that there is a largest natural number, but he would (followingAristotle) say that the existence of arbitrarily large natural numbers is likea “potential” rather than an actual reality.

Fig. 17.7. Cantor (left): “I realize that in this undertaking I place myself ina certain opposition to views widely held concerning the mathematical infiniteand to opinions frequently defended on the nature of numbers”. Kronecker (right):“God created the integers, all else is the work of man”

In the first round, Kronecker won since Cantor’s theories about the infi-nite was rejected by many mathematicians in the late 19th and beginning20th century. But in the next round, the influential mathematician Hilbert,the leader of the formalist school, joined on the side of Cantor. BertrandRussell and Norbert Whitehead tried to give mathematics a foundationbased on logic and set theory in their monumental Principia Mathematica(1910–13) and may also be viewed as supporters of Cantor. Thus, despitethe strong blow from Godel in the 1930’s, the formalist/logicist schoolstook over the scene and have been dominating mathematics education intoour time. Today, the development of the computer as is again starting toshift the weight to the side of the constructivists, simply because no com-puter is able to perform infinitely many operations nor store infinitely manynumbers, and so the old battle may come alive again.

Cantor’s theories about infinite numbers have mostly been forgotten, butthere is one reminiscence in most presentations of the basics of Calculus,


namely Cantors’s argument that the degree of infinity of the real numbersis strictly larger than that of the rational or natural numbers. Cantor ar-gued as follows: suppose we try to enumerate the real numbers in a listwith a first real number r1, a second real number r2 and so on. Cantorclaimed that in any such list there must be some real numbers missing, forexample any real number that differs from r1 in the first decimal, from r2in the second decimal and so on. Right? Kronecker would argue againstthis construction simply by asking full information about for example r1,that is, full information about all the digits of r1. OK, if r1 was rationalthen this could be given, but if r1 was irrational, then the mere listing of allthe decimals of r1 would never come to an end, and so the idea of a list ofreal numbers would not be very convincing. So what do you think? Cantoror Kronecker?

Cantor not only speculated about different degrees of infinities, but alsocleared out more concrete questions about e.g. convergence of trigonometricseries viewing real numbers as limits of of Cauchy sequences of rationalnumbers in pretty much the same we have presented.

17.8 Deciding Whether a Number is Rationalor Irrational

We dwell a bit more on the nature of real numbers. Suppose x is a real num-ber, the decimals of which can be determined one by one by using a certainalgorithm. How can we tell if x is rational or irrational? Theoretically, if thedecimal expansion is periodic then x is rational otherwise it is irrational.There is a practical problem with this answer however because we can onlycompute a finite number of digits, say never more than 10100. How can webe sure that the decimal expansion does not start repeating after that? Tobe honest, this question seems very difficult to answer. Indeed it appearsto be impossible to tell what happens in the complete decimal expansionby looking at a finite number of decimals. The only way to decide if a num-ber x is rational or irrational is figure out a clever argument like the onethe Pythagoreans used to show that

√2 is irrational. Figuring out such

arguments for different specific numbers like π and e is an activity that hasinterested a lot of mathematicians over the years.

On the other hand, the computer can only compute rational numbers andmoreover only rational numbers with finite decimal expansions. If irrationalnumbers do not exist in practical computations, it is reasonable to wonder ifthey truly exist. Constructive mathematicians like Kronecker and Brouwerwould not claim that irrational numbers really exist.

17.9 The Set of All Possible Books 233

17.9 The Set of All Possible Books

We suggest it is reasonable to define the set of all real numbers R as theset of all possible decimal expansions or equivalently the set of all pos-sible Cauchy sequences of rational numbers. Periodic decimal expansionscorrespond to rational numbers and non-periodic expansions to irrationalnumbers. The set R thus consists of the set of all rational numbers togetherwith the set of all irrational numbers. We know that it is common to omitthe word “possible” in the suggested definition of R and define R as “theset of all real numbers”, or “the set of all infinite decimal expansions”.

Let’s see if this hides some tricky point by way of an analogy. Supposewe define a “book” to be any finite sequence of letters. There are specificbooks such as “The Old Man and the Sea” by Hemingway, “The Authoras a Young Dog” by Thomas, “Alice in Wonderland” by Lewis Carrol, and“1984” by Orwell, that we could talk about. We could then introduce B as“the set of all possible books”, which would consist of all the books thathave been and will be written purposely, together with many more “books”that consist of random sequences of letters. These would include those fa-mous books that are written or could be written by chimpanzees playingwith typewriters. We could probably handle this kind of terminology with-out too much difficulty, and we would agree that 1984 is an element of B.More generally, we would be able to say that any given book is a memberof B. Although this statement is difficult to deny, it is also hard to say thatthis ability is very useful.

Suppose now we omit the word possible and start to speak of B as “theset of all books”. This could give the impression that in some sense B is anexisting reality, rather than some kind of potential as when we speak about“possible books”. The set B could then be viewed as a library containingall books. This library would have to be enormously large and most of the“books” would be of no interest to anyone. Believing that the set of allbooks “exists” as a reality would not be very natural for most people.

The set of real numbers R has the same flavor as the set of all books B.It must be a very large set of numbers of which only a relative few, suchas the rational numbers and a few specific irrational numbers, are everencountered in practice. Yet, it is traditional to define R as the set of realnumbers, rather than as “set of all possible real numbers”. The reader maychoose the interpretation of R according to his own taste. A true idealistwould claim that the set of all real numbers “exists”, while a down-to-earthperson would more likely speak about the set of possible real numbers.Eventually, this may come down a personal religious feeling; some peopleappear to believe that Heaven actually exists, and while others might viewas a potential or as a poetic way of describing something which is difficultto grasp.

Whatever interpretation you choose, you will certainly agree that somereal numbers are more clearly specified than others, and that to specify


a real number, you need to give some algorithm allowing you to determineas many digits of the real number as would be possible (or reasonable) toask for.

17.10 Recipes and Good Food

Using the Bisection algorithm, we can compute any number of decimals of√2 if we have enough computational power. Using an algorithm to specify

a number is analogous to using a recipe to specify for example Grandpa’sChocolate Cake. By following the recipe, we can bake a cake that is a moreor less accurate approximation of the ideal cake (which only Grandpa canmake) depending on our skill, energy, equipment and ingredients. There isa clear difference between the recipe and cakes made from the recipe, sinceafter all we can eat a cake with pleasure but not a recipe. The recipe is likean algorithm or scheme telling us how to proceed, how many eggs to usefor example, while cakes are the result of actually applying the algorithmwith real eggs.

Of course, there are people who seem to enjoy reading recipes, or evenjust looking at pictures of food in magazines and talking about it. But ifthey never actually do cook anything, their friends are likely to lose interestin this activity. Similarly, you may enjoy looking at the symbols π,

√2 et

cetera, and talking about them, or writing them on pieces of paper, but ifyou you never actually compute them, you may come to wonder what youare actually doing.

In this book, we will see that there are many mathematical quantitiesthat can only be determined approximately using a computational algo-rithm. Examples of such quantities are

√2, π, and the base e of the natural

logarithm. Later we will find that there are also functions, even elementaryfunctions like sin(x) and exp(x) that need to be computed for differentvalues of x. Just as we first need to bake a cake in order to enjoy it, wemay need to compute such ideal mathematical quantities using certain al-gorithms before using them for other purposes.

17.11 The “New Math” in Elementary Education

After the defeat of formalists in the 1930s by the arguments of Godel, para-doxically the formalist school took over and set theory got a new chance.A wave generated by this development struck the elementary mathematicseducation in the 1960s in the form of the “new math”. The idea was toexplain numbers using set theory, just as Russell and Whitehead had triedto do 60 years earlier in their Principia. Thus a kid would learn that a setconsisting of one cow, two cups, a piece of chocolate and an orange, would

17.12 The Search for Rigor in Mathematics 235

have five elements. The idea was to explain the nature of the number 5this way rather than counting to five on the fingers or pick out 5 orangesfrom a heap of oranges. This type of “new math” confused the kids, andthe parents and teachers even more, and was abandoned after some yearsof turbulence.

17.12 The Search for Rigor in Mathematics

The formalists tried to give mathematics a rigorous basis. The search forrigor was started by Cauchy and Weierstrass who tried to give precise def-initions of the concepts of limit, derivative and integral, and was continuedby Cantor and Dedekind who tried to clarify the precise meaning of con-cepts such as continuum, real number, the set of real numbers et cetera.Eventually this effort of giving mathematics a fully rational basis collapsed,as we have indicated above.

We may identify two types of rigor:

constructive rigor

formal rigor.

Constructive rigor is necessary to accomplish difficult tasks like carryingout a heart operation, sending a man to the moon, building a tall sus-pension bridge, climbing Mount Everest, or writing a long computer pro-gram that works properly. In each case, every little detail may count andif the whole enterprize is not characterized by extreme rigor, it will mostlikely fail. Eventually this is a rigor that concerns material things, or realevents.

Formal rigor is of a different nature and does not have a direct con-crete objective like the ones suggested above. Formal rigor may be ex-ercised at a royal court or in diplomacy, for example. It is a rigor thatconcerns language (words), or manners. The Scholastic philosophers dur-ing the Medieval time, were formalists who loved formal rigor and coulddiscuss through very complicated arguments for example the question howmany Angels could fit onto the edge of a knife. Some people use a very ed-ucated formally correct language which may be viewed as expressing a for-mal rigor. Authors pay a lot of attention to the formalities of language, andmay spend hour after hour polishing on just one sentence until it gets justthe right form. More generally, formal aspects may be very important inArts and Aesthetics. Formal rigor may be thus very important, but servesa different purpose than constructive rigor. Constructive rigor is there toguarantee that something will actually function as desired. Formal rigormay serve the purpose of controlling people or impressing people, or justmake people feel good, or to carry out a diplomatic negotiation. Formalrigor may be exercised in a game or play with certain very specific rules,


that may be very strict, but do not serve a direct practical purpose outsidethe game.

Also in mathematics, one may distinguish between concrete and formalerror. A computation, like multiplication of two natural numbers, is a con-crete task and rigor simply means that the computation is carried out ina correct way. This may be very important in economics or engineering. Itis not difficult to explain the usefulness of this type of constructive rigor,and the student has no difficulty in formulating himself what the criteriaof constructive rigor might be in different contexts.

Formal rigor in calculus was promoted by Weierstrass with the objectiveof making basic concepts and arguments like the continuum of real num-bers or limit processes more “formally correct”. The idea of formal rigor isstill alive very much in mathematics education dominated by the formalistschool. Usually, students cannot understand the meaning of this type of“formally rigorous reasoning”, and very seldom can exercise this type ofrigor without much direction from the teacher.

We shall follow an approach where we try to reach constructive rigor toa degree which can be clearly motivated, and we shall seek to make theconcept of formal rigor somewhat understandable and explain some of itsvirtues.

17.13 A Non-Constructive Proof

We now give an example of a proof with non-constructive aspects thatplays an important role in many Calculus books. Although because of thenon-constructive aspects, the proof is considered to be so difficult that itcan only by appreciated by selected math majors.

The setting is the following: We consider a bounded increasing sequencean∞1 of real numbers, that is an ≤ an+1 for n = 1, 2, . . ., and there is sTS

d

constantC such that an ≤ C for n = 1, 2, . . .. The claim is that the sequencean∞1 converges to a limit A. The proof goes as follows: all the numbersan clearly belong to the interval I = [a1, C]. For simplicity suppose a1 = 0and C = 1. Divide now the interval [0, 1] into the two intervals [0, 1/2] and[1/2, 1]. and make the following choice: if there is a real number an suchthat an ∈ [1/2, 1], then choose the right interval [1/2, 1] and if not choosethe left interval [0, 1/2]. Then repeat the subdivision into a left and a rightinterval, choose one of the intervals following the same principle: if there isa real number an in the right interval, then choose this interval, and if notchoose the left interval. We then get a nested sequence of intervals withlength tending to zero defining a unique real number that is easily seen tobe the limit of the sequence an∞1 . Are you convinced? If not, you mustbe a constructivist.

TSd Please check “s”.


17.14 Summary 237

So where is the hook of non-constructiveness in this proof? Of course,it concerns the choice of interval: in order to choose the correct intervalyou must be able to check if there is some an that belongs to the rightinterval, that is you must check if an belongs to the right interval for allsufficiently large n. The question from a constructivist point of view isif we can perform each check in a finite number of steps. Well, this maydepend on the particular sequence an ≤ an+1 under consideration. Let’sfirst consider a sequence which is so simple that we may say that we knoweverything of interest: for example the sequence an∞1 with an = 1 −2−n, that is the sequence 1/2, 3/4, 7/8, 15/16, 31/32, . . ., which is a boundedincreasing sequence clearly converging to 1. For this sequence, we would beable to always choose the correct interval (the right one) because of itssimplicity.

We now consider the sequence an∞1 with an =∑n

11k2 , which is clearly

an increasing sequence, and one can also quite easily show that the se-quence is bounded. In this case the choice of interval is much more tricky,and it is not clear how to make the choice constructively without actuallyconstructing the limit. So there we stand, and we may question the valueof the non-constructive proof of existence of a limit, if we anyway have toconstruct the limit.

At any rate we sum up in the following result that we will use a a coupleof times below.

Theorem 17.1 (non-constructive!) A bounded increasing sequence con-verges.

17.14 Summary

The viewpoint of Plato was to say that ideal points and lines exist in someHeaven above, while the points and lines which we as human beings candeal with, are some more or less incomplete copies or shades or imagesof the ideals. This is Plato’s idealistic approach, which is related to theformalistic school. An intuitionist would say that we can never be sure ofthe existence of the ideals, and that we should concentrate on the moreor less incomplete copies we can construct ourselves as human beings. Thequestion of the actual existence of the ideals thus becomes a question ofmetaphysics or religion, to which there probably is no definite answer. Fol-lowing our own feelings, we may choose to be either a idealist/formalist oran intuitionist/contructivist, or something in between.

The authors of this book have chosen such a middle way between theconstructivist and formalist schools, trying always to be as constructive as ispossible from a practical point of view, but often using a formalist languagefor reasons of convenience. The constructive approach puts emphasis on theconcrete aspects of mathematics and brings it close to engineering and


“body”. This reduces the mystical character of mathematics and helpsunderstanding. On the other hand, mathematics is not equal to engineeringor only “body”, and also the less concrete aspects or “soul” are useful forour thinking and in modeling the world around us. We thus seek a goodsynthesis of constructive and formalistic mathematics, or a synthesis ofBody & Soul.

Going back to the start of our little discussion, we thus associate the logi-cist and formalistic schools with the idealistic/aristochratic tradition andthe constructivists with the constructive/democratic tradition. As students,we would probably appreciate a constructive/democratic approach, since itaids the understanding and gives the student an active role. On the otherhand, certain things indeed are very difficult to understand or construct,and then the idealistic/arisochratic approach opens a possible attitude tohandle this dilemma.

The constructivist approach, whenever feasible, is appealing from educa-tional point of view, since it gives the student an active role. The studentis invited to construct himself, and not just watch an omnipotent teacherpick ready-made examples from Heaven.

Of course, the development of the modern computer has meant a tremen-dous boost of constructive mathematics, because what the computer doesis constructive. Mathematics education is still dominated by the formalistschool, and the most of the problems today afflicting mathematics educa-tion can be related to the over-emphasis of the idealistic school in timeswhen constructive mathematics is dominating in applications.

Turing’s principle of a “universal computing machine” directly connectsthe work on the foundations of mathematics in the 1930s (with Computablenumbers as a key article), with the development of the modern computer inthe 1940s (with ACE as a key example), and thus very concretely illustratesthe power of (constructive!) mathematics.


Chapter 17 Problems

17.1. Can you figure out how the barber’s paradox is constructed? Suppose thebarber comes from another village. Does this resolve the paradox?

17.2. Another paradox of a similar kind goes as follows: Consider all the naturalnumbers which you can describe using at most 100 words or letters. For instance,you can describe the number 10 000 by the words “ten thousand” or “a onefollowed by four zeros”. Describe now a number by specifying it as the smallestnatural number which can not be described in at most one hundred words. Butthe sentence “the smallest natural number which can not be described in at mostone hundred words” is a description of a certain number with fewer than 100words (15 to be exact), which contradicts the very definition of the number asthe number which could not be described with less than 100 words. Can youfigure out how the paradox arises?

17.3. Describe as closely as you can what you mean by a point or line. Aska friend to do the same, and try to figure out if your concepts are the same.

17.4. Study how the concept of real numbers is introduced by browsing throughthe first pages of some calculus books in your nearest library or on your bookshelf.

17.5. Define the number ω ∈ (0, 1) as follows: let the first digit of ω be equal toone if there are exactly 10 digits in a row equal to one in the decimal expansionof

√2 and zero else, let the second be equal to one if there are exactly 20 digits in

a row equal to one in the decimal expansion of√

2 and zero else, and so on. Is ωa well defined real number? How many digits of ω could you think to be possibleto compute?

17.6. Make a poll about what people think a real number is, from friends,relatives, politicians, rock musicians, to physics and mathematics professors.

Some distinguished mathematicians have recently advocated the moreor less complete banishment from mathematics of all non-constructiveproofs. Even if such a program were desirable, it would involve tremen-dous complications and even the partial destruction of the body ofliving mathematics. For this reason it is no wonder that the school of“intuitionism”, which has adopted this program, has met with strongresistance, and that even the most thoroughgoing intuitionists can-not always live up to their convictions. (Courant)

The composition of vast books is a laborious and impoverishingextravagance. To go on for five hundred pages developing an ideawhose perfect oral exposition is possible in a few minutes! A bettercourse of procedure is to pretend that these books already exist, andthen to offer a resume, a commentary. . .More reasonable, more in-ept, more indolent, I have preferred to write notes upon imaginarybooks. (Borges, 1941)


I have always imagined that Paradise will be kind of a library.(Borges)

My prize book at Sherbourne School (von Neumann’s MathematischeGrundlagen der Quantenmechanik) is turning out very interesting,and not at all difficult reading, although the applied mathematiciansseem to find it rather strong. (Turing, age 21)

Fig. 17.8. View of the river Cam at Cambridge 2003 with ACE in the fore-ground(and “UNTHINKABLE” in the background to the right)

18The Function y = xr

With equal passion I have sought knowledge. I have wished to un-derstand the secrets of men. I have wished to know why the starsshine. And I have tried to apprehend the Pythagorean power bywhich numbers hold sway about the flux. A little of this, but notmuch, I have achieved. (Bertrand Russell 1872–1970).

18.1 The Function√

x

We showed above that we can solve the equation x2 = a for any positiverational number a using the Bisection algorithm. The unique positive so-lution is a real number denoted by

√a. We can view

√a as a function of

a defined for a ∈ Q+. Of course, we can extend the function√a to [0,∞)

since 02 = 0 or√

0 = 0.Changing names from a to x, we now consider the function f(x) =

√x

with D(f) = Q+ and f : Q+ → R+. As explained in the Chapter Realnumbers, we can extend this into a function f : R+ → R+ with f(x) =

√x,

using the Lipschitz continuity of√x on intervals (δ,∞) with δ > 0 as

discussed below. Since by definition√x is the solution to the equation

y2 = x with y as unknown, we have for x ∈ R+,

(√x)2 = x. (18.1)

We plot the function√x in Fig. 18.1.

The function y =√x is increasing: if x > x, then

√x >

√x. Further,

if xi is a sequence of positive real numbers with limi→∞ xi = 0, then

242 18. The Function y = xr

0 2 4 6 8 10 12 14 16 180

0.5

1

1.5

2

2.5

3

3.5

4

4.5

x

the

square

root

ofx

Fig. 18.1. The function√x of x

obviously limi→∞√xi = 0, that is

limx→0+

√x = 0. (18.2)

18.2 Computing with the Function√

x

If x2 = a and y2 = b, then (xy)2 = ab. This gives the following property ofthe square root function, √

a√b =

√ab. (18.3)

We find similarly that √a√b

=√a

b. (18.4)

18.3 Is√

x Lipschitz Continuous on R+?

To check if the function f(x) =√x is Lipschitz continuous on R+, we note

that since (√x−

√x)(

√x+

√x) = x− x, we have

f(x) − f(x) =√x−

√x =

1√x+

√x

(x − x).

Since1√

x+√x,

18.4 The Function xr for Rational r = pq

243

can be arbitrarily large by making x and x small positive, the functionf(x) =

√x does not have a bounded Lipschitz constant on R+ and f(x) =√

x is not Lipschitz continuous on R+. This reflects the observation thatthe “slope” of

√x seems to increase without bound as x approaches zero.

However, f(x) =√x is Lipschitz continuous on any interval (δ,∞) where

δ is a fixed positive number, since we may then choose the Lipschitz con-stant Lf equal to 1

2δ .

18.4 The Function xr for Rational r = pq

Consider the equation yq = xp in the unknown y, where p and q are givenintegers and x is a given positive real number. Using the Bisection algo-rithm, we can prove that this equation has a unique solution y for any givenpositive x. We call the solution y = x

pq = xr, where r = p

q . In this way, wedefine a function f(x) = xr on R

+ known as “x to the power r”. Unique-ness follows from realizing that y = xr is increasing with x. Apparently,√x = x

12 .

18.5 Computing with the Function xr

Using the defining equation yq = xp as above, we find that for x ∈ R+ andr, s ∈ Q,

xrxs = xr+s,xr

xs= xr−s. (18.5)

18.6 Generalizing the Conceptof Lipschitz Continuity

There is a natural generalization of the concept of Lipschitz continuity thatgoes as follows. Let 0 < θ ≤ 1 be a given number and L a positive constant,and suppose the function f : R → R satisfies

|f(x) − f(y)| ≤ L|x− y|θ for all x, y ∈ R.

We say that f : R → R is Lipschitz continuous with exponent θ andLipschitz constant L (or Holder continuous with exponent θ and constant Lwith a common terminology).

This generalizes the previous notion of Lipschitz continuity that corre-sponds to θ = 1. Since θ can be smaller than one, we thus consider a largerclass of functions. For example, we show that the function f(x) =

√x is

244 18. The Function y = xr

Lipschitz continuous on (0,∞) with exponent θ = 1/2 and Lipschitz con-stant L = 1, that is

|√x−

√x| ≤ |x− x|1/2. (18.6)

To prove this estimate, we assume that x > x and compute backwards,starting with x ≤

√x√x to get x+ x− 2

√x√x ≤ x− x = |x − x| , which

can be written(√x−

√x)2 ≤ (|x− x|1/2)2

from which the desired estimate follows by taking the square root. The casex > x is the same.

Functions that are Lipschitz continuous with Lipschitz exponent θ < 1may be quite “wild”. In the “worst case”, they may behave “everywhere” as“badly” as

√x does at x = 0. An example is given by Weierstrass function

presented in the Chapter Fourier series. Take a look!

18.7 Turbulent Flow is Holder (Lipschitz)Continuous with Exponent 1

3

In Chapter Navier-Stokes, Quick and Easy we give an argument indicatingthat turbulent flow is Holder (Lipschitz) continuous with exponent 1

3 sothat a turbulent velocity u(x) would satisfy

|u(x) − u(y)| ∼ L|x− y| 13 .

Such a turbulent velocity is a quite “wild” function which varies veryquickly. Thus, Nature is not unfamiliar with Holder (Lipschitz) continu-ity with exponent θ < 1.

Chapter 18 Problems

18.1. Let x, y ∈ R and r, s ∈ Q. Verify the following computing rules: (a)xr+s = xrxs (b) xr−s = xr/xs (c) xrs = (xr)s (d) (xy)r = xrys

18.2. Is f(x) = 3√x, Lipschitz continuous on (0,∞) in the generalized sense? If

yes give then the Lipschitz constant and exponent.

18.3. A Lipschitz continuous function with a Lipschitz constant L with 0 ≤L < 1 is also called a contraction mapping. Which of the following functionsare contraction mappings on R? (a) f(x) = sin x (b) f(x) = 1

1+x2 (c)

f(x) = (1 + x2)−1/2 (d) f(x) = x3

18.4. Let f(x) = 1, for x ≤ 0, and f(x) =√

1 + x2, for x > 0. Is f a contractionmapping?

19Fixed Points and ContractionMappings

Give me one fixed point on which to stand, and I will move the Earth.(Archimedes)

19.1 Introduction

A special case of the basic problem of solving an algebraic equation f(x) = 0takes the form: find x such that

x = g(x), (19.1)

where g : R → R is a given Lipschitz continuous function. The equation(19.1) says that x is a fixed point of the function y = g(x), that is theoutput value g(x) is the same as the input value x. Graphically, we seekthe intersection of the graphs of the line y = x and the curve y = g(x), seeFig. 19.1.

To solve the equation x = g(x), we could rewrite it as f(x) = 0 with(for example) f(x) = x − g(x) and then apply the Bisection (or Deca-section) algorithm to f(x) = 0. Note that the two equations f(x) = 0 withf(x) = x − g(x) and x = g(x) have exactly the same solutions, that is thetwo equations are equivalent.

In this chapter we consider a different algorithm for solving the equation(19.1) that is of central importance in mathematics. This is the Fixed PointIteration algorithm, which takes the following form: Starting with some x0,for i = 1, 2, . . . , compute

xi = g(xi−1) for i = 1, 2, 3, . . . . (19.2)

246 19. Fixed Points and Contraction Mappings

y = x

y = g(x)

x

x = g(x)

Fig. 19.1. Illustration of a fixed point problem g(x) = x

In words, we start with an initial approximation x0 then compute x1 =g(x0), x2 = g(x1), x3 = g(x2), and so on. Stepwise, given a current valuexi−1, we compute the corresponding output g(xi−1), and then choose asnew input xi = g(xi−1). Repeating this procedure, we will generate a se-quence xi∞i=1.

We shall below study the following basic questions related to the sequencexi∞i=1 generated by Fixed Point Iteration:

Does xi∞i=1 converge, that is does x = limi→∞ xi exist?

Is x = limi→∞ xi a fixed point of y = g(x), that is x = g(x)?

We shall also investigate whether or not a fixed point x is uniquely deter-mined.

19.2 Contraction Mappings

We shall prove in this chapter that both the above questions have affirma-tive answers if g(x) is Lipschitz continuous with Lipschitz constant L < 1,i.e.

|g(x) − g(y)| ≤ L|x− y| for all x, y ∈ R, (19.3)

with L < 1. We shall also see that the smaller L is, the quicker the con-vergence of the sequence xi to a fixed point, and the happier we willbe.

A function g : R → R satisfying (19.3) with L < 1 is said to be a con-traction mapping. We may summarize the basic result of this chapter asfollows: A contraction mapping has a unique fixed point that is the limit ofa sequence generated by Fixed Point Iteration. This is a most fundamentalresult of mathematics with a large number of applications. Sometimes it

19.3 Rewriting f(x) = 0 as x = g(x) 247

is referred to as Banach’s contraction mapping theorem. Banach was a fa-mous Polish mathematician, who created much of the field of FunctionalAnalysis, which is a generalization of Calculus and Linear Algebra.

19.3 Rewriting f(x) = 0 as x = g(x)

Fixed Point Iteration is an algorithm for computing roots of equations ofthe form x = g(x). If we are given an equation of the form f(x) = 0, wemay want to rewrite this equation in the form of a fixed point equationx = g(x). This can be done in many ways, for example by setting

g(x) = x+ αf(x),

where α is a nonzero real number to be chosen. Clearly, we have x = g(x)if and only if f(x) = 0. To obtain quick convergence, one would try tochoose α so that the Lipschitz constant of the corresponding function g(x)is small. We shall see that trying to find such values of α leads to thewonderful world of Newton methods for solving equations, which is a veryimportant part of mathematics.

A preliminary computation to find a good value of α to make g(x) =x + αf(x) have a small Lipschitz constant could go as follows. Assumingx > y,

g(x) − g(y) = x+ αf(x) − (y + αf(y)) = x− y + α(f(x) − f(y))

=(

1 + αf(x) − f(y)

x− y

)

|x− y|,

which suggests choosing α to satisfy

− 1α

=f(x) − f(y)

x− y.

We arrive at the same formula for x < y. We will return to this formulabelow. We note in particular the appearance of the quotient

f(x) − f(y)x− y

,

which represents the slope of the corda or secant connecting the points(x, f(x)) and (y, f(y)) in R

2, see Fig. 19.2.We now consider two models from everyday life leading to fixed point

problems and apply the Fixed Point Iteration to solve them. In each case,the fixed point represents a balance or break-even of income and spend-ing, with input equal to output. We then prove the contraction mappingtheorem.


(x, f(x))

(y, f(y))

Fig. 19.2. Corda connecting the points (x, f(x)) and (y, f(y)) in R2

19.4 Card Sales Model

A door-to-door salesman selling greeting cards has a franchise with a greet-ing card company with the following price arrangement. For each shipmentof cards, she pays a flat delivery fee of $25 dollars and on top of this forsales of x, where x is measured in units of a hundred dollars, she pays anadditional fee of 25% to the company. In mathematical terms, for sales ofx hundreds of dollars, she pays

g(x) =14

+14x (19.4)

where g is also given in units of a hundred dollars. The problem is to findthe “break-even point”, i.e. the amount of sales x where the money thatshe takes in (= x) exactly balances the money she has to pay out (g(x)),that is, her problem is to find the fixed point x satisfying x = g(x). Ofcourse, she hopes to see that she clears a profit with each additional saleafter this point.

We display the problem graphically in Fig. 19.3 in terms of two lines.The first line y = x represents the amount of money collected for sales ofx. In this problem, we measure sales in units of dollars, rather than sayin numbers of cards sold, so we just get y = x for this curve. The secondline y = g(x) = 1

4x + 14 represents the amount of money that has to be

paid to the greeting card company. Because of the initial flat fee of $25,the salesman starts with a loss. Then as sales increase, she reaches thebreak-even point x and finally begins to see a profit.

In this problem, it is easy to analytically compute the break-even point,that is, the fixed point x because we can solve the equation

x = g(x) =14x+

14

to get x = 1/3.

TSe Figures 19.3 and 19.4 differ from printout, is this ok?


19.5 Private Economy Model 249

money collected

money paid to company

x

profitloss

sales

y = ¼ x + ¼

y = x

¼

Fig. 19.3. Illustration of the problem of determining the break-even point for sell-ing greeting cards door-to-door. Sales above the break-even point x give a profitto the salesman, but sales below this point mean a lossTS

e

19.5 Private Economy Model

Your roommate has formulated the following model for her/his privateeconomy: denote the net income by x that is variable including contri-butions from family, fellowship and a temporary job at McDonalds. Thespending consists a fixed amount of 1 unit (of say 500 dollars per month) forrent and insurance, the variable amount of x/2 units for good food, goodbooks and intellectual movies, and a variable amount of 1/x units for junkfood, cigarettes and bad movies. This model is based on the observationthat the more money your roommate has, the more educated a life she/hewill live. The total spending is thus

g(x) =x

2+ 1 +

1x

and the pertinent question is to find a balance of income and spending,that is to find the income x such that x = g(x) where the spending is thesame as the income. If the income is bigger than x, then your roommatewill not use up all the money, which is against her/his nature, and if theincome is less than x, then your roommate’s father will get upset, becausehe will have to pay the resulting debt.

Also in this case, we can directly find the fixed point x by solving theequation

x =x

2+ 1 +

1x

analytically and we then find that x = 1 +√

3 ≈ 2.73.If we don’t have enough motivation to go through the details of this

calculation, we could instead try the Fixed Point Iteration. We would thenstart with an income x0 = 1 say and compute the spending g(1) = 2.5,


then choose the new income x1 = 2.5, and compute the spending g(x1) =g(2.5) = 2.65, and then set x2 = 2.65 and compute the spending g(x2) = . . .and so on. Of course, we expect that limi xi = x = 1 +

√3. Below, we will

prove that this is indeed true!

19.6 Fixed Point Iteration in the Card Sales Model

We now apply Fixed Point Iteration to the Card Sales Model. In Fig. 19.4,we plot the function g(x) = 1

4x + 14 along with y = x and the fixed point

x. We also plot the value of x1 = g(x0) for some initial approximation x0.

xx

g(x)

x

x

y=x

y=g(x)

Fig. 19.4. The first step of Fixed Point Iteration in Card Sales model: g(x) iscloser to x than xTS

e

We choose x0 < x because the sales start at zero and then increase. Fromthe plot, we can see that x1 = g(x0) is closer to x than x0, i.e.

|g(x0) − x| < |x0 − x|.

In fact, we can compute the difference exactly since x = 1/3,

|g(x0) − x| =∣∣∣∣14x0 +

14− 1

3

∣∣∣∣ =

∣∣∣∣14

(

x0 −13

)∣∣∣∣ =

14|x0 − x|.

So the distance from x1 = g(x0) to x is exactly 1/4 times the distance fromx0 to x. The same argument shows that the distance from x2 = g(x1) to xwill be 1/4 of the distance from x1 to x and thus 1/16 of the distance fromx0 to x. In other words,

|x2 − x| =14|x1 − x| =

116

|x0 − x|

We illustrate this in Fig. 19.5. Generally, we have

19.6 Fixed Point Iteration in the Card Sales Model 251

xx

g(x)

g(x)g(g(x))

xg(g(x))

y=x

y=g(x)

Fig. 19.5. Two steps of the contraction map algorithm applied to the fixed pointproblem in Model 19.4. The distance of g(g(x)) to x is 1/4 the distance from g(x)to x and 1/16 the distance from x to x

|xi − x| =14|xi−1 − x|,

and thus for i = 1, 2, . . . ,

|xi − x| = 4−i|x0 − x|.

Since 4−i gets as small as we please if i is sufficiently large, this estimateshows that Fixed Point Iteration applied to the Card Sales model converges,that is limi→∞ xi = x.

We consider some more examples before getting into the question ofconvergence of Fixed Point Iteration in a more general case.

Example 19.1. For the sake of comparison, we show the results for thefixed point problem in Model 19.4 computed by applying the fixed pointiteration to g(x) = 1

4x + 14 and the bisection algorithm to the equivalent

root problem for f(x) = − 34x + 1

4 . To make the comparison fair, we usethe initial value x0 = 1 for the fixed point iteration and x0 = 0 andX0 = 1 for the bisection algorithm and compare the values of Xi fromthe bisection algorithm to xi from the fixed point iteration in Fig. 19.6.The error of the fixed point iteration decreases by a factor of 1/4 for eachiteration as opposed to the error of the bisection algorithm which decreasesby a factor of 1/2. This is clear in the table of results. Moreover, since bothmethods require one function evaluation and one storage per iteration butthe bisection algorithm requires an additional sign check, the fixed pointiteration costs less per iteration. We conclude that the fixed point iterationis truly “faster” than the bisection algorithm for this problem.


Bisection Algorithm Fixed Point Iterationi Xi xi

0 1.00000000000000 1.000000000000001 0.50000000000000 0.500000000000002 0.50000000000000 0.375000000000003 0.37500000000000 0.343750000000004 0.37500000000000 0.335937500000005 0.34375000000000 0.333984375000006 0.34375000000000 0.333496093750007 0.33593750000000 0.333374023437508 0.335937500000009 0.3339843750000010 0.3339843750000011 0.3334960937500012 0.3334960937500013 0.33337402343750

Fig. 19.6. Results of the bisection algorithm and the fixed point iteration used tosolve the fixed point problem in Model 19.4. The error of the fixed point iterationdecreases more for each iteration

Example 19.2. In solving for the solubility of Ba(IO 3 ) 2 in Model 7.10, wesolved the root problem (16.3)

x(20 + 2x)2 − 1.57 = 0

using the bisection algorithm. The results are in Fig. 16.4. In this example,we use the fixed point iteration to solve the equivalent fixed point problem

g(x) =1.57

(20 + 2x)2= x. (19.5)

We know that g is Lipschitz continuous on any interval that avoids x = 10(and we also know that the fixed point/root is close to 0). We start off theiteration with x0 = 1 and show the results in Fig. 19.7.

i xi

0 1.000000000000001 0.004845679012352 0.003928806624653 0.003928085931694 0.003928085365275 0.00392808536483

Fig. 19.7. Results of the fixed point iteration applied to (19.5)

19.6 Fixed Point Iteration in the Card Sales Model 253

Example 19.3. In the case of the fixed point iteration applied to the CardSales model, we can compute the iterates explicitly:

x1 =14x0 +

14

and

x2 =14x1 +

14

=14

(14x0 +

14

)

+14

=142x0 +

142

+14

Likewise, we find

x3 =143x0 +

143

+142

+14

and after n steps

xn =14nx0 +

n∑

i=1

14i. (19.6)

The first term on the right-hand side of (19.6), 14n x0 converges to 0 as n

increases to infinity. The second term is equal to

n∑

i=1

14i

=14×

n−1∑

i=0

14i

=14×

1 − 14n

1 − 14

=1 − 1

4n

3

using the formula for the geometric sum. The second term therefore con-verges to 1/3, which is precisely the fixed point for (19.4), as n increasesto infinity.

An important observation about the last example is that the iterationconverges because the slope of g(x) = 1

4x + 14 is 1/4 < 1. This produces

a factor of 1/4 for each iteration, forcing the right-hand side of (19.6)to have a limit as n tends to infinity. Recalling that the slope of a linearfunction is the same thing as its Lipschitz constant, we can say this exampleworked because the Lipschitz constant of g is L = 1/4 < 1.

In contrast if the Lipschitz constant, or slope, of g is larger than 1 thenthe analog of (19.6) will not converge. We demonstrate this graphically inFig. 19.8 using the function g(x) = 2x+ 1

4 . The difference between successiveiterates increases with each iteration and the fixed point iteration does notconverge. It is clear from the plot that there is no positive fixed point. Onthe other hand, the fixed point iteration will converge when applied to anylinear function with Lipschitz constant L < 1. We illustrate the convergencefor g(x) = 3

4x+ 14 in Fig. 19.8. Thinking about (19.6), the reason is simply

that the geometric series with factor L converges when L < 1.


y = x

y = x

x1x0 x2 x3

g(x) = 2 x + ¼

g(x) = ¾ x + ¼

x0 x1 x2 x3 x

Fig. 19.8. On the left, we plot the first three fixed point iterates for g(x) = 2x+ 14.

The iterates increase without bound as the iteration proceeds. On the right, weplot the first three fixed point iterates for g(x) = 3

4x+ 1

4. The iteration converges

to the fixed point in this case

19.7 A Contraction Mapping Hasa Unique Fixed Point

We now go back to the general case presented in the introductory overview.We shall prove that a contraction mapping g : R → R has a unique fixedpoint x ∈ R given as the limit of a sequence generated by Fixed PointIteration. We recall that a contraction mapping g : R → R is a Lipschitzcontinuous function on R with Lipschitz constant L < 1. We organize theproof as follows:

1. Proof that xi∞i=1 is a Cauchy sequence.

2. Proof that x = limi→∞ xi is a fixed point.

3. Proof that x is unique.

Proof that xi∞i=1 is a Cauchy Sequence

To estimate |xi − xj | for j > i, we shall first prove an estimate for twoconsecutive indices, that is an estimate for |xk+1 − xk|. To this end, wesubtract the equation xk = g(xk−1) from xk+1 = g(xk) to get

xk+1 − xk = g(xk) − g(xk−1).

Using the Lipschitz continuity of g(x), we thus have

|xk+1 − xk| ≤ L|xk − xk−1|. (19.7)

Similarly,|xk − xk−1| ≤ L|xk−1 − xk−2|,

19.7 A Contraction Mapping Has a Unique Fixed Point 255

and thus|xk+1 − xk| ≤ L2|xk−1 − xk−2|.

Repeating the argument, we find that

|xk+1 − xk| ≤ Lk|x1 − x0|. (19.8)

We now proceed to use this estimate to estimate |xi − xj | for j > i. Wehave

|xi − xj | = |xi − xi+1 + xi+1 − xi+2 + xi+2 − · · · + xj−1 − xj |,

so that by the triangle inequality,

|xi − xj | ≤ |xi − xi+1|+ |xi+1 − xi+2|+ · · ·+ |xj−1 − xj | =j−1∑

k=i

|xk − xk+1|.

We now use (19.8) on each term |xk − xk+1| in the sum to get

|xi − xj | ≤j−1∑

k=i

Lk |x1 − x0| = |x1 − x0|j−1∑

k=i

Lk.

We compute

j−1∑

k=i

Lk = Li(1 + L+ L2 + · · · + Lj−i−1

)= Li 1 − Lj−i

1 − L,

using the formula for the sum of a geometric series. We now use the as-sumption that L < 1, to conclude that 0 ≤ 1 − Lj−i ≤ 1 and therefore forj > i,

|xi − xj | ≤Li

1 − L|x1 − x0|.

Since L < 1, the factor Li can be made as small as we please by tak-ing i large enough, and thus xi∞i=1 is a Cauchy sequence and thereforeconverges to a limit x = limi→∞ xi.

Note that the idea of estimating |xi − xj | for j > i by estimating |xk −xk+1| and using the formula for a geometric sum is fundamental and willbe used repeatedly below.

Proof that x = limi xi is a Fixed Point

Since g(x) is Lipschitz continuous, we have

g (x) = g(

limi→∞

xi

)= lim

i→∞g (xi) .


By the nature of the Fixed Point Iteration with xi = g(xi−1), we have

limi→∞

g(xi−1) = limi→∞

xi = x.

Since of course

limi→∞

g(xi−1) = limi→∞

g(xi),

we thus see that g(x) = x as desired. We conclude that the limit limi xi = xis a fixed point.

Proof of Uniqueness

Suppose that x and y are two fixed points, that is x = g(x) and y = g(y).Since g : R → R is a contraction mapping,

|x− y| = |g(x) − g(y)| ≤ L|x− y|

which is possible only if x = y since L < 1. This completes the proof.We have now proved that a contraction mapping g : R → R has a unique

fixed point given by Fixed Point Iteration. We summarize in the followingbasic theorem.

Theorem 19.1 A contraction mapping g : R → R has a unique fixedpoint x ∈ R, and any sequence xi∞i=1 generated by Fixed Point Iterationconverges to x.

19.8 Generalization to g : [a, b] → [a, b]

We may directly generalize this result by replacing R by any closed interval[a, b] of R. Taking the interval [a, b] to be closed guarantees that limi xi ∈[a, b] if xi ∈ [a, b]. It is critical that g maps the interval [a, b] into itself.

Theorem 19.2 A contraction mapping g : [a, b] → [a, b] has a unique fixedpoint x ∈ [a, b] and a sequence xi∞i=1 generated by Fixed Point Iterationstarting with a point x0 in [a, b] converges to x.

Example 19.4. We apply this theorem to g(x) = x4/(10−x)2. We can showthat g is Lipschitz continuous on [−1, 1] with L = .053 and the fixed pointiteration started with any x0 in [−1, 1] converges rapidly to the fixed pointx = 0. However, the Lipschitz constant of g on [−9.9, 9.9] is about 20× 106

and the fixed point iteration diverges rapidly if x0 = 9.9.

19.9 Linear Convergence in Fixed Point Iteration 257

19.9 Linear Convergence in Fixed Point Iteration

Let x = g(x) be the fixed point of a contraction mapping g : R → R andxi∞i=1 a sequence generated by Fixed Point Iteration. We can easily get anestimate on howquickly the error of the fixed point iteratexi decreases as i in-creases, that is the speed of convergence, as follows. Since x = g(x), we have

|xi − x| = |g(xi−1) − g(x)| ≤ L|xi−1 − x|, (19.9)

which shows that the error decreases by at least a factor of L < 1 duringeach iteration. The smaller L is the faster the convergence!

The error may actually decrease by exactly a factor of L, as in the CardSales model with g(x) = 1

4x + 14 , where the error decreases by exactly

a factor of L = 1/4 in each iteration.When the error decreases by (at least) a constant factor θ < 1 in each

step, we say that the convergence is linear with convergence factor θ. TheFixed Point Iteration applied to a contraction mapping g(x) with Lipschitzconstant L < 1 converges linearly with convergence factor L.

We compare in Fig. 19.9. the speed of convergence of Fixed Point Itera-tion applied to g(x) = 1

9x+ 34 and g(x) = 1

5x+ 2. The iteration for 19x+ 3

4

i xi for 19x+ 3

4 xi for 15x+ 2

0 1.00000000000000 1.000000000000001 0.86111111111111 2.200000000000002 0.84567901234568 2.440000000000003 0.84396433470508 2.488000000000004 0.84377381496723 2.497600000000005 0.84375264610747 2.499520000000006 0.84375029401194 2.499904000000007 0.84375003266799 2.499980800000008 0.84375000362978 2.499996160000009 0.84375000040331 2.4999992320000010 0.84375000004481 2.4999998464000011 0.84375000000498 2.4999999692800012 0.84375000000055 2.4999999938560013 0.84375000000006 2.4999999987712014 0.84375000000001 2.4999999997542415 0.84375000000000 2.4999999999508516 0.84375000000000 2.4999999999901717 0.84375000000000 2.4999999999980318 0.84375000000000 2.4999999999996119 0.84375000000000 2.4999999999999220 0.84375000000000 2.49999999999998

Fig. 19.9. Results of the fixed point iterations for 19x+ 3

4and 1

5x+ 2


reaches 15 places of accuracy within 15 iterations while the iteration for15x+ 2 has only 14 places of accuracy after 20 iterations.

19.10 Quicker Convergence

The functions 12x and 1

2x2 are both Lipschitz continuous on [−1/2, 1/2]

with Lipschitz constant L = 1/2, and have a unique fixed point x = 0. Theestimate (19.9) suggests the fixed point iteration for both should convergeto x = 0 at the same rate. We show the results of the fixed point iterationapplied to both in Fig. 19.10. We see that Fixed Point Iteration converges

i xi for 12x xi for 1

2x2

0 0.50000000000000 0.500000000000001 0.25000000000000 0.250000000000002 0.12500000000000 0.062500000000003 0.06250000000000 0.003906250000004 0.03125000000000 0.000015258789065 0.01562500000000 0.000000000232836 0.00781250000000 0.00000000000000

Fig. 19.10. Results of the fixed point iterations for 12x and 1

2x2

much more quickly for 12x

2, reaching 15 places of accuracy after 7 iterations.The estimate (19.9) thus does not give the full picture.

We now take a closer look into the argument behind (19.9) for the par-ticular function g(x) = 1

2x2. As above we have with x = 0,

xi − 0 =12x2

i−1 −1202 =

12(xi−1 + 0)(xi−1 − 0),

and thus|xi − 0| =

12|xi−1| |xi−1 − 0|.

We conclude that the error of Fixed Point Iteration for 12x

2 decreases bya factor of 1

2 |xi−1| during the i’th iteration. In other words,

for i = 1 the factor is 12 |x0|,



and so on. We see that the reduction factor depends on the value of thecurrent iterate.

Now consider what happens as the iteration proceeds and the iteratesxi−1 become closer to zero. The factor by which the error in each step

19.11 Quadratic Convergence 259

decreases becomes smaller as i increases! In other words, the closer the it-erates get to zero, the faster they get close to zero. The estimate in (19.9)significantly overestimates the error of the fixed point iteration for 1

2x2 be-

cause it treats the error as if it decreases by a fixed factor each time. Thusit cannot be used to predict the rapid convergence for this function. Fora function g, the first part of (19.9) tells the same story:

|xi − x| = |g(xi−1) − g(x)|.

The error of xi is determined by the change in g in going from x to theprevious iterate xi−1. This change can depend on xi−1 and when it does,the fixed point iteration does not converge linearly.

19.11 Quadratic Convergence

We now consider a second basic example, where we establish quadraticconvergence. We know that the Bisection algorithm for computing the rootof f(x) = x2 − 2 converges linearly with convergence factor 1/2: the errorgets reduced by the factor 1

2 after each step. We can write the equationx2 − 2 = 0 as the following fixed point equation

x = g(x) =1x

+x

2. (19.10)

To see this, it suffices to multiply the equation (19.10) by x. We now applyFixed Point Iteration to (19.10) to compute

√2 and show the result in

Fig. 19.11. We note that it only takes 5 iterations to reach 15 places ofaccuracy. The convergence appears to be very quick.

i xi

0 1.000000000000001 1.500000000000002 1.416666666666673 1.414215686274514 1.414213562374695 1.414213562373106 1.41421356237310

Fig. 19.11. The fixed point iteration for (19.10)


To see how quick the convergence in fact is, we seek a relation betweenthe error in two consecutive steps. Computing as in (19.9), we find that

∣∣∣xi −

√2∣∣∣ =

∣∣∣g (xi−1) − g

(√2)∣∣∣

=

∣∣∣∣∣

xi−1

2+

1xi−1

−(√

22

+1√2

)∣∣∣∣∣

=∣∣∣∣x2

i−1 + 22xi−1

−√

2∣∣∣∣ .

Now we find a common denominator for the fractions on the right and thenuse the fact that

(xi−1 −

√2)2

= x2i−1 − 2

√2xi−1 + 2

to get

∣∣∣xi −

√2∣∣∣ =

(xi−1 −

√2)2

2xi−1≈ 1

2√

2

(xi−1 −

√2)2

. (19.11)

We conclude that the error in xi is the square of the error of xi−1 up tothe factor 1

2√

2. This is quadratic convergence, which is very quick. In each

step of the iteration, the number of correct decimals doubles!

Fig. 19.12. Archimedes moving the Earth with a lever and a fixed point


Chapter 19 Problems

19.1. A salesman selling vacuum cleaners door-to-door has a franchise with thefollowing payment scheme. For each delivery of vacuum cleaners, the salesmanpays a fee of $100 and then a percentage of the sales, measured in units ofhundreds of dollars, that increases as the sales increases. For sales of x, thepercentage is 20x%. Show that this model gives a fixed point problem and makea plot of the fixed point problem that shows the location of the fixed point.

19.2. Rewrite the following fixed point problems as root problems three differentways each.

(a)x3 − 1

x+ 2= x (b) x5 − x3 + 4 = x

19.3. Rewrite the following root problems as fixed point problems three differentways each.

(a) 7x5 − 4x3 + 2 = 0 (b) x3 − 2

x= 0

19.4. (a) Draw a Lipschitz continuous function g on the interval [0, 1] thathas three fixed points such that g(0) > 0 and g(1) < 1. (b) Draw a Lipschitzcontinuous function g on the interval [0, 1] that has three fixed points such thatg(0) > 0 and g(1) > 1.

19.5. Write a program that implements Algorithm 19.2. The program shouldemploy two methods for stopping the iteration: (1) when the number of iterationsis larger than a user-input number and (2) when the difference between successiveiterates |xi − xi−1| is smaller than a user-input tolerance. Test the program byreproducing the results in Fig. 19.9 that were computed using MATLAB.

19.6. In Section 7.10, suppose that Ksp for Ba(IO 3 ) 2 is 1.8 × 10−5. Find thesolubility S to 10 decimal places using the program from Proposition 19.5 afterwriting the problem as a suitable fixed point problem. Hint: 1.8×10−5 = 18×10−6

and 10−6 = 10−2 × 10−4.

19.7. In Section 7.10, determine the solubility of Ba(IO 3 ) 2 in a .037 mole/litersolution of KIO 3 to 10 decimal places using the program from Proposition 19.5after writing the problem as a suitable fixed point problem.

19.8. The power P delivered into a load R of a simple class A amplifier of outputresistance Q and output voltage E is

P =E2R

(Q+R)2.

Find all possible solutions R for P = 1, Q = 3, and E = 4 to 10 decimal placesusing the program from Proposition 19.5 after writing the problem as a fixedpoint problem.


19.9. Van der Waal’s model for one mole of an ideal gas including the effects ofthe size of the molecules and the mutual attractive forces is

(P +

a

V 2

)(V − b) = RT,

where P is the pressure, V is the volume of the gas, T is the temperature, R is theideal gas constant, a is a constant depending on the size of the molecules and the at-tractive forces, and b is a constant depending on the volume of all the molecules inone mole. Find all possible volumes V of the gas corresponding to P = 2, T = 15,R = 3, a = 50, and b = .011 to 10 decimal places using the program from Proposi-tion 19.5 after writing the problem as a fixed point problem.

19.10. Verify that (19.6) is true.

19.11. (a) Find an explicit formula (similar to (19.6)) for the n’th fixed pointiterate xn for the function g(x) = 2x + 1

4. (b) Prove that xn diverges to ∞ as n

increases to ∞.

19.12. (a) Find an explicit formula (similar to (19.6)) for the n’th fixed pointiterate xn for the function g(x) = 3

4x + 1

4. (b) Prove that xn converges as n

increases to ∞ and compute the limit.

19.13. (a) Find an explicit formula (similar to (19.6)) for the n’th fixed pointiterate xn for the function g(x) = mx + b. (b) Prove that xn converges as nincreases to ∞ provided that L = |m| < 1 and compute the limit.

19.14. Draw a Lipschitz continuous function g that does not have the propertythat x in [0, 1] means that g(x) is in [0, 1].

19.15. (a) If possible, find intervals suitable for application of the fixed pointiteration to each of the three fixed point problems found in Problem 19.3(a). (b)If possible, find intervals suitable for application of the fixed point iteration toeach of the three fixed point problems found in Problem 19.3(b). In each case,a suitable interval is one on which the function is a contraction map.

19.16. Harder Apply Theorem 19.2 to the function g(x) = 1/(1 + x2) to showthat the fixed point iteration converges on any interval [a, b].

19.17. Given the following results of the fixed point iteration applied to a func-tion g(x),

i xi

0 14.000000000000001 14.250000000000002 14.468750000000003 14.660156250000004 14.827636718750005 14.97418212890625

compute the Lipschitz constant L for g. Hint: consider (19.8).


19.18. Verify the details of Example 19.4.

19.19. (a) Show that g(x) = 23x3 is Lipschitz continuous on [−1/2, 1/2] with

Lipschitz constant L = 1/2. (b) Use the program from Problem 19.5 to compute6 fixed point iterations starting with x0 = .5 and compare to the results inFig. 19.10. (c) Show that the error of xi is approximately the cube of the errorof xi−1 for any i.

19.20. Verify that (19.11) is true.

19.21. (a) Show the root problem f(x) = x2 + x− 6 can be written as the fixedpoint problem g(x) = x with g(x) = 6

x+1. Show that the error of xi decreases at

a linear rate to the fixed point x = 2 when the fixed point iteration convergesto 2 and estimate the convergence factor for xi close to 2. (b) Show the rootproblem f(x) = x2 + x − 6 can be written as the fixed point problem g(x) = x

with g(x) = x2+62x+1

. Show that the error of xi decreases at a quadratic rate to thefixed point x = 2 when the fixed point iteration converges to 2.

19.22. Given the following results of the fixed point iteration applied to a func-tion g(x),

i xi

0 0.500000000000001 0.707106781186552 0.840896415253713 0.917004043204674 0.957603280698575 0.97857206208770

decide if the convergence rate is linear or not.

19.23. The Regula Falsi Method is a variation of the bisection method for com-puting a root of f(x) = 0. For i ≥ 1, assuming f(xi−1) and f(xi) have the oppo-site signs, define xi+1 as the point where the straight line through (xi−1, f(xi−1))and (xi, f(xi)) intersects the x-axis. Write this method as fixed point iteration bygiving an appropriate g(x) and estimate the corresponding convergence factor.

20Analytic Geometry in R

2

Philosophy is written in the great book (by which I mean the Uni-verse) which stands always open to our view, but it cannot be under-stood unless one first learns how to comprehend the language andinterpret the symbols in which it is written, and its symbols are tri-angles, circles, and other geometric figures, without which it is nothumanly possible to comprehend even one word of it; without theseone wanders in a dark labyrinth. (Galileo)

20.1 Introduction

We give a brief introduction to analytic geometry in two dimensions, thatis the linear algebra of the Euclidean plane. Our common school experiencehas given us an intuitive geometric idea of the Euclidean plane as an infiniteflat surface without borders consisting of points, and we also have an in-tuitive geometric idea of geometric objects like straight lines, triangles andcircles in the plane. We brushed up our knowledge and intuition in geometrysomewhat in Chapter Pythagoras and Euclid. We also presented the ideaof using a coordinate system in the Euclidean plane consisting of two per-pendicular copies of Q, where each point in the plane has two coordinates(a1, a2) and we view Q

2 as the set of ordered pairs of rational numbers. Withonly the rational numbers Q at our disposal, we quickly run into troublebecause we cannot compute distances between points in Q

2. For example,the distance between the points (0, 0) and (1, 1), the length of the diago-

266 20. Analytic Geometry in R2

nal of a unit square, is equal to√

2, which is not a rational number. Thetroubles are resolved by using real numbers, that is by extending Q

2 to R2.

In this chapter, we present basic aspects of analytic geometry in theEuclidean plane using a coordinate system identified with R

2, followingthe fundamental idea of Descartes to describe geometry in terms of num-bers. Below, we extend to analytic geometry in three-dimensional Euclideanspace identified with R

3 and we finally generalize to analytic geometry inR

n, where the dimension n can be any natural number. Considering Rn with

n ≥ 4 leads to linear algebra with a wealth of applications outside Euclideangeometry, which we will meet below. The concepts and tools we develop inthis chapter focussed on Euclidean geometry in R

2 will be of fundamentaluse in the generalizations to geometry in R

3 and Rn and linear algebra.

The tools of the geometry of Euclid is the ruler and the compasses, whilethe tool of analytic geometry is a calculator for computing with numbers.Thus we may say that Euclid represents a form of analog technique, whileanalytic geometry is a digital technique based on numbers. Today, the useof digital techniques is exploding in communication and music and all sortsof virtual reality.

20.2 Descartes, Inventor of Analytic Geometry

The foundation of modern science was laid by Rene Descartes (1596–1650)in Discours de la methode pour bien conduire sa raison et chercher la veritedans les sciences from 1637. The Method contained as an appendix LaGeometrie with the first treatment of Analytic Geometry. Descartes be-lieved that only mathematics may be certain, so all must be based onmathematics, the foundation of the Cartesian view of the World.

In 1649 Queen Christina of Sweden persuaded Descartes to go to Stock-holm to teach her mathematics. However the Queen wanted to draw tan-gents at 5 a.m. and Descartes broke the habit of his lifetime of getting upat 11 o’clock, c.f. Fig. 20.1. After only a few months in the cold North-ern climate, walking to the palace at 5 o’clock every morning, he died ofpneumonia.

20.3 Descartes: Dualism of Body and Soul

Descartes set the standard for studies of Body and Soul for a long time withhis De homine completed in 1633, where Descartes proposed a mechanismfor automatic reaction in response to external events through nerve fibrils,see Fig. 20.2. In Descartes’ conception, the rational Soul, an entity distinctfrom the Body and making contact with the body at the pineal gland,might or might not become aware of the differential outflow of animal

20.4 The Euclidean Plane R2 267

Fig. 20.1. Descartes: “The Principle which I have always observed in my studiesand which I believe has helped me the most to gain what knowledge I have,has been never to spend beyond a few hours daily in thoughts which occupy theimagination, and a few hours yearly in those which occupy the understanding,and to give all the rest of my time to the relaxation of the senses and the reposeof the mind”

spirits brought about though the nerve fibrils. When such awareness didoccur, the result was conscious sensation – Body affecting Soul. In turn,in voluntary action, the Soul might itself initiate a differential outflow ofanimal spirits. Soul, in other words, could also affect Body.

In 1649 Descartes completed Les passions de l’ame, with an account ofcausal Soul/Body interaction and the conjecture of the localization of theSoul’s contact with the Body to the pineal gland. Descartes chose the pinealgland because it appeared to him to be the only organ in the brain thatwas not bilaterally duplicated and because he believed, erroneously, thatit was uniquely human; Descartes considered animals as purely physicalautomata devoid of mental states.

20.4 The Euclidean Plane R2

We choose a coordinate system for the Euclidean plane consisting of twostraight lines intersecting at a 90 angle at a point referred to as the origin.One of the lines is called the x1-axis and the other the x2-axis, and eachline is a copy of the real line R. The coordinates of a given point a in theplane is the ordered pair of real numbers (a1, a2), where a1 corresponds tothe intersection of the x1-axis with a line through a parallel to the x2-axis,and a2 corresponds to the intersection of the x2-axis with a line through aparallel to the x1-axis, see Fig. 20.3. The coordinates of the origin are (0, 0).

In this way, we identify each point a in the plane with its coordinates(a1, a2), and we may thus represent the Euclidean plane as R

2, where R2


Fig. 20.2. Automatic reaction in response to external stimulation from DescartesDe homine 1662

is the set of ordered pairs (a1, a2) of real numbers a1 and a2. That is

R2 = (a1, a2) : a1, a2 ∈ R.

We have already used R2 as a coordinate system above when plotting

a function f : R → R, where pairs of real numbers (x, f(x)) are representedas geometrical points in a Euclidean plane on a book-page.

x1

x2

a1

a2(a1, a2)

Fig. 20.3. Coordinate system for R2

To be more precise, we can identify the Euclidean plane with R2, once

we have chosen the (i) origin, and the (ii) direction (iii) scaling of the co-ordinate axes. There are many possible coordinate systems with differentorigins and orientations/scalings of the coordinate axes, and the coordi-nates of a geometrical point depend on the choice of coordinate system.

20.5 Surveyors and Navigators 269

The need to change coordinates from one system to another thus quicklyarises, and will be an important topic below.

Often, we orient the axes so that the x1-axis is horizontal and increas-ing to the right, and the x2-axis is obtained rotating the x1 axis by 90,or a quarter of a complete revolution counter-clockwise, see Fig. 20.3 orFig. 20.4 displaying MATLAB’s view of a coordinate system. The posi-tive direction of each coordinate axis may be indicated by an arrow in thedirection of increasing coordinates.

However, this is just one possibility. For example, to describe the positionof points on a computer screen or a window on such a screen, it is notuncommon to use coordinate systems with the origin at the upper leftcorner and counting the a2 coordinate positive down, negative up.

−0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

x1

x1

(.5, .2)

Fig. 20.4. Matlabs way of visualizing a coordinate system for a plane

20.5 Surveyors and Navigators

Recall our friends the Surveyor in charge of dividing land into properties,and the Navigator in charge of steering a ship. In both cases we assume thatthe distances involved are sufficiently small to make the curvature of theEarth negligible, so that we may view the world as R

2. Basic problems facedby a Surveyor are (s1) to locate points in Nature with given coordinateson a map and (s2) to compute the area of a property knowing its corners.Basic problems of a Navigator are (n1) to find the coordinates on a map ofhis present position in Nature and (n2) to determine the present directionto follow to reach a point of destiny.

We know from Chapter 2 that problem (n1) may be solved using a GPSnavigator, which gives the coordinates (a1, a2) of the current position ofthe GPS-navigator at a press of a button. Also problem (s1) may be solvedusing a GPS-navigator iteratively in an ‘inverse” manner: press the buttonand check where we are and move appropriately if our coordinates are not


the desired ones. In practice, the precision of the GPS-system determinesits usefulness and increasing the precision normally opens a new area ofapplication. The standard GPS with a precision of 10 meters may be OKfor a navigator, but not for a surveyor, who would like to get down tometers or centimeters depending on the scale of the property. Scientistsmeasuring continental drift or beginning landslides, use an advanced formof GPS with a precision of millimeters.

Having solved the problems (s1) and (n1) of finding the coordinates ofa given point in Nature or vice versa, there are many related problems oftype (s2) or (n2) that can be solved using mathematics, such as computingthe area of pieces of land with given coordinates or computing the directionof a piece of a straight line with given start and end points. These areexamples of basic problems of geometry, which we now approach to solveusing tools of analytic geometry or linear algebra.

20.6 A First Glimpse of Vectors

Before entering into analytic geometry, we observe that R2, viewed as the

set of ordered pairs of real numbers, can be used for other purposes thanrepresenting positions of geometric points. For example to describe thecurrent weather, we could agree to write (27, 1013) to describe that thetemperature is 27 C and the air pressure 1013 millibar. We then de-scribe a certain weather situation as an ordered pair of numbers, suchas (27, 1013). Of course the order of the two numbers is critical for theinterpretation. A weather situation described by the pair (1013, 27) withtemperature 1013 and pressure 27, is certainly very different from thatdescribed by (27, 1013) with temperature 27 and pressure 1013.

Having liberated ourselves from the idea that a pair of numbers mustrepresent the coordinates of a point in a Euclidean plane, there are end-less possibilities of forming pairs of numbers with the numbers representingdifferent things. Each new interpretation may be viewed as a new interpre-tation of R

2.In another example related to the weather, we could agree to write

(8, NNE) to describe that the current wind is 8 m/s and headed North-North-East (and coming from South-South-East. Now, NNE is not a realnumber, so in order to couple to R

2, we replace NNE by the correspondingangle, that is by 22.5 counted positive clockwise starting from the Northdirection. We could thus indicate a particular wind speed and directionby the ordered pair (8, 22.5). You are no doubt familiar with the weatherman’s way of visualizing such a wind on the weather map using an arrow.

The wind arrow could also be described in terms of another pair ofparameters, namely by how much it extends to the East and to the Northrespectively, that is by the pair (8 sin(22.5), 8 cos(22.5)) ≈ (3.06, 7.39).

20.7 Ordered Pairs as Points or Vectors/Arrows 271

We could say that 3.06 is the “amount of East”, and 7.39 is the “amountof North” of the wind velocity, while we may say that the wind speed is 8,where we think of the speed as the “absolute value” of the wind velocity(3.06, 7.39). We thus think of the wind velocity as having both a direction,and an “absolute value” or “length”. In this case, we view an ordered pair(a1, a2) as a vector, rather than as a point, and we can then represent thevector by an arrow.

We will soon see that ordered pairs viewed as vectors may be scaledthrough multiplication by a real number and two vectors may also be added.

Addition of velocity vectors can be experienced on a bike where the windvelocity and our own velocity relative to the ground add together to formthe total velocity relative to the surrounding atmosphere, which is reflectedin the air resistance we feel. To compute the total flight time across theAtlantic, the airplane pilot adds the velocity vector of the airplane versusthe atmosphere and the velocity of the jet-stream together to obtain thevelocity of the airplane vs the ground. We will return below to applicationsof analytic geometry to mechanics, including these examples.

20.7 Ordered Pairs as Points or Vectors/Arrows

We have seen that we may interpret an ordered pair of real numbers (a1, a2)as a point a in R

2 with coordinates a1 and a2. We may write a = (a1, a2)for short, and say that a1 is the first coordinate of the point a and a2 thesecond coordinate of a.

We shall also interpret an ordered pair (a1, a2) ∈ R2 in a alternative

way, namely as an arrow with tail at the origin and the head at the pointa = (a1, a2), see Fig. 20.5. With the arrow interpretation of (a1, a2), werefer to (a1, a2) as a vector. Again, we agree to write a = (a1, a2), and wesay that a1 and a2 are the components of the arrow/vector a = (a1, a2).We say that a1 is the first component, occurring in the first place and a2

the second component occurring in the second place.We thus may interpret an ordered pair (a1, a2) in R

2 in two ways: asa point with coordinates (a1, a2), or as an arrow/vector with components(a1, a2) starting at the origin and ending at the point (a1, a2). Evidently,there is a very strong connection between the point and arrow interpreta-tions, since the head of the arrow is located at the point (and assuming thatthe arrow tail is at the origin). In applications, positions will be connectedto the point interpretation and velocities and forces will be connected tothe arrow/vector interpretation. We will below generalize the arrow/vectorinterpretation to include arrows with tails also at other points than theorigin. The context will indicate which interpretation is most appropriatefor a given situation. Often the interpretation of a = (a1, a2) as a pointor as an arrow, changes without notice. So we have to be flexible and use


x1

x2

a1

a2

a

(a1, a2)

Fig. 20.5. A vector with tail at the origin and the head at the point a = (a1, a2)

whatever interpretation is most convenient or appropriate. We will needeven more fantasy when we go into applications to mechanics below.

Sometimes vectors like a = (a1, a2) are marked by boldface or an arrow,like a or a or a, or double script or some other notation. We prefer notto use this more elaborate notation, which makes the writing simpler, butrequires fantasy from the user to make the proper interpretation of forexample the letter a as a scalar number or vector a = (a1, a2) or somethingelse.

20.8 Vector Addition

We now proceed to define addition of vectors in R2, and multiplication of

vectors in R2 by real numbers. In this context, we interpret R

2 as a set ofvectors represented by arrows with tail at the origin.

Given two vectors a = (a1, a2) and b = (b1, b2) in R2, we use a + b to

denote the vector (a1+b1, a2+b2) in R2 obtained by adding the components

separately. We call a+b the sum of a and b obtained through vector addition.Thus if a = (a1, a2) and b = (b1, b2) are given vectors in R

2, then

(a1, a2) + (b1, b2) = (a1 + b1, a2 + b2), (20.1)

which says that vector addition is carried out by adding components sepa-rately. We note that a+b = b+a since a1+b1 = b1+a1 and a2+b2 = b2+a2.We say that 0 = (0, 0) is the zero vector since a + 0 = 0 + a = a for anyvector a. Note the difference between the vector zero and its two zero com-ponents, which are usually scalars.

Example 20.1. We have (2, 5) + (7, 1) = (9, 6) and (2.1, 5.3) + (7.6, 1.9) =(9.7, 7.2).

20.9 Vector Addition and the Parallelogram Law 273

20.9 Vector Addition and the Parallelogram Law

We may represent vector addition geometrically using the ParallelogramLaw as follows. The vector a+b corresponds to the arrow along the diagonalin the parallelogram with two sides formed by the arrows a and b displayedin Fig. 20.6. This follows by noting that the coordinates of the head of a+bis obtained by adding the coordinates of the points a and b separately. Thisis illustrated in Fig. 20.6.

This definition of vector addition implies that we may reach the point(a1 + b1, a2 + b2) by walking along arrows in two different ways. First,we simply follow the arrow (a1 + b1, a2 + b2) to its head, correspondingto walking along the diagonal of the parallelogram formed by a and b.Secondly, we could follow the arrow a from the origin to its head at thepoint (a1, a2) and then continue to the head of the arrow b parallel to b andof equal length as b with tail at (a1, a2). Alternative, we may follow thearrow b from the origin to its head at the point (b1, b2) and then continueto the head of the arrow a parallel to a and of equal length as a with tailat (b1, b2). The three different routes to the point (a1 + b1, a2 + b2) aredisplayed in Fig. 20.6.

a1 b1

a

b

a

b

a+b

Fig. 20.6. Vector addition using the Parallelogram Law

We sum up in the following theorem:

Theorem 20.1 Adding two vectors a = (a1, a2) and b = (b1, b2) in R2 to

get the sum a + b = (a1 + b1, a2 + b2) corresponds to adding the arrows aand b using the Parallelogram Law.

In particular, we can write a vector as the sum of its components in thecoordinate directions as follows, see Fig. 20.7.

(a1, a2) = (a1, 0) + (0, a2). (20.2)


a

(a1, 0)

(0, a2)

Fig. 20.7. A vector a represented as the sum of two vectors parallel with thecoordinate axes

20.10 Multiplication of a Vector by a Real Number

Given a real number λ and a vector a = (a1, a2) ∈ R2, we define a new

vector λa ∈ R2 by

λa = λ(a1, a2) = (λa1, λa2). (20.3)

For example, 3 (1.1, 2.3) = (3.3, 6.9). We say that λa is obtained by multi-plying the vector a = (a1, a2) by the real number λ and call this operationmultiplication of a vector by a scalar. Below we will meet other types ofmultiplication connected with scalar product of vectors and vector productof vectors, both being different from multiplication of a vector by a scalar.

We define −a = (−1)a = (−a1,−a2) and a− b = a+(−b). We note thata− a = a+ (−a) = (a1 − a1, a2 − a2) = (0, 0) = 0. We give an example inFig. 20.8.

a

b

−b

0.7a− b

Fig. 20.8. The sum 0.7a − b of the multiples 0.7a and (−1)b of a and b

20.11 The Norm of a Vector 275

20.11 The Norm of a Vector

We define the Euclidean norm |a| of a vector a = (a1, a2) ∈ R2 as

|a| =(a21 + a2

2

)1/2. (20.4)

By Pythagoras theorem and Fig. 20.9, the Euclidean norm |a| of the vectora = (a1, a2) is equal to the length of the hypothenuse of the right angledtriangle with sides a1 and a2. In other words, the Euclidean norm of thevector a = (a1, a2) is equal to the distance from the origin to the pointa = (a1, a2), or simply the length of the arrow (a1, a2). We have |λa| = |λ||a|if λ ∈ R and a ∈ R

2; multiplying a vector by the real number λ changes thenorm of the vector by the factor |λ|. The zero vector (0, 0) has Euclideannorm 0 and if a vector has Euclidean norm 0 then it must be the zerovector.

a

a1

a2(a1, a2)

|a| = (a21 + a2

2)1/2

Fig. 20.9. The norm |a| of a vector a = (a1, a2) is |a| = (a21 + a2

2)1/2

The Euclidean norm of a vector measures the “length” or “size” of thevector. There are many possible ways to measure the “size” of a vectorcorresponding to using different norms. We will meet several alternativenorms of a vector a = (a1, a2) below, such as |a1| + |a2| or max(|a1|, |a2|).We used |a1| + |a2| in the definition of Lipschitz continuity of f : R

2 → R

above.

Example 20.2. If a = (3, 4) then |a| =√

9 + 16 = 5, and |2a| = 10.

20.12 Polar Representation of a Vector

The points a = (a1, a2) in R2 with |a| = 1, corresponding to the vectors a

of Euclidean norm equal to 1, form a circle with radius equal to 1 centeredat the origin which we call the unit circle, see Fig. 20.10.


Each point a on the unit circle can be written a = (cos(θ), sin(θ)) forsome angle θ, which we refer to as the angle of direction or direction of thevector a. This follows from the definition of cos(θ) and sin(θ) in ChapterPythagoras and Euclid, see Fig. 20.10

1

1

a

x1

x2

a1

a2

θ

|a| = 1a1 = cos(θ)a2 = sin(θ)

Fig. 20.10. Vectors of length one are given by (cos(θ), sin(θ))

Any vector a = (a1, a2) = (0, 0) can be expressed as

a = |a|a = r(cos(θ), sin(θ)), (20.5)

where r = |a| is the norm of a, a = (a1/|a|, a2/|a|) is a vector of length one,and θ is the angle of direction of a, see Fig. 20.11. We call (20.5) the polarrepresentation of a. We call θ the direction of a and r the length of a, seeFig. 20.11.

r

r

a

x1

x2

a1

a2

θ

|a| = ra1 = r cos(θ)a2 = r sin(θ)

Fig. 20.11. Vectors of length r are given by a = r(cos(θ), sin(θ)) =(r cos(θ), r sin(θ)) where r = |a|

20.13 Standard Basis Vectors 277

We see that if b = λa, where λ > 0 and a = 0, then b has the samedirection as a. If λ < 0 then b has the opposite direction. In both cases,the norms change with the factor |λ|; we have |b| = |λ||a|.

If b = λa, where λ = 0 and a = 0, then we say that the vector b is parallelto a. Two parallel vectors have the same or opposite directions.

Example 20.3. We have

(1, 1) =√

2(cos(45), sin(45)) and (−1, 1) =√

2(cos(135), sin(135)).

20.13 Standard Basis Vectors

We refer to the vectors e1 = (1, 0) and e2 = (0, 1) as the standard basisvectors in R

2. A vector a = (a1, a2) can be expressed in term of the basisvectors e1 and e2 as

a = a1e1 + a2e2,

since

a1e1 + a2e2 = a1(1, 0) + a2(0, 1) = (a1, 0) + (0, a2) = (a1, a2) = a.

a

e1

e2

(a1, a2)

a = a1e1 + a2e2

Fig. 20.12. The standard basis vectors e1 and e2 and a linear combinationa = (a1, a2) = a1e1 + a2e2 of e1 and e2

We say that a1e1 + a2e2 is a linear combination of e1 and e2 with coef-ficients a1 and a2. Any vector a = (a1, a2) in R

2 can thus be expressed asa linear combination of the basis vectors e1 and e2 with the coordinates a1

and a2 as coefficients, see Fig. 20.12.

Example 20.4. We have (3, 7) = 3 (1, 0) + 7 (0, 1) = 3e1 + 7e2.


20.14 Scalar Product

While adding vectors to each other and scaling a vector by a real numbermultiplication have natural interpretations, we shall now introduce a (first)product of two vectors that is less motivated at first sight.

Given two vectors a = (a1, a2) and b = (b1, b2) in R2, we define their

scalar product a · b by

a · b = a1b1 + a2b2. (20.6)

We note, as the terminology suggests, that the scalar product a · b of twovectors a and b in R

2 is a scalar, that is a number in R, while the factors aand b are vectors in R

2. Note also that forming the scalar product of twovectors involves not only multiplication, but also a summation!

We note the following connection between the scalar product and thenorm:

|a| = (a · a) 12 . (20.7)

Below we shall define another type of product of vectors where also theproduct is a vector. We shall thus consider two different types of productsof two vectors, which we will refer to as the scalar product and the vectorproduct respectively. At first when limiting our study to vectors in R

2, wemay also view the vector product to be a single real number. However,the vector product in R

3 is indeed a vector in R3. (Of course, there is also

the (trivial) “componentwise” vector product like MATLAB’s a. ∗ b =(a1b1, a2b2).)

We may view the scalar product as a function f : R2 × R

2 → R wheref(a, b) = a · b. To each pair of vectors a ∈ R

2 and b ∈ R2, we associate

the number f(a, b) = a · b ∈ R. Similarly we may view summation of twovectors as a function f : R

2×R2 → R

2. Here, R2×R

2 denotes the set of allordered pairs (a, b) of vectors a = (a1, a2) and b = (b1, b2) in R

2 of course.

Example 20.5. We have (3, 7) · (5, 2) = 15 + 14 = 29, and (3, 7) · (3, 7) =9 + 49 = 58 so that |(3, 7)| =

√58.

20.15 Properties of the Scalar Product

The scalar product a · b in R2 is linear in each of the arguments a and b,

that is

a · (b+ c) = a · b+ a · c,(a+ b) · c = a · c+ b · c,

(λa) · b = λ a · b, a · (λb) = λ a · b,

20.16 Geometric Interpretation of the Scalar Product 279

for all a, b ∈ R2 and λ ∈ R. This follows directly from the definition (20.6).

For example, we have

a · (b+ c) = a1(b1 + c1) + a2(b2 + c2)= a1b1 + a2b2 + a1c1 + a2c2 = a · b+ a · c.

Using the notation f(a, b) = a · b, the linearity properties may be writtenas

f(a, b+ c) = f(a, b) + f(a, c), f(a+ b, c) = f(a, c) + f(b, c),f(λa, b) = λf(a, b) f(a, λb) = λf(a, b).

We also say that the scalar product a · b = f(a, b) is a bilinear form onR

2 ×R2, that is a function from R

2 ×R2 to R, since a · b = f(a, b) is a real

number for each pair of vectors a and b in R2 and a · b = f(a, b) is linear

both in the variable (or argument) a and the variable b. Furthermore, thescalar product a · b = f(a, b) is symmetric in the sense that

a · b = b · a or f(a, b) = f(b, a),

and positive definite, that is

a · a = |a|2 > 0 for a = 0 = (0, 0).

We may summarize by saying that the scalar product a · b = f(a, b) isa bilinear symmetric positive definite form on R

2 × R2.

We notice that for the basis vectors e1 = (1, 0) and e2 = (0, 1), we have

e1 · e2 = 0, e1 · e1 = 1, e2 · e2 = 1.

Using these relations, we can compute the scalar product of two arbitraryvectors a = (a1, a2) and b = (b1, b2) in R

2 using the linearity as follows:

a · b = (a1e1 + a2e2) · (b1e1 + b2e2)= a1b1 e1 · e1 + a1b2 e1 · e2 + a2b1 e2 · e1 + a2b2 e2 · e2 = a1b1 + a2b2.

We may thus define the scalar product by its action on the basis vectorsand then extend it to arbitrary vectors using the linearity in each variable.

20.16 Geometric Interpretationof the Scalar Product

We shall now prove that the scalar product a · b of two vectors a and b inR

2 can be expressed asa · b = |a||b| cos(θ), (20.8)


where θ is the angle between the vectors a and b, see Fig. 20.13. This formulahas a geometric interpretation. Assuming that |θ| ≤ 90 so that cos(θ) ispositive, consider the right-angled triangle OAC shown in Fig. 20.13. Thelength of the side OC is |a| cos(θ) and thus a·b is equal to the product of thelengths of sides OC and OB. We will refer to OC as the projection of OAonto OB, considered as vectors, and thus we may say that a · b is equal tothe product of the length of the projection of OA onto OB and the lengthof OB. Because of the symmetry, we may also relate a · b to the projectionof OB onto OA, and conclude that a · b is also equal to the product of thelength of the projection of OB onto OA and the length of OA.

ab

O

AB

Cθ

Fig. 20.13. a · b = |a| |b| cos(θ)

To prove (20.8), we write using the polar representation

a = (a1, a2) = |a|(cos(α), sin(α)), b = (b1, b2) = |b|(cos(β), sin(β)),

where α is the angle of the direction of a and β is the angle of direction of b.Using a basic trigonometric formula from Chapter Pythagoras and Euclid,we see that

a · b = a1b1 + a2b2 = |a||b|(cos(α) cos(β) + sin(α) sin(β))= |a||b| cos(α − β) = |a||b| cos(θ),

where θ = α − β is the angle between a and b. Note that since cos(θ) =cos(−θ), we may compute the angle between a and b as α− β or β − α.

20.17 Orthogonality and Scalar Product

We say that two non-zero vectors a and b in R2 are geometrically orthogonal,

which we write as a ⊥ b, if the angle between the vectors is 90 or 270,

20.18 Projection of a Vector onto a Vector 281

a

b

Fig. 20.14. Orthogonal vectors a and b

see Fig. 20.14. The basis vectors e1 and e2 are examples of geometricallyorthogonal vectors, see Fig. 20.12.

Let a and b be two non-zero vectors making an angle θ. From (20.8), wehave a · b = |a||b| cos(θ) and thus a · b = 0 if and only if cos(θ) = 0, that is,if and only if θ = 90 or θ = 270. We have now proved the following basicresult, which we state as a theorem.

Theorem 20.2 Two non-zero vectors a and b are geometrically orthogonalif and only if a · b = 0.

This result fits our experience in the chapter Pythagoras and Euclid,where we saw that the angle OAB formed by two line segments extend-ing from the origin O out to the points A = (a1, a2) and B = (b1, b2)respectively is a right angle if and only if

a1b1 + a2b2 = 0.

Summing up, we have translated the geometric condition of two vectorsa = (a1, a2) and b = (b1, b2) being geometrically orthogonal to the algebraiccondition a · b = a1b1 + a2b2 = 0.

Below, in a more general context we will turn this around and define twovectors a and b to be orthogonal if a · b = 0, where a · b is the scalar productof a and b. We have just seen that this algebraic definition of orthogo-nality may be viewed as an extension of our intuitive idea of geometricorthogonality in R

2. This follows the basic principle of analytic geometryof expressing geometrical relations in algebraic terms.

20.18 Projection of a Vector onto a Vector

The concept of projection is basic in linear algebra. We will now meet thisconcept for the first time and will use it in many different contexts below.


ab

c

d

Fig. 20.15. Orthogonal decomposition of b

Let a = (a1, a2) and b = (b1, b2) be two non-zero vectors and considerthe following fundamental problem: Find vectors c and d such that c isparallel to a, d is orthogonal to a, and c+d = b, see Fig. 20.15. We refer tob = c + d as an orthogonal decomposition of b. We refer to the vector c asthe projection of b in the direction of a, or the projection of b onto a, andwe use the notation Pa(b) = c. We can then express the decomposition of bas b = Pa(b)+ (b−Pa(b)), with c = Pa(b) and d = b−Pa(b). The followingproperties of the decomposition are immediate:

Pa(b) = λa for some λ ∈ R,

(b − Pa(b)) · a = 0.

Inserting the first equation into the second, we get the equation (b − λa)·a = 0 in λ, which we solve to get

λ =b · aa · a =

b · a|a|2 ,

and conclude that the projection Pa(b) of b onto a is given by

Pa(b) =b · a|a|2 a. (20.9)

We compute the length of Pa(b) as

|Pa(b)| =|a · b||a|2 |a| =

|a| |b| | cos(θ)||a| = |b|| cos(θ)|, (20.10)

where θ is the angle between a and b, and we use (20.8). We note that

|a · b| = |a| |Pb|, (20.11)

which conforms with our experience with the scalar product in Sect. 20.16,see also Fig. 20.15.

20.19 Rotation by 90 283

ab

θ Pa(b)

|Pa(b)| = |b| | cos(θ)| = |b · a|/|a|

Fig. 20.16. The projection Pa(b) of b onto a

We can view the projection Pa(b) of the vector b onto the vector a asa transformation of R

2 into R2: given the vector b ∈ R

2, we define thevector Pa(b) ∈ R

2 by the formula

Pa(b) =b · a|a|2 a. (20.12)

We write for short Pb = Pa(b), suppressing the dependence on a and theparenthesis, and note that the mapping P : R

2 → R2 defined by x → Px

is linear. We have

P (x+ y) = Px+ Py, P (λx) = λPx, (20.13)

for all x and y in R2 and λ ∈ R (where we changed name of the independent

variable from b to x or y), see Fig. 20.17.We note that P (Px) = Px for all x ∈ R

2. This could also be expressedas P 2 = P , which is a characteristic property of a projection. Projectinga second time doesn’t change anything!

We sum up:

Theorem 20.3 The projection x → Px = Pa(x) onto a given nonzerovector a ∈ R

2 is a linear mapping P : R2 → R

2 with the property thatPP = P .

Example 20.6. If a = (1, 3) and b = (5, 2), then Pa(b) = (1,3)·(5,2)1+32 (1, 3) =

(1.1, 3.3).

20.19 Rotation by 90

We saw above that to find the orthogonal decomposition b = c+ d with cparallel to a given vector a, it suffices to find c because d = b− c. Alterna-tively, we could seek to first compute d from the requirement that it should


a

x

λx

Px

P (λx) = λPx

Fig. 20.17. P (λx) = λPx

be orthogonal to a. We are thus led to the problem of finding a directionorthogonal to a given direction, that is the problem of rotating a givenvector by 90, which we now address.

Given a vector a = (a1, a2) in R2, a quick computation shows that the

vector (−a2, a1) has the desired property, because computing its scalarproduct with a = (a1, a2) gives

(−a2, a1) · (a1, a2) = (−a2)a1 + a1a2 = 0,

and thus (−a2, a1) is orthogonal to (a1, a2). Further, it follows directly thatthe vector (−a2, a1) has the same length as a.

Assuming that a = |a|(cos(α), sin(α)) and using the facts that − sin(α) =cos(α+ 90) and cos(α) = sin(α+ 90), we see that the vector (−a2, a1) =|a|(cos(α + 90), sin(α + 90)) is obtained by rotating the vector (a1, a2)counter-clockwise 90, see Fig. 20.18. Similarly, the vector (a2,−a1) =−(−a2, a1) is obtained by clockwise rotation of (a1, a2) by 90.

(a1, a2)

(−a2, a1)

Fig. 20.18. Counter-clockwise rotation of a = (a1, a2) by 90

20.20 Rotation by an Arbitrary Angle θ 285

We may view the counter clockwise rotation of a vector by 90 as a trans-formation of vectors: given a vector a = (a1, a2), we obtain another vectora⊥ = f(a) through the formula

a⊥ = f(a) = (−a2, a1),

where we denoted the image of the vector a by both a⊥ and f(a). Thetransformation a→ a⊥ = f(a) defines a linear function f : R

2 → R2 since

f(a+ b) = (−(a2 + b2), a1 + b1) = (−a2, a1) + (−b2, b1) = f(a) + f(b),f(λa) = (−λa2, λa1) = λ(−a2, a1) = λf(a).

To specify the action of a→ a⊥ = f(a) on an arbitrary vector a, it sufficesto specify the action on the basis vectors e1 and e2:

e⊥1 = f(e1) = (0, 1) = e2, e⊥2 = f(e2) = (−1, 0) = −e1,

since by linearity, we may compute

a⊥ = f(a) = f(a1e1 + a2e2) = a1f(e1) + a2f(e2)= a1(0, 1) + a2(−1, 0) = (−a2, a1).

Example 20.7. Rotating the vector (1, 2) the angle 90 counter-clockwise,we get the vector (−2, 1).

20.20 Rotation by an Arbitrary Angle θ

We now generalize to counter-clockwise rotation by an arbitrary angle θ.Let a = |a|(cos(α), sin(α)) in R

2 be a given vector. We seek a vector Rθ(a)in R

2 of equal length obtained by rotating a the angle θ counter-clockwise.By the definition of the vector Rθ(a) as the vector a = |a|(cos(α), sin(α))rotated by θ, we have

Rθ(a) = |a|(cos(α + θ), sin(α+ θ)).

Using the standard trigonometric formulas from Chapter Pythagoras andEuclid,

cos(α+ θ) = cos(α) cos(θ) − sin(α) sin(θ),sin(α+ θ) = sin(α) cos(θ) + cos(α) sin(θ),

we can write the formula for the rotated vector Rθ(a) as

Rθ(a) = (a1 cos(θ) − a2 sin(θ), a1 sin(θ) + a2 cos(θ)). (20.14)

We may view the counter-clockwise rotation of a vector by the angle θ asa transformation of vectors: given a vector a = (a1, a2), we obtain another


vector Rθ(a) by rotation by θ according to the above formula. Of course,we may view this transformation as a function Rθ : R

2 → R2. It is easy

to verify that this function is linear. To specify the action of Rθ on anarbitrary vector a, it suffices to specify the action on the basis vectors e1and e2,

Rθ(e1) = (cos(θ), sin(θ)), Rθ(e2) = (− sin(θ), cos(θ)).

The formula (20.14) may then be obtained using linearity,

Rθ(a) = Rθ(a1e1 + a2e2) = a1Rθ(e1) + a2Rθ(e2)= a1(cos(θ), sin(θ)) + a2(− sin(θ), cos(θ))= (a1 cos(θ) − a2 sin(θ), a1 sin(θ) + a2 cos(θ)).

Example 20.8. Rotating the vector (1, 2) the angle 30, we obtain thevector (cos(30) − 2 sin(30), sin(30) + 2 cos(30)) = (

√3

2 − 1, 12 +

√3).

20.21 Rotation by θ Again!

We present yet another way to arrive at (20.14) based on the idea that thetransformation Rθ : R

2 → R2 of counter-clockwise rotation by θ is defined

by the following properties,

(i) |Rθ(a)| = |a|, and (ii) Rθ(a) · a = cos(θ)|a|2. (20.15)

Property (i) says that rotation preserves the length and (ii) connects thechange of direction to the scalar product. We now seek to determine Rθ(a)from (i) and (ii). Given a ∈ R

2, we set a⊥ = (−a2, a1) and express Rθ(a) asRθ(a) = αa+βa⊥ with appropriate real numbers α and β. Taking the scalarproduct with a and using a · a⊥ = 0, we find from (ii) that α = cos(θ).Next, (i) states that |a|2 = |Rθ(a)|2 = (α2 + β2)|a|2, and we concludethat β = ± sin(θ) and thus finally β = sin(θ) using the counter-clockwiseorientation. We conclude that

Rθ(a) = cos(θ)a+ sin(θ)a⊥ = (a1 cos(θ) − a2 sin(θ), a1 sin(θ) + a2 cos(θ)),

and we have recovered (20.14).

20.22 Rotating a Coordinate System

Suppose we rotate the standard basis vectors e1 = (1, 0) and e2 = (0, 1)counter-clockwise the angle θ to get the new vectors e1 = cos(θ)e1+sin(θ)e2and e2 = − sin(θ)e1 +cos(θ)e2. We may use e1 and e2 as an alternative co-ordinate system, and we may seek the connection between the coordinates

20.23 Vector Product 287

of a given vector (or point) in the old and new coordinate system. Letting(a1, a2) be the coordinates in the standard basis e1 and e2, and (a1, a2) thecoordinates in the new basis e1 and e2, we have

a1e1 + a2e2 = a1e1 + a2e2

= a1(cos(θ)e1 + sin(θ)e2) + a2(− sin(θ)e1 + cos(θ)e2)= (a1 cos(θ) − a2 sin(θ))e1 + (a1 sin(θ) + a2 cos(θ))e2,

so the uniqueness of coordinates with respect to e1 and e2 implies

a1 = cos(θ)a1 − sin(θ)a2,

a2 = sin(θ)a1 + cos(θ)a2.(20.16)

Since e1 and e2 are obtained by rotating e1 and e2 clockwise by the angle θ,

a1 = cos(θ)a1 + sin(θ)a2,

a2 = − sin(θ)a1 + cos(θ)a2.(20.17)

The connection between the coordinates with respect to the two coordinatesystems is thus given by (20.16) and (20.17).

Example 20.9. Rotating 45 counter-clockwise gives the following relationbetween new and old coordinates

a1 =1√2(a1 + a2), a2 =

1√2(−a1 + a2).

20.23 Vector Product

We now define the vector product a × b of two vectors a = (a1, a2) andb = (b1, b2) in R

2 by the formula

a× b = a1b2 − a2b1. (20.18)

The vector product a× b is also referred to as the cross product because ofthe notation used (don’t mix up with the “×” in the “product set” R

2×R2

which has a different meaning). The vector product or cross product maybe viewed as a function R

2 × R2 → R. This function is bilinear as is easy

to verify, and anti-symmetric, that is

a× b = −b× a, (20.19)

which is a surprising property for a product.Since the vector product is bilinear, we can specify the action of the

vector product on two arbitrary vectors a and b by specifying the actionon the basis vectors,

e1 × e1 = 0, e2 × e2 = 0, e1 × e2 = 1, e2 × e1 = −1. (20.20)


Using these relations,

a× b = (a1e1 + a2e2) × (b1e1 + b2e2) = a1b2e1 × e2 + a2b1e2 × e1

= a1b2 − a2b1e2.

We next show that the properties of bilinearity and anti-symmetry infact determine the vector product in R

2 up to a constant. First note thatanti-symmetry and bilinearity imply

e1 × e1 + e1 × e2 = e1 × (e1 + e2) = −(e1 + e2) × e1

= −e1 × e1 − e2 × e1.

Since e1 × e2 = −e2 × e1, we have e1 × e1 = 0. Similarly, we see thate2× e2 = 0. We conclude that the action of the vector product on the basisvectors is indeed specified according to (20.20) up to a constant.

We next observe that

a× b = (−a2, a1) · (b1, b2) = a1b2 − a2b1,

which gives a connection between the vector product a× b and the scalarproduct a⊥ ·b with the 90 counter-clockwise rotated vector a⊥ = (−a2, a1).We conclude that the vector product a × b of two nonzero vectors a andb is zero if and only if a and b are parallel. We state this basic result asa theorem.

Theorem 20.4 Two nonzero vectors a and b are parallel if and only ifa× b = 0.

We can thus check if two non-zero vectors a and b are parallel by check-ing if a× b = 0. This is another example of translating a geometric condi-tion (two vectors a and b being parallel) into an algebraic condition(a× b = 0).

We now squeeze more information from the relation a×b = a⊥·b assumingthat the angle between a and b is θ and thus the angle between a⊥ and bis θ + 90:

|a× b| =∣∣a⊥ · b

∣∣ =

∣∣a⊥

∣∣ |b|

∣∣∣cos

(θ +

π

2

)∣∣∣

= |a| |b| | sin(θ)|,

where we use |a⊥| = |a| and | cos(θ ± π/2)| = | sin(θ)|. Therefore,


|a× b| = |a||b|| sin(θ)|, (20.21)

where θ = α− β is the angle between a and b, see Fig. 20.19.

a

bθ

(−a2, a1)

Fig. 20.19. Why |a× b| = |a| |b| | sin(θ)|

We can make the formula (20.21) more precise by removing the absolutevalues around a×b and the sine factor if we adopt a suitable sign convention.This leads to the following more developed version of (20.21), which westate as a theorem, see Fig. 20.20.

a

bθ

Fig. 20.20. a× b = |a| |b| sin(θ) is negative here because the angle θ is negative

Theorem 20.5 For two non-zero vectors a and b,

a× b = |a||b| sin(θ), (20.22)

where θ is the angle between a and b counted positive counter-clockwise andnegative clockwise starting from a.


20.24 The Area of a Triangle with a Cornerat the Origin

Consider a triangle OAB with corners at the origin O and the points A =(a1, a2) and B = (b1, b2) formed by the vectors a = (a1, a2) and b = (b1, b2),see Fig. 20.21. We say that the triangle OAB is spanned by the vectors aand b. We are familiar with the formula that states that the area of thistriangle can be computed as the base |a| times the height |b|| sin(θ)| timesthe factor 1

2 , where θ is the angle between a and b, see Fig. 20.21. Recalling(20.21), we conclude

Theorem 20.6

Area(OAB) =12|a| |b| | sin(θ)| =

12|a× b|.

θa

b

A

B

O

|b| sin(θ)

Fig. 20.21. The vectors a and b span a triangle with area 12|a× b|

The area of the triangle OAB can be computed using the vector productin R

2.

20.25 The Area of a General Triangle

Consider a triangle CAB with corners at the points C = (c1, c2), A =(a1, a2) and B = (b1, b2). We consider the problem of computing the areaof the triangle CAB. We solved this problem above in the case C = Owhere O is the origin. We may reduce the present case to that case bychanging coordinate system as follows. Consider a new coordinate systemwith origin at C = (c1, c2) and with a x1-axis parallel to the x1-axis and ax2-axis parallel to the x2-axis, see Fig. 20.22.

20.26 The Area of a Parallelogram Spanned by Two Vectors 291

Ox1

x2

x1

x2

a−c

b−c

C = (c1, c2)

A = (a1, a2)

B = (b1, b2)

θ

Fig. 20.22. Vectors a− c and b− c span triangle with area 12|(a− c) × (b− c)|

Letting (a1, a2) denote the coordinates with respect to the new coordi-nate system, the new are related to the old coordinates by

a1 = a1 − c1, a2 = a2 − c2.

The coordinates of the points A, B and C in the new coordinate systemare thus (a1 − c1, a2− c2) = a− c, (b1− c1, b2− c2) = b− c and (0, 0). Usingthe result from the previous section, we find the area of the triangle CABby the formula

Area(CAB) =12|(a− c) × (b − c)| . (20.23)

Example 20.10. The area of the triangle with coordinates at A = (2, 3),B = (−2, 2) and C = (1, 1), is given by Area(CAB) = 1

2 |(1, 2)× (−3, 1)| =72 .

20.26 The Area of a Parallelogram Spannedby Two Vectors

The area of the parallelogram spanned by a and b, as shown in Fig. 20.23,is equal to |a × b| since the area of the parallelogram is twice the area ofthe triangle spanned by a and b. Denoting the area of the parallelogramspanned by the vectors a and b by V (a, b), we thus have the formula

V (a, b) = |a× b|. (20.24)

This is a fundamental formula which has important generalizations to R3

and Rn.


a

b

θ

|b| sin(θ)

Fig. 20.23. The vectors a and b span a rectangle with area |a×b| = |a| |b| sin(θ)|

20.27 Straight Lines

The points x = (x1, x2) in the plane R2 satisfying a relation of the form

n1x1 + n2x2 = n · x = 0, (20.25)

where n = (n1, n2) ∈ R2 is a given non-zero vector, form a straight line

through the origin that is orthogonal to (n1, n2), see Fig. 20.24. We saythat (n1, n2) is a normal to the line. We can represent the points x ∈ R

2

on the line in the formx = sn⊥, s ∈ R,

where n⊥ = (−n2, n1) is orthogonal to n, see Fig. 20.24. We state thisinsight as a theorem because of its importance.

x

n n

x1 x2

x2x2

n⊥ x = sn⊥

Fig. 20.24. Vectors x = sa with b orthogonal to a given vector n generate a linethrough the origin with normal a

20.27 Straight Lines 293

Theorem 20.7 A line in R2 passing through the origin with normal n ∈

R2, may be expressed as either the points x ∈ R

2 satisfying n · x = 0, orthe set of points of the form x = sn⊥ with n⊥ ∈ R

2 orthogonal to n ands ∈ R.

Similarly, the set of points (x1, x2) in R2 such that

n1x1 + n2x2 = d, (20.26)

where n = (n1, n2) ∈ R2 is a given non-zero vector and d is a given constant,

represents a straight line that does not pass through the origin if d = 0.We see that n is a normal to the line, since if x and x are two points on theline then (x− x) · n = d− d = 0, see Fig. 20.25. We may define the line asthe points x = (x1, x2) in R

2, such that the projection n·x|n|2n of the vector

x = (x1, x2) in the direction of n is equal to d|n|2n. To see this, we use the

definition of the projection and the fact n · x = d.

O x1

x2

n

n⊥

x

x = (x1, x2)

Fig. 20.25. The line through the point x with normal n generated by directionalvector a

The line in Fig. 20.24 can also be represented as the set of points

x = x+ sn⊥ s ∈ R,

where x is any point on the line (thus satisfying n · x = d). This is becauseany point x of the form x = sn⊥ + x evidently satisfies n · x = n · x = d.We sum up in the following theorem.

Theorem 20.8 The set of points x ∈ R2 such that n ·x = d, where n ∈ R

2

is a given non-zero vector and d is given constant, represents a straight linein R

2. The line can also be expressed in the form x = x + sn⊥ for s ∈ R,where x ∈ R

2 is a point on the line.

Example 20.11. The line x1 + 2x2 = 3 can alternatively be expressed asthe set of points x = (1, 1) + s(−2, 1) with s ∈ R.


20.28 Projection of a Point onto a Line

Let n · x = d represent a line in R2 and let b be a point in R

2 that doesnot lie on the line. We consider the problem of finding the point Pb on theline which is closest to b, see Fig. 20.27. This is called the projection of thepoint b onto the line. Equivalently, we seek a point Pb on the line such thatb− Pb is orthogonal to the line, that is we seek a point Pb such that

n · Pb = d (Pb is a point on the line),b− Pb is parallel to the normal n, (b − Pb = λn, for some λ ∈ R).

We conclude that Pb = b − λn and the equation n · Pb = d thus givesn · (b− λn) = d, that is λ = b·n−d

|n|2 and so

Pb = b− b · n− d

|n|2 n. (20.27)

If d = 0, that is the line n · x = d = 0 passes through the origin, then (seeProblem 20.26)

Pb = b− b · n|n|2 n. (20.28)

20.29 When Are Two Lines Parallel?

Leta11x1 + a12x2 = b1,a21x1 + a22x2 = b2,

be two straight lines in R2 with normals (a11, a12) and (a21, a22). How do

we know if the lines are parallel? Of course, the lines are parallel if andonly if their normals are parallel. From above, we know the normals areparallel if and only if

(a11, a12) × (a21, a22) = a11a22 − a12a21 = 0,

and consequently non-parallel (and consequently intersecting at somepoint) if and only if

(a11, a12) × (a21, a22) = a11a22 − a12a21 = 0, (20.29)

Example 20.12. The two lines 2x1 + 3x2 = 1 and 3x1 + 4x2 = 1 arenon-parallel because 2 · 4 − 3 · 3 = 8 − 9 = −1 = 0.

20.30 A System of Two Linear Equations in Two Unknowns 295

20.30 A System of Two Linear Equationsin Two Unknowns

If a11x1 + a12x2 = b1 and a21x1 + a22x2 = b2 are two straight lines in R2

with normals (a11, a12) and (a21, a22), then their intersection is determinedby the system of linear equations

a11x1 + a12x2 = b1,a21x1 + a22x2 = b2,

(20.30)

which says that we seek a point (x1, x2) ∈ R2 that lies on both lines. This

is a system of two linear equations in two unknowns x1 and x2, or a 2× 2-system. The numbers aij , i, j = 1, 2 are the coefficients of the system andthe numbers bi, i = 1, 2, represent the given right hand side.

If the normals (a11, a12) and (a21, a22) are not parallel or by (20.29),a11a22 − a12a21 = 0, then the lines must intersect and thus the system(20.30) should have a unique solution (x1, x2). To determine x1, we multiplythe first equation by a22 to get

a11a22x1 + a12a22x2 = b1a22.

We then multiply the second equation by a12, to get

a21a12x1 + a22a12x2 = b2a12.

Subtracting the two equations the x2-terms cancel and we get the followingequation containing only the unknown x1,

a11a22x1 − a21a12x1 = b1a22 − b2a12.

Solving for x1, we get

x1 = (a22b1 − a12b2)(a11a22 − a12a21)−1.

Similarly to determine x2, we multiply the first equation by a21 and sub-tract the second equation multiplied by a11, which eliminates a1. Alto-gether, we obtain the solution formula

x1 = (a22b1 − a12b2)(a11a22 − a12a21)−1, (20.31a)

x2 = (a11b2 − a21b1)(a11a22 − a12a21)−1. (20.31b)

This formula gives the unique solution of (20.30) under the conditiona11a22 − a12a21 = 0.

We can derive the solution formula (20.31) in a different way, still assum-ing that a11a22 − a12a21 = 0. We define a1 = (a11, a21) and a2 = (a12, a22),noting carefully that here a1 and a2 denote vectors and that a1 × a2 =


a11a22 − a12a21 = 0, and rewrite the two equations of the system (20.30)in vector form as

x1a1 + x2a2 = b. (20.32)

Taking the vector product of this equation with a2 and a1 and using a2 ×a2 = a1 × a1 = 0,

x1a1 × a2 = b× a2, x2a2 × a1 = b× a1.

Since a1 × a2 = 0,

x1 =b× a2

a1 × a2, x2 =

b× a1

a2 × a1= − b× a1

a1 × a2, (20.33)

which agrees with the formula (20.31) derived above.We conclude this section by discussing the case when a1 × a2 = a11a22 −

a12a21 = 0, that is the case when a1 and a2 are parallel or equivalently thetwo lines are parallel. In this case, a2 = λa1 for some λ ∈ R and the system(20.30) has a solution if and only if b2 = λb1, since then the second equationresults from multiplying the first by λ. In this case there are infinitely manysolutions since the two lines coincide. In particular if we choose b1 = b2 = 0,then the solutions consist of all (x1, x2) such that a11x1 +a12x2 = 0, whichdefines a straight line through the origin. On the other hand if b2 = λb1,then the two equations represent two different parallel lines that do notintersect and there is no solution to the system (20.30).

We summarize our experience from this section on systems of 2 linearequations in 2 unknowns as follows:

Theorem 20.9 The system of linear equations x1a1 + x2a2 = b, wherea1, a2 and b are given vectors in R

2, has a unique solution (x1, x2) given by(20.33) if a1 × a2 = 0. In the case a1 × a2 = 0, the system has no solutionor infinitely many solutions, depending on b.

Below we shall generalize this result to systems of n linear equations in nunknowns, which represents one of the most basic results of linear algebra.

Example 20.13. The solution to the system

x1 + 2x2 = 3,4x1 + 5x2 = 6,

is given by

x1 =(3, 6) × (2, 5)(1, 4) × (2, 5)

=3−3

= −1, x2 = − (3, 6) × (1, 4)(1, 4) × (2, 5)

= − 6−3

= 2.

20.31 Linear Independence and Basis 297

20.31 Linear Independence and Basis

We saw above that the system (20.30) can be written in vector form as

x1a1 + x2a2 = b,

where b = (b1, b2), a1 = (a11, a21) and a2 = (a12, a22) are all vectors in R2,

and x1 and x2 real numbers. We say that

x1a1 + x2a2,

is a linear combination of the vectors a1 and a2, or a linear combinationof the set of vectors a1, a2, with the coefficients x1 and x2 being realnumbers. The system of equations (20.30) expresses the right hand sidevector b as a linear combination of the set of vectors a1, a2 with thecoefficients x1 and x2. We refer to x1 and x2 as the coordinates of b withrespect to the set of vectors a1, a2, which we may write as an orderedpair (x1, x2).

The solution formula (20.33) thus states that if a1 × a2 = 0, then anarbitrary vector b in R

2 can be expressed as a linear combination of theset of vectors a1, a2 with the coefficients x1 and x2 being uniquely deter-mined. This means that if a1 × a2 = 0, then the the set of vectors a1, a2may serve as a basis for R

2, in the sense that each vector b in R2 may be

uniquely expressed as a linear combination b = x1a1+x2a2 of the set of vec-tors a1, a2. We say that the ordered pair (x1, x2) are the coordinates of bwith respect to the basis a1, a2. The system of equations b = x1a1 +x2a2

thus give the coupling between the coordinates (b1, b2) of the vector b inthe standard basis, and the coordinates (x1, x2) with respect to the basisa1, a2. In particular, if b = 0 then x1 = 0 and x2 = 0.

Conversely if a1×a2 = 0, that is a1 and a2 are parallel, then any nonzerovector b orthogonal to a1 is also orthogonal to a2 and b cannot be expressedas b = x1a1+x2a2. Thus, if a1×a2 = 0 then a1, a2 cannot serve as a basis.We have now proved the following basic theorem:

Theorem 20.10 A set a1, a2 of two non-zero vectors a1 and a2 mayserve as a basis for R

2 if and only if if a1 × a2 = 0. The coordinates(b1, b2) of a vector b in the standard basis and the coordinates (x1, x2) of bwith respect to a basis a1, a2 are related by the system of linear equationsb = x1a1 + x2a2.

Example 20.14. The two vectors a1 = (1, 2) and a2 = (2, 1) (expressed inthe standard basis) form a basis for R

2 since a1 × a2 = 1 − 4 = −3. Letb = (5, 4) in the standard basis. To express b in the basis a1, a2, we seekreal numbers x1 and x2 such that b = x1a1 + x2a2, and using the solutionformula (20.33) we find that x1 = 1 and x2 = 2. The coordinates of b withrespect to the basis a1, a2 are thus (1, 2), while the coordinates of b withrespect to the standard basis are (5, 4).


We next introduce the concept of linear independence, which will playan important role below. We say that a set a1, a2 of two non-zero vectorsa1 and a2 two non-zero vectors a1 and a2 in R

2 is linearly independent ifthe system of equations

x1a1 + x2a2 = 0

has the unique solution x1 = x2 = 0. We just saw that if a1 × a2 = 0, thena1 and a2 are linearly independent (because b = (0, 0) implies x1 = x2 = 0).Conversely if a1 × a2 = 0, then a1 and a2 are parallel so that a1 = λa2 forsome λ = 0, and then there are many possible choices of x1 and x2, notboth equal to zero, such that x1a1 + x2a2 = 0, for example x1 = −1 andx2 = λ. We have thus proved:

Theorem 20.11 The set a1, a2 of non-zero vectors a1 and a2 is linearlyindependent if and only if a1 × a2 = 0.

x1

x2

c1c2

c = 0.73c1 + 1.7c2

Fig. 20.26. Linear combination c of two linearly independent vectors c1 and c2

20.32 The Connection to Calculus in One Variable

We have discussed Calculus of real-valued functions y = f(x) of one realvariable x ∈ R, and we have used a coordinate system in R

2 to plot graphsof functions y = f(x) with x and y representing the two coordinate axis.Alternatively, we may specify the graph as the set of points (x1, x2) ∈ R

2,consisting of pairs (x1, x2) of real numbers x1 and x2, such that x2 = f(x1)or x2 − f(x1) = 0 with x1 representing x and x2 representing y. We referto the ordered pair (x1, x2) ∈ R

2 as a vector x = (x1, x2) with componentsx1 and x2.

We have also discussed properties of linear functions f(x) = ax + b,where a and b are real constants, the graphs of which are straight lines

20.33 Linear Mappings f : R2 → R 299

x2 = ax1 + b in R2. More generally, a straight line in R

2 is the set of points(x1, x2) ∈ R

2 such that x1a1 + x2a2 = b, where the a1, a2 and b are realconstants, with a1 = 0 and/or a2 = 0. We have noticed that (a1, a2) maybe viewed as a direction in R

2 that is perpendicular or normal to the linea1x1 + a2x2 = b, and that (b/a1, 0) or (0, b/a2) are the points where theline intersects the x1-axis and the x2-axis respectively.

20.33 Linear Mappings f : R2 → R

A function f : R2 → R is linear if for any x = (x1, x2) and y = (y1, y2) in

R2 and any λ in R,

f(x+ y) = f(x) + f(y) and f(λx) = λf(x). (20.34)

Setting c1 = f(e1) ∈ R and c2 = f(e2) ∈ R, where e1 = (1, 0) and e2 =(0, 1) are the standard basis vectors in R

2, we can represent f : R2 → R as

follows:f(x) = x1c1 + x2c2 = c1x1 + c2x2,

where x = (x1, x2) ∈ R2. We also refer to a linear function as a linear

mapping.

Example 20.15. The function f(x1, x2) = x1+3x2 defines a linear mappingf : R

2 → R.

20.34 Linear Mappings f : R2 → R

2

A function f : R2 → R

2 taking values f(x) = (f1(x), f2(x)) ∈ R2 is linear

if the component functions f1 : R2 → R and f2 : R

2 → R are linear. Settinga11 = f1(e1), a12 = f1(e2), a21 = f2(e1), a22 = f2(e2), we can representf : R

2 → R2 as f(x) = (f1(x), f2(x)), where

f1(x) = a11x1 + a12x2, (20.35a)f2(x) = a21x1 + a22x2, (20.35b)

and the aij are real numbers.A linear mapping f : R

2 → R2 maps (parallel) lines onto (parallel) lines

since for x = x+ sb and f linear, we have f(x) = f(x+ sb) = f(x)+ sf(b),see Fig. 20.27.

Example 20.16. The function f(x1, x2) = (x1 + 3x2, 2x1 − x3) definesa linear mapping R

2 → R2.


x1

x2

y1

y2

x = s b

x = x+ s b

y = sf(b)

y = f(x) + sf(b)

Fig. 20.27. A linear mapping f : R2 → R

2 maps (parallel) lines to (parallel)lines, and consequently parallelograms to parallelograms

20.35 Linear Mappings and Linear Systemsof Equations

Let a linear mapping f : R2 → R

2 and a vector b ∈ R2 be given. We

consider the problem of finding x ∈ R2 such that

f(x) = b.

Assuming f(x) is represented by (20.35), we seek x ∈ R2 satisfying the

2 × 2 linear system of equations

a11x1 + a12x2 = b1, (20.36a)a21x1 + a22x2 = b2, (20.36b)

where the coefficients aij and the coordinates bi of the right hand side aregiven.

20.36 A First Encounter with Matrices

We write the left hand side of (20.36) as follows:(a11 a12

a21 a22

)(x1

x2

)

=(a11x1 + a12x2

a21x1 + a22x2

)

. (20.37)

The quadratic array (a11 a12

a21 a22

)

is called a 2 × 2 matrix. We can view this matrix to consist of two rows(a11 a12

)and

(a21 a22,

)

20.36 A First Encounter with Matrices 301

or two columns (a11

a21

)

and(a12

a22

)

.

Each row may be viewed as a 1 × 2 matrix with 1 horizontal array with 2elements and each column may be viewed as a 2× 1 matrix with 1 verticalarray with 2 elements. In particular, the array

(x1

x2

)

may be viewed as a 2×1 matrix. We also refer to a 2×1 matrix as a 2-columnvector, and a 1 × 2 matrix as a 2-row vector. Writing x = (x1, x2) we mayview x as a 1× 2 matrix or 2-row vector. Using matrix notation, it is mostnatural to view x = (x1, x2) as a 2-column vector.

The expression (20.37) defines the product of a 2× 2 matrix and a 2× 1matrix or a 2-column vector. The product can be interpreted as

(a11 a12

a21 a22

)(x1

x2

)

=(c1 · xc2 · x

)

(20.38)

where we interpret r1 = (a11, a12) and r2 = (a21, a22) as the two orderedpairs corresponding to the two rows of the matrix and x is the ordered pair(x1, x2). The matrix-vector product is given by

(a11 a12

a21 a22

)(x1

x2

)

(20.39)

i.e. by taking the scalar product of the ordered pairs r1 and r2 correspondingto the 2-row vectors of the matrix with the order pair corresponding to the

2-column vector x =(x1

x2

)

.

Writing

A =(a11 a12

a21 a22

)

and x =(x1

x2

)

and b =(b1b2

)

, (20.40)

we can phrase the system of equations (20.36) in condensed form as thefollowing matrix equation:

Ax = b, or(a11 a12

a21 a22

)(x1

x2

)

=(b1b2

)

.

We have now got a first glimpse of matrices including the basic operationof multiplication of a 2× 2-matrix with a 2× 1 matrix or 2-column vector.Below we will generalize to a calculus for matrices including addition ofmatrices, multiplication of matrices with a real number, and multiplicationof matrices. We will also discover a form of matrix division referred toas inversion of matrices allowing us to express the solution of the systemAx = b as x = A−1b, under the condition that the columns (or equivalently,the rows) of A are linearly independent.


20.37 First Applications of Matrix Notation

To show the usefulness of the matrix notation just introduced, we rewritesome of the linear systems of equations and transformations which we havemet above.

Rotation by θ

The mapping Rθ : R2 → R

2 corresponding to rotation of a vector by anangle θ is given by (20.14), that is

Rθ(x) = (x1 cos(θ) − x2 sin(θ), x1 sin(θ) + x2 cos(θ)). (20.41)

Using matrix notation, we can write Rθ(x) as follows

Rθ(x) = Ax =(

cos(θ) − sin(θ)sin(θ) cos(θ)

)(x1

x2

)

,

where A thus is the 2 × 2 matrix

A =(


)

. (20.42)

Projection Onto a Vector a

The projection Pa(x) = x·a|a|2 a given by (20.9) of a vector x ∈ R

2 ontoa given vector a ∈ R

2 can be expressed in matrix form as follows:

Pa(x) = Ax =(a11 a12

a21 a22

)(x1

x2

)

,

where A is the 2 × 2 matrix

A =

(a21

|a|2a1a2|a|2

a1a2|a|2

a22

|a|2

)

. (20.43)

Change of Basis

The linear system (20.17) describing a change of basis can be written inmatrix form as

(x1

x2

)

=(

cos(θ) sin(θ)− sin(θ) cos(θ)

)(x1

x2

)

,

or in condensed from as x = Ax, where A is the matrix

A =(

cos(θ) sin(θ)− sin(θ) cos(θ)

)

and x and x are 2-column vectors.

20.38 Addition of Matrices 303

20.38 Addition of Matrices

Let A be a given 2 × 2 matrix with elements aij , i, j = 1, 2, that is

A =(a11 a12

a21 a22

)

.

We write A = (aij). Let B = (bij) be another 2 × 2 matrix. We define thesum C = A + B to be the matrix C = (cij) with elements cij = aij + bijfor i, j = 1, 2. In other words, we add two matrices element by element:

A+B =(a11 a12

a21 a22

)

+(b11 b12b21 b22

)

=(a11 + b11 a12 + b12a21 + b21 a22 + b22

)

= C.

20.39 Multiplication of a Matrix by a Real Number

Given a 2× 2 matrix A with elements aij , i, j = 1, 2, and a real number λ,we define the matrix C = λA as the matrix with elements cij = λaij . Inother words, all elements aij are multiplied by λ:

λA = λ

(a11 a12

a21 a22

)

=(λa11 λa12

λa21 λa22

)

= C.

20.40 Multiplication of Two Matrices

Given two 2× 2 matrices A = (aij) and B = (bij) with elements, we definethe product C = AB as the matrix with elements cij given by

cij =2∑

k=1

aikbkj .

Writing out the sum, we have

AB =(a11 a12

a21 a22

)(b11 b12b21 b22

)

=(a11b11 + a12b21 a11b12 + a12b22a21b11 + a22b21 a21b12 + a22b22

)

= C.

In other words, to get the element cij of the product C = AB, we take thescalar product of row i of A with column j of B.

The matrix product is generally non-commutative so that AB = BAmost of the time.


We say that in the product AB the matrix A multiplies the matrix Bfrom the left and that B multiplies the matrix A from the right. Non-commutativity of matrix multiplication means that multiplication fromright or left may give different results.

Example 20.17. We have(

1 21 1

)(1 31 1

)

=(

3 52 4

)

, while(

1 31 1

)(1 21 1

)

=(

4 52 3

)

.

Example 20.18. We compute BB = B2, where B is the projection matrixgiven by (20.43), that is

B =

(a21

|a|2a1a2|a|2

a1a2|a|2

a22

|a|2

)

=1

|a|2

(a21 a1a2

a1a2 a22

)

.

We have

BB =1

|a|4

(a21 a1a2

a1a2 a22

)(a21 a1a2

a1a2 a22

)

=1

|a|4

(a21(a

21 + a2

2) a1a2(a21 + a2

2)a1a2(a2

1 + a22) a2

2(a21 + a2

2)

)

=1

|a|2

(a21 a1a2

a1a2 a22

)

= B,

and see as expected that BB = B.

Example 20.19. As another application we compute the product of twomatrices corresponding to two rotations with angles α and β:

A =(

cos(α) − sin(α)sin(α) cos(α)

)

and B =(

cos(β) − sin(β)sin(β) cos(β)

)

. (20.44)

We compute

AB =(

cos(α) − sin(α)sin(α) cos(α)

)(cos(β) − sin(β)sin(β) cos(β)

)

(cos(α) cos(β) − sin(α) sin(β) − cos(α) sin(β) − sin(α) cos(β)cos(α) sin(β) + sin(α) cos(β) cos(α) cos(β) − sin(α) sin(β)

)

=(

cos(α+ β) − sin(α+ β)sin(α+ β) cos(α+ β)

)

,

where again we have used the formulas for cos(α + β) and sin(α + β)from Chapter Pythagoras and Euclid. We conclude as expected that twosuccessive rotations of angles α and β corresponds to a rotation of angleα+ β.

20.41 The Transpose of a Matrix 305

20.41 The Transpose of a Matrix

Given a 2 × 2 matrix A with elements aij , we define the transpose of Adenoted by A as the matrix C = A with elements c11 = a11, c12 = a21,c21 = a12, c22 = a22. In other words, the rows of A are the columns of A

and vice versa. For example

if A =(

1 23 4

)

then A =(

1 32 4

)

.

Of course (A) = A. Transposing twice brings back the original matrix.We can directly check the validity of the following rules for computing withthe transpose:

(A+B) = A +B, (λA) = λA,

(AB) = BA.

20.42 The Transpose of a 2-Column Vector

The transpose of a 2-column vector is the row vector with the same ele-ments:

if x =(x1

x2

)

, then x =(x1 x2

).

We may define the product of a 1×2 matrix (2-row vector) x with a 2×1matrix (2-column vector) y in the natural way as follows:

xy =(x1 x2

)(y1y2

)

= x1y1 + x2y2.

In particular, we may write

|x|2 = x · x = xx,

where we interpret x as an ordered pair and as a 2-column vector.

20.43 The Identity Matrix

The 2 × 2 matrix (1 00 1

)

is called the identity matrix and is denoted by I. We have IA = A andAI = A for any 2 × 2 matrix A.


20.44 The Inverse of a Matrix

Let A be a 2 × 2 matrix with elements aij with a11a22 − a12a21 = 0. Wedefine the inverse matrix A−1 by

A−1 =1

a11a22 − a12a21

(a22 −a12

−a21 a11

)

. (20.45)

We check by direct computation that A−1A = I and that AA−1 = I, whichis the property we ask an “inverse” to satisfy. We get the first column ofA−1 by using the solution formula (20.31) with b = (1, 0) and the secondcolumn choosing b = (0, 1).

The solution to the system of equations Ax = b can be written as x =A−1b, which we obtain by multiplying Ax = b from the left by A−1.

We can directly check the validity of the following rules for computingwith the inverse:

(λA)−1 = λA−1

(AB)−1 = B−1A−1.

20.45 Rotation in Matrix Form Again!

We have seen that a rotation of a vector x by an angle θ into a vector ycan be expressed as y = Rθx with Rθ being the rotation matrix:

Rθ =(


)

(20.46)

We have also seen that two successive rotations by angles α and β can bewritten as

y = RβRαx, (20.47)

and we have also shown that RβRα = Rα+β . This states the obvious factthat two successive rotations α and β can be performed as one rotationwith angle α+ β.

We now compute the inverse R−1θ of a rotation Rθ using (20.45),

R−1θ =

1cos(θ)2 + sin(θ)2

(cos(θ) sin(θ)− sin(θ) cos(θ)

)

=(

cos(−θ) − sin(−θ)sin(−θ) cos(−θ)

)

(20.48)

where we use cos(α) = cos(−α), sin(α) = − sin(−α). We see that R−1θ =

R−θ, which is one way of expressing the (obvious) fact that the inverse ofa rotation by θ is a rotation by −θ.

20.46 A Mirror in Matrix Form 307

We observe that R−1θ = R

θ with Rθ the transpose of Rθ, so that in

particularRθR

θ = I. (20.49)

We use this fact to prove that the length of a vector is not changed byrotation. If y = Rθx, then

|y|2 = yT y = (Rθx)(Rθx) = xRθ Rθx = xx = |x|2. (20.50)

More generally, the scalar product is preserved after the rotation. If y =Rθx and y = Rθx, then

y · y = (Rθx)(Rθx) = xRθ Rθx = x · x. (20.51)

The relation (20.49) says that the matrix Rθ is orthogonal. Orthogonalmatrices play an important role, and we will return to this topic below.

20.46 A Mirror in Matrix Form

Consider the linear transformation 2P − I, where Px = a·x|a|2 a is the projec-

tion onto the non-zero vector a ∈ R2, that is onto the line x = sa through

the origin. In matrix form, this can be expressed as

2P − I =2

|a|2

(a21 − 1 a1a2

a2a1 a22 − 1

)

.

After some reflection(!), looking at Fig. 20.28, we understand that the trans-formation I + 2(P − I) = 2P − I maps a point x into its mirror image inthe line through the origin with direction a.

a

x Px−x

Fig. 20.28. The mapping 2P − I maps points to its mirror point relative to thegiven line


To see if 2P − I preserves scalar products, we assume that y = (2P − I)xand y = (2P − I)x and compute:

y · y = ((2P − I)x)(2P − I)x = x(2P − I)(2P − I)x = (20.52)x(4PP − 2PI − 2PI + I)x = x(4P − 4P + I)x = x · x, (20.53)

where we used the fact that P = P and PP = P , and we thus find anaffirmative answer.

20.47 Change of Basis Again!

Let a1, a2 and a1, a2 be two different bases in R2. We then express any

given b ∈ R2 as

b = x1a1 + x2a2 = x1a1 + x2a2, (20.54)

with certain coordinates (x1, x2) with respect to a1, a2 and some othercoordinates (x1, x2) with respect a1, a2.

To connect (x1, x2) to (x1, x2), we express the basis vectors a1, a2 interms of the basis a1, a2:

c11a1 + c21a2 = a1,

c12a1 + c22a2 = a2,

with certain coefficients cij . Inserting this into (20.54), we get

x1(c11a1 + c21a2) + x2(c12a1 + c22a2) = b.

Reordering terms,

(c11x1 + c12x2)a1 + (c21x1 + c22x2)a2 = b.

x1

x2

b b

a1

a2

b2

b2

b1 b1

a1

a2

Fig. 20.29. A vector b may be expressed in terms of the basis a1, a2 or thebasis a1, a2

20.48 Queen Christina 309

We conclude by uniqueness that

x1 = c11x1 + c12x2, (20.55)x2 = c21x1 + c22x2, (20.56)

which gives the connection between the coordinates (x1, x2) and (x1, x2).Using matrix notation, we can write this relation as x = Cx with

C =(c11 c12c21 c22

)

.

20.48 Queen Christina

Queen Christina of Sweden (1626–1689), daughter of Gustaf Vasa Kingof Sweden 1611–1632, crowned to Queen at the age 5, officially coronated1644, abdicated 1652, converted to Catholicism and moved to Rome 1655.

Throughout her life, Christina had a passion for the arts and for learning,and surrounded herself with musicians, writers, artists and also philoso-

Fig. 20.30. Queen Christina to Descartes: “If we conceive the world in that vastextension you give it, it is impossible that man conserve himself therein in thishonorable rank, on the contrary, he shall consider himself along with the entireearth he inhabits as in but a small, tiny and in no proportion to the enormoussize of the rest. He will very likely judge that these stars have inhabitants, or eventhat the earths surrounding them are all filled with creatures more intelligent andbetter than he, certainly, he will lose the opinion that this infinite extent of theworld is made for him or can serve him in any way”


phers, theologians, scientists and mathematicians. Christina had an im-pressive collection of sculpture and paintings, and was highly respected forboth her artistic and literary tastes. She also wrote several books, includ-ing her Letters to Descartes and Maxims. Her home, the Palace Farnese,was the active center of cultural and intellectual life in Rome for severaldecades.

Duc de Guise quoted in Georgina Masson’s Queen Christina biographydescribes Queen Chistina as follows: “She isn’t tall, but has a well-filledfigure and a large behind, beautiful arms, white hands. One shoulder ishigher than another, but she hides this defect so well by her bizarre dress,walk and movements. . . . The shape of her face is fair but framed by themost extraordinary coiffure. It’s a man’s wig, very heavy and piled highin front, hanging thick at the sides, and at the back there is some slightresemblance to a woman’s coiffure. . . . She is always very heavily powderedover a lot of face cream”.

Chapter 20 Problems

20.1. Given the vectors a, b and c in R2 and the scalars λ, µ ∈ R, prove the

following statements

a+ b = b+ a, (a+ b) + c = a+ (b+ c), a+ (−a) = 0

a+ 0 = a, 3a = a+ a+ a, λ(µa) = (λµ)a,

(λ+ µ)a = λa+ µa, λ(a+ b) = λa+ λb, |λa| = |λ||a|.

Try to give both analytical and geometrical proofs.

20.2. Give a formula for the transformation f : R2 → R

2 corresponding toreflection through the direction of a given vector a ∈ R

2. Find the correspondingmatrix.

20.3. Given a = (3, 2) and b = (1, 4), compute (i) |a|, (ii) |b|, (iii) |a + b|, (iv))|a− b|, (v) a/|a|, (vi) b/|b|.

20.4. Show that the norm of a/|a| with a ∈ R2, a = 0, is equal to 1.

20.5. Given a, b ∈ R2 prove the following inequalities a) |a + b| ≤ |a| + |b|, b)

a · b ≤ |a||b|.

20.6. Compute a · b with

(i) a = (1, 2), b = (3, 2) (ii) a = (10, 27), b = (14,−5)

20.7. Given a, b, c ∈ R2, determine which of the following statements make sense:

(i) a · b, (ii) a · (b · c), (iii) (a · b) + |c|, (iv) (a · b) + c, (v) |a · b|.

20.8. What is the angle between a = (1, 1) and b = (3, 7)?


20.9. Given b = (2, 1) construct the set of all vectors a ∈ R2 such that a · b = 2.

Give a geometrical interpretation of this result.

20.10. Find the projection of a onto b onto (1, 2) with (i) a = (1, 2), (ii) a =(−2, 1), (iii) a = (2, 2), (iv) a = (

√2,√

2).

20.11. Decompose the vector b = (3, 5) into one component parallel to a andone component orthogonal to a for all vectors a in the previous exercise.

20.12. Let a, b and c = a− b in R2 be given, and let the angle between a and b

be ϕ. Show that:|c|2 = |a|2 + |b|2 − 2|a||b| cosϕ.

Give an interpretation of the result.

20.13. Prove the law of cosines for a triangle with sidelengths a, b and c:

c2 = a2 + b2 − 2ab cos(θ),

where θ is the angle between the sides a and b.

20.14. Given the 2 by 2 matrix:

A =

(1 23 4

)

compute Ax and ATx for the following choice of x ∈ R2:

(i) xT = (1, 2) (ii) xT = (1, 1)

20.15. Given the 2 × 2-matrices:

A =

(1 23 4

)

, B =

(5 67 8

)

,

compute (i) AB, (ii) BA, (iii) ATB, (iv) ABT , (v) BTAT , (vi) (AB)T , (vii) A−1,(viii) B−1, (ix) (AB)−1, (x) A−1A.

20.16. Show that (AB)T = BTAT and that (Ax)T = xTAT .

20.17. What can be said about A if: a) A = AT b) AB = I?

20.18. Show that the projection:

Pa(b) =b · a|a|2 a

can be written in the form Pb, where P is a 2 × 2 matrix. Show that PP = Pand P = P T .

20.19. Compute the mirror image of a point with respect to a straight line inR

2 which does not pass through the origin. Express the mapping in matrix form.

20.20. Express the linear transformation of rotating a vector a certain givenangle as a matrix vector product.


20.21. Given a, b ∈ R2, show that the “mirror vector” b obtained by reflecting

b in a can be expressed as:b = 2Pb − b

where P is a certain projection Show that the scalar product between two vectorsis invariant under a reflection, that is

c · d = c · d.

20.22. Compute a× b and b× a with (i) a = (1, 2), b = (3, 2), (ii) a = (1, 2), b =(3, 6), (iii) a = (2,−1), b = (2, 4).

20.23. Extend the Matlab functions for vectors in R2 by writing functions for

vector product (x = vecProd(a, b)) and rotation (b = vecRotate(a, angle)) ofvectors.

20.24. Check the answers to the above problems using Matlab.

20.25. Verify that the projection Px = Pa(x) is linear in x. Is it linear also ina? Illustrate, as in Fig. 20.17, that Pa(x+ y) = Pa(x) + Pa(y).

20.26. Prove that the formula (21.29) for the projection of a point onto a linethrough the origin, coincides with the formula (20.9) for the projection of thevector b on the direction of the line.

20.27. Show that if a = λa, where a is a nonzero vector R2 and λ = 0, then

for any b ∈ R2 we have Pa(b) = Pa(b), where Pa(b) is the projection of b onto a.

Conclude that the projection onto a non-zero vector a ∈ R2 only depends on the

direction of a and not the norm of a.

21Analytic Geometry in R

3

We must confess that in all humility that, while number is a productof our mind alone, space has a reality beyond the mind whose ruleswe cannot completely prescribe. (Gauss 1830)

You can’t help respecting anybody who can spell TUESDAY, evenif he doesn’t spell it right. (The House at Pooh Corner, Milne)

21.1 Introduction

We now extend the discussion of analytic geometry to Euclidean three di-mensional space or Euclidean 3d space for short. We imagine this spacearises when we draw a normal through the origin to a Euclidean two di-mensional plane spanned by orthogonal x1 and x2 axes, and call the normalthe x3-axis. We then obtain an orthogonal coordinate system consisting ofthree coordinate x1, x2 and x3 axes that intersect at the origin, with eachaxis being a copy of R, see Fig. 21.1.

In daily life, we may imagine a room where we live as a portion of R3,

with the horizontal floor being a piece of R2 with two coordinates (x1, x2)

and with the vertical direction as the third coordinate x3. On a largerscale, we may imagine our neighborhood in terms of three orthogonal di-rections West-East, South-North, and Down-Up, which may be viewed tobe a portion of R

3, if we neglect the curvature of the Earth.The coordinate system can be oriented two ways, right or left. The coor-

dinate system is said to be right-oriented, which is the standard, if turning


x1x1

x2x2

x3x3

(a1, a2, a3)

(a1, a2, a3)

Fig. 21.1. Coordinate system for R3

a standard screw into the direction of the positive x3-axis will turn thex1-axis the shortest route to the x2-axis, see Fig. 21.2. Alternatively, wecan visualize holding our flattened right hand out with the fingers alignedalong the x1 axis so that when we curl our fingers inward, they move to-wards the positive x2 axis, and then our extended thumb will point alongthe positive x3 axis.

x1

x1

x1

x2x2

x2 x3

x3

x3

Fig. 21.2. Two “right” coordinate systems and one “left”, where the verticalcoordinate of the view point is assumed positive, that is, the horizontal plane isseen from above. What happens if the vertical coordinate of the view point isassumed negative?

Having now chosen a right-oriented orthogonal coordinate system, wecan assign three coordinates (a1, a2, a3) to each point a in space using thesame principle as in the case of the Euclidean plane, see Fig. 21.1. Thisway we can represent Euclidean 3d space as the set of all ordered 3-tuplesa = (a1, a2, a3), where ai ∈ R for i = 1, 2, 3, or as R

3. Of course, wecan choose different coordinate systems with different origin, coordinatedirections and scaling of the coordinate axes. Below, we will come back tothe topic of changing from one coordinate system to another.

21.2 Vector Addition and Multiplication by a Scalar 315

21.2 Vector Addition and Multiplicationby a Scalar

Most of the notions and concepts of analytic geometry of the Euclideanplane represented by R

2 extend naturally to Euclidean 3d space representedby R

3.In particular, we can view an ordered 3-tuple a = (a1, a2, a3) either as

a point in three dimensional space with coordinates a1, a2 and a3 or asa vector/arrow with tail at the origin and head at the point (a1, a2, a3), asillustrated in Fig. 21.1.

We define the sum a+b of two vectors a = (a1, a2, a3) and b = (b1, b2, b3)in R

3 by componentwise addition,

a+ b = (a1 + b1, a2 + b2, a3 + b3),

and multiplication of a vector a = (a1, a2, a3) by a real number λ by

λa = (λa1, λa2, λa3).

The zero vector is the vector 0 = (0, 0, 0). We also write −a = (−1)a anda − b = a + (−1)b. The geometric interpretation of these definitions isanalogous to that in R

2. For example, two non-zero vectors a and b in R3

are parallel if b = λa for some non-zero real number λ. The usual ruleshold, so vector addition is commutative, a + b = b + a, and associative,(a + b) + c = a + (b + c). Further, λ(a + b) = λa + λb and κ(λa) = (κλ)afor vectors a and b and real numbers λ and κ.

The standard basis vectors in R3 are e1 = (1, 0, 0), e2 = (0, 1, 0) and

e3 = (0, 0, 1).

21.3 Scalar Product and Norm

The standard scalar product a · b of two vectors a = (a1, a2, a3) and b =(b1, b2, b3) in R

3 is defined by

a · b =3∑

i=1

aibi = a1b1 + a2b2 + a3b3. (21.1)

The scalar product in R3 has the same properties as its cousin in R

2, so itis bilinear, symmetric, and positive definite.= We say that two vectors aand b are orthogonal if a · b = 0.

The Euclidean length or norm |a| of a vector a = (a1, a2, a3) is definedby

|a| = (a · a) 12 =

(3∑

i=1

a2i

) 12

, (21.2)


which expresses Pythagoras theorem in 3d, and which we may obtain byusing the usual 2d Pythagoras theorem twice. The distance |a− b| betweentwo points a = (a1, a2, a3) and b = (b1, b2, b3) is equal to

|a− b| =

(3∑

i=1

(ai − bi)2)1/2

.

Cauchy’s inequality states that for any vectors a and b in R3,

|a · b| ≤ |a| |b|. (21.3)

We give a proof of Cauchy’s inequality in Chapter Analytic Geometry inR

n below. We note that Cauchy’s inequality in R2 follows directly from the

fact that a · b = |a| |b| cos(θ), where θ is the angle between a and b.

21.4 Projection of a Vector onto a Vector

Let a be a given non-zero vector in R3. We define the projection Pb = Pa(b)

of a vector b in R3 onto the vector a by the formula

Pb = Pa(b) =a · ba · a a =

a · b|a|2 a. (21.4)

This is a direct generalization of the corresponding formula in R2 based

on the principles that Pb is parallel to a and b − Pb is orthogonal to a asillustrated in Fig. 21.3, that is

Pb = λa for some λ ∈ R and (b− Pb) · a = 0.

This gives the formula (21.4) with λ = a·b|a|2 .

The transformation P : R3 → R

3 is linear, that is for any b and c ∈ R3

and λ ∈ R,P (b+ c) = Pb+ Pc, P (λb) = λPb,

and PP = P .

21.5 The Angle Between Two Vectors

We define the angle θ between non-zero vectors a and b in R3 by

cos(θ) =a · b|a| |b| , (21.5)


where we may assume that 0 ≤ θ ≤ 180. By Cauchy’s inequality (21.3),|a · b| ≤ |a| |b|. Thus, there is an angle θ satisfying (21.5) that is uniquelydefined if we require 0 ≤ θ ≤ 180. We may write (21.5) in the form

a · b = |a||b| cos(θ), (21.6)

where θ is the angle between a and b. This evidently extends the corre-sponding result in R

2.We define the angle θ between two vectors a and b via the scalar product

a · b in (21.5), which we may view as an algebraic definition. Of course,we would like to see that this definition coincides with a usual geometricdefinition. If a and b both lie in the x1 − x2-plane, then we know fromthe Chapter Analytic geometry in R

2 that the two definitions coincide. Weshall see below that the scalar product a · b is invariant (does not change)under rotation of the coordinate system, which means that given any twovectors a and b, we can rotate the coordinate system so make a and b liein the x1 − x2-plane. We conclude that the algebraic definition (21.5) ofangle between two vectors and the usual geometric definition coincide. Inparticular, two non-zero vectors are geometrically orthogonal in the sensethat the geometric angle θ between the vectors satisfies cos(θ) = 0 if andonly if a · b = |a||b| cos(θ) = 0.

x1x1

x2x2

x3x3

aa

bb

Pb

Pb θθ

Fig. 21.3. Projection Pb of a vector b onto a vector a

21.6 Vector Product

We now define the vector product a× b of two vectors a = (a1, a2, a3) andb = (b1, b2, b3) in R

3 by the formula

a× b = (a2b3 − a3b2, a3b1 − a1b3, a1b2 − a2b1) . (21.7)


We note that the vector product a× b of two vectors a and b in R3 is itself

a vector in R3. In other words, with f(a, b) = a× b, f : R

3 → R3. We also

refer to the vector product as the cross product, because of the notation.

Example 21.1. If a = (3, 2, 1) and b = (4, 5, 6), then a × b = (12 − 5, 4 −18, 15− 8) = (7,−14, 7).

Note that there is also the trivial, componentwise “vector product” de-fined by (using MATLAB’s notation) a.∗b = (a1b1, a2b2, a3b3). The vectorproduct defined above, however, is something quite different!

The formula for the vector product may seem a bit strange (and com-plicated), and we shall now see how it arises. We start by noting that theexpression a1b2 − a2b1 appearing in (21.7) is the vector product of thevectors (a1, a2) and (b1, b2) in R

2, so there appears to be some pattern atleast.

We may directly check that the vector product a× b is linear in both aand b, that is

a× (b + c) = a× b+ a× c, (a+ b) × c = a× c+ b× c, (21.8a)(λa) × b = λa× b, a× (λb) = λa× b, (21.8b)

where the products × should be computed first unless something else isindicated by parentheses. This follows directly from the fact that the com-ponents of a× b depend linearly on the components of a and b.

Since the vector product a× b is linear in both a and b, we say that a× bis bilinear. We also see that the vector product a × b is anti-symmetric inthe sense that

a× b = − b× a. (21.9)

Thus, the vector product a× b is bilinear and antisymmetric and moreoverit turns out that these two properties determine the vector product up toa constant just as in in R

2.For the vector products of the basis vectors ei, we have (check this!)

ei × ei = 0, i = 1, 2, 3, (21.10a)e1 × e2 = e3, e2 × e3 = e1, e3 × e1 = e2, (21.10b)

e2 × e1 = −e3, e3 × e2 = −e1, e1 × e3 = −e2. (21.10c)

We see that e1×e2 = e3 is orthogonal to both e1 and e2. Similarly, e2×e3 =e1 is orthogonal to both e2 and e3, and e1×e3 = −e2 is orthogonal to bothe1 and e3.

This pattern generalizes. In fact, for any two non-zero vectors a and b,the vector a× b is orthogonal to both a and b since

a ·(a×b) = a1(a2b3−a3b2)+a2(a3b1−a1b3)+a3(a1b2−a2b1) = 0, (21.11)

and similarly b · (a× b) = 0.

21.7 Geometric Interpretation of the Vector Product 319

We may compute the vector product of two arbitrary vectors a = (a1, a2,a3) and b = (b1, b2, b3) by using linearity combined with (21.10) as follows,

a× b = (a1e1 + a2e2 + a3e3) × (b1e1 + b2e2 + b3e3)= a1b2 e1 × e2 + a2b1 e2 × e1

+ a1b3 e1 × e3 + a3b1 e3 × e1

+ a2b3 e2 × e3 + a3b2 e3 × e2

= (a1b2 − a2b1)e3 + (a3b1 − a1b3)e2 + (a2b3 − a3b2)e1= (a2b3 − a3b2, a3b1 − a1b3, a1b2 − a2b1),

which conforms with (21.7).

21.7 Geometric Interpretationof the Vector Product

We shall now make a geometric interpretation of the vector product a× bof two vectors a and b in R

3.We start by assuming that a = (a1, a2, 0) and b = (b1, b2, 0) are two

non-zero vectors in the plane defined by the x1 and x2 axes. The vectora×b = (0, 0, a1b2−a2b1) is clearly orthogonal to both a and b, and recallingthe basic result (20.21) for the vector product in R

2, we have

|a× b| = |a||b|| sin(θ)|, (21.12)

where θ is the angle between a and b.We shall now prove that this result generalizes to arbitrary vectors a =

(a1, a2, a3) and b = (b1, b2, b3) in R3. First, the fact that a× b is orthogonal

to both a and b was proved in the previous section. Secondly, we note thatmultiplying the trigonometric identity sin2(θ) = 1− cos2(θ) by |a|2 |b|2 andusing (21.6), we obtain

|a|2|b|2 sin2(θ) = |a|2|b|2 − (a · b)2. (21.13)

Finally, a direct (but somewhat lengthy) computation shows that

|a× b|2 = |a|2|b|2 − (a · b)2,

which proves (21.12). We summarize in the following theorem.

Theorem 21.1 The vector product a× b of two non-zero vectors a and bin R

3 is orthogonal to both a and b and |a × b| = |a||b|| sin(θ)|, where θ isthe angle between a and b. In particular, a and b are parallel if and only ifa× b = 0.

We can make the theorem more precise by adding the following sign rule:The vector a × b is pointing in the direction of a standard screw turningthe vector a into the vector b the shortest route.


x1

x2

x3

a

b

θ

a×b

|a×b| = A(a, b) = |a| |b| | sin(θ)|

Fig. 21.4. Geometric interpretation of the vector product

21.8 Connection Between Vector Productsin R

2 and R3

We note that if a = (a1, a2, 0) and b = (b1, b2, 0), then

a× b = (0, 0, a1b2 − a2b1). (21.14)

The previous formula a×b = a1b2−a2b1 for a = (a1, a2) and b = (b1, b2) inR

2 may thus be viewed as a short-hand for the formula a× b = (0, 0, a1b2−a2b1) for a = (a1, a2, 0) and b = (b1, b2, 0), with a1b2 − a2b1 being the thirdcoordinate of a × b in R

3. We note the relation of the sign conventionsin R

2 and R3: If a1b2 − a2b1 ≥ 0, then turning a screw into the positive

x3-direction should turn a into b the shortest route. This corresponds toturning a into b counter-clockwise and to the angle θ between a and bsatisfying sin(θ) ≥ 0.

21.9 Volume of a Parallelepiped Spannedby Three Vectors

Consider the parallelepiped spanned by three vectors a, b and c, accordingto Fig. 21.5.

We seek a formula for the volume V (a, b, c) of the parallelepiped. Werecall that the volume V (a, b, c) is equal to the area A(a, b) of the basespanned by the vectors a and b times the height h, which is the length ofthe projection of c onto a vector that is orthogonal to the plane formed bya and b. Since a× b is orthogonal to both a and b, the height h is equal tothe length of the projection of c onto a × b. From (21.12) and (21.4), weknow that

A(a, b) = |a× b|, h =|c · (a× b)||a× b| ,

21.10 The Triple Product a · b× c 321

x1

x2

x3

a

bc

θ

a×b

Fig. 21.5. Parallelepiped spanned by three vectors

and thusV (a, b, c) = |c · (a× b)|. (21.15)

Clearly, we may also compute the volume V (a, b, c) by considering b and cas forming the base, or likewise the vectors a and c forming the base. Thus,

V (a, b, c) = |a · (b× c)| = |b · (a× c)| = |c · (a× b)|. (21.16)

Example 21.2. The volume V (a, b, c) of the parallelepiped spanned bya = (1, 2, 3), b = (3, 2, 1) and c = (1, 3, 2) is equal to a · (b × c) = (1, 2, 3) ·(1,−5, 7) = 12.

21.10 The Triple Product a · b × c

The expression a · (b × c) occurs in the formulas (21.15) and (21.16). Thisis called the triple product of the three vectors a, b and c. We usually writethe triple product without the parenthesis following the convention that thevector product × is performed first. In fact, the alternative interpretation(a · b)× c does not make sense since a · b is a scalar and the vector product× requires vector factors!

The following properties of the triple product can be readily verified bydirect application of the definition of the scalar and vector products,

a · b× c = c · a× b = b · c× a,

a · b× c = −a · c× b = −b · a× c = −c · b× a.

To remember these formulas, we note that if two of the vectors change placethen the sign changes, while if all three vectors are cyclically permuted (forexample the order a, b, c is replaced by c, a, b or b, c, a), then the sign isunchanged.


Using the triple product a · b× c, we can express the geometric quantityof the volume V (a, b, c) of the parallelepiped spanned by a, b and c in theconcise algebraic form,

V (a, b, c) = |a · b× c|. (21.17)

We shall use this formula many times below. Note, we later prove that thevolume of a parallelepiped can be computed as the area of the base timesthe height using Calculus below.

21.11 A Formula for the Volume Spanned byThree Vectors

Let a1 = (a11, a12, a13), a2 = (a21, a22, a23) and a3 = (a31, a32, a33) be threevectors in R

3. Note that here a1 is a vector in R3 with a1 = (a11, a12, a13),

et cetera. We may think of forming the 3 × 3 matrix A = (aij) with therows corresponding to the coordinates of a1, a2 and a3,

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

We will come back to 3 × 3 matrices below. Here, we just use the matrixto express the coordinates of the vectors a1, a2 and a3 in handy form.

We give an explicit formula for the volume V (a1, a2, a3) spanned by threevectors a1, a2 and a3. By direct computation starting with (21.17),

±V (a1, a2, a3) = a1 · a2 × a3

= a11(a22a33 − a23a32) − a12(a21a33 − a23a31)+ a13(a21a32 − a22a31).

(21.18)

We note that V (a1, a2, a3) is a sum of terms, each term consisting of theproduct of three factors aijaklamn with certain indices ij, kl and mn. If weexamine the indices occurring in each term, we see that the sequence of rowindices ikm (first indices) is always 1, 2, 3, while the sequence of columnindices jln (second indices) corresponds to a permutation of the sequence1, 2, 3, that is the numbers 1, 2 and 3 occur in some order. Thus, allterms have the form

a1j1a2j2a3j3 (21.19)

with j1, j2, j3 being a permutation of 1, 2, 3. The sign of the termschange with the permutation. By inspection we can detect the followingpattern: if the permutation can be brought to the order 1, 2, 3 with aneven number of transpositions, each transposition consisting of interchang-ing two indices, then the sign is +, and with an uneven number of transpo-sitions the sign is −. For example, the permutation of second indices in the

21.12 Lines 323

term a11a23a32 is 1, 3, 2, which is uneven since one transposition brings itback to 1, 2, 3, and thus this term has a negative sign. Another example:the permutation in the term a12a23a31 is 2, 3, 1 is even since it resultsfrom the following two transpositions 2, 1, 3 and 1, 2, 3.

We have now developed a technique for computing volumes that we willgeneralize to R

n below. This will lead to determinants. We will see thatthe formula (21.18) states that the signed volume ±V (a1, a2, a3) is equalto the determinant of the 3 × 3 matrix A = (aij).

21.12 Lines

Let a be a given non-zero vector in R3 and let x be a given point in R

3.The points x in R

3 of the form

x = x+ sa,

where s varies over R, form a line in R3 through the point x with direction

a, see Fig. 21.6. If x = 0, then the line passes through the origin.

x1

x2

x3

a

s a

x

x = x+ s a

Fig. 21.6. Line in R3 of the form x = x+ s a

Example 21.3. The line through (1, 2, 3) in the direction (4, 5, 6) is givenby

x = (1, 2, 3) + s(4, 5, 6) = (1 + 4s, 2 + 5s, 3 + 6s) s ∈ R.

The line through (1, 2, 3) and (3, 1, 2) has the direction (3, 1, 2) − (1, 2, 3)= (2,−1,−1), and is thus given by x = (1, 2, 3) + s(2,−1,−1).Note that by choosing other vectors to represent the direction of the line,it may also be represented, for example, as x = (1, 2, 3) + s(−2, 1, 1) orx = (1, 2, 3) + s(6,−3,−3). Also, the “point of departure” on the line,corresponding to s = 0, can be chosen arbitrarily on the line of course. Forexample, the point (1, 2, 3) could be replaced by (−1, 3, 4) which is anotherpoint on the line.


21.13 Projection of a Point onto a Line

Let x = x + sa be a line in R3 through x with direction a ∈ R

3. We seekthe projection Pb of a given point b ∈ R

3 onto the line, that is we seekPb ∈ R

3 with the property that (i) Pb = x + sa for some s ∈ R, and (ii)(b−Pb) ·a = 0. Note that we here view b to be a point rather than a vector.Inserting (i) into (ii) gives the following equation in s: (b− x− sa) · a = 0,from which we conclude that s = b·a−x·a

|a|2 , and thus

Pb = x+b · a− x · a

|a|2 a . (21.20)

If x = 0, that is the line passes through the origin, then Pb = b·a|a|2a in

conformity with the corresponding formula (20.9) in R2.

21.14 Planes

Let a1 and a2 be two given non-zero non-parallel vectors in R3, that is

a1 × a2 = 0. The points x in R3 that can be expressed as

x = s1a1 + s2a2, (21.21)

where s1 and s2 vary over R, form a plane in R3 through the origin that

is spanned by the two vectors a1 and a2. The points x in the plane areall the linear combinations x = s1a1 + s2a2 of the vectors a1 and a2 withcoefficients s1 and s2 varying over R, see Fig. 21.7. The vector a1 × a2 isorthogonal to both a1 and a2 and therefore to all vectors x in the plane.Thus, the non-zero vector n = a1 × a2 is a normal to the plane. The pointsx in the plane are characterized by the orthogonality relation

n · x = 0. (21.22)

We may thus describe the points x in the plane by the representation(21.21) or the equation (21.22). Note that (21.21) is a vector equationcorresponding to 3 scalar equations, while (21.22) is a scalar equation.Eliminating the parameters s1 and s2 in the system (21.21), we obtain thescalar equation (21.22).

Let x be a given point in R3. The points x in R

3 that can be expressed as

x = x+ s1a1 + s2a2, (21.23)

where s1 and s2 vary over R, form a plane in R3 through the point x that

is parallel to the corresponding plane through the origin considered above,see Fig. 21.8.

If x = x+s1a1 +s2a2 then n ·x = n · x, because n ·ai = 0, i = 1, 2. Thus,we can describe the points x of the form x = x+ s1a1 + s2a2 alternativelyas the vectors x satisfying

n · x = n · x. (21.24)

21.14 Planes 325

x1

x2

x3

a1

a2

n = a1×a2

x = s1a1 + s2a2

Fig. 21.7. Plane through the origin spanned by a1 and a2, and with normaln = a1 × a2

Again, we obtain the scalar equation (21.24) if we eliminate the parameterss1 and s2 in the system (21.23).

We summarize:

Theorem 21.2 A plane in R3 through a point x ∈ R

3 with normal ncan be expressed as the set of x ∈ R

3 of the form x = x + s1a1 + s2a2

with s1 and s2 varying over R, where a1 and a2 are two vectors satisfyingn = a1 × a2 = 0. Alternatively, the plane can be described as the set ofx ∈ R such that n · x = d, where d = n · x.

Example 21.4. Consider the plane x1 + 2x2 + 3x3 = 4, that is the plane(1, 2, 3) · (x1, x2, x3) = 4 with normal n = (1, 2, 3). To write the points xin this plane on the form x = x+ s1a1 + s2a2, we first choose a point x inthe plane, for example, x = (2, 1, 0) noting that n · x = 4. We next choose

x1

x2

x3

a1

a2

n = a1×a2

x = s1a1 + s2a2

x x = x+ s1a1 + s2a2

Fig. 21.8. Plane through x with normal n defined by n · x = d = n · x


two non-parallel vectors a1 and a2 such that n · a1 = 0 and n · a2 = 0,for example a1 = (−2, 1, 0) and a2 = (−3, 0, 1). Alternatively, we chooseone vector a1 satisfying n · a1 = 0 and set a2 = n × a1, which is a vectororthogonal to both n and a1. To find a vector a1 satisfying a1 · n = 0,we may choose an arbitrary non-zero vector m non-parallel to n and seta1 = m× n, for example m = (0, 0, 1) giving a1 = (−2, 1, 0).

Conversely, given the plane x = (2, 1, 0)+ s1(−2, 1, 0)+ s2(−3, 0, 1), that isx = x+s1a1+s2a2 with x = (2, 1, 0), a1 = (−2, 1, 0) and a2 = (−3, 0, 1), weobtain the equation x1 +2x2 +3x3 = 4 simply by computing n = a1×a2 =(1, 2, 3) and n·x = (1, 2, 3)·(2, 1, 0) = 4, from which we obtain the followingequation for the plane: n · x = (1, 2, 3) · (x1, x2, x3) = x1 + 2x2 + 3x3 =n · x = 4.

Example 21.5. Consider the real-valued function z = f(x, y) = ax+ by+ cof two real variables x and y, where a, b and c are real numbers. Settingx1 = x, x2 = y and x3 = z, we can express the graph of z = f(x, y) as theplane ax1 + bx2 − x3 = −c in R

3 with normal (a, b,−1).

21.15 The Intersection of a Line and a Plane

We seek the intersection of a line x = x+sa and a plane n ·x = d that is theset of points x belonging to both the line and the plane, where x, a, n andd are given. Inserting x = x+ sa into n · x = d, we obtain n · (x+ sa) = d,that is n · x+ s n · a = d. This yields s = (d− n · x)/(n · a) if n · a = 0, andwe find a unique point of intersection

x = x+ (d− n · x)/(n · a) a. (21.25)

This formula has no meaning if n · a = 0, that is if the line is parallel tothe plane. In this case, there is no intersection point unless x happens tobe a point in the plane and then the whole line is part of the plane.

Example 21.6. The intersection of the plane x1 +2x2 +x3 = 5 and the linex = (1, 0, 0) + s(1, 1, 1) is found by solving the equation 1 + s+ 2s+ s = 5giving s = 1 and thus the point of intersection is (2, 1, 1).The plane x1 +2x2 + x3 = 5 and the line x = (1, 0, 0) + s(2,−1, 0) has no point of inter-section, because the equation 1 + 2s − 2s = 5 has no solution. If insteadwe consider the plane x1 + 2x2 + x3 = 1, we find that the entire linex = (1, 0, 0) + s(2,−1, 0) lies in the plane, because 1 + 2s− 2s = 1 for allreal s.

21.16 Two Intersecting Planes Determine a Line 327

21.16 Two Intersecting Planes Determine a Line

Let n1 = (n11, n12, n13) and n2 = (n21, n22, n23) be two vectors in R3 and

d1 and d2 two real numbers. The set of points x ∈ R3 that lie in both the

plane n1 · x = d1 and n2 · x = d2 satisfy the system of two equations

n1 · x = d1,

n2 · x = d2.(21.26)

Intuition indicates that generally the points of intersection of the two planesshould form a line in R

3. Can we determine the formula of this line in theform x = x+ sa with suitable vectors a and x in R

3 and s varying over R?Assuming first that d1 = d2 = 0, we seek a formula for the set of x suchthat n1 · x = 0 and n2 · x = 0, that is the set of x that are orthogonal toboth n1 and n2. This leads to a = n1 × n2 and expressing the solution xof the equations n1 · x = 0 and n2 · x = 0 as x = s n1 × n2 with s ∈ R. Ofcourse it is natural to add in the assumption that n1 ×n2 = 0, that is thatthe two normals n1 and n2 are not parallel so that the two planes are notparallel.

Next, suppose that (d1, d2) = (0, 0). We see that if we can find onevector x such that n1 · x = d1 and n2 · x = d2, then we can write thesolution x of (21.26) as

x = x+ s n1 × n2, s ∈ R. (21.27)

We now need to verify that we can indeed find x satisfying n1 · x = d1 andn2 · x = d2. That is, we need to find x ∈ R

3 satisfying the following systemof two equations,

n11x1 + n12x2 + n13x3 = d1,

n21x1 + n22x2 + n23x3 = d2.

Since n1 × n2 = 0, some component of n1 × n2 must be nonzero. If forexample n11n22 − n12n21 = 0, corresponding to the third component ofn1 × n2 being non-zero, then we may choose x3 = 0. Then recalling therole of the condition n11n22 − n12n21 = 0 for a 2 × 2-system, we maysolve uniquely for x1 and x2 in terms of d1 and d2 to get a desired x. Theargument is similar in case the second or first component of n1×n2 happensto be non-zero.

We summarize:

Theorem 21.3 Two non-parallel planes n1 · x = d1 and n2 · x = d2 withnormals n1 and n2 satisfying n1 × n2 = 0, intersect along a straight linewith direction n1 × n2.

Example 21.7. The intersection of the two planes x1 + x2 + x3 = 2 and3x1 + 2x2 − x3 = 1 is given by x = x + sa with a = (1, 1, 1) × (3, 2,−1) =(−3, 4,−1) and x = (0, 1, 1).


21.17 Projection of a Point onto a Plane

Let n · x = d be a plane in R3 with normal n and b a point in R

3. We seekthe projection Pb of b onto the plane n · x = d. It is natural to ask Pb tosatisfy the following two conditions, see Fig. 21.9,

n · Pb = d, that is Pb is a point in the plane,b− Pb is parallel to the normal n, that is b− Pb = λn for some λ ∈ R.

We conclude that Pb = b − λn and the equation n · Pb = d thus givesn · (b− λn) = d. So, λ = b·n−d

|n|2 and thus

Pb = b− b · n− d

|n|2 n. (21.28)

If d = 0 so the plane n · x = d = 0 passes through the origin, then

Pb = b− b · n|n|2 n. (21.29)

If the plane is given in the form x = x+ s1a1 + s2a2 with a1 and a2 twogiven non-parallel vectors in R

3, then we may alternatively compute theprojection Pb of a point b onto the plane by seeking real numbers x1 andx2 so that Pb = x+ x1a1 + x2a2 and (b−Pb) · a1 = (b−Pb) · a2 = 0. Thisgives the system of equations

x1a1 · a1 + x2a2 · a1 = b · a1 − x · a1,

x1a1 · a2 + x2a2 · a2 = b · a2 − x · a2

(21.30)

in the two unknowns x1 and x2. To see that this system has a uniquesolution, we need to verify that a11a22 − a12a21 = 0, where a11 = a1 · a1,a22 = a2 ·a2, a21 = a2 ·a1 and a12 = a1 ·a2. This follows from the fact thata1 and a2 are non-parallel, see Problem 21.24.

Example 21.8. The projection Pb of the point b = (2, 2, 3) onto the planedefined by x1+x2+x3 = 1 is given by Pb = (2, 2, 3)− 7−1

3 (1, 1, 1) = (0, 0, 1).

Example 21.9. The projection Pb of the point b = (2, 2, 3) onto the planex = (1, 0, 0) + s1(1, 1, 1) + s2(1, 2, 3) with normal n = (1, 1, 1) × (1, 2, 3) =(1,−2, 1) is given by Pb = (2, 2, 3) − (2,2,3)·(1,−2,1)

6 (1,−2, 1) = (2, 2, 3) −16 (1,−2, 1) = 1

6 (11, 14, 17).

21.18 Distance from a Point to a Plane

We say that the distance from a point b to a plane n · x = d is equal to|b − Pb|, where Pb is the projection of b onto the plane. According to the

21.19 Rotation Around a Given Vector 329

previous section, we have

|b− Pb| =|b · n− d|

|n| .

Note that this distance is equal to the shortest distance between b and anypoint in the plane, see Fig. 21.9 and Problem 21.22.

x1

x2

x3

n

b

Pb

b−Pb

Fig. 21.9. Projection of a point/vector onto a plane

Example 21.10. The distance from the point (2, 2, 3) to the plane x1 +x2 +x3 = 1 is equal to |(2,2,3)·(1,1,1)−1|√

3= 2

√3.

21.19 Rotation Around a Given Vector

We now consider a more difficult problem. Let a ∈ R3 be a given vector

and θ ∈ R a given angle. We seek the transformation R : R3 → R

3 corre-sponding to rotation of an angle θ around the vector a. Recalling Section20.21, the result Rx = R(x) should satisfy the following properties,

(i) |Rx−Px| = |x−Px|, (ii) (Rx−Px) · (x−Px) = cos(θ)|x−Px|2.

where Px = Pa(x) is the projection of x onto a, see Fig. 21.10. We writeRx−Px as Rx−Px = α(x−Px)+β a× (x−Px) for real numbers α andβ, noting that Rx− Px is orthogonal to a and a× (x− Px) is orthogonalto both a and (x− Px). Taking the scalar product with (x − Px), we use(ii) to get α = cos(θ) and then use (i) to find β = sin(θ)

|a| with a suitableorientation. Thus, we may express Rx in terms of the projection Px as

Rx = Px+ cos(θ)(x − Px) +sin(θ)|a| a× (x− Px). (21.31)


x1

x2

x3

x PxRx

θ

Fig. 21.10. Rotation around a = (0, 0, 1) a given angle θ

21.20 Lines and Planes Through the Origin AreSubspaces

Lines and planes in R3 through the origin are examples of subspaces of R

3.The characteristic feature of a subspace is that the operations of vectoraddition and scalar multiplication does not lead outside the subspace. Forexample if x and y are two vectors in the plane through the origin withnormal n satisfying n·x = 0 and n·y = 0, then n·(x+y) = 0 and n·(λx) = 0for any λ ∈ R, so the vectors x+ y and λx also belong to the plane. On theother hand, if x and y belong to a plane not passing through the origin withnormal n, so that n · x = d and n · y = d with d a nonzero constant, thenn · (x+ y) = 2d = d, and thus x+ y does not lie in the plane. We concludethat lines and planes through the origin are subspaces of R

3, but lines andplanes not passing through the origin are not subspaces. The concept ofsubspace is very basic and we will meet this concept many times below.

We note that the equation n · x = 0 defines a line in R2 and a plane

in R3. The equation n · x = 0 imposes a constraint on x that reduces the

dimension by one, so in R2 we get a line and in R

3 we get a plane.

21.21 Systems of 3 Linear Equationsin 3 Unknowns

Consider now the following system of 3 linear equations in 3 unknowns x1,x2 and x3 (as did Leibniz already 1683):

a11x1 + a12x2 + a13x3 = b1,a21x1 + a22x2 + a23x3 = b2,a31x1 + a32x2 + a33x3 = b3,

(21.32)

21.21 Systems of 3 Linear Equations in 3 Unknowns 331

with coefficients aij and right hand side bi, i, j = 1, 2, 3. We can write thissystem as the following vector equation in R

3:

x1a1 + x2a2 + x3a3 = b, (21.33)

where a1 = (a11, a21, a31), a2 = (a12, a22, a32), a3 = (a13, a23, a33) andb = (b1, b2, b3) are vectors in R

3, representing the given coefficients and theright hand side.

When is the system (21.32) uniquely solvable in x = (x1, x2, x3) fora given right hand side b? We shall see that the condition to guaranteeunique solvability is

V (a1, a2, a3) = |a1 · a2 × a3| = 0, (21.34)

stating that the volume spanned by a1, a2 and a3 is not zero.We now argue that the condition a1 ·a2×a3 = 0 is the right condition to

guarantee the unique solvability of (21.32). We can do this by mimickingwhat we did in the case of a 2×2 system: Taking the scalar product of bothsides of the vector equation x1a1 +x2a2 +x3a3 = b by successively a2 ×a3,a3 × a1, and a2 × a3, we get the following solution formula (recalling thata1 · a2 × a3 = a2 · a3 × a1 = a3 · a1 × a2):

x1 =b · a2 × a3

a1 · a2 × a3,

x2 =b · a3 × a1

a2 · a3 × a1=

a1 · b× a3

a1 · a2 × a3,

x3 =b · a1 × a2

a3 · a1 × a2=

a1 · a2 × b

a1 · a2 × a3,

(21.35)

where we used the facts that ai · aj × ak = 0 if any two of the indicesi, j and k are equal. The solution formula (21.35) shows that the system(21.32) has a unique solution if a1 · a2 × a3 = 0.

Note the pattern of the solution formula (21.35), involving the commondenominator a1 · a2 × a3 and the numerator for xi is obtained by replacingai by b. The solution formula (21.35) is also called Cramer’s rule. We haveproved the following basic result:

Theorem 21.4 If a1 · a2 × a3 = 0, then the system of equations (21.32)or the equivalent vector-equation (21.33) has a unique solution given byCramer’s rule (21.35).

We repeat: V (a1, a2, a3) = a1 · a2 × a3 = 0 means that the three vectorsa1, a2 and a3 span a non-zero volume and thus point in three differentdirections (such that the plane spanned by any two of the vectors does notcontain the third vector). If V (a1, a2, a3) = 0, then we say that the set ofthree vectors a,a2, a3 is linearly independent, or that the three vectorsa1, a2 and a3 are linearly independent.


21.22 Solving a 3 × 3-System by GaussianElimination

We now describe an alternative to Cramer’s rule for computing the solu-tion to the 3 × 3-system of equations (21.32), using the famous method ofGaussian elimination. Assuming a11 = 0, we subtract the first equationmultiplied by a21 from the second equation multiplied by a11, and likewisesubtract the first equation multiplied by by a31 from the third equationmultiplied by a11, to rewrite the system (21.32) in the form

a11x1 + a12x2 + a13x3 = b1,(a22a11 − a21a12)x2 + (a23a11 − a21a13)x3 = a11b2 − a21b1,(a32a11 − a31a12)x2 + (a33a11 − a31a13)x3 = a11b3 − a31b1,

(21.36)

where the unknown x1 has been eliminiated in the second and third equa-tions. This system has the form

a11x1+ a12x2 + a13x3 = b1,

a22x2 + a23x3 = b2,

a32x2 + a33x3 = b3,

(21.37)

with modified coefficients aij and bi. We now proceed in the same wayconsidering the 2 × 2-system in (x2, x3), and bring the system to the finaltriangular form

a11x1+ a12x2 + a13x3 = b1,

a22x2 + a23x3 = b2,

a33x3 = b3,

(21.38)

with modified coefficients in the last equation. We can now solve the thirdequation for x3, then insert the resulting value of x3 into the second equa-tion and solve for x2 and finally insert x3 and x2 into the first equation tosolve for x1.

Example 21.11. We give an example of Gaussian elimination: Consider thesystem

x1 +2x2 + 3x3 = 6,2x1 +3x2 + 4x3 = 9,3x1 +4x2 + 6x3 = 13.

Subtracting the first equation multiplied by 2 from the second and the firstequation multiplied by 3 from the third equation, we get the system

x1 +2x2 + 3x3 = 6,−x2 − 2x3 = −3,−2x2 − 3x3 = −5.

21.23 3 × 3 Matrices: Sum, Product and Transpose 333

Subtracting now the second equation multiplied by 2 from the third equa-tion, we get

x1 + 2x2+ 3x3 = 6,−x2− 2x3 = −3,

x3 = 1,

from which we find x3 = 1 and then from the second equation x2 = 1 andfinally from the first equation x1 = 1.

21.23 3 × 3 Matrices: Sum, Product and Transpose

We can directly generalize the notion of a 2 × 2 matrix as follows: We saythat the quadratic array

a11 a12 a13

a21 a22 a23

a31 a32 a33

is a 3×3 matrix A = (aij) with elements aij , i, j = 1, 2, 3, and with i beingthe row index and j the column index.

Of course, we can also generalize the notion of a 2-row (or 1× 2 matrix)and a 2-column vector (or 2×1 matrix). Each row of A, the first row being(a11 a12 a13), can thus be viewed as a 3-row vector (or 1 × 3 matrix), andeach column of A, the first column being

a11

a21

a31

as a 3-column vector (or 3 × 1 matrix). We can thus view a 3 × 3 matrixto consist of three 3-row vectors or three 3-column vectors.

Let A = (aij) and B = (bij) be two 3 × 3 matrices. We define the sumC = A + B to be the matrix C = (cij) with elements cij = aij + bij fori, j = 1, 2, 3. In other words, we add two matrices element by element.

Given a 3 × 3 matrix A = (aij) and a real number λ, we define thematrix C = λA as the matrix with elements cij = λaij . In other words, allelements aij are multiplied by λ.

Given two 3× 3 matrices A = (aij) and B = (bij), we define the productC = AB as the 3 × 3 matrix with elements cij given by

cij =3∑

k=1

aikbkj i, j = 1, 2, 3. (21.39)

Matrix multiplication is associative so that (AB)C = A(BC) for matricesA, B and C, see Problem 21.10. The matrix product is however not commu-tative in general, that is there are matrices A and B such that AB = BA,see Problem 21.11.


Given a 3× 3 matrix A = (aij), we define the transpose of A denoted byA as the matrix C = A with elements cij = aji, i, j = 1, 2, 3. In otherwords, the rows of A are the columns of A and vice versa. By definition(A) = A. Transposing twice brings back the original matrix.

We can directly check the validity of the following rules for computingwith the transpose:

(A+B) = A +B, (λA) = λA,

(AB) = BA.

Similarly, the transpose of a 3-column vector is the 3-row vector with thesame elements. Vice versa, if we consider the 3 × 1 matrix

x =

x1

x2

x3

to be a 3-column vector, then the transpose x is the corresponding 3-rowvector (x1 x2 x3). We define the product of a 1 × 3 matrix (3-row vector)x with a 3 × 1 matrix (3-column vector) y in the natural way as follows:

xy =(x1 x2 x3

)

y1y2y3

= x1y1 + x2y2 + x3y3 = x · y,

where we noted the connection to the scalar product of 3-vectors. We thusmake the fundamental observation that multiplication of a 1 × 3 matrix(3-row vector) with a 3× 1 matrix (3-column vector) is the same as scalarmultiplication of the corresponding 3-vectors. We can then express theelement cij of the product C = AB according to (21.39) as the scalarproduct of row i of A with column j of B,

cij =(ai1 ai2 ai3

)

b1j

b2j

b3j

=3∑

k=1

aikbkj .

We note that|x|2 = x · x = xx,

where we interpret x both as an ordered triple and as a 3-column vector.The 3 × 3 matrix

1 0 00 1 00 0 1

is called the 3 × 3 identity matrix and is denoted by I. We have IA = Aand AI = A for any 3 × 3 matrix A.

21.24 Ways of Viewing a System of Linear Equations 335

If A = (aij) is a 3×3 matrix and x = (xi) is a 3×1 matrix with elementsxi, then the product Ax is the 3 × 1 matrix with elements

3∑

k=1

aikxk i = 1, 2, 3.

The linear system of equations

a11x1 + a12x2 + a13x3 = b1,a21x1 + a22x2 + a23x3 = b2,a31x1 + a32x2 + a33x3 = b3,

can be written in matrix form as

a11 a12 a13

a21 a22 a23

a31 a32 a33

x1

x2

x3

=

b1b2b3

,

that isAx = b,

with A = (aij) and x = (xi) and b = (bi).

21.24 Ways of Viewing a Systemof Linear Equations

We may view a 3 × 3 matrix A = (aij)

a11 a12 a13

a21 a22 a23

a31 a32 a33

as being formed by three column-vectors a1 = (a11, a21, a31), a2 = (a12, a22,a32), a3 = (a13, a23, a33), or by three row-vectors a1 = (a11, a12, a13), a2 =(a21, a22, a23), a3 = (a31, a32, a33). Accordingly, we may view the system ofequations

a11 a12 a13

a21 a22 a23

a31 a32 a33

x1

x2

x3

=

b1b2b3

,

as a vector equation in the column vectors:

x1a1 + x2a2 + x3a3 = b, (21.40)

or as a system of 3 scalar equations:

a1 · x = b1a2 · x = b2a3 · x = b3,

(21.41)


where the rows ai may be interpreted as normals to planes. We knowfrom the discussion following (21.34) that (21.40) can be uniquely solved if±V (a1, a2, a3) = a1 · a2 × a3 = 0.

We also know from Theorem 21.3 that if a2 × a3 = 0, then the setof x ∈ R

3 satisfying the two last equations of (21.41) forms a line withdirection a2× a3. If a1 is not orthogonal to a2× a3 then we expect this lineto meet the plane given by the first equation of (21.41) at one point. Thus,if a1 · a2× a3 = 0 then (21.41) should be uniquely solvable. This leads to theconjecture that V (a1, a2, a3) = 0 if and only if V (a1, a2, a3) = 0. In fact,direct inspection from the formula (21.18) gives the more precise result,

Theorem 21.5 If a1, a2 and a3 are the vectors formed by the columns ofa 3 × 3 matrix A, and a1, a2 and a3 are the vectors formed by the rows ofA, then V (a1, a2, a3) = V (a1, a2, a3).

21.25 Non-Singular Matrices

Let A be a 3×3 matrix formed by three 3-column vectors a1, a2, and a3. IfV (a1, a2, a3) = 0 then we say that A is non-singular, and if V (a1, a2, a3) = 0then we say that A is singular. From Section 21.21, we know that if A isnon-singular then the matrix equation Ax = b has a unique solution x foreach b ∈ R

3. Further, if A is singular then the three vectors a1, a2 and a3

lie in the same plane and thus we can express one of the vectors as a linearcombination of the other two. This implies that there is a non-zero vectorx = (x1, x2, x3) such that Ax = 0. We sum up:

Theorem 21.6 If A is a non-singular 3 × 3 matrix then the system ofequations Ax = b is uniquely solvable for any b ∈ R

3. If A is singular thenthe system Ax = 0 has a non-zero solution x.

21.26 The Inverse of a Matrix

Let A be a non-singular 3 × 3 matrix. Let ci ∈ R3 be the solution to the

equation Aci = ei for i = 1, 2, 3, where the ei denote the standard basisvectors here interpreted as 3-column vectors. Let C = (cij) be the matrixwith columns consisting of the vectors ci. We then have AC = I, where Iis the 3 × 3 identity matrix, because Aci = ei. We call C the inverse of Aand write C = A−1 and note that A−1 is a 3 × 3 matrix such that

AA−1 = I, (21.42)

that is multiplication of A by A−1 from the right gives the identity. We nowwant to prove that also A−1A = I, that is that we get the identity also by

21.27 Different Bases 337

multiplying A from the left by A−1. To see this, we first note that A−1 mustbe non-singular, since if A−1 was singular then there would exist a non-zerovector x such that A−1x = 0 and multiplying by A from the left would giveAA−1x = 0, contradicting the fact that by (21.42) AA−1x = Ix = x = 0.Multiplying AA−1 = I with A−1 from the left, we get A−1AA−1 = A−1,from which we conclude that A−1A = I by multiplying from the right withthe inverse of A−1, which we know exists since A−1 is non-singular.

We note that (A−1)−1 = A, which is a restatement of A−1A = I, andthat

(AB)−1 = B−1A−1

since B−1A−1AB = B−1B = I. We summarize:

Theorem 21.7 If A is a 3× 3 non-singular matrix, then the inverse 3× 3matrix A−1 exists, and AA−1 = A−1A = I. Further, (AB)−1 = B−1A−1.

21.27 Different Bases

Let a1, a2, a3 be a linearly independent set of three vectors in R3, that is

assume that V (a1, a2, a3) = 0. Theorem 21.4 implies that any given b ∈ R3

can be uniquely expressed as a linear combination of a1, a2, a3,

b = x1a1 + x2a2 + x3a3, (21.43)

or in matrix language

b1b2b3

=

a11 a12 a13

a21 a22 a23

a31 a32 a33

x1

x2

x3

or b = Ax,

where the columns of the matrix A = (aij) are formed by the vectors a1 =(a11, a21, a31), a2 = (a12, a22, a32), a3 = (a13, a23, a33). Since V (a1, a2, a3) =0, the system of equations Ax = b has a unique solution x ∈ R

3 for anygiven b ∈ R

3, and thus any b ∈ R3 can be expressed uniquely as a linear

combination b = x1a1 + x2a2 + x3a3 of the set of vectors a1, a2, a3 withthe coefficients (x1, x2, x3). This means that a1, a2, a3 is a basis for R

3

and we say that (x1, x2, x3) are the coordinates of b with respect to thebasis a1, a2, a3. The connection between the coordinates (b1, b2, b3) of bin the standard basis and the coordinates x of b in the basis a1, a2, a3 isgiven by Ax = b or x = A−1b.

21.28 Linearly Independent Set of Vectors

We say that a set of three vectors a1, a2, a3 in R3 is linearly independent

if V (a1, a2, a3) = 0. We just saw that a linearly independent set a1, a2, a3of three vectors can be used as a basis in R

3.


If the set a1, a2, a3 is linearly independent then the system Ax = 0 inwhich the columns of the 3× 3 matrix are formed by the coefficients of a1,a2 and a3 has no other solution than x = 0.

Conversely, as a test of linear dependence we can use the following crite-rion: if Ax = 0 implies that x = 0, then a1, a2, a3 is linearly independentand thus V (a1, a2, a3) = 0.

We summarize:

Theorem 21.8 A set a1, a2, a3 of 3 vectors in R3 is linearly independent

and can be used as a basis for R3 if ±V (a1, a2, a3) = a1 ·a2× a3 = 0. A set

a1, a2, a3 of 3 vectors in R3 is linearly independent if and only if Ax = 0

implies that x = 0.

21.29 Orthogonal Matrices

A 3 × 3 matrixQ satisfyingQQ = I is called an orthogonal matrix. An or-thogonal matrix is non-singular withQ−1 = Q and thus alsoQQ = I. Anorthogonal matrix is thus characterized by the relationQQ = QQ = I.

Let qi = (q1i, q2i, q3i) for i = 1, 2, 3, be the column vectors of Q, that isthe row vectors of Q. Stating that QQ = I is the same as stating that

qi · qj = 0 for i = j, and |qi| = 1,

that is the columns of an orthogonal matrix Q are pairwise orthogonal andhave length one.

Example 21.12. The matrix

Q =

cos(θ) − sin(θ) 0sin(θ) cos(θ) 0

0 0 1

, (21.44)

is orthogonal and corresponds to rotation of an angle θ around the x3 axis.

21.30 Linear Transformations Versus Matrices

Let A = (aij) be a 3×3 matrix. The mapping x→ Ax, that is the functiony = f(x) = Ax, is a transformation from R

3 to R3. This transformation is

linear since A(x+y) = Ax+Ay and A(λx) = λAx for λ ∈ R. Thus, a 3×3matrix A generates a linear transformation f : R

3 → R3 with f(x) = Ax.

Conversely to each linear transformation f : R3 → R

3, we can associatea matrix A with coefficients given by

aij = fi(ej)

21.31 The Scalar Product Is Invariant Under Orthogonal Transformations 339

where f(x) = (f1(x), f2(x), f3(x)). The linearity of f(x) implies

f(x) =

f1

3∑

j=1

xjej

, f2

3∑

j=1

xjej

, f3

3∑

j=1

xjej

=

3∑

j=1

f1(ej)xj ,3∑

j=1

f2(ej)xj ,3∑

j=1

f3(ej)xj

=

3∑

j=1

a1jxj ,3∑

j=1

a2jxj ,3∑

j=1

a3jxj

= Ax,

which shows that a linear transformation f : R3 → R

3 can be representedas f(x) = Ax with the matrix A = (aij) with coefficients aij = fi(ej).

Example 21.13. The projection Px = x·a|a|2 a onto a non-zero vector a ∈ R

3

takes the matrix form

Px =

a21

|a|2a1a2|a|2

a1a3|a|2

a2a1|a|2

a22

|a|2a2a3|a|2

a3a1|a|2

a3a2|a|2

a23

|a|2

x1

x2

x3

Example 21.14. The projection Px = x − x·n|n|2n onto a plane n · x = 0

through the origin takes the matrix form

Px =

1 − n21

|n|2 −n1n2|n|2 −n1n3

|n|2

−n2n1|n|2 1 − n2

2|n|2 −n2n3

|n|2

−n3n1|n|2 −n3n2

|n|2 1 − n23

|n|2

x1

x2

x3

Example 21.15. The mirror image of a point x with respect to a planethrough the origin given by (2P −I)x, where Px is the projection of x ontothe plane, takes the matrix form

(2P − I)x =

2 a21

|a|2 − 1 2a1a2|a|2 2a1a3

|a|2

2a2a1|a|2 2 a2

2|a|2 − 1 2a2a3

|a|2

2a3a1|a|2 2a3a2

|a|2 2 a23

|a|2 − 1

x1

x2

x3

21.31 The Scalar Product Is InvariantUnder Orthogonal Transformations

Let Q be the matrix q1, q2, q2 formed by taking the columns to be thebasis vectors qj . We assume that Q is orthogonal, which is the same as as-


suming that q1, q2, q2 is an orthogonal basis, that is the qj are pairwise or-thogonal and have length 1. The coordinates x of a vector x in the standardbasis with respect to the basis q1, q2, q2 are given by x = Q−1x = Qx.We shall now prove that if y = Qy, then

x · y = x · y,

which states that the scalar product is invariant under orthogonal coordi-nate changes. We compute

x · y =(Qx

)·(Qy

)= x ·

(Q)Qy = x · y,

where we used that for any 3 × 3 matrix A = (aij) and x, y ∈ R3

(Ax) · y =3∑

i=1

3∑

j=1

aijxj

yi =3∑

j=1

(3∑

i=1

aijyi

)

xj

=(Ay

)· x = x ·

(Ay

),

(21.45)

with A = Q, and the facts that (Q) = Q and QQ = I.We can now complete the argument about the geometric interpretation

of the scalar product from the beginning of this chapter. Given two non-parallel vectors a and b, we may assume by an orthogonal coordinate trans-formation that a and b belong to the x1 − x2-plane and the geometricinterpretation from Chapter Analytic geometry in R

2 carries over.

21.32 Looking Ahead to Functions f : R3 → R

3

We have met linear transformations f : R3 → R

3 of the form f(x) = Ax,where A is a 3 × 3 matrix. Below we shall meet more general (non-linear)transformations f : R

3 → R3 that assign a vector f(x) = (f1(x), f2(x),

f3(x)) ∈ R3 to each x = (x1, x2, x3) ∈ R

3. For example,

f(x) = f (x1, x2, x3) =(x2x3, x

21 + x3, x

43 + 5

)

with f1(x) = x2x3, f2(x) = x21+x3, f3(x) = x4

3+5. We shall see that we maynaturally extend the concepts of Lipschitz continuity and differentiabilityfor functions f : R → R to functions f : R

3 → R3. For example, we say

that f : R3 → R

3 is Lipschitz continuous on R3 if there is a constant Lf

such that|f(x) − f(y)| ≤ Lf |x− y| for all x, y ∈ R

3.


Chapter 21 Problems

21.1. Show that the norm |a| of the vector a = (a1, a2, a3) is equal to the distancefrom the origin 0 = (0, 0, 0) to the point (a1, a2, a3). Hint: apply PythagorasTheorem twice.

21.2. Which of the following coordinate systems are righthanded?

x1

x1

x1

x2

x2

x2 x3x3

x3

21.3. Indicate the direction of a × b and b × a in Fig. 21.1 if b points in thedirection of the x1-axis. Consider also the same question in Fig. 21.2.

21.4. Given a = (1, 2, 3) and b = (1, 3, 1), compute a× b.

21.5. Compute the volume of the parallelepiped spanned by the three vectors(1, 0, 0), (1, 1, 1) and (−1,−1, 1).

21.6. What is the area of the triangle formed by the three points: (1, 1, 0),(2, 3,−1) and (0, 5, 1)?

21.7. Given b = (1, 3, 1) and a = (1, 1, 1), compute a) the angle between a andb, b) the projection of b onto a, c) a unit vector orthogonal to both a and b.

21.8. Consider a plane passing through the origin with normal n = (1, 1, 1) anda vector a = (1, 2, 3). Which point p in the plane has the shortest distance to a?

21.9. Is it true or not that for any 3 × 3 matrices A, B, and C and number λ(a) A+B = B +A, (b) (A+B) +C = A+ (B +C), (c) λ(A+B) = λA+ λB?

21.10. Prove that for 3 × 3 matrices A, B and C: (AB)C = A(BC). Hint:Use that D = (AB)C has the elements dij =

∑3k=1(

∑3l=1 ailblk)ckj , and do the

summation in a different order.

21.11. Give examples of 3×3-matrices A and B such that AB = BA. Is it difficultto find such examples, that is, is it exceptional or “normal” that AB = BA.

21.12. Prove Theorem 21.5.


21.13. Write down the three matrices corresponding to rotations around the x1,x2 and x3 axis.

21.14. Find the matrix corresponding to a rotation by the angle θ around a givenvector b in R

3.

21.15. Give the matrix corresponding to be mirroring a vector through thex1 − x2-plane.

21.16. Consider a linear transformation that maps two points p1 and p2 in R3

into the points p1, p2, respectively. Show that all points lying on a straight linebetween p1, p2 will be transformed onto a straight line between p1 and p2.

21.17. Consider two straight lines in R3 given by: a + λb and c + µd where

a, b, c, d ∈ R3, λ, µ ∈ R. What is the shortest distance between the two lines?

21.18. Compute the intersection of the two lines given by: (1, 1, 0) + λ(1, 2,−3)and (2, 0,−3) + µ(1, 1,−3). Is it a rule or an exception that such an intersectioncan be found?

21.19. Compute the intersection between two planes passing through the originwith normals n1 = (1, 1, 1), n2 = (2, 3, 1). Compute the intersection of these twoplanes and the x1 − x2 plane.

21.20. Prove that (21.42) implies that the inverse of A−1 exists.

21.21. Consider a plane through a point r with normal n. Determine the re-flection in the plane at r of a light ray entering in a direction parallel to a givenvector a.

x1

x2

x3

r

a

n

b

21.34 Gosta Mittag-Leffler 343

21.22. Show that the distance between a point b and its projection onto a planen · x = d is equal to the shortest distance between b and any point in the plane.Give both a geometric proof based on Pythagoras’ theorem, and an analyticalproof. Hint: For x in the plane write |b− x|2 = |b− Pb+ (Pb− x)|2 = (b− Pb+(Pb− x), b− Pb+ (Pb− x)) and expand using that (b− Pb, P b− x) = 0).

21.23. Express (21.31) i matrix form.

21.24. Complete the proof of the claim that (21.30) is uniquely solvable.

21.34 Gosta Mittag-LefflerTSf

The Swedish mentor of Sonya Kovalevskayawas Gosta Mittag-Leffler (1846–1927), famous Swedish mathematician and founder of the prestigous jour-nal Acta Mathematica, see Fig. 21.34. The huge mansion of Mittag-Leffler,beautifully situated in in Djursholm, just outside Stockholm, with an im-pressive library, now houses Institut Mittag-Leffler bringing mathemati-cians from all over the world together for work-shops on different themesof mathematics and its applications. Mittag-Leffler made important con-tributions to the theory of functions of a complex variable, see ChapterAnalytic functions below.

Fig. 21.11. Gosta Mittag-Leffler, Swedish mathematician and founder of ActaMathematica: “The mathematician’s best work is art, a high perfect art, as daringas the most secret dreams of imagination, clear and limpid. Mathematical geniusand artistic genius touch one another”

TSf Please confirm position of Sect. 21.34 only after Problems.


22Complex Numbers

The imaginary number is a fine and wonderful recourse of the divinespirit, almost an amphibian between being and not being. (Leibniz)

The composition of vast books is a laborious and impoverishing ex-travagance. To go on for five hundred pages developing an idea whoseperfect oral exposition is possible in a few minutes! A better courseof procedure is to pretend that these books already exist, and thento offer a resume, a commentary. . . More reasonable, more inept,more indolent, I have preferred to write notes upon imaginary books.(Borges, 1941)

22.1 Introduction

In this chapter, we introduce the set of complex numbers C. A complexnumber, typically denoted by z, is an ordered pair z = (x, y) of real numbersx and y, where x represents the real part of z and y the imaginary part of z.We may thus identify C with R

2 and we often refer to C as the complexplane. We further identify the set of complex numbers with zero imaginarypart with the set of real numbers and write (x, 0) = x, viewing the realline R as the x-axis in the complex plane C. We may thus view C as anextension of R. Similarly, we identify the set of complex numbers withzero real part with the y-axis, which we also refer to as the set of purelyimaginary numbers. The complex number (0, 1) is given a special namei = (0, 1), and we refer to i as the imaginary unit.

346 22. Complex Numbers

i = (0, 1)

1 = (1, 0)

z = (x, y)

x = Re z

y = Im z

Fig. 22.1. The complex plane C = R2

The operation of addition in C coincides with the operation of vectoraddition in R

2. The new aspect of C is the operation of multiplicationof complex numbers, which differs from scalar and vector multiplicationin R

2.The motivation to introduce complex numbers comes from considering

for example the polynomial equation x2 = −1, which has no root if x isrestricted to be a real number. There is no real number x such that x2 = −1since x2 ≥ 0 for x ∈ R. We shall see that if we allow x to be a complexnumber, the equation x2 = −1 becomes solvable and the two roots arex = ±i. More generally, the Fundamental Theorem of Algebra states thatany polynomial equation with real or complex coefficients has a root in theset of complex numbers. In fact, it follows that a polynomial equation ofdegree n has exactly n roots.

Introducing the complex numbers finishes the extension process from nat-ural numbers over integers and rational numbers to real numbers, wherein each case a new class of polynomial equations could be solved. Furtherextensions beyond complex numbers to for example quarternions consistingof quadruples of real numbers were made in the 19th century by Hamilton,but the initial enthusiasm over these constructs faded since no fully con-vincing applications were found. The complex numbers, on the other hand,have turned out to be very useful.

22.2 Addition and Multiplication

We define the sum (a, b) + (c, d) of two complex numbers (a, b) and (c, d),obtained through the operation of addition denoted by +, as follows:

(a, b) + (c, d) = (a+ c, b+ d), (22.1)

22.3 The Triangle Inequality 347

that is we add the real parts and imaginary parts separately. We see thataddition of two complex numbers corresponds to vector addition of the cor-responding ordered pairs or vectors in R

2. Of course, we define subtractionsimilarly: (a, b) − (c, d) = (a− c, b− d).

We define the product (a, b)(c, d) of two complex numbers (a, b) and (c, d),obtained through the operation of multiplication, as follows:

(a, b)(c, d) = (ac− bd, ad+ bc). (22.2)

We can readily check using rules for operating with real numbers that theoperations of addition and multiplication of complex numbers obey thecommutative, associative and distributive rules valid for real numbers.

If z = (x, y) is a complex number, we can write

z = (x, y) = (x, 0) + (0, y) = (x, 0) + (0, 1)(y, 0) = x+ iy, (22.3)

referring to the identification of complex numbers of the form (x, 0) with x,(and similarly (y, 0) with y of course) and the notation i = (0, 1) introducedabove. We refer to x is the real part of z and y as the imaginary part of z,writing x = Re z and y = Im z, that is

z = Re z + i Im z = (Re z, Im z). (22.4)

We note in particular that

i2 = i i = (0, 1)(0, 1) = (−1, 0) = −(1, 0) = −1, (22.5)

and thus z = i solves the equation z2 + 1 = 0. Similarly, (−i)2 = −1, andthus the equation z2 + 1 = 0 has the two roots z = ±i.

The rule (22.2) for multiplication of two complex numbers (a, b) and(c, d), can be retrieved using that i2 = −1 (and taking the distributive lawfor granted):

(a, b)(c, d) = (a+ ib)(c+ id) = ac+ i2bd+ i(ad+ bc) = (ac− bd, ad+ bc).

We define the modulus or absolute value |z| of a complex number z =(x, y) = x+ iy, by

|z| = (x2 + y2)1/2, (22.6)that is, |z| is simply the length or norm of the corresponding vector (x, y) ∈R

2. We note that if z = x+ iy, then in particular

|x| = |Re z| ≤ |z|, |y| = |Im z| ≤ |z|. (22.7)

22.3 The Triangle Inequality

If z1 and z2 are two complex numbers, then

|z1 + z2| ≤ |z1| + |z2|. (22.8)

This is the triangle inequality for complex numbers, which follows directlyfrom the triangle inequality in R

2.


22.4 Open Domains

We extend the notion of an open domain in R2 to C in the natural way.

We say that a domain Ω in C is open if the corresponding domain in R2 is

open, that is for each z0 ∈ Ω there is a positive number r0 such that thecomplex numbers z with |z − z0| < r also belong to Ω. For example, theset Ω = z ∈ C : |z| < 1, is open.

22.5 Polar Representation of Complex Numbers

Using polar coordinates in R2, we can express a complex number as follows

z = (x, y) = r(cos(θ), sin(θ)) = r(cos(θ) + i sin(θ)), (22.9)

where r = |z| is the modulus of z and θ = arg z is the argument of z, andwe also used (22.3). We usually assume that θ ∈ [0, 2π), but by periodicitywe may replace θ by θ + 2πn with n = ±1,±2, . . . ,. Choosing θ ∈ [0, 2π),we obtain the principal argument of z, which we denote by Arg z.

x = Re z

y = Im z

θ

r

x = r cos(θ)

y = r sin(θ)z = (x, y) = r (cos(θ), sin(θ))

Fig. 22.2. Polar representation of a complex number

Example 22.1. The polar representation of the complex number z =(1,

√3) = 1 + i

√3 is z = 2(cos(π

3 ), sin(π3 )), or z = 2(cos(60), sin(60)).

22.6 Geometrical Interpretation of Multiplication

To find the operation on vectors in R2 corresponding to multiplication of

complex numbers, it is convenient to use polar coordinates,

z = (x, y) = r(cos(θ), sin(θ)),

22.7 Complex Conjugation 349

where r = |z| and θ = Arg z. Letting ζ = (ξ, η) = ρ(cos(ϕ), sin(ϕ)) be an-other complex number expressed using polar coordinates, the basic trigono-metric formulas from the Chapter Pythagoras and Euclid imply

zζ = r(cos(θ), sin(θ)) ρ(cos(ϕ), sin(ϕ))= rρ(cos(θ) cos(ϕ) − sin(θ) sin(ϕ), cos(θ) sin(ϕ) + sin(θ) cos(ϕ))= rρ(cos(θ + ϕ), sin(θ + ϕ)).

We conclude that multiplying z = r(cos(θ), sin(θ)) by ζ = ρ(cos(ϕ), sin(ϕ))corresponds to rotating the vector z the angle ϕ = Arg ζ, and changing itsmodulus by the factor ρ = |ζ|. In other words, we have

arg zζ = Arg z + Arg ζ, |zζ| = |z||ζ|. (22.10)

1x = Re z

y = Im z

z

ζ

z ζ

|z ζ| = |z| |ζ|Arg z ζ = Arg z + Arg ζ

Fig. 22.3. Geometrical interpretation of multiplication of a complex numbers

Example 22.2. Multiplication by i corresponds to rotation counter-clock-wise π

2 , or 90.

22.7 Complex Conjugation

If z = x+ iy is a complex number with x and y real, we define the complexconjugate z of z as

z = x− iy.

We see that z is real if and only if z = z and that z is purely imaginary,that is Re z = 0, if and only z = −z.

Identifying C with R2, we see that complex conjugation corresponds

to reflection in the real axis. We also note the following relations, easily


verified,

|z|2 = zz, Re z =12(z + z), Im z =

12i

(z − z). (22.11)

22.8 Division

We extend the operation of division (denoted by /) of real numbers todivision of complex numbers by defining for w, u ∈ C with u = 0,

z = w/u =w

uif and only if uz = w.

To compute w/u for given w, u ∈ C with u = 0, we proceed as follows:

w/u =w

u=wu

uu=wu

|u|2 .

Example 22.3. We have

1 + i

2 + i=

(1 + i)(2 − i)5

=35

+ i15.

Note that we consider complex numbers as scalars although they havea lot in common with vectors in R

2. The main reason for this is that . . . .

22.9 The Fundamental Theorem of Algebra

Consider a polynomial equation p(z) = 0, where p(z) = a0+a1z+. . .+anzn

is a polynomial in z of degree n with complex coefficients a0, . . . , an. TheFundamental Theorem of Algebra states that the equation p(z) has at leastone complex root z1 satisfying p(z1) = 0. By the factorization algorithm,it follows that p(z) can be factored into

p(z) = (z − z1)p1(z),

where p1(z) is a polynomial of degree at most n−1. Indeed, the factorizationalgorithm from the Chapter Combinations of functions (Section 11.4) showsthat

p(z) = (z − z1)p1(z) + c,

where c is a constant. Setting z = z1, it follows that c = 0. Repeating theargument, we find that p(z) can be factored into

p(z) = c(z − z1) . . . (z − zn),

where z1, . . . , zn are the (complex valued in general) roots of p(z) = 0.

22.10 Roots 351

22.10 Roots

Consider the equation in w ∈ C

wn = z,

where n = 1, 2, . . . is a natural number and z ∈ C is given. Using polar co-ordinates with z = |z|(cos(θ), sin(θ)) ∈ C and w = |w|(cos(ϕ), sin(ϕ)) ∈ C,the equation wn = z takes the form

|w|n(cos(nϕ), sin(nϕ)) = |z|(cos(θ), sin(θ))

from which it follows that

|w| = |z| 1n , ϕ =

θ

n+ 2π

k

n,

where k = 0, . . . , n − 1. We conclude that the equation wn = z has ndistinct roots on the circle |w| = |z| 1

n . In particular, the equation w2 = −1has the two roots w = ±i. The n roots of the equation wn = 1 are calledthe n roots of unity.

11

z zw1

w1

w2

w2

w3

x = Re zx = Re z

y = Im zy = Im z

θ/2 θ/2 θ/3

Fig. 22.4. The “square” and “cubic” roots of z

22.11 Solving a Quadratic Equationw2 + 2bw + c = 0

Consider the quadratic equation for w ∈ C,

w2 + 2bw + c = 0,

where b, c ∈ C. Completing the square, we get

(w + b)2 = b2 − c.


If b2 − c ≥ 0 thenw = −b±

√b2 − c,

while if b2 − c < 0 then

w = −b± i√c− b2.

Chapter 22 Problems

22.1. Show that (a) 1i

= −i, (b) i4 = 1.

22.2. Find (a) Re 11+i

, (b) Im 3+4i7−i , (c) Im z

z.

22.3. Let z1 = 4 − 5i and z2 = 2 + 3i. Find in the form z = x+ iy (a) z1z2, (b)z1z2

, (c) z1z1+z2

.

22.4. Show that the set of complex numbers z satisfying an equation of the form|z − z0| = r, where z0 ∈ C is given and r > 0, is a circle in the complex planewith center z0 and radius r.

22.5. Represent in polar form (a) 1 + i, (b) 1+i1−i , (c) 2+3i

5+4i.

22.6. Solve the equations (a) z2 = i, (b) z8 = 1, (c) z2 + z + 1 = −i, (d)z4 − 3(1 + 2i)z2 + 6i = 0.

22.7. Determine the sets in the complex plane represented by (a) | z+iz−i | = 1, (b)

Im z2 = 2, (c) |Arg z| ≤ π4.

22.8. Express z/w in polar coordinates in terms of the polar coordinates of zand w.

22.9. Describe in geometrical terms the mappings f : C → C given by (a)

f(z) = az + b, with a, b ∈ C, (b) f(z) = z2, (c) f(z) = z12 .

23The Derivative

I’ll teach you differences. (Shakespeare: King Lear)

An object with zero velocity will not change position. (Einstein)

. . . and therefore I offer this work as the mathematical principlesof philosophy, for the whole burden in philosophy seems to consistin this: from the phenomena of motions to investigate the forces ofnature, and then from these forces to demonstrate the other phe-nomena. . . (Galileo)

23.1 Rates of Change

Life is change. The newborn changes every day and acquires new skills,the teen-ager develops into an adult in a couple of years, the middle-agedwants to see the family, the house and career expand every year. Only theretired wants to stop the world and play golf for ever, but realizes that thisis impossible and understands that there is an end, after which there is nochange at all any more.

When something changes, we may speak of the total change and we mayspeak of the change per unit or the rate of change. If our salary increases, weexpect an increase in tax and we may speak of the total change in tax (forone year). We may also speak of the change in tax per extra dollar we earn,which is a rate of change of tax commonly referred to as marginal incometax. The marginal income tax usually changes with our total income, sothat we pay a higher marginal tax if we have a higher income. If our total

354 23. The Derivative

income is 10 000 dollars, then we may have to pay 30 cents tax out of anextra dollar we earn, and if our total income is 50 000 dollars, we may haveto pay 50 cents tax out of an extra dollar. The marginal tax, or rate ofchange of tax, in this example is 0.3 if our income is 10 000 dollars and 0.5if our income is 50 000 dollars.

Business people speak of marginal cost of a certain item, which is theincrease in total cost if we buy one more item, that is the cost increase peritem or rate of change of total cost. Normally the marginal cost dependson the the total amount and in fact normally the marginal cost decreaseswith the total amount of items we buy. The marginal cost of producingsome item also varies with the total amount produced. At a certain pro-duction level, the cost of producing one more item may be very small, whileif we have to build a whole new factory to produce that single additionalitem, the marginal cost would be very large. Thus the marginal cost inproduction may vary with the total production.

The concept of a function f : D(f) → R(f) is also intimately connectedto change. For each x ∈ D(f) there is a f(x) ∈ R(f), and usually f(x)changes with x. If f(x) is the same for all x, then the function f(x) isa constant function, which is easy to grasp and does not require muchfurther study. If f(x) does vary with x, then it is natural to seek ways ofdescribing qualitatively and quantitatively how f(x) varies with x. The rateof change enters again if we seek to describe how f(x) changes per unit of x.

The derivative of a function f(x) with respect to x measures the rate ofchange of f(x) as x varies. The derivative of our tax with respect to incomeis the marginal tax. The derivative of the total production cost with respectto total production is the marginal cost.

The basic modeling tool in Calculus is the derivative. Indeed, the startof the modern scientific age coincides with the invention of the concept ofderivative. The derivative is a measure of rate of change.

In this chapter, we introduce the wonderful mathematical concept of thederivative, figure out some of its properties, and start to use derivatives inmathematical modeling.

23.2 Paying Taxes

We return to the above example of describing our income tax as a functionof income. Suppose we let x denote our total income next year and let f(x)be the corresponding total income tax we would have to pay. The functionf(x) describes how our total income tax changes with our income x. Foreach given income x, there is a corresponding income tax f(x) to pay. Weplot a possible function f(x) in the following figure:

23.2 Paying Taxes 355

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

income

tax

Fig. 23.1. Income tax f(x) varying with income x

The function f(x) in this example is piecewise linear and Lipschitz con-tinuous. The slope of f(x) is zero up to the total income 5, the slope is 0.2in the interval [5 000, 10 000], 0.3 in [10 000, 20 000], 0.4 in [20 000, 30 000]and 0.5 in [30 000,∞).

For a given x, the slope of the straight line representing f(x) close to x, isthe marginal tax. We denote the slope of f(x) at x bym(x). We see that theslope m(x) varies with x. For example, m(x) = 0.3 for x ∈ (10 000, 20 000).If we add one extra dollar at the income x ∈ (10 000, 20 000), then ourincome tax will increase by 0.3 dollars.

The marginal tax is the same as the slope of the straight line representingthe income tax f(x) as a function of income x. Thus the marginal tax ism(x) at the income x. The marginal income tax is zero up to the totalincome 5 000, the marginal tax is 0.2 in the income bracket [5 000, 10 000],0.3 in the bracket [10 000, 20 000], 0.4 in the bracket [20 000, 30 000] and 0.5for incomes in the bracket [30 000,∞).

We can describe how f(x) varies in each income tax bracket through thefollowing formula

f(x) = 0 for x ∈ [0, 5 000]f(x) = 0.2(x− 5 000) for x ∈ [5 000, 10 000]f(x) = f(10 000) + 0.3(x− 10 000) for x ∈ [10 000, 20 000]f(x) = f(20 000) + 0.4(x− 20 000) for x ∈ [20 000, 30 000]f(x) = f(30 000) + 0.5(x− 30 000) for x ∈ [30 000,∞)

We can condense these formulas into

f(x) = f(x) +m(x)(x− x), (23.1)


where x represents a given income with corresponding tax f(x), and we areinterested in the tax f(x) for an income x in some interval containing x.For example, the formula

f(x) = f(15 000) +m(15 000)(x− 15 000) for x ∈ [10 000, 20 000]

where m(15 000) = 0.3 is the marginal tax, describes how the tax varieswith the income x around the income x = 15 000, see Fig. 23.2.

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

x 104

1000

1500

2000

2500

3000

3500

4000

income

tax

Fig. 23.2. Income tax f(x) for income x in the interval [10 000, 20 000]

The derivative of the function f(x) = f(x) +m(x)(x − x) for x = x, isthe marginal tax m(x). The formula f(x) = f(x) +m(x)(x − x) describeshow f(x) varies if x varies in an interval around x. The formula states thatf(x) is a straight line with slope m(x) close to x.

More generally, if f(x) = mx+ b is a linear function, then we can write

f(x) = f(x) +m (x− x),

since f(x) = b + mx. The coefficient m multiplying the change x − x isequal to the derivative of f(x) at x. In this case, the derivative is constantequal to m for all x. The change in f(x) is proportional to the change in xwith factor of proportionality equal to m:

f(x) − f(x) = m (x− x), (23.2)

that is if x = x, then the slope m is given by

m =f(x) − f(x)

x− x(23.3)

We may view the slope m as the change of f(x) per unit change of x, or asthe rate of change of f(x) with respect to x.

23.3 Hiking 357

23.3 Hiking

We now give the above example a different interpretation. Suppose now thatx represents time in seconds and f(x) is the distance in meters travelledby a hiker along a hiking path measured from the start at time x = 0.According to the above formula, we have f(x) = 0 for x ∈ [0, 5 000], whichmeans that the trip starts with the hiker at rest at x = 0 for 5 000 seconds(maybe to fix some malfunctioning equipment). For x ∈ [5 000, 10 000], wehave f(x) = 0.2(x − 5 000) which means that the hiker advances with 0.2meter per second, that is with the velocity 0.2 meters per second. In the timeinterval [10 000, 20 000], we have f(x) = f(10 000)+ 0.3(x− 10 000), whichmeans that the hiker’s velocity is now 0.3 meters per second, and so on.

We note that the slopem(x) of the straight line f(x) = f(x)+m(x)(x−x)represents the velocity at x. We may thus say that the derivative of thedistance f(x) with respect to time x, which is the slope m(x), is equal tothe velocity. We will meet the interpretation of the derivative as a velocityagain below.

23.4 Definition of the Derivative

We shall now seek to define the derivative of a given function f : R → R ata given point x. We shall then follow the idea that if f(x) is particularlywell approximated by the linear function f(x) + m (x − x) for x close tox, then the derivative of f(x) at x will be equal to m. In other words, thederivative of f(x) at x will be equal to the slope m of the approximatinglinear function f(x) + m (x − x). Of course, a key point is to describehow to interpret that the linear function f(x) + m (x − x) approximatesf(x) “particularly well”. We shall see that the natural requirement is toask that the error is proportional to |x − x|2, that is that the error isquadratic in the difference x − x. Geometrically, this will be the sameas asking the straight line y = f(x) + m (x − x) to be tangent to thegraph of y = f(x) at (x, f(x)). We will see that asking the error to bequadratic in x− x is just about right. In particular, asking the error to beeven smaller, for example proportional to |x− x|3, would be to ask for toomuch.

Before defining the derivative, we back off a little to prepare ourselvesand consider different linear approximations b+m (x− x) of the given func-tion f(x) for x close to x. There are many straight lines that approximatef(x) close to x. We show some bad approximations and a number of goodapproximations in Fig. 23.3. On the left, we show some bad linear “approx-imations” to the function f(x) near x. On the right, we show some betterlinear approximations.

The question is whether one of the many possible approximate lines isa particularly good choice or not.


xx

y = f(x)y = f(x)

Fig. 23.3. Linear approximations of f(x) close to x

We have one piece of information we should use, namely, we know thatthe value of f(x) at x = x is f(x). So first of all, we only consider linesb + m (x − x) that take on the value f(x) for x = x, that is we chooseb = f(x). Such lines are said to interpolate f(x) at x and thus have anequation of the form

y = f(x) +m(x− x). (23.4)

We started this section considering approximations of f(x) of this form.We plot several examples in Fig. 23.4 with different slopes m.

x

f(x)

y = f(x)

Fig. 23.4. Linear approximations to a function that pass through the point(x, f(x)). The region near (x, f(x)) has been blown-up on the right

We now would like to choose the slope m so that f(x) is particularly wellapproximated by the linear function f(x) +m (x− x) for x close to x. Weexpect the slope m to depend on x and thus we will have m = m(x).

Out of the three lines plotted in Fig. 23.4 near (x, f(x)), the line in themiddle seems to be the best by far. This line is tangent to the graph of f(x)at the point x. The slope of the tangent is characterized by the fact thatthe error between f(x) and the approximation f(x) +m(x)(x− x), that is

23.4 Definition of the Derivative 359

the quantity

Ef (x, x) = f(x) −(f(x) +m(x)(x− x)

), (23.5)

is particularly small. Since f(x) +m(x)(x− x) interpolates f(x) at x = x,we have E(x, x) = 0. Rewriting (23.5) as

f(x) = f(x) +m(x)(x − x) + Ef (x, x),

we may view Ef (x, x) as a correction to the linear approximation f(x) +m(x)(x − x) of f(x), see Fig. 23.5. It is natural to say that the correctionEf (x, x) is particularly small if it is much smaller than the term m(x− x),which represents a linear correction of the constant value f(x). Thus, f(x)+m(x)(x− x) is a linear approximation of f(x) close to x with zero error forx = x, and we seek m(x) so that the correction Ef (x, x) is small comparedto m(x− x) for x close to x.

x x

y = f(x)

y = f(x) +m(x)(x− x)Ef (x, x)

Fig. 23.5. Graph y = f(x), tangent y = f(x) +m(x)(x− x) and error Ef (x, x)

The natural requirement is then to ask that Ef (x, x) can be bounded bya term which is quadratic in x− x, that is

|Ef (x, x)| ≤ Kf(x)|x− x|2 for x close to x, (23.6)

where Kf(x) is a constant. The term Kf(x)|x − x|2 is much smaller thanm(x)|x − x|, if x is sufficiently close to x, that is, if the factor |x − x| issmall enough.

We will say, for short, that an error term Ef (x, x) is quadratic in x − xif Ef (x, x) satisfies the estimate (23.6) for some constant Kf (x) for x closeto x. We thus seek to choose the slope m = m(x) so that the error Ef (x, x)is quadratic in x− x. The linear function f(x) +m(x)(x− x) will then betangent to f(x) at x. We expect the slope of the tangent at x to depend onx, which we indicate by denoting the slope by m(x).


Now we are in position to define the derivative of f(x) at x. The functionf(x) is said to be differentiable at x if there are constants m(x) and Kf (x)such that for x close to x,

f(x) = f(x) +m(x)(x− x) + Ef (x, x),

with |Ef (x, x)| ≤ Kf (x)|x − x|2. (23.7)

We then say that the derivative of f(x) at x is equal to m(x), and we denotethe derivative by f ′(x) = m(x). The derivative f ′(x) of f(x) at x is equalto the slope m(x) of the tangent y = f(x) +m(x)(x− x) to f(x) at x. Thedependence of x is kept in f ′(x).

Recapping our discussion, the equation (23.7) defining the derivative off at x can be thought of as defining a linear approximation

f(x) + f ′(x)(x− x) ≈ f(x)

for x close to x with an error Ef (x, x) which is quadratic in x−x. The linearapproximation f(x) + f ′(x)(x− x) of f(x) with quadratic error in x− x, iscalled the linearization of f(x) at x, and the corresponding Ef (x, x) is thelinearization error.

We now compute the derivative of some basic polynomial functions f(x)from the definition of the derivative.

23.5 The Derivative of a Linear Function IsConstant

If f(x) = b+mx is a linear function with b and m real constants, then

f(x) = b+mx = b+mx+m(x− x) = f(x) +m(x− x),

with the corresponding error function Ef (x, x) = 0 for all x. We concludethat if f(x) = b + mx, then f ′(x) = m. Thus the derivative of a linearfunction b +mx is constant equal to the slope m. We note that if m > 0,then f(x) = b +mx is increasing (with increasing x), that is f(x) > f(x)if x > x and f(x) < f(x) if x < x. Conversely, if m < 0, then f(x) isdecreasing (with increasing x), that is f(x) < f(x) if x > x and f(x) > f(x)if x < x. In particular, for b = 0 and m = 1 we have

if f(x) = x, then f ′(x) = 1. (23.8)

23.6 The Derivative of x2 Is 2x

We now compute the derivative of the quadratic function f(x) = x2 ata point x. The strategy is to first “extract” the constant value f(x) from

23.6 The Derivative of x2 Is 2x 361

f(x), and a factor x− x from the reminder term, to obtain f(x) = f(x) +g(x, x)(x − x) for some quantity g(x, x), then to replace g(x, x) by g(x, x)and verify that the resulting error term E = (g(x, x) − g(x, x))(x − x) hasthe desired property |E| ≤ K|x− x|2. In the considered case of f(x) = x2

we have

x2 = x2 + (x2 − x2) = x2 + (x+ x)(x− x) = x2 + 2x(x− x) + (x− x)2,

that is,f(x) = f(x) + 2x(x− x) + Ef (x, x),

where Ef (x, x) = (x− x)2, which shows that f(x) = x2 is differentiable forall x with f ′(x) = 2x, that is, f ′(x) = 2x for x ∈ R. We conclude that, seeFig. 23.6,

if f(x) = x2, then f ′(x) = 2x. (23.9)

An alternative, shorter route to the linearization formula (23.6) in thiscase is

x2 = (x+ (x− x))2 = x2 + 2x(x− x) + (x− x)2,

−2−2 22

−4−4

44

f(x) f ′(x)

Fig. 23.6. f(x) = x2 and f ′(x) = 2x

We see that x2 is decreasing for x < 0 and increasing for x > 0 followingthe sign of the derivative f ′(x) = 2x.

Repeating the above calculation with the particular value x = 1, to getfamiliar with the argument, we get

x2 = 1 + 2(x− 1) + (x− 1)2,

and thus the derivative of f(x) = x2 at x = 1 is f ′(1) = 2. We plot x2 and1 + 2(x− 1) in Fig. 23.7. We compare some values of the given function x2

to the linear approximation 1 + 2(x − 1) along with the error (x − 1)2 inFig. 23.8


y = x2

y = 1 + 2(x− 1)

0

1

1

2

2

3

4

x

Fig. 23.7. The linearization 1 + 2(x− 1) of x2 at x = 1

x f(x) f(1) + f ′(2)(x − 1) Ef (x, 1)

.7 .49 .4 .09

.8 .64 .6 .04

.9 .81 .8 .011.0 1.0 1.0 0.01.1 1.21 1.2 .011.2 1.44 1.4 .041.3 1.69 1.6 .09

Fig. 23.8. Some values of f(x) = x2, f(1) + f ′(1)(x − 1) = 1 + 2(x − 1), andEf (x, 1) = (x− 1)2

23.7 The Derivative of xn Is nxn−1

We now compute the derivative of the monomial f(x) = xn at a point x,where n ≥ 2 is a natural number. By the Binomial Theorem, generalizing(23.6), we have

xn = (x+ x− x)n = xn + nxn−1 (x− x) + Ef (x, x) ,

where all the terms of the error

Ef (x, x) =n(n− 1)

2xn−2 (x− x)2 + · · · + (x− x)n ,

contain at least two factors of (x− x), and thus

|Ef (x, x)| ≤ Kf (x)(x − x)2,

with Kf(x) depending on x, x and n. Clearly, Kf(x) is bounded by someconstant if x and x belong to some bounded interval. We conclude that

23.8 The Derivative of 1x

Is − 1x2 for x = 0 363

f ′(x) = nxn−1 for all x, that is, f ′(x) = nxn−1 for all x. We summarize:

if f(x) = xn, then f ′(x) = nxn−1. (23.10)

For n = 2, we recover the formula f ′(x) = 2x if f(x) = x2.

23.8 The Derivative of 1x Is − 1

x2 for x = 0

We now compute the derivative of the function f(x) = 1x for x = 0. We

have for x close to x = 0,

1x

=1x

+(

1x− 1x

)

=1x

+(

− 1xx

)

(x− x) =1x

+(

− 1x2

)

(x− x) + E

where

E =(

1x2

− 1xx

)

(x− x) =1xx2

(x− x)2 ,

and thus |E| ≤ K|x− x|2 as desired. We conclude that f(x) = 1x is differ-

entiable at x with derivative f ′(x) = − 1x2 for x = 0, that is

if f(x) =1x, then f ′(x) = − 1

x2for x = 0. (23.11)

23.9 The Derivative as a Function

If a function f(x) is differentiable for all points x in an open interval I, thenf(x) is said to be differentiable on I. The derivative f ′(x) in general varieswith x. We may thus view the derivative f ′(x) of a function f(x), whichis differentiable on some interval I, as a function of x for x ∈ I. We maychange the name of the variable x and speak about the derivative f ′(x) asa function of x. We already took this step above. To a function f(x) that isdifferentiable on an interval I, we may thus associate the function f ′(x) forx ∈ I that gives the derivative of f(x). We may thus speak of the derivativef ′(x) of a differentiable function f(x). For example, the derivative of x2 is2x and the derivative of x3 is 3x2.

23.10 Denoting the Derivative of f(x) by Df(x)

We also denote the derivative f ′(x) of f(x) by Df(x), that is

f ′(x) = Df(x).


−4 −2 0 2 4−25

−20

−15

−10

−5

0

5

10

15

20

25

−4 −2 0 2 4−5

−4

−3

−2

−1

0

1

2

3

4

5

x x

1/x

−1/x

2

Fig. 23.9. The function f(x) = 1/x and its derivative f ′(x) = −1/x2 for x > 0

Observe that D(f) denotes the domain of f , while Df(x) denotes thederivative of f(x) at x.

We may write the basic formula (23.10) as

if f(x) = xn, then f ′(x) = Df(x) = nxn−1, (23.12)

orDxn = nxn−1 for n = 1, 2, . . . (23.13)

This is one of the most important results of Calculus. We here assume thatn is a natural number (including the particular case n = 0 if we agree todefine x0 = 1 for all x). Below we will extend this formula to n rational(and finally to n real). We recall that we proved above that for x = 0

if f(x) =1x, then f ′(x) = Df(x) = − 1

x2,

corresponding to setting n = −1 in (23.12).

Example 23.1. Suppose you drive a car along the x-axis and your positionat time t measured from the starting point at t = 0 is s(t) = 3 × (2t− t2)miles, where t is measured in hours and the positive direction for s is tothe right. Your speed is s′(t) = 6−6t = 6(1− t) miles/hour at time t. Sincethe derivative is positive for 0 ≤ t < 1, which means that the tangent linesto s(t) have positive slope for 0 ≤ t < 1, the car moves to the right up tot = 1. At exactly t = 1, you stop the car. If t > 1, then the car moves tothe left again, because the slopes of the tangents are negative.

23.11 Denoting the Derivative of f(x) by dfdx

365

23.11 Denoting the Derivative of f(x) by dfdx

We will also denote the derivative f ′(x) of a differentiable function f(x) by

df

dx= f ′(x) (23.14)

We here usually omit the variable x using the notation dfdx and thus write

dfdx instead of df

dx (x). Of course the notation dfdx is inspired from (23.22)

below, with df corresponding to the f -difference f(xi)− f(x) in f(x), anddx corresponding to the x-difference xi − x in x. One may also denotethe differentiation operator D in Df(x) alternatively by d

dx , and write forexample

d

dx(xn) = nxn−1 (23.15)

We now have three ways of denoting the derivative of a function f(x) withrespect to x, namely f ′(x), Df(x), and df

dx .Note that using the notation f ′(x) andDf(x) for the derivative of a func-

tion f(x), it is understood that the derivative is taken with respect to theindependent variable x occurring in f(x). This convention is made explicitin the notation df

dx . Thus if f = f(y), that is f is a function of the variable y,then Df = df

dy , while if f = f(x) then Df = dfdx .

23.12 The Derivative as a Limit of DifferenceQuotients

We recall that the function f(x) is differentiable at x with derivative f ′(x),if for x in some open interval I containing x,

f(x) = f(x) + f ′(x)(x − x) + Ef (x, x), (23.16)

where|Ef (x, x)| ≤ Kf (x)|x− x|2, (23.17)

and Kf(x) is a constant. Dividing by x − x assuming x = x, we get forx ∈ I,

f(x) − f(x)x− x

= f ′(x) +Rf (x, x), (23.18)

where

Rf (x, x) =Ef (x, x)x− x

, (23.19)

satisfies|Rf (x, x)| ≤ Kf(x)|x− x| for x ∈ I. (23.20)


Let now xi∞i=1 be a sequence with limi→∞ xi = x with xi ∈ I and xi = xfor all i. There are many such sequences. For example, we may choosexi = x+ i−1, or xi = x+ 10−i. From (23.20) it follows that

limi→∞

Rf (xi, x) = 0, (23.21)

and thus by (23.18) we have

f ′(x) = limi→∞

mi(x), (23.22)

where

mi(x) =f(xi) − f(x)

xi − x(23.23)

is the difference quotient based on the two distinct points x and xi. Thedifference quotient mi(x) defined by (23.23) is the slope of the secant lineconnecting the points (x, f(x)) and (xi, f(xi)), see Fig. 23.10, and can beviewed as the average rate of change of f(x) between the points x and xi.

x x

f(x)

f(x)

Fig. 23.10. The secant line joining (x, f(x)) and (xi, f(xi))

The formulaf ′(x) = lim

i→∞

f(xi) − f(x)xi − x

, (23.24)

expresses the derivative f ′(x) as the limit of the average rate of change off(x) over intervals xi − x, the length of which tend to zero as i tends toinfinity. We may thus view f ′(x) as the local rate of change of f(x) at x.If f(x) is tax at income x, then f ′(x) is the marginal tax at x. If f(x) isa distance and x time, then f ′(x) is the instantaneous velocity at time x.

Alternatively, we may view f ′(x) being the slope of the tangent to f(x) atx = x as the limit of the sequence mi(x) of slopes of secants through thepoints (x, f(x)) and (xi, f(xi)), where xi∞i=1 is a sequence with limit x.We illustrate in Fig. 23.11.

23.13 How to Compute a Derivative? 367

x x1x2x3x4

y = f(x)

tangent line

secant lines

Fig. 23.11. A sequence of secant lines approaching the tangent line at x

Example 23.2. Let us now compute the derivative of f(x) = x2 at x byusing (23.22). Let xi = x+1/i. The slope of the secant line through (x, x2)and (xi, f(xi)) = (xi, x

2i ) is

mi(x) =x2

i − x2

xi − x=

(xi − x)(xi + x)xi − x

= (xi + x).

By (23.22), we have


mi(x) = limi→∞

(

2x+1i

)

= 2x,

and we recover the well known formula Dx2 = 2x.

23.13 How to Compute a Derivative?

Suppose f(x) is a given function for which we are not able to analyticallycompute the derivative f ′(x) for a given x. Note that we were able tocarry out the analytical computation above for polynomials, but we gaveno strategy to determine the derivative f ′(x) for more general functionsf(x). The function f(x) may not be given by any formula at all, and couldjust be given as a value f(x) for each x determined in some way.

The same problem arises if we want to determine a physical velocity bydoing some measurement. For example, if the speed meter of our car isout of function, how can we measure the velocity of the car at some giventime x? Of course the natural thing would be to measure the increment ofdistance f(x)− f(x) over some time interval x− x, where f(x) is the totaldistance, and then use the quotient f(x)−f(x)

x−x , the average velocity overthe time interval (x, x), as an approximation of the momentary velocityat time x. But how to choose the length of the time interval x − x? If we


choose x− x way too small, then we will not be able to measure any changein position at all, that is we will have f(x) = f(x), and then conclude zerovelocity, while if we take x − x too large, the computed average velocitymay differ very much from the desired momentary velocity at x.

We now use analysis to find the right increment x − x to use to deter-mine the derivative f ′(x) of a given function f(x) at x, assuming that thefunction values f(x) are given with a certain precision. From the definitionof f ′(x), we have for x close to x, x = x,

f ′(x) =f(x) − f(x)

x− x− Ef (x, x)

x− x,

where ∣∣∣∣Ef (x, x)x− x

∣∣∣∣ ≤ Kf (x)|x− x|.

The difference quotientf(x) − f(x)

x− x,

may thus be used as an approximation of f ′(x) up to a linearization errorof size Kf(x)|x− x|.

Suppose now that we know the quantity f(x) − f(x) up to an errorof size δf . We thus assume that we know x and x exactly, but there isan error of size δf in the quantity f(x) − f(x) resulting from errors in thefunction values f(x) and f(x) from computation or measurement. We knowthat frequently the value f(x) for a given x, is known only approximatelythrough computation.

The error δf in f(x)−f(x) causes an error of size | δfx−x | in the difference

quotient f(x)−f(x)x−x . We thus have a total error in f ′(x) of size

∣∣∣∣δf

x− x

∣∣∣∣ +Kf(x)|x− x|, (23.25)

resulting from the error in f(x)− f(x) and the linearization error. Makingthe two error contributions equal, which should give the right balance, weget the equation ∣

∣∣∣δf

x− x

∣∣∣∣ = Kf |x− x|,

where we write Kf = Kf(x), from which we compute the “optimal incre-ment”

|x− x| =

√δf

Kf. (23.26)

If we take |x− x| smaller, then the error contribution | δfx−x | will dominate

and we take |x − x| bigger, then the linearization error Kf(x)|x − x| willdominate.

23.14 Uniform Differentiability on an Interval 369

Inserting the optimal increment into (23.25), we get a corresponding“best” error estimate

∣∣∣∣f

′ (x) − f(x) − f (x)x− x

∣∣∣∣ ≤ 2

√δf

√Kf . (23.27)

Contemplating the two resulting formulas (23.26) and (23.27) for theoptimal increment and corresponding minimal error in f ′(x), we see thatsome a priori knowledge of δf and Kf is needed here. If we have no idea ofthe size of these quantities, we will not know how to choose the incrementx − x and we will not know anything about the error in the computedderivative. Of course it is in many cases realistic to have an idea of thesize of δf , being an error from computation or measurement, but it maybe less obvious how to get an idea of the size of Kf . We will return to thisquestion below.

We sum up: Computing an approximation of f ′(x) by using the differencequotient f(x)−f(x)

x−x , we should not choose x− x too small if there is an errorin the quantity f(x) − f(x). The formula



,

where xi∞i=1 is a sequence with limit x and xi = x, thus must be usedwith caution. If we examine the cases above where we could compute thederivative analytically, like the case f(x) = x2, we will see that in fact wecould divide through by xi − x in the quotient f(xi)−f(x)

xi−x and avoid thedangerous appearance of xi − x in the denominator. For example, whencomputing Dx2 analytically, we used that

x2i − x2

xi − x=

(xi + x)(xi − x)(xi − x)

= xi + x,

from which we could conclude that Dx2 = 2x.

23.14 Uniform Differentiability on an Interval

We say that the function f(x) is differentiable on the interval I if f(x) isdifferentiable for each x ∈ I, that is for x ∈ I there are constants m(x) andKf(x) such that for x close to x,

f(x) = (f(x) +m(x)(x− x)) + Ef (x, x)

|Ef (x, x)| ≤ Kf (x)|x− x|2.

In many cases we can choose one and the same constantKf (x) = Kf for allx ∈ I. We may express this by saying the f(x) is uniformly differentiable


on I. Allowing also x to vary in I we are led to the following definition,which we will find very useful below: We say that the function f : I → R

is uniformly differentiable on the interval I with derivative f ′(x) at x, ifthere is a constant Kf such that for x, x ∈ I,

f(x) = (f(x) + f ′(x)(x− x)) + Ef (x, x)

|Ef (x, x)| ≤ Kf |x− x|2.

Observe that the important thing is that Kf here does not depend on x,but may of course depend on the function f and the interval I.

23.15 A Bounded Derivative Implies LipschitzContinuity

Suppose that f(x) is uniformly differentiable on the interval I = (a, b) andsuppose there is a constant L such that for x ∈ I,

|f ′(x)| ≤ L. (23.28)

We shall now show that f(x) is Lipschitz continuous on I with Lipschitzconstant L, that is we shall show that

|f(x) − f(y)| ≤ L|x− y| for x, y ∈ I. (23.29)

This result states something completely obvious: if the absolute value ofthe maximal rate of change of a function f(x) is bounded by L, then theabsolute value of the total change |f(x) − f(y)| is bounded by L|x− y|.

If f(x) represents distance, and thus f ′(x) velocity, the statement is thatif the absolute value of the instantaneous velocity is bounded by L thenthe absolute value of the change of distance |f(x)− f(y)| is bounded by Ltimes the total time change |x− y|. Elementary, my dear Watson!

We shall give a short proof of this result below, when we have someadditional machinery available (the Mean Value theorem). We present herea somewhat longer proof.

By assumption we have for x, y ∈ I

f(x) = f(y) + f ′(y)(x − y) + Ef (x, y),

where|Ef (x, y)| ≤ Kf |x− y|2,

with Kf a certain constant. We conclude that for x, y ∈ I

|f(x) − f(y)| ≤ (L+Kf |x− y|)|x− y|,

23.15 A Bounded Derivative Implies Lipschitz Continuity 371

so that for x, y ∈ I,

|f(x) − f(y)| ≤ L|x− y|,

where L = L + K(b − a). This is almost what we want; the difference isthat L is replaced with the somewhat larger Lipschitz constant L.

If we restrict x and y to a subinterval Iδ of I of length δ, we have

|f(x) − f(y)| ≤ (L+Kδ)|x− y|

By making δ small enough, we can get L +Kδ as close to L as we wouldlike. Let now x and y in I be given and let x = x0 < x1 < · · · < xN = y,where xi − xi−1 ≤ δ, see Fig. 23.12.

x y

x0 x1 x2 xi−1 xi xN

δ

Fig. 23.12. Subdivision of interval [x, y] into subintervals of length < δ

We have by the triangle inequality

|f(x) − f(y)| =

∣∣∣∣∣

N∑

i=1

(TSgf(xi) − f(xi−1)

∣∣∣∣∣

≤N∑

i=1

|f(xi) − f(xi−1)| ≤ (L+Kδ)N∑

i=1

|xi − xi−1|

= (L+Kδ)|x− y|.

Since this inequality holds for any δ > 0, we conclude that indeed

|f(x) − f(y)| ≤ L|x− y|, for x, y ∈ I,

which proves the desired result. We summarize in the following theoremwhich we will use extensively below:

Theorem 23.1 Suppose that f(x) is uniformly differentiable on the inter-val I = (a, b) and suppose there is a constant L such that

|f ′(x)| ≤ L, for x ∈ I.

Then f(x) is Lipschitz continuous on I with Lipschitz constant L.

TSg Please check this opening parenthesis.



23.16 A Slightly Different Viewpoint

In many Calculus books the derivative of a function f : R → R at a point xis defined as follows. If the limit

limi→∞


, (23.30)

does exist for any sequence xi with limi→∞ xi = x (assuming xi = x),then we call the (unique) limit the derivative of f(x) at x = x and we denoteit by f ′(x). We proved in (23.22) that if f(x) is differentiable according toour definition with derivative f ′(x), then



,

because we assume that∣∣∣∣f

′(x) − f(xi) − f(x)xi − x

∣∣∣∣ ≤ Kf(x)|xi − x|. (23.31)

This means that our definition of derivative is somewhat more demand-ing than that used in many Calculus books. We assume that the limitingprocess occurs at a linear rate expressed by (23.31), whereas the definition(23.30) just asks the limit to exist with no rate required (which pleasesmany mathematicians because of its maximal generality). In most cases,the two concepts agree, but in some very special cases the derivative wouldexist according to the standard Calculus book definition, but not accordingto the definition we use. We could naturally relax our definition by relaxingthe right hand side bound in (23.31) to Kf (x)|xi − x|θ, with some posi-tive constant θ < 1, but the corresponding definition would still be a littlestronger than just asking the limit to exist. Using a more demanding defi-nition we focus on normality rather than the extreme or degenerate, whichwe believe will help the student to approach the new topic. Once the normalsituation is understood it may be easier to come to grips with extreme cases.

23.17 Swedenborg

A Swedish counterpart of the Universal Genius Leibniz, together with New-ton the Inventor of Calculus, was Emanuel Swedenborg (1688–1772). Swe-denborg introduced Calculus to Sweden with independent contributions.Swedenborg produced 150 works on seventeen sciences, was a musician,mining engineer, member of the Swedish parliament, invented a glider, anundersea boat, an ear trumpet for the deaf, a mathematician who wrote thefirst books in Swedish on algebra and calculus, a physiologist who discov-ered the function of several areas of the brain and ductless glands, creator


Fig. 23.13. Emanuel Swedenborg, Swedish Universal Genius, as a young man:“The Intercourse of Soul and Body is thus not effected by any physica influx orby any action of the Body upon the Mind or Soul; for the lower cannot affectthe higher, and the nature cannot inflow into the spiritual. Yet the Soul canaccomodate itself to the changes of the sensories of the brain and form mentalpercepts and concepts. It can also time the release of the energy there stored andfrom an intelligent conatus direct it into motivated or living actions”

of the (at the time) world’s largest dry-dock, and suggested the nebulatheory of the formation of the planets.

Chapter 23 Problems

23.1. Prove directly from the definition that the derivative of x3 is 3x2, andthat the derivative of x4 is 4x3.

23.2. Prove directly from the definition that the derivative of the function f(x) =√x = x

12 is equal to f ′(x) = 1

2x− 1

2 for x > 0. Hint: use that (√x −

√x)(

√x +√

x) = x− x.

23.3. Compute the derivative of√x numerically for different values of x and

study how the error depends on the increment used, and the precision of thecomputation of

√x.


23.4. Study the symmetric difference quotient approximation

f ′(x) ≈ f(x+ h) − f(x− h)

hh > 0.

What is an optimal choice of the increment h, assuming f(x ± h) is not knownexactly. Hint: You may find it useful to look ahead into the next chapter (Taylor’sformula of order 2).

23.5. Compute the derivative of xn numerically for different values of x and nand study how the error depends on the increment used.

23.6. Can you compute the derivative of sin(x) and cos(x) from the definition?

23.7. Determine the smallest possible Lipschitz constant for the function f(x) =x3 with D(f) = [1, 4].

23.8. (l’Hopitals rule). Let f : R → R and g : R → R be differentiable on anopen interval I containing 0, and suppose f(0) = g(0) = 0. Prove that

limi→∞

f(xi)

g(xi)=f ′(0)g′(0)

if g′(0) = 0, where xi∞i=1 is a sequence with limi→∞ xi = 0 and xi = 0 forall i. This is the famous l’Hopitals rule, presented in l’Hopitals book Analyse deinfiment petit (1713), the first Calculus book! Note that f(0)

g(0)= 0

0is not well

defined. Hint: Write f(xi) = f(0) + f ′(xi)xi +Ef (xi, 0) et cet.

23.9. Determine limi→∞f(xi)g(xi)

, where f(x) =√x − 1 and g(x) = x − 1, and

xi∞i=1 is a sequence with limi→∞ xi = 1 and xi = 1 for all i. Extend to the casef(x) = xr − 1 with r rational.

24Differentiation Rules

Calculemus. (Leibniz)

When I have followed a line of thought to the end, it often seemsso simple that I start to wonder if I have stolen it from someone.(Horace Engdahl)

24.1 Introduction

We now state and prove some rules for computing derivatives of combi-nations of functions in terms of the derivatives of the functions in thecombination. These rules of differentiation form a part of Calculus thatcan be automated in terms of symbolic manipulation software. In contrast,we will see below that integration, the other basic operation of Calculus,is not open to automatic symbolic manipulation to the same extent. Itmakes sense that a popular software for symbolic manipulation in Calculusis called Derive and not Integrate.

The following rules of differentiation are of basic importance and will beused frequently below. They form the very back-bone of symbolic Calculus.Plunging into the proofs we get familiar with different basic aspects of theconcept of derivative, and prepare ourselves to write our own version ofDerive.

376 24. Differentiation Rules

24.2 The Linear Combination Rule

Suppose that f(x) and g(x) are two functions that are differentiable onan open interval I and let x ∈ I. By definition, there are error functionsEf (x, x) and Eg(x, x) satisfying for x close to x,

f(x) = f(x) + f ′(x)(x − x) + Ef (x, x),g(x) = g(x) + g′(x)(x − x) + Eg(x, x),

(24.1)

and|Ef (x, x)| ≤ Kf |x− x|2, |Eg(x, x)| ≤ Kg|x− x|2, (24.2)

where Kf and Kg are constants. Addition gives

f(x) + g(x) = f(x) + g(x) + (f ′(x) + g′(x))(x − x)+Ef (x, x) + Eg(x, x),

which can be written

(f + g)(x) = (f + g)(x) + (f ′(x) + g′(x))(x − x) + Ef+g(x, x) (24.3)

whereEf+g(x, x) = Ef (x, x) + Eg(x, x).

By (24.2), we have

|Ef+g(x, x)| ≤ (Kf +Kg)|x− x|2.

The formula (24.3) shows that (f + g)(x) is differentiable at x and

(f + g)′(x) = f ′(x) + g′(x). (24.4)

Next, multiplying the first line in (24.1) by a constant c, we get

(cf)(x) = (cf)(x) + cf ′(x)(x − x) + cEf (x, x) (24.5)

This proves that if f(x) is differentiable at x, then (cf)(x) is differentiableat x and

(cf)′(x) = cf ′(x). (24.6)

We summarize in

Theorem 24.1 (The Linear Combination rule) If f(x) and g(x) aredifferentiable functions on an open interval I and c is a constant, then(f + g)(x) and (cf)(x) are differentiable on I, and for x ∈ I,

(f + g)′(x) = f ′(x) + g′(x), or D(f + g)(x) = Df(x) +Dg(x), (24.7)

and(cf)′(x) = cf ′(x), or D(cf)(x) = cDf(x). (24.8)

24.3 The Product Rule 377

Example 24.1.

D(2x3 + 4x5 +

7x

)= 6x2 + 20x4 − 7

x2.

Example 24.2. Using the above theorem and the fact that Dxi = ixi−1,we find that the derivative of

f(x) = a0 + a1x+ a2x2 + · · · + anx

n =n∑

i=0

aixi

is

f ′(x) = a1 + 2a2x2 + · · · + nanx

n−1 =n∑

i=1

iaixi−1.

24.3 The Product Rule

Multiplying the left and right-hand sides, respectively, of the two equationsin (24.1), we obtain

(fg)(x) = f(x)g(x) = f(x)g(x)

+ f ′(x)g(x)(x− x) + f(x)g′(x)(x − x) + f ′(x)g′(x)(x− x)2

+ (g(x) + g′(x)(x − x))Ef (x, x) + (f(x)+ f ′(x)(x − x))Eg(x, x) + Ef (x, x)Eg(x, x).

We conclude that

(fg)(x) = (fg)(x) +(f ′(x)g(x) + f(x)g′(x)

)(x − x) + Efg(x, x),

where Efg(x, x) is quadratic in x− x. We have now proved:

Theorem 24.2 (The Product rule) If f(x) and g(x) are differentiableon I, then (fg)(x) is differentiable on I and

(fg)′(x) = f(x)g′(x) + f ′(x)g(x), (24.9)

that is,D(fg)(x) = Df(x)g(x) + f(x)Dg(x), (24.10)

Example 24.3.

D((10 + 3x2 − x6)(x− 7x4)

)

= (6x− 6x5)(x − 7x4) + (10 + 3x2 − x6)(1 − 28x3).


24.4 The Chain Rule

We shall now compute the derivative of the composite function (f g)(x) =f(g(x)) in terms of the derivatives f ′(y) = df

dy and g′(x) = dgdx . Suppose then

that g(x) is uniformly differentiable on an open interval I, and supposefurther that g(x) is Lipschitz continuous on I with Lipschitz constant Lg.Let x ∈ I. Suppose next that f(y) is uniformly differentiable on an openinterval J containing y = g(x). By definition, there are error functionsEf (y, y) and Eg(x, x) satisfying for y close to y and x close to x,

f(y) = f(y) + f ′(y)(y − y) + Ef (y, y),g(x) = g(x) + g′(x)(x− x) + Eg(x, x),

(24.11)

and|Ef (y, y)| ≤ Kf |y − y|2, |Eg(x, x)| ≤ Kg|x− x|2, (24.12)

where Kf and Kg are certain constants, independent of y and x, respec-tively. Further, by assumption

|g(x) − g(x)| ≤ Lg|x− x|. (24.13)

Setting y = g(x) and recalling that y = g(x), we have

f(g(x)) = f(y) = f(y) + f ′(y)(y − y) + Ef (y, y)= f(g(x)) + f ′(g(x))(g(x) − g(x)) + Ef (g(x), g(x)).

Substituting g(x) − g(x) = g′(x)(x − x) + Eg(x, x), we thus have

f(g(x)) = f(g(x)) + f ′(g(x)) g′(x)(x− x)+ f ′(g(x))Eg(x, x) + Ef (g(x), g(x)).

Since (24.12) and (24.13) imply

|Ef (g(x), g(x))| ≤ Kf |g(x) − g(x)|2 ≤ KfL2g|x− x|2,

|f ′(g(x))Eg(x, x)| ≤ |f ′(g(x))|Kg|x− x|2,

we see that

(f g)(x) = (f g)(x) + f ′(g(x))g′(x)(x − x) + Efg(x, x),

where Efg(x, x) is quadratic in x− x. We have now proved:

Theorem 24.3 (The Chain rule) Assume that g(x) is uniformly dif-ferentiable in an open interval I and g(x) is Lipschitz continuous on I.Suppose further that f is uniformly differentiable in an open interval Jcontaining g(x) for x in I. Then the composite function f(g(x)) is differ-entiable on I, and

(f g)′(x) = f ′(g(x))g′(x), for x ∈ I, (24.14)

24.5 The Quotient Rule 379

ordh

dx=df

dy

dy

dx, (24.15)

where h(x) = f(y) and y = g(x), that is h(x) = f(g(x)) = (f g)(x). Analternative formulation si

D(f(g(x)) = Df(g(x))Dg(x), (24.16)

where Df = dfdy .

Example 24.4. Let f(y) = y5 and y = g(x) = 9 − 8x, so that f(g(x)) =(f g)(x) = (9 − 8x)5. We have f ′(y) = 5y4 and g′(x) = −8, and thus

D((9 − 8x)5) = 5y4 g′(x) = 5(9 − 8x)4 (−8) = −40(9 − 8x)4.

Example 24.5.

D(7x3 + 4x+ 6

)18 = 18(7x3 + 4x+ 6

)17D(7x3 + 4x+ 6)

= 18(7x3 + 4x+ 6

)17(21x2 + 4).

Example 24.6. Consider the composite function f(g(x)) with f(y) = 1/y,that is the function h(x) = 1

g(x) , where g(x) is a given function with g(x) =0. Since Df(y) = − 1

y2 we have using the Chain rule

Dh(x) = D1

g(x)=

−1(g(x))2

g′(x) =−g′(x)g(x)2

, (24.17)

as long as g(x) is differentiable and g(x) = 0.

Example 24.7. Using Example 24.6 and the Chain Rule, we get for n ≥ 1

d

dxx−n =

d

dx

(1xn

)

=−1

(xn)2d

dxxn

=−1x2n

× nxn−1 = −nx−n−1.

This extends the formula Dxm = mxm−1 to negative integers m =−1,−2, . . .

24.5 The Quotient Rule

Let f(x) and g(x) be differentiable on I and consider the problem of com-puting the derivative of (f

g )(x) = f(x)g(x) at x. Applying the Product rule to

f(x) 1g(x) = f(x)

g(x) , and using (24.17), we find that(f

g

)′(x) = f ′(x)

1g(x)

+ f(x)−g′(x)g(x)2

=f ′(x)g(x) − f(x)g′(x)

g(x)2,

if g(x) = 0, and we have thus proved:


Theorem 24.4 (The Quotient rule) Assume that f(x) and g(x) aredifferentiable functions on the open interval I. Then for x ∈ I, we have

(f

g

)′(x) =

f ′(x)g(x) − f(x)g′(x)g(x)2

,

provided g(x) = 0.

Example 24.8.

D

(3x+ 4x2 − 1

)

=3 × (x2 − 1) − (3x+ 4) × 2x

(x2 − 1)2.

Example 24.9.

d

dx

(x3 + x

(8 − x)6

)9

= 9(x3 + x

(8 − x)6

)8d

dx

(x3 + x

(8 − x)6

)

= 9(x3 + x

(8 − x)6

)8 (8 − x)6 ddx(x3 + x) − (x3 + x) d

dx(8 − x)6((8 − x)6

)2

= 9(x3 + x

(8 − x)6

)8 (8 − x)6(3x2 + 1) − (x3 + x)6(8 − x)5 ×−1(8 − x)12

.

Example 24.10. The Chain rule can also be used recursively:

d

dx

((((1 − x)2 + 1

)3

+ 2)4

+ 3

)5

= 5

((((1 − x)2 + 1

)3

+ 2)4

+ 3

)4

× 4((

(1 − x)2 + 1)3

+ 2)3

× 3((1 − x)2 + 1

)2

× 2(1 − x) × (−1).

24.6 Derivatives of Derivatives: f (n) = Dnf = dnfdxn

Let f(x) be a function with derivative f ′(x). Since f ′(x) is a function,it may also be differentiable with a derivative which would describe howquickly the rate of change of f is changing at each point x. The derivativeof the derivative f ′(x) of f(x) is called the second derivative of f(x) and isdenoted by

f ′′(x) = D2f(x) =d2f

dx2= (f ′)′(x).

24.7 One-Sided Derivatives 381

Example 24.11. For f(x) = x2, f ′(x) = 2x and f ′′(x) = 2.

Example 24.12. For f(x) = 1/x, f ′(x) = −1/x2 = −x−2 and f ′′(x) =−(−2)x−3 = 2/x3.

We can continue taking the derivative of the second derivative and geta third derivative:

f ′′′(x) = D3f(x) =d3f

dx3= (f ′′)′(x)

as long as the functions are differentiable. We can recursively define thederivative f (n) = Dnf of f order n by

f (n)(x) = Dnf(x) =dnf

dxn= (f (n−1))′(x) = D(Dn−1f)(x),

where f ′(x) = f (1)(x) = Df(x), f ′′(x) = f (2)(x) = D2f(x), and so on.The derivative of distance with respect to time is velocity. The derivative

of velocity with respect to time is called acceleration. Velocity indicates howquickly the position of an object is changing with time and accelerationindicates how quickly the object is speeding up or slowing down (changingvelocity) with respect to time.

Example 24.13. If f(x) = x4, then Df(x) = 4x3, D2f(x) = 12x2,D3f(x) = 24x, D4f(x) = 24 and D5f(x) ≡ 0.

Example 24.14. The n+1’st derivative of a polynomial of degree n is zero.

Example 24.15. If f(x) = 1/x, then

f(x) = x−1, Df(x) = −1 × x−2, D2f(x) = 2 × x−3, D3f(x) = −6 × x−4

...

Dnf(x) = (−1)n × 1 × 2 × 3 × · · · × nx−n−1 = (−1)nn!x−n−1.

24.7 One-Sided Derivatives

We can also define differentiability from the right at a point x of a functionf(x). The definition is the same as that used above with the restrictionthat x ≥ x. More precisely, the function f : J → R, where J = [x, b) andb > x, is said to be differentiable from the right at x if there are constantsm(x) and Kf(x) such that for x ∈ [x, b)

|f(x) − (f(x) +m(x)(x − x))| ≤ Kf (x)|x− x|2. (24.18)


We then say that the right-hand derivative of f(x) at x is equal to m(x),and we denote the right-hand derivative by f ′

+(x) = m(x).We define the left-hand derivative f ′

−(x) = m(x), analogously restrictingx ≤ x. In both cases, we are simply requiring that the linearization estimateholds for x on one side of x.

Example 24.16. The function f(x) = |x| is differentiable for x = 0 withderivative f ′(x) = 1 if x > 0 and f ′(x) = −1 if x < 0. The function f(x) =|x| is differentiable from the right at x = 0 with derivative f ′

+(0) = 1, anddifferentiable form the left at x = 0 with derivative f ′

−(0) = −1.

We say that f : [a, b] → R is differentiable on the closed interval [a, b], iff(x) is differentiable on the open interval (a, b), and is differentiable fromthe right at a, and differentiable from the left at b. The definition extendsin the obvious way to half-open/half-closed intervals (a, b] and [a, b). If fis either differentiable or is differentiable from the right and/or the left atevery point in an interval, then we say that f is piecewise differentiable onthe interval.

Example 24.17. The function |x| is piecewise differentiable on R. Thefunction 1/x is differentiable on (0,∞) but not differentiable on [0,∞).

24.8 Quadratic Approximation: Taylor’s Formulaof Order Two

For a differentiable function f(x), we figured out how to compute a bestlinear approximation for x close to x, namely

f(x) ≈ f(x) + f ′(x)(x − x)

with an error quadratic in x − x. In some situations, we might requiremore accuracy from an approximation than is possible to get using a lin-ear function. The natural generalization is to look for a “best” quadraticapproximation of the form

f(x) = f(x) +m1(x)(x − x) +m2(x)(x− x)2 + Ef (x, x), (24.19)

for x close to x, where m1(x) and m2(x) are constants and now the errorfunction Ef (x, x) is cubic in x− x, that is

|Ef (x, x)| ≤ Kf (x)|x− x|3, (24.20)

with Kf(x) a constant. Of course, for |x− x| small, Kf(x)|x− x|3 is muchsmaller than both m1(x)(x− x) or m2(x)(x− x)2, unless m1(x) and m2(x)happen to be zero, of course.

24.8 Quadratic Approximation 383

Now, if (24.19) holds for x close to x, then m1(x) = f ′(x), since m2(x−x)2 + Ef (x, x) is quadratic in x− x. If (24.19) holds, we thus have

f(x) = f(x) + f ′(x)(x− x) +m2(x)(x − x)2 + Ef (x, x). (24.21)

Let us next try to determine the constant m2(x). To this end we differen-tiate the relation (24.19) with respect to x to get

f ′(x) = f ′(x) + 2m2(x)(x− x) +d

dxEf (x, x). (24.22)

Let us now assume that for x close to x∣∣∣∣d

dxEf (x, x)

∣∣∣∣ ≤Mf (x)|x − x|2, (24.23)

for some constant Mf(x). The principle is that taking the derivative bringsdown the power of |x − x| one step from 3 to 2. We shall meet this phe-nomenon many times below. From (24.23) it would then follow by thedefinition of f ′′(x), that f ′′(x) = (f ′)′(x) = 2m2(x), that is

m2(x) =12f ′′(x).

We would thus arrive at an approximation formula of the form

f(x) = f(x) + f ′(x)(x − x) +12f ′′(x)(x− x)2 + Ef (x, x), (24.24)

for x close to x, where |Ef (x, x)| ≤ Kf (x)|x− x|3 with Kf (x) a constant.

Example 24.18. Consider the function f(x) = 1x for x close to x = 1. We

shall use the fact that if y = −1, then

11 + y

= 1 − y1

1 + y

which is readily verified by multiplying by 1 + y, and thus

11 + y

= 1 − y1

1 + y= 1 − y

(

1 − y1

1 + y

)

= 1 − y + y2 11 + y

= 1 − y + y2 − y3 11 + y

.

Choosing y = x− 1, we get

1x

=1

1 + (x − 1)= 1 − (x− 1) + (x− 1)2 − (x− 1)3

1 + (x− 1), (24.25)


and we see that the quadratic polynomial

1 − (x− 1) + (x− 1)2,

approximates 1x for x close to x = 1 with an error, which is cubic in x− x.

As a consequence of the expansion, we have that f(1) = 1, f ′(1) = −1 andf ′′(1) = 2. We plot the approximation in Fig. 24.1 and list some values ofthe approximation in Fig. 24.2.

0.5 1.0 1.5 2.00

1

2

3

4 y = 1x

x

y = 1 − (x− 1) + (x− 1)2

Fig. 24.1. The quadratic approximation 1− (x− 1) + (x− 1)2 of 1/x near x = 1

x 1/x 1 − (x− 1) + (x− 1)2 Ef (x, 1).7 1.428571 1.39 .038571.8 1.25 1.22 .03.9 1.111111 1.11 .001111.0 1.0 1.0 0.01.1 .909090 .91 .0009091.2 .833333 .84 .006661.3 .769230 .79 .02077

Fig. 24.2. Some values of f(x) = 1/x, the quadratic approximation1 − (x− 1) + (x− 1)2, and the error Ef (x, 1)

Below we will prove under the name of Taylor’s theorem, that if thefunction f(x) is three times differentiable with |f (3)(x)| ≤ 6Kf(x) for xclose to x, where Kf (x) is a constant, then for x close to x,

f(x) = f(x) + f ′(x)(x − x) +12f ′′(x)(x− x)2 + Ef (x, x), (24.26)

where the error function Ef (x, x) is cubic in x− x, more precisely,

|Ef (x, x)| ≤ Kf (x)|x− x|3, for x close to x. (24.27)

24.9 The Derivative of an Inverse Function 385

Further, ddxEf (x, x) is quadratic in x − x. Taylor’s theorem thus gives an

answer to the problem of quadratic approximation formulated in (24.19).

24.9 The Derivative of an Inverse Function

Let f : (a, b) → R be differentiable at x ∈ (a, b), so that for x close to x

f(x) = f(x) + f ′(x)(x − x) + Ef (x, x), (24.28)

where |Ef (x, x)| ≤ Kf (x)(x − x)2 with Kf (x) a constant. Suppose thatf ′(x) = 0 so that f(x) is strictly increasing or decreasing for x close tox, and thus the equation y = f(x) has a unique solution x for y close toy = f(x). This defines x as a function of y, and this function is said to bethe inverse of the function y = f(x) and is denoted by x = f−1(y), seeFig. 24.3.

xx x

y

y

yy = f(x) y = f(x)

y = f(x)

x = f−1(y)

Fig. 24.3. The function y = f(x) and its inverse x = f−1(y)

Can we compute the derivative of the function x = f−1(y) with respectto y close to y = f(x)? Rewriting (24.28), we have

y = y + f ′(x)(f−1(y) − f−1(y)) + Ef (f−1(y), f−1(y)),

that is

f−1(y) = f−1(y) +1

f ′(x)(y − y) − 1

f ′(x)Ef (f−1(y), f−1(y)), (24.29)

Suppose now that f−1 is Lipschitz continuous in an open interval J aroundy, so that

|f−1(y) − f−1(y)| ≤ Lf−1 |y − y| for y ∈ J.

Then for y close to y,

| 1f ′(x)

Ef (f−1(y), f−1(y))| ≤ 1|f ′(x)|Kf (x)(Lf−1)2|y − y|2,


which proves by (24.29) that the derivative Df−1(y) of f−1(y) with respectto y at y is equal to 1

f ′(x) , that is

Df−1(y) =1

f ′(x), (24.30)

where y = f(x). We summarize:

Theorem 24.5 If y = f(x) is differentiable at x with respect to x withf ′(x) = 0, then the inverse function x = f−1(y) is differentiable withrespect to y at y = f(x) with derivative Df−1(y) = 1

f ′(x) .

Example 24.19. The inverse of the function y = f(x) = x2 for x > 0 isthe function x = f−1(y) =

√y defined for y > 0. It follows that D

√y =

1f ′(x) = 1

2x = 12√

y . Changing notation from y to x, we thus have for x > 0,

d

dx

√x = D

√x =

12√x, or Dx

12 =

12x−

12 . (24.31)

24.10 Implicit Differentiation

We give an example of a technique called implicit differentiation to computethe derivative of the function x

pq , where p and q are integers with q = 0,

and x > 0. We know that the function y = xpq is the unique solution of the

equation yq = xp in y for a given x > 0. We can thus view y as a functionof x and write y(x) = x

pq , and we have

(y(x))q = xp for x > 0. (24.32)

Assuming y(x) to be differentiable with respect to x with derivative y′(x),we would get differentiating both sides of (24.32) with respect to x, andusing the Chain Rule on the left hand side:

q(y(x))q−1y′(x) = pxp−1

from which we deduce inserting that y(x) = xpq ,

y′(x) =p

qx−

pq (q−1)xp−1 =

p

qx

pq −1.

We conclude that

Dxr = rxr−1 for r rational, and x > 0, (24.33)

using the computation as an indication that the derivative indeed exists.To connect with the previous section, note that if y = f(x) has an inverse

function x = f−1(y), then differentiating both sides of x = f−1(y) with

24.11 Partial Derivatives 387

respect to x, considering y = y(x) = f(x) as a function of x, we get withD = d

dy

1 = Df−1(y)f ′(x)

which gives the formula (24.30).

24.11 Partial Derivatives

We now have gained some experience of the concept of derivative of a real-valued function f : R → R of one real variable x. Below we shall considerreal-valued functions several real variables, and we are then led to the con-cept of partial derivative. We give here a first glimpse, and consider a real-valued function f : R×R → R of two real variables, that is for each x1 ∈ R

and x2 ∈ R, we are given a real number f(x1, x2). For example,

f(x1, x2) = 15x1 + 3x2, (24.34)

represents the total cost in the Dinner Soup/Ice Cream model, with x1 rep-resenting the amount of meat and x2 the amount of ice-cream. To computethe partial derivative of the function f(x1, x2) = 15x1 + 3x2 with respectto x1, we keep the variable x2 constant and compute the derivative of thefunction f1(x1) = f(x1, x2) as a function of x1, and obtain df1

dx1= 15, and

we write∂f

∂x1= 15

which is the partial derivative of f(x1, x2) with respect to x1. Similarly, tocompute the partial derivative of the function f(x1, x2) = 15x1 + 3x2 withrespect to x2, we keep the variable x1 constant and compute the derivativeof the function f2(x2) = f(x1, x2) as a function of x2, and obtain df2

dx2= 3,

and we write∂f

∂x2= 3.

Obviously, ∂f∂x1

represents the cost of increasing the amount of meat oneunit, and ∂f

∂x2= 3 represents the cost of increasing the amount of ice cream

one unit. The marginal cost of meat is thus ∂f∂x1

= 15 and that of ice cream∂f∂x2

= 3.

Example 24.20. Suppose f : R × R → R is given by f(x1, x2) = x21 + x3

2 +x1x2. We compute

∂f

∂x1(x1, x2) = 2x1 + x2,

∂f

∂x2(x1, x2) = 3x2

2 + x1,

where we follow the principle just explained: to compute ∂f∂x1

, keep x2 con-stant and differentiate with respect to x1, and to compute ∂f

∂x2, keep x1

constant and differentiate with respect to x2.


More generally, we may in a natural way extend the concept of differ-entiability of a real-valued function f(x) of one real variable x to differen-tiability of a real valued function f(x1, x2) of two real variables x1 and x2

as follows: We say that function f(x1, x2) is differentiable at x = (x1, x2)if there are constants m1(x1, x2), m2(x1, x2) and Kf (x1, x2), such that for(x1, x2) close to (x1, x2),

f(x1, x2) = f(x1, x2)+m1(x1, x2)(x1−x1)+m2(x1, x2)(x2−x2)+Ef (x, x),

where|Ef (x, x)| ≤ Kf (x1, x2)((x1 − x1)2 + (x2 − x2)2).

Note that

f(x1, x2) +m1(x1, x2)(x1 − x1) +m2(x1, x2)(x2 − x2)

is a linear approximation to f(x) with quadratic error, the graph of whichrepresents the tangent plane to f(x) at x.

Letting x2 be constant equal to x2, we see that the partial derivativeof f(x1, x2) at (x1, x2) with respect to x1 is equal to m1(x1, x2), and wedenote this derivative by

∂f

∂x1(x1, x2) = m1(x1, x2).

Similarly, we say that the partial derivative of f(x) at x with respect to x2 isequal to m2(x1, x2) and denote this derivative by ∂f

∂x2(x1, x2) = m2(x1, x2).

These ideas extend in a natural way to real-valued functions f(x1, . . . , xd)of d real variables x1, . . . , xd, and we can speak about (and compute) partialderivatives of f(x1, . . . , xd) with respect to x1, . . . , xd following the samebasic idea. To compute the partial derivative ∂f

∂xjwith respect to xj for

some j = 1, . . . , d, we keep all variables but xj constant and compute theusual derivative with respect to xj . We shall return below to the conceptof partial derivative below, and through massive experience learn that itplays a basic role in mathematical modeling.

24.12 A Sum Up So Far

We have proved above that

Dxn =d

dxxn = nxn−1 for n integer and x = 0,

Dxr =d

dxxr = rxr−1 for r rational and x > 0.

We have also proved rules for how to differentiate linear combinations,products, quotients, compositions, and inverses of differentiable functions.This is just about all so far. We lack in particular answers to the followingquestions:


What function u(x) satisfies u′(x) = 1x?

What is the derivative of the function ax, where a > 0 is a constant?

Chapter 24 Problems

24.1. Construct and differentiate functions obtained by combining functions ofthe form xr using linear combinations, products, quotients, compositions, andtaking inverses. For example, functions like

√

x11 +

√x111

x−1.1 + x1.1.

24.2. Compute the partial derivatives of the function f : R ×R → R defined byf(x1, x2) = x2

1 + x42.

24.3. We have defined 2x for x rational. Let us try to compute the derivativeD2x = d

dx2x with respect to x at x = 0. We are then led to study the quotient

qn =2

1n − 1

1n

as n tends to infinity. (a) Do this experimentally using the computer. Note that

21n = 1 + qn

n, and thus we seek qn so that (1 + qn

n)n = 2. Compare with the

experience concerning (1 + 1n)n in Chapter A Very Short Course in Calculus.

24.4. Suppose you know how to compute the derivative of 2x at x = 0. What is

the derivative then at x = 0? Hint: 2x+1n = 2x2

1n .

24.5. Consider the function f : (0, 2) → R defined by f(x) = (1 + x4)−1 for0 < x < 1, f(x) = ax + b for 1 ≤ x < 2, where a, b ∈ R are constants. Forwhat values of a and b is this function (i) Lipschitz continuous on (0, 2), (ii)differentiable on (0, 2)?

24.6. Compute the partial derivatives of the function f : R3 → R given by

f(x1, x2, x3) = 2x21x3 + 5x3

2x43.

25Newton’s Method

Brains first and then Hard Work. (The House at Pooh Corner, Milne)

25.1 Introduction

As a basic application of the derivative, we study Newton’s method forcomputing roots of an equation f(x) = 0. Newton’s method is one of thecorner-stones of constructive mathematics. As a preparation we start outusing the concept of derivative to analyze the convergence of Fixed PointIteration.

25.2 Convergence of Fixed Point Iteration

Let g : I → I be uniformly differentiable on an interval I = (a, b) withderivative g′(x) satisfying |g′(x)| ≤ L for x ∈ I, where we assume thatL < 1. By Theorem 23.1 we know that g(x) is Lipschitz continuous on Iwith Lipschitz constant L, and since L < 1, the function g(x) has a uniquefixed point x ∈ I satisfying x = g(x).

We know that x = limi→∞ xi, where xi∞i=1 is a sequence generatedusing Fixed Point Iteration: xi+1 = g(xi) for i = 1, 2, . . .. To analyze theconvergence of Fixed Point Iteration, we assume that g(x) admits the fol-

392 25. Newton’s Method

lowing quadratic approximation close to x following the pattern of (24.26),

g(x) = g(x) + g′(x)(x− x) +12g′′(x)(x− x)2 + Eg(x, x), (25.1)

where |Eg(x, x)| ≤ Kg(x)|x− x|3. Choosing x = xi, setting ei = xi − x andusing x = g(x), we have for i large enough,

ei+1 = xi+1 − x = g(xi) − g(x) = g′(x)ei +12g′′(x)e2i + Eg(xi, x), (25.2)

where |Eg(xi, x)| ≤ Kg(x)|ei|3. This formula gives an expansion of the errorei+1 at step i+ 1 in terms of the different powers of ei.

If g′(x) = 0, then the linear term g′(x)ei dominates and

|ei+1| ≈ |g′(x)||ei|, (25.3)

which says that the error decreases with (approximately) the factor |g′(x)|at each step, and we then say that the convergence is linear. If g′(x) = 0.1,then we gain one decimal of accuracy in each step of Fixed Point Iteration.

As |g′(x)| decreases, the convergence becomes faster. An extreme casearises when g′(x) = 0. In this case, (25.2) implies

ei+1 =12g′′(x)e2i + Eg(xi, x),

so that neglecting the cubic term Eg(xi, x), we have

|ei+1| ≈12|g′′(x)|e2i . (25.4)

In this case the convergence is said to be quadratic, because the error |ei+1|is, up to the factor |g′′(x)/2|, the square of the error |ei|. If the convergenceis quadratic, then the number of correct decimals roughly doubles in eachstep.

25.3 Newton’s Method

In Chapter Fixed Point Iteration, we saw that the problem of finding a rootof an equation f(x) = 0, where f(x) is a given function, can be reformulatedas a fixed point equation x = g(x), with g(x) = x − αf(x) and α a non-zero constant to choose. In fact, one may choose α(x) to depend in x andreformulate f(x) = 0 as

g(x) = x− α(x)f(x),

if only α(x) = 0, where x is the root being computed. From above, weunderstand that a natural strategy is to choose α so as to make g′(x) as

25.4 Newton’s Method Converges Quadratically 393

small as possible. The ideal would be g′(x) = 0. Differentiating the equationg(x) = x− α(x)f(x) with respect to x, we get

g′(x) = 1 − α′(x)f(x) − α(x)f ′(x).

Assuming that f ′(x) = 0, and using f(x) = 0,

α(x) =1

f ′(x).

Setting α(x) = 1f ′(x) leads to Newton’s method for computing a root of

f(x) = 0: for i = 0, 1, 2, . . .

xi+1 = xi −f(xi)f ′(xi)

, (25.5)

where x0 is a given initial root approximation. Newton’s method corre-sponds to Fixed Point Iteration with

g(x) = x− f(x)f ′(x)

. (25.6)

Using Newton’s method, it is natural to assume that f ′(x) = 0, whichguarantees that f ′(xi) = 0 for i large if f ′(x) is Lipschitz continuous.

Example 25.1. We apply Newton’s method to compute the roots x =2, 1, 0,−0.5,−1.5 of the polynomial equation f(x) = (x − 2)(x − 1)x(x +.5)(x + 1.5) = 0. We have that f ′(x) = 0 for all roots x. We compute 21Newton iterations for f(x) = 0 starting with 400 equally spaced initialvalues in [−3, 3] and indicate the corresponding roots that are found inFig. 25.1. Each of the roots is contained in an interval in which all ini-tial values produce convergence to the root. But outside these intervalsthe behavior of the iteration is unpredictable with near-by initial valuesconverging to different roots.

25.4 Newton’s Method Converges Quadratically

We shall now prove that Newton’s method converges quadratically if theinitial approximation is good enough. We do this by computing the deriva-tive of the corresponding fixed point function defined by (25.6):

g′(x) = 1 − f ′(x)2 − f(x)f ′′(x)f ′(x)2

=f(x)f ′′(x)f ′(x)2

= 0,

where we used that f(x) = 0 and the assumption that f ′(x) = 0. Weconclude that Newton’s method converges quadratically if f ′(x) = 0. This


-3 1 2 3-1.5 -.5 0

Converges to -1.5

Converges to -.5

Converges to 0

Converges to 2

Converges to 1

Fig. 25.1. This plot shows the roots of f(x) = (x− 2)(x− 1)x(x+ .5)(x + 1.5)found by Newton’s method for 5000 equally spaced initial guesses in [−3, 3]. Thehorizontal position of the points shows the location of the initial guess and thevertical position indicates the twenty first Newton iterate

result holds if we start sufficiently close to x, so that in particular f ′(xi) = 0for all i.

A more direct way to see that Newton’s method converges quadratically,goes as follows. Subtract x from each side of (25.5) and use the fact thatf(xi) = −f ′(xi)(x−xi)−Ef(x, xi), obtained from the linearization formulaf(x) = f(xi) + f ′(xi)(x− xi) + Ef (x, xi) because f(x) = 0, to obtain

xi+1 − x = xi −f(xi)f ′(xi)

− x =Ef (x, xi)f ′(xi)

.

We conclude that

|xi+1 − x| = |Ef (x, xi

f ′(xi)| ≤ Kf

|f ′(xi)||xi − x|2,

which gives quadratic convergence if f ′(x) is bounded away from zero forx close to x.

25.5 A Geometric Interpretation of Newton’sMethod

There is an appealing geometric interpretation of Newton’s method. Let xi

be an approximation of a root x of f(x) = 0 satisfying f(x) = 0. Considerthe tangent line to y = f(x) at x = xi,

y = f(xi) + f ′(xi)(x − xi).

Let xi+1 be the x-value where the tangent line crosses the x-axis, seeFig. 25.2, that is let xi+1 satisfy f(xi) + f ′(xi)(xi+1 − xi) = 0, so that

xi+1 = xi −f(xi)f ′(xi)

, (25.7)

25.6 What Is the Error of an Approximate Root? 395

which is Newton’s method. We conclude that the iterate xi+1 in Newton’smethod is the intersection of the tangent line to f(x) at xi with the x-axis.In words: trying to find x, so that f(x) = 0, we replace f(x) by the linearapproximation

f(x) = f(xi) + f ′(xi)(x − xi),

that is by the tangent line at x = xi, and then compute xi+1 as the solu-tion of the equation f(x) = 0. We shall find that this approach to Newton’smethod is easy to generalize to systems of equations corresponding to find-ing roots of f(x) where f : R

n → Rn.

xi x xi+1

y = f(x)

y = f(xi) + f ′(xi)(x− xi)

Fig. 25.2. An illustration of one step of Newton’s method from xi to xi+1

25.6 What Is the Error of an Approximate Root?

Suppose xi is an approximation of a root x of a given equation f(x) = 0.Can we say something about the error xi − x from the knowledge of f(xi)?We will meet this question over and over again and we will refer to f(xi)as the residual of the approximation xi. For the exact root x, the residualis zero since f(x) = 0, and for the approximation xi, the residual f(xi) isnot zero (unless by some miracle xi = x, or xi is some root of f(x) = 0different from x).


Now, there is a very basic connection between the residual f(xi) and theerror xi − x that may be expressed as follows. Using the fact that f(x) = 0and assuming that f(x) is differentiable at x,

f(xi) = f(xi) − f(x) = f ′(x)(xi − x) + Ef (xi, x),

where |Ef (xi, x)| ≤ Kf(x)|xi − x|2. Assuming that f ′(x) = 0, we concludethat

xi − x ≈ f(xi)f ′(x)

, (25.8)

up to the error term (f ′(x))−1Ef (xi, x), which is quadratic in xi − x andthus much smaller than |xi − x| if xi is close to x, see Fig. 25.3.

x

y

x xi

y = f(x)

y = f(xi)

root error

residualf ′(x)(xi − x)

Fig. 25.3. The root error and the residual

The relation (25.8) shows that the root error xi − x is roughly propor-tional to the residual with the proportionality factor (f ′(x))−1, if xi is closeto x and f ′(x) is Lipschitz continuous near x = x. We summarize in thefollowing basic theorem (the full proof of which will be given below usingthe Mean Value theorem).

Theorem 25.1 If f(x) is differentiable in an interval I containing a rootx of f(x) = 0, and |f ′(x)|−1 ≤ M for x ∈ I, then an approximate rootxi ∈ I, satisfies |xi − x| ≤M |f(xi)|.

In particular, if f ′(x) is very small, then the root error may be largealthough the residual is very small. In this case the process of computingthe root x is said to be ill-conditioned.

Example 25.2. We apply Newton’s method to f(x) = (x−1)2−10−15x withroot x≈1.00000003162278. Here f ′(1)=−10−15 and f ′(x)≈0.0000000316,

25.6 What Is the Error of an Approximate Root? 397

0 5 10 15 20 2510−14

10−12

10−10

10−8

10−6

10−4

10−2

100

|xn − x|

|f(xn)|

Fig. 25.4. Plots of the residuals • and errors versus iteration number forNewton’s method applied to f(x) = (x − 1)2 − 10−15x with initial valuex0 = 1

so that f ′(xn) is very small for all xn close to x, and the problem seems tobe very ill-conditioned. We plot the errors and residuals versus iteration inFig. 25.4. We see that the residuals become small quite a bit faster thanthe errors.

Introducing the approximation (25.8) into the definition of Newton’smethod,

xi+1 = xi − f(xi)/f ′(xi),

we get the relation|xi − x| ≈ |xi+1 − xi|. (25.9)

In other words, as an estimate of the error of xi − x, we can compute anextra step of Newton’s method to get xi+1 and then use |xi+1 − xi| as anestimate of |xi − x|. This is an alternative way of estimating the root errorxi − x, where the derivative f ′(x) does not enter explicitly.

i |xi − x| |xi+1 − xi|0 .586 .51 .086 .0832 2.453 × 10−3 2.451× 10−3

3 2.124 × 10−6 2.124× 10−6

4 1.595× 10−12 1.595 × 10−12

5 0 0

Fig. 25.5. The error and error estimate for Newton’s method for f(x) = x2 − 2with x0 = 2


Example 25.3. We apply Newton’s method to f(x) = x2 − 2 and showthe error and error estimate (25.9) in Fig. 25.5. The error estimate doesa pretty good job.

25.7 Stopping Criterion

Suppose we want to compute an approximation of a root x of a givenequation f(x) = 0 with a certain accuracy, or error tolerance TOL > 0. Inother words, suppose we want to guarantee that

|xi − x| ≤ TOL, (25.10)

where xi is a computed approximation of the root x. For example, we maychoose TOL = 10−m corresponding to seeking an approximate root xi withm correct decimals. Can we find some stopping criterion that tells us whento stop an iterative process with an approximation xi satisfying (25.10)?The following criteria based on (25.8) presents itself: stop the iterativeprocess at step i if

|(f ′(xi))−1f(xi)| ≤ TOL. (25.11)

Up to the change of argument from x to xi, this criterion guarantees thedesired error control (25.10).

As an alternative stopping criterion for Newton’s method, we may use(25.9), that is accept the approximation xi with tolerance TOL if

|xi+1 − xi| ≤ TOL. (25.12)

25.8 Globally Convergent Newton Methods

In this chapter, we have proved quadratic convergence of Newton’s methodunder the assumption that we start close enough to the root of interest,that is we have prove local convergence of Newton’s method. To get a suf-ficiently good initial approximation we may use the Bisection algorithm.Thus, by using the Bisection algorithm in an initial search of roots and thenNewton’s method for each individual root, we may obtain a globally conver-gent method combining efficiency (quadratic convergence) with reliability(guaranteed convergence).


Chapter 25 Problems

25.1. (a) Verify theoretically that the fixed point iteration for

g(x) =1

2

(x+

a

x

)

with x =√a converges quadratically. (b) Try to say something about which

initial values guarantee convergence for a = 3 by computing some fixed pointiterations.

25.2. (a) Show analytically that Fixed Point Iteration for

g(x) =x(x2 + 3a)

3x2 + a

is third order convergent for computing x =√a. (b) Compute a few iterations for

a = 2 and x0 = 1. How many digits of accuracy are gained with each iteration?

25.3. (a) Consider Newton’s method applied to a differentiable function f(x)with f(x) = f ′(x) = 0, but f ′′(x) = 0, that is x is a double-root of f(x) = 0. Provethat Newton’s method in this case converges linearly, by proving that g′(x) = 1/2,where g(x) = x−f(x)/f ′(x). (b) What is the rate of convergence of the followingvariant of Newton’s method in the case of a double root: g(x) = x−2f(x)/f ′(x)?Hint: you may find it convenient to use l’Hopital’s rule.

25.4. Use Newton’s method to compute all the roots of f(x) = x5 + 3x4 − 3x3 −5x2 + 5x− 1.

25.5. Use Newton’s method to compute the smallest positive root of f(x) =cos(x) + sin(x)2(50x).

25.6. Use Newton’s method to compute the root x = 0 of the function

f(x) =

√x x ≥ 0

−√−x x < 0

Does the method converge? If so, is it converging at second order? Explain youranswer.

25.7. Apply Newton’s method to f(x) = x3 − x starting with x0 = 1/√

5. Is themethod converging? Explain your answer using a plot of f(x).

25.8. (a) Derive an approximate relation between the residual g(x)−x of a fixedpoint problem for g and the error of the fixed point iterate xn − x. (b) Devisetwo stopping criteria for a fixed point iteration. (c) Revise your fixed point codeto make use of (a) and (b).

25.9. Use Newton’s method to compute the root x = 1 of f(x) = x4 − 3x2 +2x. Is the method converging quadratically? Hint: you can test this by plotting|xn − 1|/|xn−1 − 1| for n = 1, 2, · · · .

25.10. Assume that f(x) has the form f(x) = (x − x)2h(x) where h is a dif-ferentiable function with h(x) = 0. (a) Verify that f ′(x) = 0 but f ′′(x) = 0. (b)Show that Newton’s method applied to f(x) converges to x at a linear rate andcompute the convergence factor.

26Galileo, Newton, Hooke, Malthusand Fourier

In a medium totally devoid of all resistance, all bodies would fallwith the same speed. (Galileo)

Everything that Galileo says about bodies falling in empty spaceis built without foundation; he ought first to have determined thenature of weight. (Descartes)

When we have the decrees of Nature, authority goes for nothing.(Aristotle)

Galileo has been the first to open the door to us to the whole realmof physics. (Hobbes)

Provando e riprovando (Verify one and disprove the other). (Galileo)

Measure what is measurable, and make measurable what is not so.(Galileo)

26.1 Introduction

In this chapter, we describe some basic models of physical phenomena thatinvolve the derivatives of some function(s) and which therefore are calleddifferential equations. The derivative is the fundamental tool for modelingin science and engineering. To formulate and solve differential equationshas been a basic part of science since the days when Newton’s Law ofMotion was formulated. Today, the computer is opening new possibilities ofmodeling through differential equations, which Newton and all his scientificfollowers through the centuries, could never even dream of.

402 26. Galileo, Newton, Hooke, Malthus and Fourier

We thus formulate a couple of very basic differential models in this chap-ter, and we will spend a large part of the rest of this book to solve theseequations or related generalizations using analytical or numerical tech-niques.

26.2 Newton’s Law of Motion

Newton’s Law of motion is one of the cornerstones of the Newtonian physicsthat serves to describe much of the world that we live in. Newton’s lawstates that the derivative with respect to time t of the momentum m(t)v(t)of a body, which is the product of the mass m(t) and the velocity v(t) ofthe body, is equal to the force f(t) acting on the body, that is,

(mv)′(t) = f(t). (26.1)

If m(t) = m is independent of time, then we can write Newton’s Law as

mv′(t) = f(t), (26.2)

or in its most familiar form:

ma(t) = f(t), (26.3)

wherea(t) = v′(t),

is the acceleration.Since the velocity is the derivative with respect to time of the position

s(t), that is, v(t) = s′(t), we can write Newton’s Law (26.2) in the case ofconstant mass, as follows:

ms′′(t) = f(t). (26.4)

If we think of the force f(t) as a given function of time t, then (26.2) and(26.4) represent differential equations for the velocity v(t) or the positions(t). The differential equation thus involves some known function (f(t)),and the unknown is again a function (v(t) or s(t)). The differential equationthus typically involves the derivative of an unknown function, and othergiven functions act as data. Note that typically we seek a function s(t)satisfying the differential equation (26.4) not just at a particular instant oftime t, but for all t in some interval.

26.3 Galileo’s Law of Motion

Galileo (1564–1642), mathematician, astronomer, philosopher, co-founderof the Scientific Revolution, performed his famous experiments dropping

26.3 Galileo’s Law of Motion 403

objects form the Tower of Pisa and counting the time for the objects tohit the ground, or using an inclining plane, see Fig. 26.1, to demonstratehis Laws of Motion. Galileo, condemned 1632 by the Catholic Church forquestioning the idea that the Earth is the center of Universe, tried throughthese experiments to understand the nature of motion, a topic that alreadythe Greeks were obsessed by.

Fig. 26.1. Galileo demonstrating Laws of Motion

Galileo found that that an object close to the surface of the earth, thatis acted upon only by the vertical gravity force, has a constant verticalacceleration independent of the mass or position of the body. This factmay be viewed as a special case of Newton’s Law ma(t) = f(t), where mis the mass of the body and a(t) its vertical acceleration, and the gravityforce f(t) is given by f(t) = mg, where g ≈ 9.81 (meter/sec2), is the famousphysical constant referred to as the acceleration of gravity at the surface ofthe Earth. Both Galileo and Newton would conclude that the accelerationa(t) = g is independent of m and the position of the body as long as thebody is not far from the surface of the Earth.

We now study the motion of one of the objects dropped vertically byGalileo from the Tower. Assuming that the mass of the object is m, andletting s(t) denote the height above ground of the object at time t, withthus the positive direction upwards, see Fig. 26.2. In this coordinate system,a positive velocity v(t) = s′(t) corresponds to the object moving upwardswhile a negative velocity means that the object is moving downwards. New-


s = 0

s(t)

s(0) = s0

the ground

initial height

Fig. 26.2. The coordinate system describing the position of a free falling objectwith the initial height s(0) = s0 at time t = 0

ton’s Law states that

ms′′ = −mg, or mv′(t) = −mg

where a minus sign enters, because the gravity force is direct downwards.We thus have, dividing out the common factor of the mass m in the spiritof Galileo,

s′′(t) = v′(t) = −g. (26.5)

We conclude, using the fact f ′(t) = c for f(t) = ct and c a constant, that

v(t) = −gt+ c, (26.6)

where c is a constant to determine. We check that ddt (−gt + c) = −g for

all t. To pick out the one line that gives the velocity of the falling object ina concrete case, it is sufficient to know the velocity at some specific time.For example, if we suppose that the initial speed v(0) at time t = 0 isknown, v(0) = v0, then we get the solution

v(t) = −gt+ v0, for t > 0. (26.7)

For example, if v0 = 0, then the upward velocity is v(t) = −gt, and thusthe downward velocity gt increases linearly with time. This is what Galileoobserved (to his surprise).

Having now solved for the velocity v(t) according to (26.7), we can nowseek to find the position s(t) by solving the differential equation s′(t) = v(t),that is

s′(t) = −gt+ v0 for t > 0. (26.8)

26.4 Hooke’s Law 405

Recalling that (t2)′ = 2t, and that (cf(t))′ = cf ′(t), it is natural to believethat the solution of (26.8) would be the quadratic function

s(t) = −g2t2 + v0t+ d,

where d is a constant, since this function satisfies (26.8). To determine s(t)we need to determine the constant d. Typically, we can do this if we knows(t) for some specific value of t, for example, if we know the initial positions(0) = s0 at time t = 0. In this case we conclude that d = s0, and thus wehave the solution formulas

s(t) = −g2t2 + v0t+ s0, v(t) = −gt+ v0 for t > 0, (26.9)

giving the position s(t) and velocity v(t) as functions of time t for t > 0,if the initial position s(0) = s0 and velocity v(0) = v0 are given. Wesummarize (the uniqueness of the solution will be settled below).

Theorem 26.1 (Galileo) Let s(t) and v(t) denote the vertical position andvertical velocity of an object subject to free fall at time t ≥ 0 with constantacceleration of gravity g, with the upward direction being positive. Thens(t) = − g

2 t2 + v0t+ s0 and v(t) = −gt+ v0, where s(0) = s0 and v(0) = v0

are given initial position and velocity.

Example 26.1. If the initial height of the object is 15 meter and it isdropped from rest, what is the height at t = .5 sec? We have

s(.5) = −9.82

(.5)2 + 0 × .5 + 15 = 13.775 meter

If initially it is thrown upwards at 2 meter/sec, the height at t = .5 sec is

s(.5) = −9.82

(.5)2 + 2 × .5 + 15 = 14.775 meter

If initially it is thrown downwards at 2 meter/sec, the height at t = .5 secis

s(.5) = −9.82

(.5)2 − 2 × .5 + 15 = 12.775 meter

Example 26.2. An object starting from rest is dropped and hits the groundat t = 5 sec. What was its initial height? We have s(5) = 0 = − 9.8

2 52 +0×5 + s0, and so s0 = 122.5 meter.

26.4 Hooke’s Law

Consider a spring of length L hanging vertically in equilibrium from theceiling. Fix a coordinate system with x-axis directed downwards with the


origin at the ceiling, and let f(x) be the weight per unit length of the springas a function of distance x to the ceiling. Imagine that the spring first isweightless (turn off gravity for a while) with corresponding zero tensionforce. Imagine then that you gradually turn on the gravity force and watchhow the spring stretches under its increasing own weight. Let u(x) be thecorresponding displacement of a point in position x in the un-stretchedinitial position. Can we determine the displacement u(x) of the spring andthe tension force σ(x) in the spring as a function of x at full power of thegravity?

Hooke’s Law for a linear (ideal) spring states that the tension force σ(x)is proportional to the deformation u′(x),

σ(x) = Eu′(x) (26.10)

where E > 0 is a spring constant which we may refer to as the modulus ofelasticity of the spring. Note that u′(x) measures change of displacementu(x) per unit of x, which is deformation.

The weight of the spring from position x to position x + h for h > 0is approximately equal to f(x)h, since f(x) is weight per unit length, andthis weight should be balanced by the difference of tension force:

−σ(x+ h) + σ(x) ≈ f(x)h, that is − σ(x+ h) − σ(x)h

≈ f(x),

which gives the equilibrium equation −σ′(x) = f(x). Altogether, we get thefollowing differential equation, since E is assumed constant independent ofx,

−Eu′′(x) = f(x), for 0 < x < L. (26.11)

Here f(x) is a given function, and we seek the displacement u(x) satisfyingthe differential equation (26.11). To determine u(x) we need to specify that,for example, u(0) = 0, expressing that the spring is attached to the ceiling,and u′(L) = 0, expressing that the tension force is zero at the free end of thespring. If f(x) = 1 corresponding to a homogenous spring, and assumingE = 1 and L = 1, then we get the displacement u(x) = x − x2

2 , and thetension σ(x) = 1 − x.

Hooke (1635–1703) was A Curator at the Royal Society in London. Ev-ery week, except during the Summer vacation, Hooke had to demonstratethree or four experiments proving new laws of nature. Among other things,Hooke discovered the cellular structure of plants, the wave nature of light,Jupiter’s red spots, and (probably before Newton!) the inverse square lawof gravitation.

26.5 Newton’s Law plus Hooke’s Law

Consider a mass on a frictionless table connected to a spring attached toa wall. Hooke’s Law for a spring states that the spring force arising when

26.6 Fourier’s Law for Heat Flow 407

stretching or compressing the spring is proportional to the change of lengthof the spring from its rest state with zero force. We choose a horizontal x-axis in the direction of the spring so that the mass is located at x = 0when the spring is at its rest state, and x > 0 corresponds to stretchingthe spring to the right, see Fig. 26.3. Let now u(t) be the position of themass at time t > 0. Hooke’s Law states that the force f(t) acting on themass is given by

f(t) = −ku(t), (26.12)

where the constant of proportionality k = E/L > 0 is the spring constant,with L the (natural) length of the spring. On the other hand, Newton’sLaw, states that mu′′(t) = f(t), and thus we get the following differentialequation for the mass-spring system:

mu′′(t) = −ku(t), or mu′′(t) + ku(t) = 0, for t > 0. (26.13)

We will see that solving this differential equation specifying the initial po-sition u(0) and the initial velocity u′(0), will lead us into the world of the

trigonometric functions sin(ωx) and cos(ωx) with ω =√

km .

u=0 u > 0

Fig. 26.3. Illustration of the coordinate system used to describe a spring-masssystem. The mass is allowed to slide freely back and forth with no friction

26.6 Fourier’s Law for Heat Flow

Fourier was one of the first mathematicians to study the process of con-duction of heat. Fourier developed a mathematical technique for this pur-pose using Fourier series, where “general functions” are expressed as lin-ear combinations of the trigonometric functions sin(nx) and cos(nx) withn = 1, 2, . . ., see the Chapter Fourier series. Fourier created the simplestmodel of heat conduction stating that the heat flow is proportional to thetemperature difference, or more generally temperature gradient, that is rateof change of temperature.


We now seek a simple model of the following situation: You live yourlife in Northern Scandinavia and need to electrically heat your car enginebefore being able to start the car in the morning. You may ask the question,when you should start the heating and what effect you should then turnon. Short time, high effect or long time small effect, depending on price ofelectric power which may vary over day and night.

A simple model could be as follows: let u(t) be the temperature of thecar engine at time t, let q+(t) be the flow of heat per unit time from theelectrical heater to the engine, and let q−(t) be the heat loss from the engineto the environment (the garage). We then have

λu′(t) = q+(t) − q−(t),

where λ > 0 is a constant which is referred to as the heat capacity. Theheat capacity measures the rise of temperature of unit added heat.

Fourier’s Law states that

q−(t) = k(u(t) − u0),

where u0 is the temperature of the garage. Assuming that u0 = 0 for sim-plicity, we thus get the following differential equation for the temperatureu(t):

λu′(t) + ku(t) = q+(t), for t > 0, u(0) = 0, (26.14)

where we added the initial condition u(0) = 0 stating that the enginehas the temperature of the garage when the heater is turned on at timet = 0. We may now regard q+(t) as a given function, and λ and k as givenfunctions, and seek the temperature u(t) as a function of time.

26.7 Newton and Rocket Propulsion

Let’s try to model the flight of a rocket far out in space away from anygravitational forces. When the rocket’s engine is fired, the exhaust gasesfrom the burnt fuel shoot backwards at high speed and the rocket movesforward so as to preserve the total momentum of the exhaust plus rocket,see Fig. 26.4. Assume the rocket moves to the right along a straight linewhich we identify with an x-axis. We let s(t) denote the position of therocket at time t and v(t) = s′(t) its velocity and we let m(t) denote themass of the rocket at time t. We assume that the exhaust gases are ejectedat a constant velocity u > 0 relative to the rocket in the direction of thenegative x-axis. The total mass me(t) of the exhaust at time t is me(t) =m(0)−m(t), where m(0) is the total mass of the rocket with fuel at initialtime t = 0.

Let ∆me be the exhaust released from the rocket during a time interval(t, t + ∆t). Conservation of total momentum of rocket plus exhaust over

26.7 Newton and Rocket Propulsion 409

t = t0

Fig. 26.4. A model of rocket propulsion

the time interval (t, t+ ∆t), implies that

(m(t) − ∆me)(v(t) + ∆v) + ∆me(v(t) − u) = m(t)v(t),

where ∆v is the increase in speed of the rocket and ∆me(v(t)−u) the mo-mentum of the exhaust released over (t, t+∆t). Recall that the momentumis the product of mass and velocity. This leads, using that ∆me = −∆mwith ∆m the change of rocket mass, to the relation

m(t)∆v = −u∆m.

Dividing by ∆t, we are led to the differential equation

m(t)v′(t) = −um′(t) for t > 0. (26.15)

We shall show below that the solution can be expressed as (anticipatingthe use of the logarithm function log(x)), assuming v(0) = 0,

v(t) = u log(m(0)m(t)

)

, (26.16)

connecting the velocity v(t) to the mass m(t). Typically, we may knowm(t) corresponding to firing the rocket in a specific way, and we may thendetermine the corresponding velocity v(t) from (26.16). For example, wesee that, when half the initial mass has been ejected at speed u relativeto the rocket, then the speed of the rocket is equal to u log(2) ≈ 0.6931 u.Note that this speed is less than u, because the speed of the exhaust ejectedat time t varies from −u for t = 0 to v(t) − u for t > 0.

If the rocket engine would be fired so that the absolute speed of the ex-haust was always −u (that is at an increasing speed relative to the rocket),then conservation of momentum would state that

m(t)v(t) = (m(0) −m(t))u, that is v(t) = u

(m(0)m(t)

− 1)

.

In this case, the speed of the rocket would be equal to u when m(t) =12m(0).


26.8 Malthus and Population Growth

Thomas Malthus (1766–1834), English priest and economist, developeda model for population growth in his treatise An Essay on the Theoryof Population, Fig. 26.5. His study made him worried: the model indicatedthat the population would grow quicker than available resources. As a cureMalthus suggested late marriages. Malthus model was the following: Sup-pose the size of a population at time t is measured by a function u(t), whichis Lipschitz continuous in t. This means that we consider a quite large pop-ulation and normalize so that the total number of individuals P (t) of thepopulation is set equal to u(t)P0, where P0 is the number of individuals attime t = 0, say, so that u(0) = 1. Malthus model for population growth isthen

u′(t) = λu(t), for t > 0, u(0) = 1, (26.17)

where λ is a positive constant. Malthus thus assumes that the rate of growthu′(t) is proportional to the population u(t) at each given instant with a rateof growth factor λ, which may be the difference between the rates of birthand death (considering other factors such as migration negligible).

Fig. 26.5. Thomas Malthus:“I think I may fairly make two postulata. First, Thatfood is necessary to the existence of man. Secondly, That the passion betweenthe sexes is necessary and will remain nearly in its present state”

Letting tn = kn, n = 0, 1, 2, . . . be sequence of discrete time steps withconstant time step k > 0, we can compare the differential equation (26.17),to the following discrete model

Un = Un−1 + kλUn−1, for n = 1, 2, 3 . . . , U0 = 1. (26.18)

Formally, (26.18) arises form u(tn) ≈ u(tn−1) + ku′(tn−1) inserting thatu′(tn−1) = λu(tn−1) and viewing Un as an approximation of u(tn). We are

26.9 Einstein’s Law of Motion 411

familiar with the model (26.18), and we know that the solution is given bythe formula

Un = (1 + kλ)n, for n = 0, 1, 2, . . . (26.19)

We expect thus that

u(kn) ≈ (1 + kλ)n, for n = 0, 1, 2, . . . (26.20)

or

u(t) ≈(

1 +t

nλ

)n

, for n = 0, 1, 2, . . . (26.21)

We shall below prove that the differential equation (26.17) has a uniquesolution u(t), which we will write as

u(t) = exp(λt) = eλt, (26.22)

where exp(λt) = eλt is the famous exponential function. We will prove that

exp(λt) = eλt = limn→∞

(

1 +λt

n

)n

, (26.23)

corresponding to the fact that Un will be an increasingly good approxima-tion of u(t) for a given t = nk, as n tends infinity and thus the time stepk = t

n tends to zero.In particular, we have setting λ = 1, that u(t) = exp(t) is the solution

of the differential equation

u′(t) = u(t), for t > 0, u(0) = 1. (26.24)

We plotted the function exp(t) in Fig. 4.4, from which the worries ofMalthus become understandable. The exponential function grows reallyquickly after some time!

Many physical situations are modeled by (26.17) with a ∈ R, such asearnings from compound interest (a > 0), and radioactive decay (a < 0).

26.9 Einstein’s Law of Motion

Einstein’s version of Newton’s Law (m0v(t))′(t) = f(t), modeling the mo-tion of a particle of mass m0 moving in a straight line with velocity v(t)under the influence of a force f(t), is

d

dt

(m0

√1 − v2(t)/c2

v(t)

)

= f(t), (26.25)


where m0 is the mass of the particle at zero speed and c ≈ 3 × 108 m/s isthe speed of light in a vacuum. Setting

w(t) =v(t

√1 − v2(t)/c2

,

we can write Einstein’s equation

w′(t) =f(t)m0

,

and assuming f(t) = F is constant, and v(0) = 0, we obtain w(t) = Fm0t,

which gives the following equation for v(t)

v(t)√

1 − v2(t)/c2=

F

m0t.

Squaring both sides, we get

v2(t)1 − v2(t)/c2

=F 2

m20

t2.

and solving for v(t), we get

v2(t) = c2F 2t2

m20c

2 + F 2t2. (26.26)

We conclude that v2(t) < c2 for all t: Einstein says that no object can beaccelerated to a velocity bigger or equal to the velocity of light!

To determine the position of the particle, we therefore have to solve thedifferential equation

s′(t) = v(t) =cF t

√m2

0c2 + F 2t2

. (26.27)

We will return to this task below.

26.10 Summary

We have above derived differential equation models of the following basicforms:

u′(x) = f(x) for x > 0, u(0) = u0, (26.28)

u′(x) = u(x) for x > 0, u(0) = u0, (26.29)

u′′(x) + u(x) = 0 for x > 0, u(0) = u0, u′(0) = u1, (26.30)

−u′′(x) = f(x) for 0 < x < 1, u(0) = 0, u′(1) = 0. (26.31)


Here we view the function f(x), and the constants u0 and u1 as given, andwe seek the unknown function u(x) satisfying the differential equation forx > 0.

We shall see in Chapter The Fundamental Theorem of Calculus that thesolution u(x) to (26.28) is given by

u(x) =∫ x

0

f(y) dy + u0.

Integrating twice, we can the write the solution u(x) of (26.31) in termsof the data f(x) as

u(x) =∫ x

0

g(y) dy, u′(y) = g(y) = −∫ y

1

f(z) dz, for 0 < x < 1.

We shall see in Chapter The exponential function exp(x) that the solutionto (26.29) is u(x) = u0 exp(x). We shall see in Chapter Trigonometricfunctions that the solution u(x) to (26.30) is sin(x) if u0 = 0 and u1 = 1,and cos(x) if u0 = 1 and u1 = 0.

We conclude that the basic elementary functions exp(x), sin(x) andcos(x) are solutions to basic differential equations. We shall see that itis fruitful to define these elementary functions as the solutions of the cor-responding differential equations, and that the basic properties of the thefunctions can easily be derived using the differential equations. For ex-ample, the fact that D exp(x) = exp(x) follows from the defining differ-ential equation Du(x) = u(x). Further, the fact that D sin(x) = cos(x)follows by differentiating D2 sin(x) + sin(x) = 0 with respect to x toget D2(D sin(x)) + D sin(x) = 0, which expresses that D sin(x) solvesu′′(x) + u(x) = 0 with initial conditions u0 = 1 and u1 = Du(0) =D2 sin(0) = − sin(0) = 0, that is D sin(x) = cos(x)!

Chapter 26 Problems

26.1. An object is dropped from height s0 = 15 m. After how long time doesit hit the ground? How much does an initial velocity v0 = −2 delay the timeof hitting the ground? Use the (crude) approximation g ≈ 10 to simplify theanalysis. Which initial velocity v0 is required to double the time of the fall?

26.2. Find the displacement of a hanging spring of length L = 1 and E = 1with a non-uniform weight given by f(x) = x.

26.3. (Tricky!) Find the displacement of a hanging spring of length L = 2 whichhave been tied together of two springs of length L = 1 with (a) E = 1 and E = 2,respectively, assuming the weight of the springs to be uniform with f(x) = 1, (b)both with E = 1 but with weights f = 1 and f = 2 per unit length, respectively.


26.4. Derive (26.12) from (26.10). Hint: u as in (26.12) corresponds to thedisplacement in (26.10) at x = L being L+ u, and the displacement being linearin x.

26.5. Develop more complicated population models in the form of systems ofdifferential equations.

26.6. Derive differential equation models for systems of springs coupled in seriesor in parallel.

26.7. Derive a differential equation model for a “bungee jump” of a body hookedup to a fixed support with a rubber band and attracted by gravity.

26.8. You leave a hot cup of coffee on the table in a room with temperature20o Celsius. Put up an equation for the rate of change of the temperature of thecoffee with time.

Show that the median, hce che ech, interecting at royde angles theparilegs of a given obtuse one biscuts both the arcs that are incurveachord behind. Brickbaths. The family umbroglia. A Tullagrovepole to the Heigt of County Fearmanagh has a septain inclininaisonand the graphplot for all the functions in Lower County Monachan,whereat samething is rivisible by nighttim. may be involted into thezerois couplet, palls pell inhis heventh glike noughty times ∞, find,if you are literally cooefficient, how minney combinaisies and permu-tandis can be played on the international surd! pthwndxrclzp!, hidscubid rute being extructed, takin anan illitterettes, ifif at a tom.Answers, (for teasers only). (Finnegans Wake, James Joyce)

References

[1] L. Ahlfors, Complex Analysis, McGraw-Hill Book Company, NewYork, 1979.

[2] K. Atkinson, An Introduction to Numerical Analysis, John Wileyand Sons, New York, 1989.

[3] L. Bers, Calculus, Holt, Rinehart, and Winston, New York, 1976.

[4] M. Braun, Differential Equations and their Applications, Springer-Verlag, New York, 1984.

[5] R. Cooke, The History of Mathematics. A Brief Course, John Wileyand Sons, New York, 1997.

[6] R. Courant and F. John, Introduction to Calculus and Analysis,vol. 1, Springer-Verlag, New York, 1989.

[7] R. Courant and H. Robbins, What is Mathematics?, Oxford Uni-versity Press, New York, 1969.

[8] P. Davis and R. Hersh, The Mathematical Experience, HoughtonMifflin, New York, 1998.

[9] J. Dennis and R. Schnabel, Numerical Methods for UnconstrainedOptimization and Nonlinear Equations, Prentice-Hall, New Jersey,1983.

416 References

[10] K. Eriksson, D. Estep, P. Hansbo, and C. Johnson, Computa-tional Differential Equations, Cambridge University Press, New York,1996.

[11] I. Grattan-Guiness, The Norton History of the Mathematical Sci-ences, W.W. Norton and Company, New York, 1997.

[12] P. Henrici, Discrete Variable Methods in Ordinary Differential Equa-tions, John Wiley and Sons, New York, 1962.

[13] E. Isaacson and H. Keller, Analysis of Numerical Methods, JohnWiley and Sons, New York, 1966.

[14] M. Kline, Mathematical Thought from Ancient to Modern Times,vol. I, II, III, Oxford University Press, New York, 1972.

[15] J. O’Connor and E. Robertson, The MacTutor His-tory of Mathematics Archive, School of Mathematics andStatistics, University of Saint Andrews, Scotland, 2001.http://www-groups.dcs.st-and.ac.uk/∼history/.

[16] W. Rudin, Principles of Mathematical Analysis, McGraw–Hill BookCompany, New York, 1976.

[17] T. Ypma, Historical development of the Newton-Raphson method,SIAM Review, 37 (1995), pp. 531–551.

Index

K, 1032L2 projection, 746Vh, 1036N, 52Q, 81R, 197δz, 1008ε−N definition of a limit, 170λi, 1037ϕi, 1037τK , 1032h refinement, 1033hK , 1032Nh, 1032Sh, 1032Th, 1032

a posteriori error estimate, 668, 765,767, 828, 1059

a priori error estimate, 661, 667, 764,766

Abacus, 4absolutely convergent series, 548acceleration, 402action integral, 707adaptive

algorithm, 768, 1059

error control, 768, 1059adaptive error control, 825advancing front, 1033alternating series, 548Ampere’s law, 962, 1003analytic function, 1102analytic geometry, 265angle, 316arc length, 893Archimedes’ principle, 971, 976arclength, 899area, 443, 916

of triangle, 290automatic step-size control, 826automatized computation, 3automatized production, 3

Babbage, 4Babbage-Scheutz Difference Ma-

chine, 4backward Euler, 578bandwidth, 655barbers paradox, 226basis, 297, 609Bernouilli’s law, 1133bisection algorithm, 188, 215block structure, 1052

418 Index

Bolzano, 216Bolzano’s theorem, 216boundary condition

Dirichlet, 989essential, 763, 1064natural, 763, 1065Neumann, 989Robin, 763, 989

boundary value problem, 755buoyancy force, 977butterfly effect, 843

Cantor, 230capacitor, 725Cauchy sequence, 192, 196Cauchy’s inequality, 598

weighted, 765Cauchy’s representation formula,

1120Cauchy’s theorem, 1119Cauchy-Riemann equations, 1103,

1104Cauchy-Schwarz inequality, 598center of mass, 971centripetal acceleration, 890change of basis, 633change of order of integration, 911change of variable, 937chaos, 843charge, 725, 1003, 1099children, 1033Cholesky’s method, 654circle of curvature, 901compact factorization, 654complex integral, 1115, 1116complex number, 345complex plane, 345computational cost, 219computer representation of num-

bers, 180condition number, 660conformal mapping, 1110conjugate harmonic function, 1108conservation

heat, 988conservation of energy, 699conservation of mass, 993conservative system, 699constitutive equation, 989

constitutive law, 993constructivist, 227continuous Galerkin cG(1), 578continuous Galerkin method cG(1),

826contour plot, 809contraction mapping, 246Contraction mapping theorem, 800coordinate system, 267Coriolis

acceleration, 890force, 887

Coulomb’s law, 1003Cramer’s formula, 624crash model, 718current, 725curvature, 900curve, 784curve integral, 893

dashpot, 710de Moivres formula, 514deca-section algorithm, 193decimal expansions

non-periodic, 76periodic, 76

delta function, 1008derivative, 357

of xn, 362chain rule, 378computation of, 367definition, 360inverse function, 385linear combination rule, 376one-sided, 381quotient rule, 379

Descartes, 101determinant, 617diagonal matrix, 647diagonally dominant, 678diameter of a triangle, 1032dielectric constant, 1003, 1099difference quotient, 366differentiability

uniform, 370differentiable, 369, 788, 1025differentiation under the integral

sign, 806dinner Soup model, 25

Index 419

directional derivative, 797divergence, 879, 880Divergence theorem, 946, 955domain, 104double integral, 905

Einstein, 411Einstein’s law of motion, 411elastic membrane, 994elastic string, 731electric circuit, 725electric conductivity, 1003, 1099electric current, 1003, 1099electric displacement, 1003, 1099electric field, 1003, 1004, 1099electric permittivity, 1003electrical circuit, 725electrostatics, 1004element, 1032

basis function, 1037stiffness matrix, 1054

elementary functions, 517elementary row operations, 648elliptic, 991energy, 1055energy norm, 1056Engdahl, 375ENIAC, 4equidistribution of error, 483, 768,

1063error representation formula, 741,

767, 1040, 1058essential boundary condition, 763,

1064Euclid, 87Euclidean norm, 275Euclidean plane, 99, 267Euler equations, 1002Eulerian description, 997existence of minimum point, 868

Faraday’s law, 1003fill-in, 657five-point formula, 996fixed point, 245floating point arithmetic, 180flow in a corner, 1131force field, 897formalist school, 224

forward Euler, 578Fourier, 407Fourier’s law, 408, 989Fredholm, 1011Fredholm integral equation, 1011front, 1033function, 103

ax, 497polynomial, 119

function y = xr, 241functions

combinations of, 141rational, 143several variables, 163

fundamental solution, 1009Fundamental Theorem of Calculus,

428, 440fundamental theorem of linear alge-

bra, 632

Galileo, 402Gauss, 101Gauss transformation, 649Gauss’ theorem, 943, 946, 953, 955Gauss-Seidel method, 666geometrically orthogonal, 280global basis function, 1037global stiffness matrix, 1054GPS, 11, 94GPS navigator, 269gradient, 791, 880, 883gradient field, 898Gram-Schmidt procedure, 629gravitational field, 1007greatest lower bound, 874Green’s formula, 943, 946, 953, 955Gulf Stream, 891Gustafsson, Lars, 195

hanging chain, 510hat function, 743heat

capacity coefficient, 988conduction, 987conductivity, 989flux, 988source, 988

heat equation, 990Hilbert, 1011

420 Index

Hooke, 405Hooke’s law, 405

identity matrix, 305ill-conditioned matrix, 660implicit differentiation, 386Implicit Function theorem, 804, 811,

813income tax, 354incompressible, 999independent variable, 105induction, 728

mutual, 728inductor, 725infinite decimal expansion, 195, 582initial value problem

general, 571scalar autonomous, 555second order, 577separable scalar, 563

integer, 47computer representation of, 59division with remainder, 57

integraladditivity over subintervals,

450change of variables, 455linearity, 452monotonicity, 453

integral equation, 1011integration by parts, 457, 946interior minimum point, 869intermediate value theorem, 216intuitionist, 227invariance

orthogonal transformation, 340inverse

of matrix, 336Inverse Function theorem, 804inverse matrix, 625inversion, 1112irrotational, 968irrotational flow, 1131isobars, 887isotropic, 883isotropy, 1032iterated integration, 934iterated one-dimensional integra-

tion, 911

iteration matrix, 664iterative method, 657

Jacobi method, 666Jacobian, 788, 1025Jacquard, 4

Kirchhoff’s laws, 727Kronecker, 230

Lagrange, 693Lagrangian description, 998Laplace, 1007Laplacian, 879, 881, 884

polar coordinates, 881, 1028spherical coordinates, 885

Laurent series, 1124LCR-circuit, 725least squares method, 634Leibniz, 104, 428Leibniz’ teen-age dream, 41level curve, 658, 809level surface, 812liars paradox, 226limit, 177

computation of, 177line, 323line integral, 893, 898linear combination, 277, 599linear convergence, 661linear function, 611linear independence, 297, 601, 682linear mapping, 299linear oscillator, 712

damped, 713linear transformation, 338, 612linearization, 791linearly independent, 337Lipschitz continuity, 149, 205

boundedness, 159composition of functions, 161generalization, 243linear combinations, 157linear function, 150monomials, 156product of functions, 160quotient of functions, 160

Lipschitz continuous, 786, 1025

Index 421

Lipschitz continuous functionconverging sequence, 175

load vector, 759, 1048, 1052logarithm, 469logicists, 224logistic equation, 558long division, 77Lorenz, 843Lorenz system, 844lower triangular matrix, 648lumped mass quadrature, 1043

Mobius transformation, 1112magnetic field, 885, 1003, 1099magnetic flux, 1003, 1099magnetic permeability, 1003, 1099magnetostatics, 1006Malthus, 410marginal cost, 354mass conservation, 998mass-spring system, 695mass-spring-dashpot systems, 709matrix, 300, 333, 612

factorization, 649ill-conditioned, 660multiplication, 613, 683

matrix addition, 303matrix multiplication, 303Maxwell, 1003Maxwell’s equations, 1003Mean Value theorem, 793medical tomography, 12mesh, 743

isotropy, 1032mesh function, 483, 743, 1032mesh modification criterion, 768minimization method, 658minimization problem, 658, 1055minimum point, 866minimum value, 866model

crash, 718infection, 568marriage crisis, 722national economy, 569population, 722spread of infection, 722stock market, 722symbiosis, 722

transition to turbulence, 722moment of inertia, 929, 940muddy yard model, 28multi-grid method, 1055multiplication by scalar, 599

N-body system, 705natural boundary condition, 763,

1065natural logarithm, 469natural number, 47Navier-Stokes equations, 1002navigator, 269Newton, 981Newton’s Inverse Square Law, 981Newton’s Law of gravitation, 1009Newton’s Law of motion, 402Newton’s method, 391, 805nightmare, 981nodal basis function, 743non-Euclidean geometry, 101non-periodic decimal expansion, 196norm, 275

energy, 1056norm of a symmetric matrix, 616,

644numerical quadrature, 476

Ohm’s law, 1003optimal mesh, 1060optimization, 865ordered n-tuples, 596, 682ordered pair, 271orthogonal, 315orthogonal complement, 628orthogonal decomposition, 282, 628orthogonal matrix, 338, 630orthogonal projection, 746orthogonalization, 629

parallellines, 294

parallelogram law, 273parametrization, 785parents, 1033partial derivative, 388partial derivatives of second order,

798partial fractions, 523

422 Index

partial pivoting, 653particle paths, 999partition, 743Peano axiom system, 229pendulum

double, 699fixed support, 696moving support, 697

periodic decimal expansion, 196permutation, 617pivoting, 652plane, 324Poincare inequality, 1018point mass, 1008Poisson’s equation, 991, 1046, 1062

minimization problem, 1055variational formulation, 1047

Poisson’s equation on a square, 1049polar coordinates, 919polar representation, 276polynomial, 119

coefficients, 119positive series, 545positive-definite, 760potential, 898potential field, 967potential flow, 999, 1131potential theory, 1130power series, 1123precision, 1043prime number, 58principle of equidistribution, 1060principle of least action, 693projection, 281, 302, 316

onto a subspace, 626point onto a line, 294point onto a plane, 328

Pythagoras, 87Pythagoras’ theorem, 87

QR-decomposition, 631quadrature, 429

adaptive, 482endpoint rule, 480lumped mass, 1043midpoint rule, 480trapezoidal rule, 480

quadrature error, 478quarternions, 346

radius of curvature, 901range, 104rate of change, 353rate of convergence, 661rational number, 71Rayleigh quotient, 1012Reagan, 943real number, 197

absolute value, 200addition, 197Cauchy sequence, 203, 582comparison, 201division, 200multiplication, 200

reference triangle, 1050refinement strategy, 1060residual error, 668, 767, 1058residue calculus, 1126Residue Theorem, 1127resistor, 725Riemann sum, 916, 936rigid transformations, 883Robin boundary conditions, 763rocket propulsion, 408rotation, 285, 879, 881

scalar product, 315, 597search direction, 658separable scalar initial value prob-

lem, 563sequence, 165

limit of, 165series, 544Slide Rule, 4socket wrench, 167solid of revolution, 939sorting, 866space capsule, 977sparse matrix, 657, 760sparsity pattern of a matrix, 1052spectral radius, 664spectral theorem for symmetric ma-

trices, 639spherical coordinates, 885, 937spinning tennis ball, 1132splitting a matrix, 663square domain, 1049squareroot of two, 185

Index 423

stabilityof motion, 701

stability factor, 825stability of floating body, 977standard basis, 600steepest ascent, 795steepest descent, 795, 872steepest descent method, 658step length, 658stiffness matrix, 759, 1048, 1052Stoke’s theorem, 959, 964Stokes, 961stopping criterion, 398, 768straight line, 292streamlines, 999, 1131string theory, 731strong boundary condition, 763subtractive cancellation, 670support, 1038surface, 786surface area, 923surface integral, 923, 928, 1029surface of revolution, 926surveyor, 269Svensson’s formula, 996, 1095symmetric matrix, 760system of linear equations, 295, 330

tangent plane, 791Taylor’s formula, 1118, 1122Taylor’s theorem, 461, 800temperature, 988tent functions, 1037test of linear independence, 622total energy, 1055transpose, 615transpose of a matrix, 305

triangular domain, 1070triangulation, 1032

boundary nodes, 1062internal nodes, 1032, 1062

trigonometric functions, 502triple integral, 933triple product, 321Turing, 222two-body, 700two-point boundary value problem,

755

union jack triangulation, 1070upper triangular matrix, 648

variable, 104variation of constants, 530Vasa, 971vector, 271vector addition, 272vector product, 287, 317vector space R

n, 596Verhulst, 558voltage, 725Volterra-Lotka’s predator-prey

model, 566volume, 935

parallelepiped, 320volume under graph, 912

wave equation, 1012weather prediction, 13weight, 765weighted L2 norm, 765weighted Cauchy’s inequality, 765Winnie-the-Pooh, 165

zero pivoting, 652

Documents

Applied Mathematics: Body & Soul Vol I–III