808
Numerical Methods Fall 2010 Lecturer: conf. dr. Viorel Bostan Oce: 6-417 Telephone: 50-99-38 E-mail address: viorel [email protected] Course web page: moodle.fcim.utm.md Oce hours: TBA. I will also be available at other times. Just drop by my oce, talk to me after the class or send me an e-mail to make an appointment. Prerequisites: A basic course on mathematical anal- ysis (single and multivariable calculus), ordinary dif- ferential equations and some knowledge of computer programming.

Analiza Numerica [Utm, Bostan v.]

Embed Size (px)

Citation preview

Page 1: Analiza Numerica [Utm, Bostan v.]

Numerical Methods Fall 2010

Lecturer: conf. dr. Viorel Bostan

O�ce: 6-417

Telephone: 50-99-38

E-mail address: viorel [email protected]

Course web page: moodle.fcim.utm.md

O�ce hours: TBA. I will also be available at other

times. Just drop by my o�ce, talk to me after the

class or send me an e-mail to make an appointment.

Prerequisites: A basic course on mathematical anal-

ysis (single and multivariable calculus), ordinary dif-

ferential equations and some knowledge of computer

programming.

Page 2: Analiza Numerica [Utm, Bostan v.]

Course outline: This is a fast-paced course. This coursegives an in-depth introduction to the basic areas of nu-merical analysis. The main objective will be to have aclear understanding of the ideas and techniques underly-ing the numerical methods, results, and algorithms thatwill be presented, where error analysis plays an impor-tant role. You will then be able to use this knowledge toanalyze the numerical methods and algorithms that youwill encounter, and also to program them e¤ectively ona computer. This knowledge will be useful in your futurenot only to solve problems with a numerical component,but also to develop numerical algorithms of your own.

Page 3: Analiza Numerica [Utm, Bostan v.]

Topics to be covered:

1. Computer representation of numbers. Errors: types,sources, propagation.

2. Solution of nonlinear equations. Root�nding.

3. Interpolation by polynomials and spline functions.

4. Approximation of functions.

5. Numerical integration. Automatic di¤erentiation.

6. Matrix computations and systems of linear equations.

7. Numerical methods for ODE.

This course plan may be modi�ed during the semester.Such modi�cations will be anounced in advance duringclass period. The student is responsible for keeping abreastof such changes.

Page 4: Analiza Numerica [Utm, Bostan v.]

Class procedure: The majority of each class period

will be lecture oriented. Some material will be handed

during lectures, some material will be send by e-mail.

I strongly advise to attend lectures, do your home-

work, work consistently, and ask questions. Lecture

time is at premium; you cannot be taught everything

in class. It is your responsability to learn the material;

the instructor's job is to guide you in your learning.

During the semester, 10 homeworks and 4 program-

ming projects will be assigned. As a general rule, you

will �nd it necessary to spend approximately 2-3 hours

of study for each lecture/lab meeting, and additional

time will be needed for exam preparation. It is strongly

advised that you start working on this course from the

very beginning. The importance of doing the assigned

homeworks and projects cannot be over emphasized.

Page 5: Analiza Numerica [Utm, Bostan v.]

Programming projects: The predominant programminglanguage used in numerical analysis are Fortran and MAT-LAB. We will focus on MATLAB. Programs in other lan-guages are also sometimes acceptable, but no program-ming assistance will be given in the use of such languages(i.e. C,C++,Java,Pascal). For students unaquaintedwith MATLAB, the following e-readings are suggested

1.Ian Cavers, An Introductory Guide to MATLAB, 2ndEdition, Dept. of Computer Science, University of BritishColumbia, December 1998,

www.cs.ubc.ca/spider/cavers/MatlabGuide/guide.html

2. Paul Fackler, A MATLAB Primer, North Carolina Sta-teUniversity,

www4.ncsu.edu/unity/users/p/pfackler/www/MPRIMER.htm

3.MATLAB Tutorials, Dept. of Mathematics, SouthernIllinois University at Carbondale,

Page 6: Analiza Numerica [Utm, Bostan v.]

www.math.siu.edu/matlab/tutorials.html

4.Christian Roessler, MATLAB Basics, University of Mel-bourne, june 2004,

www.econphd.net/downloads/matlab.pdf

5.Kermit Sigmon, MATLAB Primer, 3rd edition, Dept.of Mathematics, University of Florida,

www.wiwi.uni-frankfurt.de/professoren/krueger/teaching/ws0506/macromh/matlabprimer.pdf.

Page 7: Analiza Numerica [Utm, Bostan v.]

In your project report you should include:

1. The routines you have developed;

2. The results for your test cases in forms of tables,graphs etc.;

3. Answers to all questions contained in the assign-ment;

4. Comments.

You should report your results in a way that is easy toread, communicates the problem and the results ef-fectively, and can be reproduced by someone else whohas not seen the problem before, but is technicallyknowledgeable. You should also give any justi�cationor other reasons to believe the correctness of yourresults and code. Also, give conclusions on how e�ec-tive your methods and routines appear to be, repportand comment any "unusual behavior" of your results.Team working is allowed, but you should specify thisin your report ,as well as the tasks executed by eachmember of your team.

Page 8: Analiza Numerica [Utm, Bostan v.]

Grading policy: The �nal grade will be based on testsand hw/projects, as follows:

1. There will be one 3-hour written exam given after8 weeks of classes at a time arranged later (assumablyat the end of October). This midterm exam will count25% of the course grade.

2. The �nal comprehensive exam will be given dur-ing the scheduled examination time at the end of thesemester, it will cover all material, and it will count35% of your �nal grade.

3. HW and lab projects will count 20% of the gradeeach. Late homeworks and projects are not allowed!

4. You will need a scienti�c calculator during exams.Sharing of calculators will not be allowed. Make sureyou have one.

The exams will be open notes, i.e. you will be allowedto use your class notes and class slides (no other ma-terial will be allowed).

Page 9: Analiza Numerica [Utm, Bostan v.]

Grading for homeworks and labprojects

The HW will be graded on a scale from 0 to 4 with a

possibility of getting extra bonus point at each home-

work. Grades will be given according to the following

guidelines:

{ 0 { no homework turned in;

{ 1 { poor lousy job;

{ 2 { incomplete job;

{ 3 { good job;

{ 4 { very good job;

+1 for optional problems and/or excellent/outstanding

solution to one of the porblems

Page 10: Analiza Numerica [Utm, Bostan v.]

It is very important that you take the examinations at thescheduled times. Alternate exams will be scheduled onlyfor those who have compelling and convincing enoughreasons.

Page 11: Analiza Numerica [Utm, Bostan v.]

Academic misconduct: Any kinds of academic mis-

conduct will not be tolerated. If a situation arises

where you and your instructor disagree on some mat-

ter and cannot resolve the issue, you should see the

Dean. However, any problems concerning the course

should be �rst discussed with your instructor.

Page 12: Analiza Numerica [Utm, Bostan v.]

Readings:

1. Kendall Atkinson, An Introduction to Numerical Analy-sis, 2nd edition

2. Cleve Moler, Numerical Computing with MATLAB,

http://www.mathworks.com/moler/

3. Bjoerck A., Dahlquist G , Numerical mathematics andscienti�c computation.

4. Steven E. Pav, Numerical Methods Course Notes,University of California at San diego, 2005

5. Mathews J.H., Fink D.K., Numerical methods usingMATLAB, 1999

6. Kincaid D.�Cheney.W., Numerical analysis, 1991

Page 13: Analiza Numerica [Utm, Bostan v.]

7. Goldberg, What every computer scientist should knowabout �oating-point arithmetic, 1991

8. Ho¤man J.D.,. Numerical methods for engineers andscientists, 2001

9. Johnston.R.L., Numerical methods, a software ap-proach, 1982

10. Carothers N.L, A short course on approximation the-ory, Course notes, Bowling Green State University

11. George W. Collins, Fundamental Numerical Methodsand Data Analysis

12. Shampine L.F., Allen R.C., Pruess S., Fundamentalsof numerical computing, 1997

Also, you should check the university library for availablebooks.

Page 14: Analiza Numerica [Utm, Bostan v.]

Useful web-sites with on-line literature:

www.math.gatech.edu/~cain/textbooks/onlinebooks.html

www.econphd.net/notes.htm

Page 15: Analiza Numerica [Utm, Bostan v.]

De�nition of Numerical Analysis by Kendall Atkinson,Prof. University of Iowa

Numerical analysis is the area of mathematics and com-puter science that creates, analyzes, and implements al-gorithms for solving numerically the problems of contin-uous mathematics.

Such problems originate generally from real-world appli-cations of algebra, geometry and calculus, and they in-volve variables which vary continuously; these problemsoccur throughout the natural sciences, social sciences,engineering, medicine, and business.

During the past half-century, the growth in power andavailability of digital computers has led to an increas-ing use of realistic mathematical models in science andengineering, and numerical analysis of increasing sophis-tication has been needed to solve these more detailedmathematical models of the world.

Page 16: Analiza Numerica [Utm, Bostan v.]

With the growth in importance of using computers tocarry out numerical procedures in solving mathematicalmodels of the world, an area known as scienti�c com-puting or computational science has taken shape duringthe 1980s and 1990s. This area looks at the use of nu-merical analysis from a computer science perspective. Itis concerned with using the most powerful tools of nu-merical analysis, computer graphics, symbolic mathemat-ical computations, and graphical userinterfaces to makeit easier for a user to set up, solve, and interpret compli-cated mathematical models of the real world.

Page 17: Analiza Numerica [Utm, Bostan v.]

De�nition of Numerical Analysis by Lloyd N Trefethen,Prof. Cornell Unviersity

Here is the wrong answer: Numerical analysis is the studyof rounding errors

Some other wrong or incomplete answers:

Websters New Collegiate Dictionary:The study of quan-titative approximations to the solutions of mathematicalproblems including consideration of the errors and boundsto the errors involved.

Chambers 20th Century Dictionary: The study ofmethods of approximation and their accuracy etc

The American Heritage Dictionary: The study of ap-proximate solutions to mathematical problems taking intoaccount the extent of possible errors

Correct answer is:Numerical analysis is the study of algo-rithms for the problems of continuous mathematics

Page 18: Analiza Numerica [Utm, Bostan v.]
Page 19: Analiza Numerica [Utm, Bostan v.]
Page 20: Analiza Numerica [Utm, Bostan v.]
Page 21: Analiza Numerica [Utm, Bostan v.]
Page 22: Analiza Numerica [Utm, Bostan v.]
Page 23: Analiza Numerica [Utm, Bostan v.]
Page 24: Analiza Numerica [Utm, Bostan v.]
Page 25: Analiza Numerica [Utm, Bostan v.]
Page 26: Analiza Numerica [Utm, Bostan v.]
Page 27: Analiza Numerica [Utm, Bostan v.]
Page 28: Analiza Numerica [Utm, Bostan v.]
Page 29: Analiza Numerica [Utm, Bostan v.]
Page 30: Analiza Numerica [Utm, Bostan v.]
Page 31: Analiza Numerica [Utm, Bostan v.]
Page 32: Analiza Numerica [Utm, Bostan v.]
Page 33: Analiza Numerica [Utm, Bostan v.]
Page 34: Analiza Numerica [Utm, Bostan v.]

NUMERICAL ANALYSIS: This refers to the analysis

of mathematical problems by numerical means, es-

pecially mathematical problems arising from models

based on calculus.

Effective numerical analysis requires several things:

• An understanding of the computational tool beingused, be it a calculator or a computer.

• An understanding of the problem to be solved.

• Construction of an algorithm which will solve the

given mathematical problem to a given desired

accuracy and within the limits of the resources

(time, memory, etc) that are available.

Page 35: Analiza Numerica [Utm, Bostan v.]

This is a complex undertaking. Numerous people

make this their life’s work, usually working on only

a limited variety of mathematical problems.

Within this course, we attempt to show the spirit of

the subject. Most of our time will be taken up with

looking at algorithms for solving basic problems such

as rootfinding and numerical integration; but we will

also look at the structure of computers and the impli-

cations of using them in numerical calculations.

We begin by looking at the relationship of numerical

analysis to the larger world of science and engineering.

Page 36: Analiza Numerica [Utm, Bostan v.]

SCIENCE

Traditionally, engineering and science had a two-sided

approach to understanding a subject: the theoretical

and the experimental. More recently, a third approach

has become equally important: the computational.

Traditionally we would build an understanding by build-

ing theoretical mathematical models, and we would

solve these for special cases. For example, we would

study the flow of an incompressible irrotational fluid

past a sphere, obtaining some idea of the nature of

fluid flow. But more practical situations could seldom

be handled by direct means, because the needed equa-

tions were too difficult to solve. Thus we also used

the experimental approach to obtain better informa-

tion about the flow of practical fluids. The theory

would suggest ideas to be tried in the laboratory, and

the experiemental results would often suggest direc-

tions for a further development of theory.

Page 37: Analiza Numerica [Utm, Bostan v.]

1

Page 38: Analiza Numerica [Utm, Bostan v.]

Computational

Science

Theoretical

Science

Experimental

Science

Page 39: Analiza Numerica [Utm, Bostan v.]

With the rapid advance in powerful computers, we

now can augment the study of fluid flow by directly

solving the theoretical models of fluid flow as applied

to more practical situations; and this area is often re-

ferred to as “computational fluid dynamics”. At the

heart of computational science is numerical analysis;

and to effectively carry out a computational science

approach to studying a physical problem, we must un-

derstand the numerical analysis being used, especially

if improvements are to be made to the computational

techniques being used.

Page 40: Analiza Numerica [Utm, Bostan v.]

MATHEMATICAL MODELS

A mathematical model is a mathematical description

of a physical situtation. By means of studying the

model, we hope to understand more about the physi-

cal situation. Such a model might be very simple. For

example,

A = 4πR2e, Re.= 6, 371 km

is a formula for the surface area of the earth. How

accurate is it? First, it assumes the earth is sphere,

which is only an approximation. At the equator, the

radius is approximately 6,378 km; and at the poles,

the radius is approximately 6,357 km. Next, there is

experimental error in determining the radius; and in

addition, the earth is not perfectly smooth. Therefore,

there are limits on the accuracy of this model for the

surface area of the earth.

Page 41: Analiza Numerica [Utm, Bostan v.]

AN INFECTIOUS DISEASE MODEL

For rubella measles, we have the following model for

the spread of the infection in a population (subject to

certain assumptions).

ds

dt= −a s i

di

dt= a s i− b i

dr

dt= b i

In this, s, i, and r refer, respectively, to the propor-

tions of a total population that are susceptible, infec-

tious, and removed (from the susceptible and infec-

tious pool of people). All variables are functions of

time t. The constants can be taken as

a =6.8

11, b =

1

11The same model works for some other diseases (e.g.

flu), with a suitable change of the constants a and b.

Again, this is an approximation of reality (and a useful

one).

Page 42: Analiza Numerica [Utm, Bostan v.]

But it has its limits. Solving a bad model will not give

good results, no matter how accurately it is solved;

and the person solving this model and using the results

must know enough about the formation of the model

to be able to correctly interpret the numerical results.

THE LOGISTIC EQUATION

This is the simplest model for population growth. Let

N(t) denote the number of individuals in a population

(rabbits, people, bacteria, etc). Then we model its

growth by

N 0(t) = cN(t), t ≥ 0, N(t0) = N0

The constant c is the growth constant, and it usually

must be determined empirically. Over short periods of

time, this is often an accurate model for population

growth. For example, it accurately models the growth

of US population over the period of 1790 to 1860, with

c = 0.2975.

Page 43: Analiza Numerica [Utm, Bostan v.]

THE PREDATOR-PREY MODEL

Let F (t) denote the number of foxes at time t; and

let R(t) denote the number of rabbits at time t. A

simple model for these populations is called the Lotka-

Volterra predator-prey model :

dR

dt= a [1− bF (t)]R(t)

dF

dt= c [−1 + dR(t)]F (t)

with a, b, c, d positive constants. If one looks carefully

at this, then one can see how it is built from the logis-

tic equation. In some cases, this is a very useful model

and agrees with physical experiments. Of course, we

can substitute other interpretations, replacing foxes

and rabbits with other predator and prey. The model

will fail, however, when there are other populations

that affect the first two populations in a significant

way.

Page 44: Analiza Numerica [Utm, Bostan v.]

NEWTON’S SECOND LAW

Newton’s second law states that the force acting on

an object is directly proportional to the product of its

mass and acceleration,

F ∝ ma

With a suitable choice of physical units, we usually

write this in its scalar form as

F = ma

Newton’s law of gravitation for a two-body situation,

say the earth and an object moving about the earth is

then

md2r(t)

dt2= −Gmme

|r(t)|2 ·r(t)

|r(t)|with r(t) the vector from the center of the earth to

the center of the object moving about the earth. The

constant G is the gravitational constant, not depen-

dent on the earth; and m and me are the masses,

respectively of the object and the earth.

Page 45: Analiza Numerica [Utm, Bostan v.]

This is an accurate model for many purposes. But

what are some physical situations under which it will

fail?

When the object is very close to the surface of the

earth and does not move far from one spot, we take

|r(t)| to be the radius of the earth. We obtain thenew model

md2r(t)

dt2= −mgk

with k the unit vector directly upward from the earth’s

surface at the location of the object. The gravitational

constant

g.= 9.8meters/second2

Again this is a model; it is not physical reality.

Page 46: Analiza Numerica [Utm, Bostan v.]

The Patriot Missile Failure

On February 25, 1991, during the Gulf War, an Amer-

ican Patriot Missile battery in Dharan, Saudi Arabia,

failed to intercept an incoming Iraqi Scud missile. The

Scud struck an American Army barracks and killed 28

soliders.

Page 47: Analiza Numerica [Utm, Bostan v.]

A report of the General Accounting o�ce, GAO/IMTEC-

92-26, entitled Patriot Missile Defense: Software Prob-

lem Led to System Failure at Dhahran, Saudi Arabia

reported on the cause of the failure.

It turns out that the cause was an inaccurate calcula-

tion of the time since boot due to computer arithmetic

errors.

Speci�cally, the time in tenths of second as measured

by the system's internal clock was multiplied by 1=10

to produce the time in seconds. This calculation was

performed using a 24 bit �xed point register. In par-

ticular, the value 1=10, which has a non-terminating

binary expansion, was chopped at 24 bits after the

radix point. The small chopping error, when multi-

plied by the large number giving the time in tenths of

a second, lead to a

Page 48: Analiza Numerica [Utm, Bostan v.]

signi�cant error. Indeed, the Patriot battery had been uparound 100 hours, and an easy calculation shows that theresulting time error due to the magni�ed chopping errorwas about 0.34 seconds.

The number 110 equals

1

10=

1

24+1

25+1

28+1

29+

1

212+

1

213+ : : :

= (0:0001100110011001100110011001100 : : :)2

Now the 24 bit register in the Patriot stored instead

(0:00011001100110011001100)2

introducing an error of

(0:0000000000000000000000011001100:::)2

Page 49: Analiza Numerica [Utm, Bostan v.]

which being converted in decimal is

(0:000000095)10

Multiplying by the number of tenths of a second in 100hours gives:

0:000000095 � 100 � 60 � 60 � 10 = 0:34

A Scud travels at about 1676 meters per second, andso travels more than half a kilometer in this time. Thiswas far enough that the incoming Scud was outside the"range gate" that the Patriot tracked. Ironically, the factthat the bad time calculation had been improved in someparts of the code, but not all, contributed to the problem,since it meant that the inaccuracies did not cancel.

The following paragraph is excerpted from the GAO re-port.

The range gate�s prediction of where the Scud will nextappear is a function of the Scud�s known velocity and thetime of the last radar detection. Velocity is a real number

Page 50: Analiza Numerica [Utm, Bostan v.]

that can be expressed as a whole number and a decimal(e.g., 3750.2563...miles per hour). Time is kept continu-ously by the system�s internal clock in tenths of secondsbut is expressed as an integer or whole number (e.g., 32,33, 34...). The longer the system has been running, thelarger the number representing time. To predict wherethe Scud will next appear, both time and velocity mustbe expressed as real numbers. Because of the way thePatriot computer performs its calculations and the factthat its registers are only 24 bits long, the conversion oftime from an integer to a real number cannot be anymore precise than 24 bits. This conversion results in aloss of precision causing a less accurate time calculation.The e¤ect of this inaccuracy on the range gate�s calcu-lation is directly proportional to the target�s velocity andthe length of the the system has been running. Conse-quently, performing the conversion after the Patriot hasbeen running continuously for extended periods causesthe range gate to shift away from the center of the tar-get, making it less likely that the target, in this case aScud, will be successfully intercepted.

Page 51: Analiza Numerica [Utm, Bostan v.]

CALCULATION OF FUNCTIONS

Using hand calculations, a hand calculator, or a com-puter, what are the basic operations of which we arecapable? In essence, they are addition, subtraction,multiplication, and division (and even this will usuallyrequire a truncation of the quotient at some point).In addition, we can make logical decisions, such asdeciding which of the following are true for two realnumbers a and b:

a > b, a = b, a < b

Furthermore, we can carry out only a finite numberof such operations. If we limit ourselves to just addi-tion, subtraction, and multiplication, then in evaluat-ing functions f(x) we are limited to the evaluation ofpolynomials:

p(x) = a0 + a1x+ · · · anxnIn this, n is the degree (provided an 6= 0) and {a0, ..., an}are the coefficients of the polynomial. Later we willdiscuss the efficient evaluation of polynomials; but fornow, we ask how we are to evaluate other functionssuch as ex, cosx, log x, and others.

Page 52: Analiza Numerica [Utm, Bostan v.]

TAYLOR POLYNOMIAL APPROXIMATIONS

We begin with an example, that of f(x) = ex from

the text. Consider evaluating it for x near to 0. We

look for a polynomial p(x) whose values will be the

same as those of ex to within acceptable accuracy.

Begin with a linear polynomial p(x) = a0+a1x. Then

to make its graph look like that of ex, we ask that the

graph of y = p(x) be tangent to that of y = ex at

x = 0. Doing so leads to the formula

p(x) = 1 + x

Page 53: Analiza Numerica [Utm, Bostan v.]
Page 54: Analiza Numerica [Utm, Bostan v.]

Continue in this manner looking next for a quadratic

polynomial

p(x) = a0 + a1x+ a2x2

We again make it tangent; and to determine a2, we

also ask that p(x) and ex have the same “curvature”

at the origin. Combining these requirements, we have

for f(x) = ex that

p(0) = f(0), p0(0) = f 0(0), p00(0) = f 00(0)

This yields the approximation

p(x) = 1 + x+ 12x2

Page 55: Analiza Numerica [Utm, Bostan v.]
Page 56: Analiza Numerica [Utm, Bostan v.]

We continue this pattern, looking for a polynomial

p(x) = a0 + a1x+ a2x2 + · · ·+ anx

n

We now require that

p(0) = f(0), p0(0) = f 0(0), · · · , p(n)(0) = f (n)(0)

This leads to the formula

p(x) = 1 + x+ 12x2 + · · ·+ 1

n!xn

What are the problems when evaluating points x that

are far from 0?

Page 57: Analiza Numerica [Utm, Bostan v.]
Page 58: Analiza Numerica [Utm, Bostan v.]

TAYLOR’S APPROXIMATION FORMULA

Let f(x) be a given function, and assume it has deriv-

atives around some point x = a (with as many deriv-

atives as we find necessary). We seek a polynomial

p(x) of degree at most n, for some non-negative inte-

ger n, which will approximate f(x) by satisfying the

following conditions:

p(a) = f(a)

p0(a) = f 0(a)p00(a) = f 00(a)

...

p(n)(a) = f (n)(a)

The general formula for this polynomial is

pn(x) = f(a) + (x− a)f 0(a) + 1

2!(x− a)2f 00(a)

+ · · ·+ 1

n!(x− a)nf (n)(a)

Then f(x) ≈ pn(x) for x close to a.

Page 59: Analiza Numerica [Utm, Bostan v.]

TAYLOR POLYNOMIALS FOR f(x) = log x

In this case, we expand about the point x = 1, making

the polynomial tangent to the graph of f(x) = log x

at the point x = 1. For a general degree n ≥ 1, this

results in the polynomial

pn(x) = (x− 1)− 12(x− 1)2 + 1

3(x− 1)3

+ · · ·+ (−1)n−11n(x− 1)n

Note the graphs of these polynomials for varying n.

Page 60: Analiza Numerica [Utm, Bostan v.]
Page 61: Analiza Numerica [Utm, Bostan v.]

THE TAYLOR POLYNOMIAL ERROR FORMULA

Let f(x) be a given function, and assume it has deriv-

atives around some point x = a (with as many deriva-

tives as we find necessary). For the error in the Taylor

polynomial pn(x), we have the formulas

f(x)− pn(x) =1

(n+ 1)!(x− a)n+1f (n+1)(cx)

=1

n!

Z x

a(x− t)nf (n+1)(t) dt

The point cx is restricted to the interval bounded by x

and a, and otherwise cx is unknown. We will use the

first form of this error formula, although the second

is more precise in that you do not need to deal with

the unknown point cx.

Page 62: Analiza Numerica [Utm, Bostan v.]

Consider the special case of n = 0. Then the Taylor

polynomial is the constant function:

f(x) ≈ p0(x) = f(a)

The first form of the error formula becomes

f(x)− p0(x) = f(x)− f(a) = (x− a) f 0(cx)

with cx between a and x. You have seen this in

your beginning calculus course, and it is called the

mean-value theorem. The error formula

f(x)− pn(x) =1

(n+ 1)!(x− a)n+1f (n+1)(cx)

can be considered a generalization of the mean-value

theorem.

Page 63: Analiza Numerica [Utm, Bostan v.]

EXAMPLE: f(x) = ex

For general n ≥ 0, and expanding ex about x = 0, wehave that the degree n Taylor polynomial approxima-

tion is given by

pn(x) = 1 + x+1

2!x2 +

1

3!x3 + · · ·+ 1

n!xn

For the derivatives of f(x) = ex, we have

f (k)(x) = ex, f (k)(0) = 1, k = 0, 1, 2, ...

For the error,

ex − pn(x) =1

(n+ 1)!xn+1ecx

with cx located between 0 and x. Note that for x ≈ 0,we must have cx ≈ 0 and

ex − pn(x) ≈ 1

(n+ 1)!xn+1

This last term is also the final term in pn+1(x), and

thus

ex − pn(x) ≈ pn+1(x)− pn(x)

Page 64: Analiza Numerica [Utm, Bostan v.]

Consider calculating an approximation to e. Then let

x = 1 in the earlier formulas to get

pn(1) = 1 + 1 +1

2!+1

3!+ · · ·+ 1

n!For the error,

e− pn(1) =1

(n+ 1)!ecx, 0 ≤ cx ≤ 1

To bound the error, we have

e0 ≤ ecx ≤ e1

1

(n+ 1)!≤ e− pn(1) ≤ e

(n+ 1)!

To have an approximation accurate to within 10−5,we choose n large enough to have

e

(n+ 1)!≤ 10−5

which is true if n ≥ 8. In fact,e− p8(1) ≤

e

9!

.= 7.5× 10−6

Then calculate p8(1).= 2.71827877, and e− p8(1)

.=

3.06× 10−6.

Page 65: Analiza Numerica [Utm, Bostan v.]

FORMULAS OF STANDARD FUNCTIONS

1

1− x= 1 + x+ x2 + · · ·+ xn +

xn+1

1− x

cosx = 1− x2

2!+x4

4!− · · ·+ (−1)m x2m

(2m)!

+(−1)m x2m+2

(2m+ 2)!cos cx

sinx = x− x3

3!+x5

5!− · · ·+ (−1)m−1 x2m−1

(2m− 1)!+(−1)m x2m+1

(2m+ 1)!cos cx

with cx between 0 and x.

Page 66: Analiza Numerica [Utm, Bostan v.]

OBTAINING TAYLOR FORMULAS

Most Taylor polynomials have been bound by other

than using the formula

pn(x) = f(a) + (x− a)f 0(a) + 1

2!(x− a)2f 00(a)

+ · · ·+ 1

n!(x− a)nf (n)(a)

because of the difficulty of obtaining the derivatives

f (k)(x) for larger values of k. Actually, this is now

much easier, as we can use Maple or Mathematica.

Nonetheless, most formulas have been obtained by

manipulating standard formulas; and examples of this

are given in the text.

Page 67: Analiza Numerica [Utm, Bostan v.]

For example, use

et = 1 + t+1

2!t2 +

1

3!t3 + · · ·+ 1

n!tn

+1

(n+ 1)!tn+1ect

in which ct is between 0 and t. Let t = −x2 to obtain

e−x2 = 1− x2 +1

2!x4 − 1

3!x6 + · · ·+ (−1)

n

n!x2n

+(−1)n+1(n+ 1)!

x2n+2e−ξx

Because ct must be between 0 and −x2, we have itmust be negative. Thus we let ct = −ξx in the errorterm, with 0 ≤ ξx ≤ x2.

Page 68: Analiza Numerica [Utm, Bostan v.]

EVALUATING A POLYNOMIAL

Consider having a polynomial

p(x) = a0 + a1x+ a2x2 + · · ·+ anx

n

which you need to evaluate for many values of x. How

do you evaluate it? This may seem a strange question,

but the answer is not as obvious as you might think.

The standard way, written in a loose algorithmic for-

mat:

poly = a0for j = 1 : npoly = poly + ajx

j

end

Page 69: Analiza Numerica [Utm, Bostan v.]

To compare the costs of different numerical meth-

ods, we do an operations count, and then we compare

these for the competing methods. Above, the counts

are as follows:

additions : n

multiplications : 1 + 2 + 3 + · · ·+ n =n(n+ 1)

2

This assumes each term ajxj is computed indepen-

dently of the remaining terms in the polynomial.

Page 70: Analiza Numerica [Utm, Bostan v.]

Next, do the terms xj recursively:

xj = x · xj−1

Then to computenx2, x3, ..., xn

owill cost n−1 mul-

tiplications. Our algorithm becomes

poly = a0 + a1xpower = xfor j = 2 : npower = x · powerpoly = poly + aj · power

end

The total operations cost is

additions : n

multiplications : n+ n− 1 = 2n− 1When n is evenly moderately large, this is much less

than for the first method of evaluating p(x). For ex-

ample, with n = 20, the first method has 210 multi-

plications, whereas the second has 39 multiplications.

Page 71: Analiza Numerica [Utm, Bostan v.]

We now considered nested multiplication. As exam-ples of particular degrees, write

n = 2 : p(x) = a0 + x(a1 + a2x)n = 3 : p(x) = a0 + x (a1 + x (a2 + a3x))n = 4 : p(x) = a0 + x (a1 + x (a2 + x (a3 + a4x)))

These contain, respectively, 2, 3, and 4 multiplica-tions. This is less than the preceding method, whichwould have need 3, 5, and 7 multiplications, respec-tively.

For the general case, write

p(x) = a0+x (a1 + x (a2 + · · ·+ x (an−1 + anx) · · · ))This requires n multiplications, which is only abouthalf that for the preceding method. For an algorithm,write

poly = anfor j = n− 1 : −1 : 0poly = aj + x · poly

end

With all three methods, the number of additions is n;but the number of multiplications can be dramaticallydifferent for large values of n.

Page 72: Analiza Numerica [Utm, Bostan v.]

NESTED MULTIPLICATION

Imagine we are evaluating the polynomial

p(x) = a0 + a1x+ a2x2 + · · ·+ anx

n

at a point x = z. Thus with nested multiplication

p(z) = a0+z (a1 + z (a2 + · · ·+ z (an−1 + anz) · · · ))We can write this as the following sequence of oper-

ations:

bn = anbn−1 = an−1 + zbnbn−2 = an−2 + zbn−1

...b0 = a0 + zb1

The quantities bn−1, ..., b0 are simply the quantities inparentheses, starting from the inner most and working

outward.

Page 73: Analiza Numerica [Utm, Bostan v.]

Introduce

q(x) = b1 + b2x+ b3x2 + · · ·+ bnx

n−1

Claim:

p(x) = b0 + (x− z)q(x) (∗)Proof: Simply expand

b0 + (x− z)³b1 + b2x+ b3x

2 + · · ·+ bnxn−1´

and use the fact that

zbj = bj−1 − aj−1, j = 1, ..., n

With this result (*), we have

p(x)

x− z=

b0x− z

+ q(x)

Thus q(x) is the quotient when dividing p(x) by x−z,and b0 is the remainder.

Page 74: Analiza Numerica [Utm, Bostan v.]

If z is a zero of p(x), then b0 = 0; and then

p(x) = (x− z)q(x)

For the remaining roots of p(x), we can concentrate

on finding those of q(x). In rootfinding for polynomi-

als, this process of reducing the size of the problem is

called deflation.

Another consequence of (*) is the following. Form

the derivative of (*) with respect to x, obtaining

p0(x) = (x− z)q0(x) + q(x)

p0(z) = q(z)

Thus to evaluate p(x) and p0(x) simultaneously at x =z, we can use nested multiplication for p(z) and we

can use the intermediate steps of this to also evaluate

p0(z). This is useful when doing rootfinding problemsfor polynomials by means of Newton’s method.

Page 75: Analiza Numerica [Utm, Bostan v.]

APPROXIMATING SF (x)

Define

SF (x) =1

x

Z x

0

sin t

tdt, x 6= 0

We use Taylor polynomials to approximate this func-

tion, to obtain a way to compute it with accuracy and

simplicity.

x

y

0.5

1.0

-8 -4 84

Page 76: Analiza Numerica [Utm, Bostan v.]

As an example, begin with the degree 3 Taylor ap-

proximation to sin t, expanded about t = 0:

sin t = t− 16t3 +

1

120t5 cos ct

with ct between 0 and t. Then

sin t

t= 1− 1

6t2 +

1

120t4 cos ctZ x

0

sin t

tdt =

Z x

0

·1− 1

6t2 +

1

120t4 cos ct

¸dt

= x− 1

18x3 +

1

120

Z x

0t4 cos ctdt

1

x

Z x

0

sin t

tdt = 1− 1

18x2 +R2(x)

R2(x) =1

120

1

x

xZ0

t4 cos ct dt

How large is the error in the approximation

SF (x) ≈ 1− 1

18x2

Page 77: Analiza Numerica [Utm, Bostan v.]

on the interval [−1, 1]? Since |cos ct| ≤ 1, we have

for x > 0 that

0 ≤ R2(x) ≤1

120

1

x

Z x

0t4dt

=1

600x4

and the same result can be shown for x < 0. Then

for |x| ≤ 1, we have

0 ≤ R2(x) ≤1

600

To obtain a more accurate approximation, we can pro-

ceed exactly as above, but simply use a higher degree

approximation to sin t.

Page 78: Analiza Numerica [Utm, Bostan v.]

BINARY INTEGERS

A binary integer x is a finite sequence of the digits 0

and 1, which we write symbolically as

x = (amam−1 · · · a2a1a0)2where I insert the parentheses with subscript ()2 in

order to make clear that the number is binary. The

above has the decimal equivalent

x = am2m + am−12m−1 + · · ·+ a12

1 + a0

For example, the binary integer x = (110101)2 has

the decimal value

x = 25 + 24 + 22 + 20 = 53

The binary integer x = (111 · · · 1)2 with m ones has

the decimal value

x = 2m−1 + · · ·+ 21 + 1 = 2m − 1

Page 79: Analiza Numerica [Utm, Bostan v.]

DECIMAL TO BINARY INTEGER CONVERSION

Given a decimal integer x we write

x = (amam−1 · · · a2a1a0)2= am2m + am−12m−1 + · · ·+ a12

1 + a0

Divide x by 2, calling the quotient x1. The remainder

is a0, and

x1 = am2m−1 + am−12m−2 + · · ·+ a12

0

Continue the process. Divide x1 by 2, calling the quo-

tient x2. The remainder is a1, and

x2 = am2m−2 + am−12m−3 + · · ·+ a22

0

After a finite number of such steps, we will obtain all

of the coefficients ai, and the final quotient will be

zero.

Try this with a few decimal integers.

Page 80: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

The following shortened form of the above method is

convenient for hand computation. Convert (11)10 to

binary.

b2√11c = 5 = x1 a0 = 1b2√5c = 2 = x2 a1 = 1

b2√2c = 1 = x3 a2 = 0b2√1c = 0 = x4 a3 = 1

In this, the notation bbc denotes the largest integer≤ b, and the notation 2

√n denotes the quotient re-

sulting from dividing 2 into n. From the above cal-

culation, (11)10 = (1011)2.

Page 81: Analiza Numerica [Utm, Bostan v.]

BINARY FRACTIONS

A binary fraction x is a sequence (possibly infinite) of

the digits 0 and 1:

x = (.a1a2a3 · · · am · · · )2= a12

−1 + a22−2 + a32

−3 + · · ·For example, x = (.1101)2 has the decimal value

x = 2−1 + 2−2 + 2−4

= .5 + .25 + .0625 = 0.8125

Page 82: Analiza Numerica [Utm, Bostan v.]

Recall the formula for the geometric series

nXi=0

ri =1− rn+1

1− r, r 6= 1

Letting n→∞ with |r| < 1, we obtain the formula

∞Xi=0

ri =1

1− r, |r| < 1

Using this,

(.0101010101010 · · · )2 = 2−2 + 2−4 + 2−6 + · · ·= 2−2

³1 + 2−2 + 2−4 + · · ·

´which sums to the fraction 1/3.

Also,

(.11001100110011 · · · )2= 2−1 + 2−2 + 2−5 + 2−6 + · · ·

and this sums to the decimal fraction 0.8 = 810.

Page 83: Analiza Numerica [Utm, Bostan v.]

DECIMAL TO BINARY FRACTION CONVERSION

In

x1 = (.a1a2a3 · · · am · · · )2= a12

−1 + a22−2 + a32

−3 + · · ·we multiply by 2. The integer part will be a1; and

after it is removed we have the binary fraction

x2 = (.a2a3 · · · am · · · )2= a22

−1 + a32−2 + a42

−3 + · · ·Again multiply by 2, obtaining a2 as the integer part of

2x2. After removing a2, let x3 denote the remaining

number. Continue this process as far as needed.

For example, with x = 15, we have

x1 = .2; 2x1 = .4; x2 = .4 and a1 = 02x2 = .8; x3 = .8 and a2 = 02x3 = 1.6; x4 = .6 and a2 = 1

Continue this to get the pattern

(.2)10 = (.00110011001100 · · · )2

Page 84: Analiza Numerica [Utm, Bostan v.]

DECIMAL FLOATING-POINT NUMBERS

Floating point notation is akin to what is called scien-

ti�c notation. For a nonzero number x, we can write

it in the form

x = � � x � 10e;

where � = �1, e is an integer, and 1 � x < 10. Num-ber � is called sign, e is exponent, and x is signi�candor mantissa.

For example,

345:78 = 3:4578 � 102;where � = +1, e = 2, x = 3:4578.

On a decimal computer or calculator, we store x by

instead storing �, e and x: We must restrict the num-

ber of digits in x and the size of the exponent e. The

number of digits in x is called precision.

For example, on an HP-15C calculator, precision is 10,

and the exponent is restricted to �99 � e � 99.

Page 85: Analiza Numerica [Utm, Bostan v.]

BINARY FLOATING-POINT NUMBERS

We now do something similar with the binary repre-

sentation of a number x. Write

x = � � x � 2e;

with 1 � x < (10)2 = (2)10 and e an integer.

For example,

(0:1)10 = (:000110011001100. . . )2

= +|{z}�=+1

(1:10011001100 : : :)2| {z }x

� 2ez}|{�4 ;

The number x is stored in the computer by storing the

�, x, and e. On all computers, there are restrictions

on the number of digits in x and the size of e.

Page 86: Analiza Numerica [Utm, Bostan v.]

FLOATING POINT NUMBERS

When a number x outside a computer or calculator

is converted into a machine number, we denote it by

fl(x). On an HP-calculator,

fl(:3333. . . ) = (3:333333333)10 � 10�1

The decimal fraction of in�nite length will not �t in

the registers of the calculator, but the latter 10�digitnumber will �t. Some calculators actually carry more

digits internally than they allow to be displayed. On

a binary computer, we use a similar notation.

We will concentrate on a particular form of computer

oating point number, that is called the IEEE oating

point standard.

Page 87: Analiza Numerica [Utm, Bostan v.]

Example 1 Consider a binary oating point represen-

tation with precision 3; and emin = �2 � e � 2 =

emax: All the numbers admitted by this representation

are presented in the table

ex �2 �1 0 1 2

(1:00)2 (:25)10 (:5)10 (1)10 (2)10 (4)10(1:01)2 (:3125)10 (:625)10 (1:25)10 (2:5)10 (5)10(1:10)2 (:375)10 (:75)10 (1:5)10 (3)10 (6)10(1:11)2 (:4375)10 (:875)10 (1:75)10 (3:5)10 (7)10

0 4 5 6 71 2 30.5

This representation can be extended to include smaller

numbers called denormalized numbers. These num-

bers are obtained if e = emin; and the �rst digit of the

signi�cand is 0:

Page 88: Analiza Numerica [Utm, Bostan v.]

Example 2 Previous example plus denormalized num-

bers

(0:01)2 � 2�1 =

1

16= (0:0625)10

(0:10)2 � 2�1 =

2

16= (0:125)10

(0:11)2 � 2�1 =

3

16= (0:1875)10

0 4 5 6 71 2 30.5

Page 89: Analiza Numerica [Utm, Bostan v.]

IEEE SINGLE PRECISION STANDARD

In IEEE single precision 32 bits are used to store num-

bers. A number is written as

x = � � 1:a1a2 : : : a23 � 2e:

The signi�cand x = (1:a1a2���a23)2 immediately sat-is�es 1 � x < 2.

What are the limits on e? To understand the limits

on e and the number of binary digits chosen for x, we

must look roughly at how the number x will be stored

in the computer.

Basically, we store � as a single bit, the signi�cand x

as 24 bits (only 23 need be stored), and the exponent

�lls out 8 bits, including both negative and positive

integers.

Page 90: Analiza Numerica [Utm, Bostan v.]

Roughly speaking, we have that e must satisfy

�(1111111| {z }7

)2 � e � (1111111| {z }7

)2

or in decimal

�127 � e � 127

In actuality, the limits are

�126 � e � 127

for reasons related to the storage of 0 and other num-

bers such as �1. In order to avoid the sign for ex-ponent, denote E = e+ 127:

Obviously, 1 � E � 254 with two additional values:

0 and 255:

� E xb1 b2 : : : b9 b10 : : : b32

Number x = 0 is stored in the following way: E = 0,

� = 0 and b10b11 : : : b32 = (00 : : : 0)2.

Page 91: Analiza Numerica [Utm, Bostan v.]

E = (b2 : : : b9)2 e x

(00000000)2 = 0 �127 �(0:b10 : : : b32)2 � 2�126(00000001)2 = 1 �126 �(1:b10 : : : b32)2 � 2�126(00000010)2 = 2 �125 �(1:b10 : : : b32)2 � 2�125

... ... ...(01111111)2 = 127 0 �(1:b10 : : : b32)2 � 20(10000000)2 = 128 1 �(1:b10 : : : b32)2 � 21

... ... ...(11111101)2 = 253 126 �(1:b10 : : : b32)2 � 2126(11111110)2 = 254 127 �(1:b10 : : : b32)2 � 2127

(11111111)2 = 255 128�1; if bi = 0NaN; otherwise

Page 92: Analiza Numerica [Utm, Bostan v.]

IEEE DOUBLE PRECISSION STANDARD

x = � � 1:a1a2 : : : a52 � 2e:

� E xb1 b2 : : : b12 b13 : : : b64

where E = e+ 1023

E = (b2 : : : b12)2 e x

(00000000000)2 = 0 �1023 �(:b13 : : : b64)2�1022(00000000001)2 = 1 �1022 �(1:b13 : : : b64)2�1022(00000000010)2 = 2 �1021 �(1:b13 : : : b64)2�1021

... ... ...(01111111111)2 = 1023 0 �(1:b13 : : : b64)20(10000000000)2 = 1024 1 �(1:b13 : : : b64)21

... ... ...(11111111101)2 = 2045 1022 �(1:b13 : : : b64)21022(11111111110)2 = 2046 1023 �(1:b13 : : : b64)21023

(11111111111)2 = 2047 1024�1; bi = 0NaN; otherwise

Page 93: Analiza Numerica [Utm, Bostan v.]

What is the connection of the 24 bits in the signi�cand

x to the number of decimal digits in the storage of

a number x into oating point form. One way of

answering this is to �nd the integer M for which

1. 0 < x � M and x an integer implies fl(x) = x;

and

2.fl(M + 1) 6=M + 1

This integer M is at least as big as

(11 : : : 1| {z })223 ones

= (1:11 : : : 1)2 � 223

= 223 + 222 + : : :+ 20

= 224 � 1

Page 94: Analiza Numerica [Utm, Bostan v.]

Also, 224 = (1:00 : : : 0)2 � 224 will be stored exactly.

Next integer 224+1 cannot be stored exactly since its

signi�cand will contain 24 + 1 binary digits:

224 + 1 = (1:00 : : : 0| {z }23 of 0

1)2 � 224:

Therefore for single precision M = 224. Any integer

less or equal to M will be stored exactly. So

M = 224 = 16777216:

For IEEE double precision standard we have

M = 253 � 9:0� 1015:

Page 95: Analiza Numerica [Utm, Bostan v.]

THE MACHINE EPSILON

Let y be the smallest number representable in the ma-

chine arithmetic that is greater than 1 in the machine.

The machine epsilon is η = y − 1. It is a widely usedmeasure of the accuracy possible in representing num-

bers in the machine.

The number 1 has the simple floating point represen-

tation

1 = (1.00 · · · 0)2 · 20

What is the smallest number that is greater than 1?

It is

1 + 2−23 = (1.0 · · · 01)2 · 20 > 1

and the machine epsilon in IEEE single precision float-

ing point format is η = 2−23 .= 1.19× 10−7.

Page 96: Analiza Numerica [Utm, Bostan v.]

THE UNIT ROUND

Consider the smallest number δ > 0 that is repre-

sentable in the machine and for which

1 + δ > 1

in the arithmetic of the machine.

For any number 0 < α < δ, the result of 1 + α is

exactly 1 in the machines arithmetic. Thus α ‘drops

off the end’ of the floating point representation in the

machine. The size of δ is another way of describing

the accuracy attainable in the floating point represen-

tation of the machine. The machine epsilon.has been

replacing it in recent years.

Page 97: Analiza Numerica [Utm, Bostan v.]

It is not too difficult to derive δ. The number 1 has

the simple floating point representation

1 = (1.00 · · · 0)2 · 20

What is the smallest number which can be added to

this without disappearing? Certainly we can write

1 + 2−23 = (1.0 · · · 01)2 · 20 > 1

Past this point, we need to know whether we are us-

ing chopped arithmetic or rounded arithmetic. We

will shortly look at both of these. With chopped

arithmetic, δ = 2−23; and with rounded arithmetic,δ = 2−24.

Page 98: Analiza Numerica [Utm, Bostan v.]

ROUNDING AND CHOPPING

Let us first consider these concepts with decimal arith-

metic. We write a computer floating point number z

as

z = σ · ζ · 10e ≡ σ · (a1.a2 · · · an)10 · 10e

with a1 6= 0, so that there are n decimal digits in thesignificand (a1.a2 · · · an)10.

Given a general number

x = σ · (a1.a2 · · · an · · · )10 · 10e, a1 6= 0we must shorten it to fit within the computer. This

is done by either chopping or rounding. The floating

point chopped version of x is given by

fl(x) = σ · (a1.a2 · · · an)10 · 10e

where we assume that e fits within the bounds re-

quired by the computer or calculator.

Page 99: Analiza Numerica [Utm, Bostan v.]

For the rounded version, we must decide whether to

round up or round down. A simplified formula is

fl(x) =(σ · (a1.a2 · · · an)10 · 10e an+1 < 5

σ · [(a1.a2 · · · an)10 + (0.0 · · · 1)10] · 10e an+1 ≥ 5The term (0.0 · · · 1)10 denotes 10−n+1, giving the or-dinary sense of rounding with which you are familiar.

In the single case

(0.0 · · · 0an+1an+2 · · · )10 = (0.0 · · · 0500 · · · )10a more elaborate procedure is used so as to assure an

unbiased rounding.

Page 100: Analiza Numerica [Utm, Bostan v.]

CHOPPING/ROUNDING IN BINARY

Let

x = σ · (1.a2 · · · an · · · )2 · 2e

with all ai equal to 0 or 1. Then for a chopped floating

point representation, we have

fl(x) = σ · (1.a2 · · · an)2 · 2e

For a rounded floating point representation, we have

fl(x) =(σ · (1.a2 · · · an)2 · 10e an+1 = 0

σ · [(1.a2 · · · an)2 + (0.0 · · · 1)2] · 10e an+1 = 1

Page 101: Analiza Numerica [Utm, Bostan v.]

ERRORS

The error x−fl(x) = 0 when x needs no change to beput into the computer or calculator. Of more interest

is the case when the error is nonzero. Consider first

the case x > 0 (meaning σ = +1). The case with

x < 0 is the same, except for the sign being opposite.

With x 6= fl(x), and using chopping, we have

fl(x) < x

and the error x− fl(x) is always positive. This later

has major consequences in extended numerical com-

putations. With x 6= fl(x) and rounding, the error

x− fl(x) is negative for half the values of x, and it is

positive for the other half of possible values of x.

Page 102: Analiza Numerica [Utm, Bostan v.]

We often write the relative error as

x− fl(x)

x= −ε

This can be expanded to obtain

fl(x) = (1 + ε)x

Thus fl(x) can be considered as a perturbed value

of x. This is used in many analyses of the effects of

chopping and rounding errors in numerical computa-

tions.

For bounds on ε, we have

−2−n ≤ ε ≤ 2−n, rounding−2−n+1 ≤ ε ≤ 0, chopping

Page 103: Analiza Numerica [Utm, Bostan v.]

IEEE ARITHMETIC

We are only giving the minimal characteristics of IEEE

arithmetic. There are many options available on the

types of arithmetic and the chopping/rounding. The

default arithmetic uses rounding.

Single precision arithmetic:

n = 24, −126 ≤ e ≤ 127This results in

M = 224 = 16777216

η = 2−23 = 1.19× 10−7

Double precision arithmetic:

n = 53, −1022 ≤ e ≤ 1023What are M and η?

There is also an extended representation, having n =

69 digits in its significand.

Page 104: Analiza Numerica [Utm, Bostan v.]

MATLAB can be used to generate the binary oating

point representation of a number.

Execute in MATLAB the command:

format hex

This will cause all subsequent numerical output to the

screen to be given in hexadecimal format (base 16).

For example, listing the number 7.125 results in an

output of

401c800000000000:

The 16 hexadecimal digits are

f0; 1; 2; 3; 4; 5; 6; 7; 8; 9; a; b; c; d; e; fg

To obtain the binary representation, convert each hex-

adecimal digit to a four digit binary number according

to the table below

Page 105: Analiza Numerica [Utm, Bostan v.]

Format Format Format Formathex binary hex binary

0 0000 8 10001 0001 9 10012 0010 a 10103 0011 b 10114 0100 c 11005 0101 d 11016 0110 e 11107 0111 f 1111

For the above number, we obtain the binary expansion

4z }| {0100

0z }| {0000

1z }| {0001

cz }| {1100

8z }| {1000

0z }| {0000

0z }| {0000 : : :

0z }| {0000

0|{z}�10000000001| {z }

E

1100100000000000 : : : 0000| {z }1:b13b14:::b64=x

which provides us with the IEEE double precision rep-

resentation of 7:125.

Page 106: Analiza Numerica [Utm, Bostan v.]

SOME DEFINITIONS

Let xT denote the true value of some number, usuallyunknown in practice; and let xA denote an approxi-mation of xT .

The error in xA is

error(xA) = xT � xA

The relative error in xA is

rel(xA) =error(xA)

xT

=xT � xAxT

Example:

xT = e; xA =197 : Then,

error(xA) = e� 197= 0:003996

rel(xA) =0:003996

e= 0:00147

Page 107: Analiza Numerica [Utm, Bostan v.]

Relative error is more exact in representing the d�er-

ence between the true value and approximated one.

Example: Suppose the distance between two cities is

DT = 100 km and let this distance be approximated

with DA = 99 km. In this case,

Err (DA) = DT �DA = 1 km,

Rel (DA) =Err (DA)

DT= 0:01 � 1%:

Now, suppose that distance is dT = 2 km and esti-

mate it with dA = 1 km. Then

Err (dA) = dT � dA = 1 km,

Rel (dA) =Err (dA)

dT= 0:5 � 50%:

In both cases the error is the same. But, obviously

DA is a better approximation of DT , then dA of dT :

Page 108: Analiza Numerica [Utm, Bostan v.]

Numerical Analysis

conf.dr. Bostan Viorel

Fall 2010 Lecture 3

1 / 83

Page 109: Analiza Numerica [Utm, Bostan v.]

Sources of Error

The sources of error in the computation of the solution of amathematical model for some physical situation can be roughlycharacterised as follows:

1. Modelling Error.Consider the example of a projectile of mass m that is travellingthrugh the earth's atmosphere. A simple and oftenly useddescription of projectile motion is given by

md2�!rdt2

(t) = �mg�!k � bd�!rdt

with b � 0. In this, �!r (t) is the vector position of the projectile;and the �nal term in the equation represents friction force in air. Ifthere is an error in this a model of a physical situation, then thenumerical solution of this equation is not going to improve theresults.

2 / 83

Page 110: Analiza Numerica [Utm, Bostan v.]

Sources of Error

The sources of error in the computation of the solution of amathematical model for some physical situation can be roughlycharacterised as follows:1. Modelling Error.

Consider the example of a projectile of mass m that is travellingthrugh the earth's atmosphere. A simple and oftenly useddescription of projectile motion is given by

md2�!rdt2

(t) = �mg�!k � bd�!rdt

with b � 0. In this, �!r (t) is the vector position of the projectile;and the �nal term in the equation represents friction force in air. Ifthere is an error in this a model of a physical situation, then thenumerical solution of this equation is not going to improve theresults.

3 / 83

Page 111: Analiza Numerica [Utm, Bostan v.]

Sources of Error

The sources of error in the computation of the solution of amathematical model for some physical situation can be roughlycharacterised as follows:1. Modelling Error.Consider the example of a projectile of mass m that is travellingthrugh the earth's atmosphere. A simple and oftenly useddescription of projectile motion is given by

md2�!rdt2

(t) = �mg�!k � bd�!rdt

with b � 0. In this, �!r (t) is the vector position of the projectile;and the �nal term in the equation represents friction force in air. Ifthere is an error in this a model of a physical situation, then thenumerical solution of this equation is not going to improve theresults.

4 / 83

Page 112: Analiza Numerica [Utm, Bostan v.]

Sources of Error

The sources of error in the computation of the solution of amathematical model for some physical situation can be roughlycharacterised as follows:1. Modelling Error.Consider the example of a projectile of mass m that is travellingthrugh the earth's atmosphere. A simple and oftenly useddescription of projectile motion is given by

md2�!rdt2

(t) = �mg�!k � bd�!rdt

with b � 0. In this, �!r (t) is the vector position of the projectile;and the �nal term in the equation represents friction force in air. Ifthere is an error in this a model of a physical situation, then thenumerical solution of this equation is not going to improve theresults.

5 / 83

Page 113: Analiza Numerica [Utm, Bostan v.]

Sources of Error

2. Physical / Observational / Measurement Error.The radius of an electron is given by

(2.81777+ ε)� 10�13cm, jεj � 0.00011

This error cannot be removed, and it must a�ect the accuracy ofany computation in which it is used.

We need to be aware of these e�ects and to so arrange thecomputation as to minimize the e�ects.

6 / 83

Page 114: Analiza Numerica [Utm, Bostan v.]

Sources of Error

2. Physical / Observational / Measurement Error.The radius of an electron is given by

(2.81777+ ε)� 10�13cm, jεj � 0.00011

This error cannot be removed, and it must a�ect the accuracy ofany computation in which it is used.

We need to be aware of these e�ects and to so arrange thecomputation as to minimize the e�ects.

7 / 83

Page 115: Analiza Numerica [Utm, Bostan v.]

Sources of Error

3. Approximation Error.This is also called \discretization error" and \truncation error";and it is the main source of error with which we deal in this course.

Such errors generally occur when we replace a computationallyunsolvable problem with a nearby problem that is more tractablecomputationally.For example, the Taylor polynomial approximation

ex � 1+ x + 12x2

contains an \approximation error".The numerical integration

1R0

f (x)dx � 1

N

N

∑j=1f

�j

N

�contains an approximation error.

8 / 83

Page 116: Analiza Numerica [Utm, Bostan v.]

Sources of Error

3. Approximation Error.This is also called \discretization error" and \truncation error";and it is the main source of error with which we deal in this course.Such errors generally occur when we replace a computationallyunsolvable problem with a nearby problem that is more tractablecomputationally.

For example, the Taylor polynomial approximation

ex � 1+ x + 12x2

contains an \approximation error".The numerical integration

1R0

f (x)dx � 1

N

N

∑j=1f

�j

N

�contains an approximation error.

9 / 83

Page 117: Analiza Numerica [Utm, Bostan v.]

Sources of Error

3. Approximation Error.This is also called \discretization error" and \truncation error";and it is the main source of error with which we deal in this course.Such errors generally occur when we replace a computationallyunsolvable problem with a nearby problem that is more tractablecomputationally.For example, the Taylor polynomial approximation

ex � 1+ x + 12x2

contains an \approximation error".

The numerical integration

1R0

f (x)dx � 1

N

N

∑j=1f

�j

N

�contains an approximation error.

10 / 83

Page 118: Analiza Numerica [Utm, Bostan v.]

Sources of Error

3. Approximation Error.This is also called \discretization error" and \truncation error";and it is the main source of error with which we deal in this course.Such errors generally occur when we replace a computationallyunsolvable problem with a nearby problem that is more tractablecomputationally.For example, the Taylor polynomial approximation

ex � 1+ x + 12x2

contains an \approximation error".The numerical integration

1R0

f (x)dx � 1

N

N

∑j=1f

�j

N

�contains an approximation error.

11 / 83

Page 119: Analiza Numerica [Utm, Bostan v.]

Sources of Error

4. Finiteness of Algorithm ErrorThis is an error due to stopping an algorithm after a �nite numberof iterations.

Even if theoretically an algorithm can run for inde�nite time, aftera �nite (usually speci�ed) number of iterations the algorithm willbe stopped.

12 / 83

Page 120: Analiza Numerica [Utm, Bostan v.]

Sources of Error

4. Finiteness of Algorithm ErrorThis is an error due to stopping an algorithm after a �nite numberof iterations.

Even if theoretically an algorithm can run for inde�nite time, aftera �nite (usually speci�ed) number of iterations the algorithm willbe stopped.

13 / 83

Page 121: Analiza Numerica [Utm, Bostan v.]

Sources of Error

5. Blunders.In the pre-computer era, blunders were mostly arithmetic errors.

Inthe earlier years of the computer era, the typical blunder was aprogramming bugs. Present day \blunders" are still oftenprogramming errors. But now they are often much more di�cult to�nd, as they are often embedded in very large codes which maymask their e�ect.Some simple rules to decrease the risk of having a bug in the code:

Break programs into small testable subprograms;

Run test cases for which you know the outcome;

When running the full code, maintain a skeptical eye on theoutput, checking whether the output is reasonable or not.

14 / 83

Page 122: Analiza Numerica [Utm, Bostan v.]

Sources of Error

5. Blunders.In the pre-computer era, blunders were mostly arithmetic errors. Inthe earlier years of the computer era, the typical blunder was aprogramming bugs.

Present day \blunders" are still oftenprogramming errors. But now they are often much more di�cult to�nd, as they are often embedded in very large codes which maymask their e�ect.Some simple rules to decrease the risk of having a bug in the code:

Break programs into small testable subprograms;

Run test cases for which you know the outcome;

When running the full code, maintain a skeptical eye on theoutput, checking whether the output is reasonable or not.

15 / 83

Page 123: Analiza Numerica [Utm, Bostan v.]

Sources of Error

5. Blunders.In the pre-computer era, blunders were mostly arithmetic errors. Inthe earlier years of the computer era, the typical blunder was aprogramming bugs. Present day \blunders" are still oftenprogramming errors. But now they are often much more di�cult to�nd, as they are often embedded in very large codes which maymask their e�ect.

Some simple rules to decrease the risk of having a bug in the code:

Break programs into small testable subprograms;

Run test cases for which you know the outcome;

When running the full code, maintain a skeptical eye on theoutput, checking whether the output is reasonable or not.

16 / 83

Page 124: Analiza Numerica [Utm, Bostan v.]

Sources of Error

5. Blunders.In the pre-computer era, blunders were mostly arithmetic errors. Inthe earlier years of the computer era, the typical blunder was aprogramming bugs. Present day \blunders" are still oftenprogramming errors. But now they are often much more di�cult to�nd, as they are often embedded in very large codes which maymask their e�ect.Some simple rules to decrease the risk of having a bug in the code:

Break programs into small testable subprograms;

Run test cases for which you know the outcome;

When running the full code, maintain a skeptical eye on theoutput, checking whether the output is reasonable or not.

17 / 83

Page 125: Analiza Numerica [Utm, Bostan v.]

Sources of Error

6. Rounding/chopping Error.This is the main source of many problems, especially problems insolving systems of linear equations. We later look at the e�ects ofsuch errors.

18 / 83

Page 126: Analiza Numerica [Utm, Bostan v.]

Sources of Error

7. Finitness of precision errorsAll the numbers stored in computer memory are subject to the�niteness of allocated space for storage.

19 / 83

Page 127: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Original problem in engineering or in science to be solved:

θ

T

mg

Model this physical problem mathematically.Second Newton law provides us with:

..θ = �g

lsin θ

( .θ = ω.

ω = � gl sin θ

20 / 83

Page 128: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Original problem in engineering or in science to be solved:

θ

T

mg

Model this physical problem mathematically.Second Newton law provides us with:

..θ = �g

lsin θ

( .θ = ω.

ω = � gl sin θ

21 / 83

Page 129: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Original problem in engineering or in science to be solved:

θ

T

mg

Model this physical problem mathematically.Second Newton law provides us with:

..θ = �g

lsin θ

( .θ = ω.

ω = � gl sin θ

22 / 83

Page 130: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Original problem in engineering or in science to be solved:

θ

T

mg

Model this physical problem mathematically.Second Newton law provides us with:

..θ = �g

lsin θ

( .θ = ω.

ω = � gl sin θ

23 / 83

Page 131: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Original problem in engineering or in science to be solved:

θ

T

mg

Model this physical problem mathematically.Second Newton law provides us with:

..θ = �g

lsin θ

( .θ = ω.

ω = � gl sin θ

24 / 83

Page 132: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Problem of continuous mathematics:

θ

T

mg

( .θ = ω.

ω = � gl sin θ

Modeling Errors

Physical Errors

25 / 83

Page 133: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Problem of continuous mathematics:

θ

T

mg

( .θ = ω.

ω = � gl sin θ

Modeling Errors

Physical Errors

26 / 83

Page 134: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Problem of continuous mathematics:

θ

T

mg

( .θ = ω.

ω = � gl sin θ

Modeling Errors

Physical Errors

27 / 83

Page 135: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Mathematical Algorithms:

θ

T

mg

�θn+1 = θn + hωn+1

ωn+1 = ωn � h gl sin (θn)

Discretisation Errors

Finiteness of Algorithm Errors

28 / 83

Page 136: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Mathematical Algorithms:

θ

T

mg

�θn+1 = θn + hωn+1

ωn+1 = ωn � h gl sin (θn)

Discretisation Errors

Finiteness of Algorithm Errors

29 / 83

Page 137: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Mathematical Algorithms:

θ

T

mg

�θn+1 = θn + hωn+1

ωn+1 = ωn � h gl sin (θn)

Discretisation Errors

Finiteness of Algorithm Errors

30 / 83

Page 138: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Computer Implementation:

θ

T

mg

for i=1:NmaxOmega = Omega - H*g/L*sin(Theta);Theta = Theta + H*Omega

end

Rounding / Chopping Errors

Bugs in the Code

Finite Precision Errors

31 / 83

Page 139: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Computer Implementation:

θ

T

mg

for i=1:NmaxOmega = Omega - H*g/L*sin(Theta);Theta = Theta + H*Omega

end

Rounding / Chopping Errors

Bugs in the Code

Finite Precision Errors

32 / 83

Page 140: Analiza Numerica [Utm, Bostan v.]

Pendulum Example

Computer Implementation:

θ

T

mg

for i=1:NmaxOmega = Omega - H*g/L*sin(Theta);Theta = Theta + H*Omega

end

Rounding / Chopping Errors

Bugs in the Code

Finite Precision Errors

33 / 83

Page 141: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

This can be considered a source of error or a consequence of the�niteness of calculator and computer arithmetic.

Example. De�ne

f (x) = x�p

x + 1�px�

and consider evaluating it on a 6-digit decimal calculator whichuses rounded arithmetic.

x Computed f (x) True f (x) Error

1 0.4142210 0.414214 7.0000e � 00610 1.54340 1.54347 �7.0000e � 005100 4.99000 4.98756 0.0024

1000 15.8000 15.8074 �0.007410000 50.0000 49.9988 0.0012

100000 100.000 158.113 �58.1130

34 / 83

Page 142: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

This can be considered a source of error or a consequence of the�niteness of calculator and computer arithmetic.Example. De�ne

f (x) = x�p

x + 1�px�

and consider evaluating it on a 6-digit decimal calculator whichuses rounded arithmetic.

x Computed f (x) True f (x) Error

1 0.4142210 0.414214 7.0000e � 00610 1.54340 1.54347 �7.0000e � 005100 4.99000 4.98756 0.0024

1000 15.8000 15.8074 �0.007410000 50.0000 49.9988 0.0012

100000 100.000 158.113 �58.1130

35 / 83

Page 143: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

This can be considered a source of error or a consequence of the�niteness of calculator and computer arithmetic.Example. De�ne

f (x) = x�p

x + 1�px�

and consider evaluating it on a 6-digit decimal calculator whichuses rounded arithmetic.

x Computed f (x) True f (x) Error

1 0.4142210 0.414214 7.0000e � 00610 1.54340 1.54347 �7.0000e � 005100 4.99000 4.98756 0.0024

1000 15.8000 15.8074 �0.007410000 50.0000 49.9988 0.0012

100000 100.000 158.113 �58.113036 / 83

Page 144: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

Example. De�ne

g(x) =1� cos xx2

and consider evaluating it on a 10-digit decimal calculator whichuses rounded arithmetic.

x Computed f (x) True f (x) Error

0.1 0.4995834700 0.4995834722 �2.2000e � 0090.01 0.4999960000 0.4999958333 1.6670e � 0070.001 0.5000000000 0.4999999583 4.1700e � 0080.0001 0.5000000000 0.4999999996 4.0000e � 0100.00001 0.0 0.5000000000 0.5

37 / 83

Page 145: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

Example. De�ne

g(x) =1� cos xx2

and consider evaluating it on a 10-digit decimal calculator whichuses rounded arithmetic.

x Computed f (x) True f (x) Error

0.1 0.4995834700 0.4995834722 �2.2000e � 0090.01 0.4999960000 0.4999958333 1.6670e � 0070.001 0.5000000000 0.4999999583 4.1700e � 0080.0001 0.5000000000 0.4999999996 4.0000e � 0100.00001 0.0 0.5000000000 0.5

38 / 83

Page 146: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

Consider one case, that of x = 0.001.

Then on the calculator:

cos(0.001) = 0.9999994999

1� cos(0.001) = 5.001� 10�71� cos(0.001)(0.001)2

= 0.5001000000

The true answer is

f (0.001) = .4999999583

The relative error in our answer is

0.4999999583� 0.5001 = �0.00010004170.4999999583

= �0.0002

There are 3 signi�cant digits in the answer. How can such astraightforward and short calculation lead to such a large error(relative to the accuracy of the calculator)?

39 / 83

Page 147: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

Consider one case, that of x = 0.001. Then on the calculator:

cos(0.001) = 0.9999994999

1� cos(0.001) = 5.001� 10�71� cos(0.001)(0.001)2

= 0.5001000000

The true answer is

f (0.001) = .4999999583

The relative error in our answer is

0.4999999583� 0.5001 = �0.00010004170.4999999583

= �0.0002

There are 3 signi�cant digits in the answer. How can such astraightforward and short calculation lead to such a large error(relative to the accuracy of the calculator)?

40 / 83

Page 148: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

Consider one case, that of x = 0.001. Then on the calculator:

cos(0.001) = 0.9999994999

1� cos(0.001) = 5.001� 10�7

1� cos(0.001)(0.001)2

= 0.5001000000

The true answer is

f (0.001) = .4999999583

The relative error in our answer is

0.4999999583� 0.5001 = �0.00010004170.4999999583

= �0.0002

There are 3 signi�cant digits in the answer. How can such astraightforward and short calculation lead to such a large error(relative to the accuracy of the calculator)?

41 / 83

Page 149: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

Consider one case, that of x = 0.001. Then on the calculator:

cos(0.001) = 0.9999994999

1� cos(0.001) = 5.001� 10�71� cos(0.001)(0.001)2

= 0.5001000000

The true answer is

f (0.001) = .4999999583

The relative error in our answer is

0.4999999583� 0.5001 = �0.00010004170.4999999583

= �0.0002

There are 3 signi�cant digits in the answer. How can such astraightforward and short calculation lead to such a large error(relative to the accuracy of the calculator)?

42 / 83

Page 150: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

Consider one case, that of x = 0.001. Then on the calculator:

cos(0.001) = 0.9999994999

1� cos(0.001) = 5.001� 10�71� cos(0.001)(0.001)2

= 0.5001000000

The true answer is

f (0.001) = .4999999583

The relative error in our answer is

0.4999999583� 0.5001 = �0.00010004170.4999999583

= �0.0002

There are 3 signi�cant digits in the answer. How can such astraightforward and short calculation lead to such a large error(relative to the accuracy of the calculator)?

43 / 83

Page 151: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

Consider one case, that of x = 0.001. Then on the calculator:

cos(0.001) = 0.9999994999

1� cos(0.001) = 5.001� 10�71� cos(0.001)(0.001)2

= 0.5001000000

The true answer is

f (0.001) = .4999999583

The relative error in our answer is

0.4999999583� 0.5001 = �0.00010004170.4999999583

= �0.0002

There are 3 signi�cant digits in the answer. How can such astraightforward and short calculation lead to such a large error(relative to the accuracy of the calculator)?

44 / 83

Page 152: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

Consider one case, that of x = 0.001. Then on the calculator:

cos(0.001) = 0.9999994999

1� cos(0.001) = 5.001� 10�71� cos(0.001)(0.001)2

= 0.5001000000

The true answer is

f (0.001) = .4999999583

The relative error in our answer is

0.4999999583� 0.5001 = �0.00010004170.4999999583

= �0.0002

There are 3 signi�cant digits in the answer. How can such astraightforward and short calculation lead to such a large error(relative to the accuracy of the calculator)?

45 / 83

Page 153: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

When two numbers are nearly equal and we subtract them, thenwe su�er a \loss of signi�cance error" in the calculation.

In somecases, these can be quite subtle and di�cult to detect. And evenafter they are detected, they may be di�cult to �x.The last example, fortunately, can be �xed in a number of ways.Easiest is to use a trigonometric identity:

cos(2θ) = 2 cos2(θ)� 1 = 1� 2 sin2(θ)Let x = 2θ. Then

f (x) =1� cos xx2

=2 sin2 (x/2)

x2=1

2

�sin(x/2)x/2

�2This latter formula, with x = 0.001, yields a computed value of0.4999999584, nearly the true answer. We could also have used aTaylor polynomial for cos(x) around x = 0 to obtain a betterapproximation to f (x) for small values of x .

46 / 83

Page 154: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

When two numbers are nearly equal and we subtract them, thenwe su�er a \loss of signi�cance error" in the calculation. In somecases, these can be quite subtle and di�cult to detect.

And evenafter they are detected, they may be di�cult to �x.The last example, fortunately, can be �xed in a number of ways.Easiest is to use a trigonometric identity:

cos(2θ) = 2 cos2(θ)� 1 = 1� 2 sin2(θ)Let x = 2θ. Then

f (x) =1� cos xx2

=2 sin2 (x/2)

x2=1

2

�sin(x/2)x/2

�2This latter formula, with x = 0.001, yields a computed value of0.4999999584, nearly the true answer. We could also have used aTaylor polynomial for cos(x) around x = 0 to obtain a betterapproximation to f (x) for small values of x .

47 / 83

Page 155: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

When two numbers are nearly equal and we subtract them, thenwe su�er a \loss of signi�cance error" in the calculation. In somecases, these can be quite subtle and di�cult to detect. And evenafter they are detected, they may be di�cult to �x.

The last example, fortunately, can be �xed in a number of ways.Easiest is to use a trigonometric identity:

cos(2θ) = 2 cos2(θ)� 1 = 1� 2 sin2(θ)Let x = 2θ. Then

f (x) =1� cos xx2

=2 sin2 (x/2)

x2=1

2

�sin(x/2)x/2

�2This latter formula, with x = 0.001, yields a computed value of0.4999999584, nearly the true answer. We could also have used aTaylor polynomial for cos(x) around x = 0 to obtain a betterapproximation to f (x) for small values of x .

48 / 83

Page 156: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

When two numbers are nearly equal and we subtract them, thenwe su�er a \loss of signi�cance error" in the calculation. In somecases, these can be quite subtle and di�cult to detect. And evenafter they are detected, they may be di�cult to �x.The last example, fortunately, can be �xed in a number of ways.Easiest is to use a trigonometric identity:

cos(2θ) = 2 cos2(θ)� 1 = 1� 2 sin2(θ)Let x = 2θ. Then

f (x) =1� cos xx2

=2 sin2 (x/2)

x2=1

2

�sin(x/2)x/2

�2This latter formula, with x = 0.001, yields a computed value of0.4999999584, nearly the true answer. We could also have used aTaylor polynomial for cos(x) around x = 0 to obtain a betterapproximation to f (x) for small values of x .

49 / 83

Page 157: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

When two numbers are nearly equal and we subtract them, thenwe su�er a \loss of signi�cance error" in the calculation. In somecases, these can be quite subtle and di�cult to detect. And evenafter they are detected, they may be di�cult to �x.The last example, fortunately, can be �xed in a number of ways.Easiest is to use a trigonometric identity:

cos(2θ) = 2 cos2(θ)� 1 = 1� 2 sin2(θ)Let x = 2θ. Then

f (x) =1� cos xx2

=2 sin2 (x/2)

x2=1

2

�sin(x/2)x/2

�2

This latter formula, with x = 0.001, yields a computed value of0.4999999584, nearly the true answer. We could also have used aTaylor polynomial for cos(x) around x = 0 to obtain a betterapproximation to f (x) for small values of x .

50 / 83

Page 158: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

When two numbers are nearly equal and we subtract them, thenwe su�er a \loss of signi�cance error" in the calculation. In somecases, these can be quite subtle and di�cult to detect. And evenafter they are detected, they may be di�cult to �x.The last example, fortunately, can be �xed in a number of ways.Easiest is to use a trigonometric identity:

cos(2θ) = 2 cos2(θ)� 1 = 1� 2 sin2(θ)Let x = 2θ. Then

f (x) =1� cos xx2

=2 sin2 (x/2)

x2=1

2

�sin(x/2)x/2

�2This latter formula, with x = 0.001, yields a computed value of0.4999999584, nearly the true answer. We could also have used aTaylor polynomial for cos(x) around x = 0 to obtain a betterapproximation to f (x) for small values of x .

51 / 83

Page 159: Analiza Numerica [Utm, Bostan v.]

Another example

Evaluate e�5 using a Taylor polynomial approximation:

e�5 = 1+(�5)1!

+(�5)22!

+(�5)33!

+(�5)44!

++(�5)55!

+(�5)66!

. . .

With n = 25, the error is���� (�5)�2626!ec���� � 10�8

Imagine calculating this polynomial using a computer with 4 digitdecimal arithmetic and rounding. To make the point aboutcancellation more strongly, imagine that each of the terms in theabove polynomial is calculated exactly and then rounded to thearithmetic of the computer. We add the terms exactly and then weround to four digits.

52 / 83

Page 160: Analiza Numerica [Utm, Bostan v.]

Another example

Evaluate e�5 using a Taylor polynomial approximation:

e�5 = 1+(�5)1!

+(�5)22!

+(�5)33!

+(�5)44!

++(�5)55!

+(�5)66!

. . .

With n = 25, the error is���� (�5)�2626!ec���� � 10�8

Imagine calculating this polynomial using a computer with 4 digitdecimal arithmetic and rounding. To make the point aboutcancellation more strongly, imagine that each of the terms in theabove polynomial is calculated exactly and then rounded to thearithmetic of the computer. We add the terms exactly and then weround to four digits.

53 / 83

Page 161: Analiza Numerica [Utm, Bostan v.]

Another example

Evaluate e�5 using a Taylor polynomial approximation:

e�5 = 1+(�5)1!

+(�5)22!

+(�5)33!

+(�5)44!

++(�5)55!

+(�5)66!

. . .

With n = 25, the error is���� (�5)�2626!ec���� � 10�8

Imagine calculating this polynomial using a computer with 4 digitdecimal arithmetic and rounding.

To make the point aboutcancellation more strongly, imagine that each of the terms in theabove polynomial is calculated exactly and then rounded to thearithmetic of the computer. We add the terms exactly and then weround to four digits.

54 / 83

Page 162: Analiza Numerica [Utm, Bostan v.]

Another example

Evaluate e�5 using a Taylor polynomial approximation:

e�5 = 1+(�5)1!

+(�5)22!

+(�5)33!

+(�5)44!

++(�5)55!

+(�5)66!

. . .

With n = 25, the error is���� (�5)�2626!ec���� � 10�8

Imagine calculating this polynomial using a computer with 4 digitdecimal arithmetic and rounding. To make the point aboutcancellation more strongly, imagine that each of the terms in theabove polynomial is calculated exactly and then rounded to thearithmetic of the computer. We add the terms exactly and then weround to four digits.

55 / 83

Page 163: Analiza Numerica [Utm, Bostan v.]

Another example

Degree Term Sum Degree Term Sum

0 1.000 1.000 13 �0.1960 �0.042301 �5.000 �4.000 14 0.7001e � 1 0.027712 12.50 8.500 15 �0.2334e � 1 0.0043703 �20.83 �12.33 16 0.7293e � 2 0.011664 26.04 13.71 17 �0.2145e � 2 0.0095185 �26.04 �12.33 18 0.5958e � 3 0.010116 21.70 9.370 19 �0.1568e � 3 0.0099577 �15.50 �6.130 20 0.3920e � 4 0.0099968 9.688 3.558 21 �0.9333e � 5 0.0099879 �5.382 �1.824 22 0.2121e � 5 0.00998910 2.691 0.8670 23 �0.4611e � 6 0.00998911 �1.223 �0.3560 24 0.9670e � 7 0.00998912 0.5097 0.1537 25 �0.1921e � 7 0.009989

True answer is 0.006738

56 / 83

Page 164: Analiza Numerica [Utm, Bostan v.]

Another example

Degree Term Sum Degree Term Sum

0 1.000 1.000 13 �0.1960 �0.042301 �5.000 �4.000 14 0.7001e � 1 0.027712 12.50 8.500 15 �0.2334e � 1 0.0043703 �20.83 �12.33 16 0.7293e � 2 0.011664 26.04 13.71 17 �0.2145e � 2 0.0095185 �26.04 �12.33 18 0.5958e � 3 0.010116 21.70 9.370 19 �0.1568e � 3 0.0099577 �15.50 �6.130 20 0.3920e � 4 0.0099968 9.688 3.558 21 �0.9333e � 5 0.0099879 �5.382 �1.824 22 0.2121e � 5 0.00998910 2.691 0.8670 23 �0.4611e � 6 0.00998911 �1.223 �0.3560 24 0.9670e � 7 0.00998912 0.5097 0.1537 25 �0.1921e � 7 0.009989

True answer is 0.00673857 / 83

Page 165: Analiza Numerica [Utm, Bostan v.]

Another example

To understand more fully the source of the error, look at thenumbers being added and their accuracy.

For example,

(�5)33!

= �1256= �20.83

in the 4 digit decimal calculation, with an error of magnitude0.00333. Note that this error in an intermediate step is of samemagnitude as the true answer 0.006738 being sought. Othersimilar errors are present in calculating other coe�cients, and thusthey cause a major error in the �nal answer being calculated.

General principle

Whenever a sum is being formed in which the �nal answer is muchsmaller than some of the terms being combined, then a loss ofsigni�cance error is occurring.

58 / 83

Page 166: Analiza Numerica [Utm, Bostan v.]

Another example

To understand more fully the source of the error, look at thenumbers being added and their accuracy. For example,

(�5)33!

= �1256= �20.83

in the 4 digit decimal calculation, with an error of magnitude0.00333.

Note that this error in an intermediate step is of samemagnitude as the true answer 0.006738 being sought. Othersimilar errors are present in calculating other coe�cients, and thusthey cause a major error in the �nal answer being calculated.

General principle

Whenever a sum is being formed in which the �nal answer is muchsmaller than some of the terms being combined, then a loss ofsigni�cance error is occurring.

59 / 83

Page 167: Analiza Numerica [Utm, Bostan v.]

Another example

To understand more fully the source of the error, look at thenumbers being added and their accuracy. For example,

(�5)33!

= �1256= �20.83

in the 4 digit decimal calculation, with an error of magnitude0.00333. Note that this error in an intermediate step is of samemagnitude as the true answer 0.006738 being sought.

Othersimilar errors are present in calculating other coe�cients, and thusthey cause a major error in the �nal answer being calculated.

General principle

Whenever a sum is being formed in which the �nal answer is muchsmaller than some of the terms being combined, then a loss ofsigni�cance error is occurring.

60 / 83

Page 168: Analiza Numerica [Utm, Bostan v.]

Another example

To understand more fully the source of the error, look at thenumbers being added and their accuracy. For example,

(�5)33!

= �1256= �20.83

in the 4 digit decimal calculation, with an error of magnitude0.00333. Note that this error in an intermediate step is of samemagnitude as the true answer 0.006738 being sought. Othersimilar errors are present in calculating other coe�cients, and thusthey cause a major error in the �nal answer being calculated.

General principle

Whenever a sum is being formed in which the �nal answer is muchsmaller than some of the terms being combined, then a loss ofsigni�cance error is occurring.

61 / 83

Page 169: Analiza Numerica [Utm, Bostan v.]

Another example

To understand more fully the source of the error, look at thenumbers being added and their accuracy. For example,

(�5)33!

= �1256= �20.83

in the 4 digit decimal calculation, with an error of magnitude0.00333. Note that this error in an intermediate step is of samemagnitude as the true answer 0.006738 being sought. Othersimilar errors are present in calculating other coe�cients, and thusthey cause a major error in the �nal answer being calculated.

General principle

Whenever a sum is being formed in which the �nal answer is muchsmaller than some of the terms being combined, then a loss ofsigni�cance error is occurring.

62 / 83

Page 170: Analiza Numerica [Utm, Bostan v.]

Noise in function evaluation

Consider plotting the function

f (x) = (x � 1)3 = x3 � 3x2 + 3x � 1 = �1+ x(3+ x(�3+ x))

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2­1

­0.8

­0.6

­0.4

­0.2

0

0.2

0.4

0.6

0.8

1

x

y

63 / 83

Page 171: Analiza Numerica [Utm, Bostan v.]

Noise in function evaluation

0.99998 1.00000 1.000002

­8

­4

0

4

8x 10

­15

x

y

64 / 83

Page 172: Analiza Numerica [Utm, Bostan v.]

Noise in function evaluation

Whenever a function f (x) is evaluated, there are arithmeticoperations carried out which involve rounding or chopping errors.

This means that what the computer eventually returns as ananswer contains noise.

This noise is generally \random" and small.

But it can a�ect the accuracy of other calculations which dependon f (x).

65 / 83

Page 173: Analiza Numerica [Utm, Bostan v.]

Noise in function evaluation

Whenever a function f (x) is evaluated, there are arithmeticoperations carried out which involve rounding or chopping errors.

This means that what the computer eventually returns as ananswer contains noise.

This noise is generally \random" and small.

But it can a�ect the accuracy of other calculations which dependon f (x).

66 / 83

Page 174: Analiza Numerica [Utm, Bostan v.]

Noise in function evaluation

Whenever a function f (x) is evaluated, there are arithmeticoperations carried out which involve rounding or chopping errors.

This means that what the computer eventually returns as ananswer contains noise.

This noise is generally \random" and small.

But it can a�ect the accuracy of other calculations which dependon f (x).

67 / 83

Page 175: Analiza Numerica [Utm, Bostan v.]

Noise in function evaluation

Whenever a function f (x) is evaluated, there are arithmeticoperations carried out which involve rounding or chopping errors.

This means that what the computer eventually returns as ananswer contains noise.

This noise is generally \random" and small.

But it can a�ect the accuracy of other calculations which dependon f (x).

68 / 83

Page 176: Analiza Numerica [Utm, Bostan v.]

Under ow errors

Consider evaluatingf (x) = x10

for x near 0.

When using IEEE single precision arithmetic, thesmallest nonzero positive number expressible in normalized oating-point format is

m = 2�126 = 1.18� 10�38

Thus f (x) will be set to zero if

x10 < m

jx j < m110

jx j < 1.61� 10�4

�0.000161 < x < 0.000161

69 / 83

Page 177: Analiza Numerica [Utm, Bostan v.]

Under ow errors

Consider evaluatingf (x) = x10

for x near 0. When using IEEE single precision arithmetic, thesmallest nonzero positive number expressible in normalized oating-point format is

m = 2�126 = 1.18� 10�38

Thus f (x) will be set to zero if

x10 < m

jx j < m110

jx j < 1.61� 10�4

�0.000161 < x < 0.000161

70 / 83

Page 178: Analiza Numerica [Utm, Bostan v.]

Under ow errors

Consider evaluatingf (x) = x10

for x near 0. When using IEEE single precision arithmetic, thesmallest nonzero positive number expressible in normalized oating-point format is

m = 2�126 = 1.18� 10�38

Thus f (x) will be set to zero if

x10 < m

jx j < m110

jx j < 1.61� 10�4

�0.000161 < x < 0.000161

71 / 83

Page 179: Analiza Numerica [Utm, Bostan v.]

Under ow errors

Consider evaluatingf (x) = x10

for x near 0. When using IEEE single precision arithmetic, thesmallest nonzero positive number expressible in normalized oating-point format is

m = 2�126 = 1.18� 10�38

Thus f (x) will be set to zero if

x10 < m

jx j < m110

jx j < 1.61� 10�4

�0.000161 < x < 0.000161

72 / 83

Page 180: Analiza Numerica [Utm, Bostan v.]

Under ow errors

Consider evaluatingf (x) = x10

for x near 0. When using IEEE single precision arithmetic, thesmallest nonzero positive number expressible in normalized oating-point format is

m = 2�126 = 1.18� 10�38

Thus f (x) will be set to zero if

x10 < m

jx j < m110

jx j < 1.61� 10�4

�0.000161 < x < 0.000161

73 / 83

Page 181: Analiza Numerica [Utm, Bostan v.]

Under ow errors

Consider evaluatingf (x) = x10

for x near 0. When using IEEE single precision arithmetic, thesmallest nonzero positive number expressible in normalized oating-point format is

m = 2�126 = 1.18� 10�38

Thus f (x) will be set to zero if

x10 < m

jx j < m110

jx j < 1.61� 10�4

�0.000161 < x < 0.000161

74 / 83

Page 182: Analiza Numerica [Utm, Bostan v.]

Under ow errors

Consider evaluatingf (x) = x10

for x near 0. When using IEEE single precision arithmetic, thesmallest nonzero positive number expressible in normalized oating-point format is

m = 2�126 = 1.18� 10�38

Thus f (x) will be set to zero if

x10 < m

jx j < m110

jx j < 1.61� 10�4

�0.000161 < x < 0.000161

75 / 83

Page 183: Analiza Numerica [Utm, Bostan v.]

Over ow errors

Attempts to use numbers that are too large for the oating-pointformat will lead to over ow errors.

These are generally fatalerrors on most computers. With the IEEE oating-point format,over ow errors can be carried along as having a value of �∞ orNaN, depending on the context. Usually an over ow error is anindication of a more signi�cant problem or error in the programand the user needs to be aware of such errors.When using IEEE single precision arithmetic, the largest nonzeropositive number expressible in normalized oating point format is

m = 2128�1� 2�24

�= 3.40� 1038

Thus, f (x) will over ow if

x10 > m

jx j > m110

jx j > 7131.6

76 / 83

Page 184: Analiza Numerica [Utm, Bostan v.]

Over ow errors

Attempts to use numbers that are too large for the oating-pointformat will lead to over ow errors. These are generally fatalerrors on most computers. With the IEEE oating-point format,over ow errors can be carried along as having a value of �∞ orNaN, depending on the context.

Usually an over ow error is anindication of a more signi�cant problem or error in the programand the user needs to be aware of such errors.When using IEEE single precision arithmetic, the largest nonzeropositive number expressible in normalized oating point format is

m = 2128�1� 2�24

�= 3.40� 1038

Thus, f (x) will over ow if

x10 > m

jx j > m110

jx j > 7131.6

77 / 83

Page 185: Analiza Numerica [Utm, Bostan v.]

Over ow errors

Attempts to use numbers that are too large for the oating-pointformat will lead to over ow errors. These are generally fatalerrors on most computers. With the IEEE oating-point format,over ow errors can be carried along as having a value of �∞ orNaN, depending on the context. Usually an over ow error is anindication of a more signi�cant problem or error in the programand the user needs to be aware of such errors.

When using IEEE single precision arithmetic, the largest nonzeropositive number expressible in normalized oating point format is

m = 2128�1� 2�24

�= 3.40� 1038

Thus, f (x) will over ow if

x10 > m

jx j > m110

jx j > 7131.6

78 / 83

Page 186: Analiza Numerica [Utm, Bostan v.]

Over ow errors

Attempts to use numbers that are too large for the oating-pointformat will lead to over ow errors. These are generally fatalerrors on most computers. With the IEEE oating-point format,over ow errors can be carried along as having a value of �∞ orNaN, depending on the context. Usually an over ow error is anindication of a more signi�cant problem or error in the programand the user needs to be aware of such errors.When using IEEE single precision arithmetic, the largest nonzeropositive number expressible in normalized oating point format is

m = 2128�1� 2�24

�= 3.40� 1038

Thus, f (x) will over ow if

x10 > m

jx j > m110

jx j > 7131.6

79 / 83

Page 187: Analiza Numerica [Utm, Bostan v.]

Over ow errors

Attempts to use numbers that are too large for the oating-pointformat will lead to over ow errors. These are generally fatalerrors on most computers. With the IEEE oating-point format,over ow errors can be carried along as having a value of �∞ orNaN, depending on the context. Usually an over ow error is anindication of a more signi�cant problem or error in the programand the user needs to be aware of such errors.When using IEEE single precision arithmetic, the largest nonzeropositive number expressible in normalized oating point format is

m = 2128�1� 2�24

�= 3.40� 1038

Thus, f (x) will over ow if

x10 > m

jx j > m110

jx j > 7131.6

80 / 83

Page 188: Analiza Numerica [Utm, Bostan v.]

Over ow errors

Attempts to use numbers that are too large for the oating-pointformat will lead to over ow errors. These are generally fatalerrors on most computers. With the IEEE oating-point format,over ow errors can be carried along as having a value of �∞ orNaN, depending on the context. Usually an over ow error is anindication of a more signi�cant problem or error in the programand the user needs to be aware of such errors.When using IEEE single precision arithmetic, the largest nonzeropositive number expressible in normalized oating point format is

m = 2128�1� 2�24

�= 3.40� 1038

Thus, f (x) will over ow if

x10 > m

jx j > m110

jx j > 7131.6

81 / 83

Page 189: Analiza Numerica [Utm, Bostan v.]

Over ow errors

Attempts to use numbers that are too large for the oating-pointformat will lead to over ow errors. These are generally fatalerrors on most computers. With the IEEE oating-point format,over ow errors can be carried along as having a value of �∞ orNaN, depending on the context. Usually an over ow error is anindication of a more signi�cant problem or error in the programand the user needs to be aware of such errors.When using IEEE single precision arithmetic, the largest nonzeropositive number expressible in normalized oating point format is

m = 2128�1� 2�24

�= 3.40� 1038

Thus, f (x) will over ow if

x10 > m

jx j > m110

jx j > 7131.682 / 83

Page 190: Analiza Numerica [Utm, Bostan v.]

Over ow errors

Attempts to use numbers that are too large for the oating-pointformat will lead to over ow errors. These are generally fatalerrors on most computers. With the IEEE oating-point format,over ow errors can be carried along as having a value of �∞ orNaN, depending on the context. Usually an over ow error is anindication of a more signi�cant problem or error in the programand the user needs to be aware of such errors.When using IEEE single precision arithmetic, the largest nonzeropositive number expressible in normalized oating point format is

m = 2128�1� 2�24

�= 3.40� 1038

Thus, f (x) will over ow if

x10 > m

jx j > m110

jx j > 7131.683 / 83

Page 191: Analiza Numerica [Utm, Bostan v.]

Numerical Analysis

conf.dr. Bostan Viorel

Fall 2010 Lecture 5

1 / 101

Page 192: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

This can be considered a source of error or a consequence of the�niteness of calculator and computer arithmetic.

Example. De�ne

f (x) = x�p

x + 1�px�

and consider evaluating it on a 6-digit decimal calculator whichuses rounded arithmetic.

x Computed f (x) True f (x) Error

1 0.4142210 0.414214 7.0000e � 00610 1.54340 1.54347 �7.0000e � 005100 4.99000 4.98756 0.0024

1000 15.8000 15.8074 �0.007410000 50.0000 49.9988 0.0012

100000 100.000 158.113 �58.1130

2 / 101

Page 193: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

This can be considered a source of error or a consequence of the�niteness of calculator and computer arithmetic.Example. De�ne

f (x) = x�p

x + 1�px�

and consider evaluating it on a 6-digit decimal calculator whichuses rounded arithmetic.

x Computed f (x) True f (x) Error

1 0.4142210 0.414214 7.0000e � 00610 1.54340 1.54347 �7.0000e � 005100 4.99000 4.98756 0.0024

1000 15.8000 15.8074 �0.007410000 50.0000 49.9988 0.0012

100000 100.000 158.113 �58.1130

3 / 101

Page 194: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

This can be considered a source of error or a consequence of the�niteness of calculator and computer arithmetic.Example. De�ne

f (x) = x�p

x + 1�px�

and consider evaluating it on a 6-digit decimal calculator whichuses rounded arithmetic.

x Computed f (x) True f (x) Error

1 0.4142210 0.414214 7.0000e � 00610 1.54340 1.54347 �7.0000e � 005100 4.99000 4.98756 0.0024

1000 15.8000 15.8074 �0.007410000 50.0000 49.9988 0.0012

100000 100.000 158.113 �58.11304 / 101

Page 195: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

In order to localise the error consider the case x = 100.

The calculator with 6 decimal digits will provide us with thefollowing values

p100 = 10,

p101 = 10.0499.

Then px + 1�

px =

p101�

p100 = 0.0499000,

while the exact value is 0.0498756.

Three signi�cant digits inpx + 1 =

p101 have been lost fromp

x =p100.

The loss of precision is due to the form of the function f (x) andthe �niteness of the precision of the 6 digit calculator.

5 / 101

Page 196: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

In order to localise the error consider the case x = 100.The calculator with 6 decimal digits will provide us with thefollowing values

p100 = 10,

p101 = 10.0499.

Then px + 1�

px =

p101�

p100 = 0.0499000,

while the exact value is 0.0498756.

Three signi�cant digits inpx + 1 =

p101 have been lost fromp

x =p100.

The loss of precision is due to the form of the function f (x) andthe �niteness of the precision of the 6 digit calculator.

6 / 101

Page 197: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

In order to localise the error consider the case x = 100.The calculator with 6 decimal digits will provide us with thefollowing values

p100 = 10,

p101 = 10.0499.

Then px + 1�

px =

p101�

p100 = 0.0499000,

while the exact value is 0.0498756.

Three signi�cant digits inpx + 1 =

p101 have been lost fromp

x =p100.

The loss of precision is due to the form of the function f (x) andthe �niteness of the precision of the 6 digit calculator.

7 / 101

Page 198: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

In order to localise the error consider the case x = 100.The calculator with 6 decimal digits will provide us with thefollowing values

p100 = 10,

p101 = 10.0499.

Then px + 1�

px =

p101�

p100 = 0.0499000,

while the exact value is 0.0498756.

Three signi�cant digits inpx + 1 =

p101 have been lost fromp

x =p100.

The loss of precision is due to the form of the function f (x) andthe �niteness of the precision of the 6 digit calculator.

8 / 101

Page 199: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

In order to localise the error consider the case x = 100.The calculator with 6 decimal digits will provide us with thefollowing values

p100 = 10,

p101 = 10.0499.

Then px + 1�

px =

p101�

p100 = 0.0499000,

while the exact value is 0.0498756.

Three signi�cant digits inpx + 1 =

p101 have been lost fromp

x =p100.

The loss of precision is due to the form of the function f (x) andthe �niteness of the precision of the 6 digit calculator.

9 / 101

Page 200: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

In this particular case, we can avoid the loss of precision byrewritining the function as follows

f (x) = x

px + 1+

pxp

x + 1+px�px + 1�

px

1

=xp

x + 1+px.

Thus we will avoid the subtraction on near quantities.

Doing so gives usf (100) = 4.98756,

a value with 6 signi�cant digits.

10 / 101

Page 201: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

In this particular case, we can avoid the loss of precision byrewritining the function as follows

f (x) = x

px + 1+

pxp

x + 1+px�px + 1�

px

1

=xp

x + 1+px.

Thus we will avoid the subtraction on near quantities.

Doing so gives usf (100) = 4.98756,

a value with 6 signi�cant digits.

11 / 101

Page 202: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

In this particular case, we can avoid the loss of precision byrewritining the function as follows

f (x) = x

px + 1+

pxp

x + 1+px�px + 1�

px

1

=xp

x + 1+px.

Thus we will avoid the subtraction on near quantities.

Doing so gives usf (100) = 4.98756,

a value with 6 signi�cant digits.

12 / 101

Page 203: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

In this particular case, we can avoid the loss of precision byrewritining the function as follows

f (x) = x

px + 1+

pxp

x + 1+px�px + 1�

px

1

=xp

x + 1+px.

Thus we will avoid the subtraction on near quantities.

Doing so gives usf (100) = 4.98756,

a value with 6 signi�cant digits.

13 / 101

Page 204: Analiza Numerica [Utm, Bostan v.]

Loss of signi�cance errors

In this particular case, we can avoid the loss of precision byrewritining the function as follows

f (x) = x

px + 1+

pxp

x + 1+px�px + 1�

px

1

=xp

x + 1+px.

Thus we will avoid the subtraction on near quantities.

Doing so gives usf (100) = 4.98756,

a value with 6 signi�cant digits.

14 / 101

Page 205: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

Let ω denote arithmetic operation such as +,�, �,or /.

Let ω� denote the same arithmetic operation as it is actuallycarried out in the computer, including rounding or chopping error.

Let xA � xT and yA � yT .

We want to obtain xT ω yT , but we actually obtain xA ω� yA.

The error in xA ω� yA is given by

(xT ω yT )� (xA ω� yA)

15 / 101

Page 206: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

Let ω denote arithmetic operation such as +,�, �,or /.

Let ω� denote the same arithmetic operation as it is actuallycarried out in the computer, including rounding or chopping error.

Let xA � xT and yA � yT .

We want to obtain xT ω yT , but we actually obtain xA ω� yA.

The error in xA ω� yA is given by

(xT ω yT )� (xA ω� yA)

16 / 101

Page 207: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

Let ω denote arithmetic operation such as +,�, �,or /.

Let ω� denote the same arithmetic operation as it is actuallycarried out in the computer, including rounding or chopping error.

Let xA � xT and yA � yT .

We want to obtain xT ω yT , but we actually obtain xA ω� yA.

The error in xA ω� yA is given by

(xT ω yT )� (xA ω� yA)

17 / 101

Page 208: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

Let ω denote arithmetic operation such as +,�, �,or /.

Let ω� denote the same arithmetic operation as it is actuallycarried out in the computer, including rounding or chopping error.

Let xA � xT and yA � yT .

We want to obtain xT ω yT , but we actually obtain xA ω� yA.

The error in xA ω� yA is given by

(xT ω yT )� (xA ω� yA)

18 / 101

Page 209: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

Let ω denote arithmetic operation such as +,�, �,or /.

Let ω� denote the same arithmetic operation as it is actuallycarried out in the computer, including rounding or chopping error.

Let xA � xT and yA � yT .

We want to obtain xT ω yT , but we actually obtain xA ω� yA.

The error in xA ω� yA is given by

(xT ω yT )� (xA ω� yA)

19 / 101

Page 210: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

The error in xA ω� yA is rewritten as

(xT ω yT )� (xA ω� yA) = (xT ω yT � xA ω yA)+ (xA ω yA � xA ω� yA)

The �nal term is the error introduced by the inexactness of themachine arithmetic. For it, we usually assume

xA ω� yA = (xA ω yA)

This means that the quantity xA ω yA is computed exactly and isthen rounded or chopped to �t the answer into the oating pointrepresentation of the machine.

20 / 101

Page 211: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

The error in xA ω� yA is rewritten as

(xT ω yT )� (xA ω� yA) = (xT ω yT � xA ω yA)+ (xA ω yA � xA ω� yA)

The �nal term is the error introduced by the inexactness of themachine arithmetic. For it, we usually assume

xA ω� yA = (xA ω yA)

This means that the quantity xA ω yA is computed exactly and isthen rounded or chopped to �t the answer into the oating pointrepresentation of the machine.

21 / 101

Page 212: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

The error in xA ω� yA is rewritten as

(xT ω yT )� (xA ω� yA) = (xT ω yT � xA ω yA)+ (xA ω yA � xA ω� yA)

The �nal term is the error introduced by the inexactness of themachine arithmetic. For it, we usually assume

xA ω� yA = (xA ω yA)

This means that the quantity xA ω yA is computed exactly and isthen rounded or chopped to �t the answer into the oating pointrepresentation of the machine.

22 / 101

Page 213: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

The formulaxA ω� yA = (xA ω yA)

impliesxA ω� yA = xA ω yA (1+ ε)

since (x) = x(1+ ε)

where limits for ε were given earier.Then,

Rel(xA ω� yA) =xA ω yA � xA ω� yA

xA ω yA

=xA ω yA � xA ω yA (1+ ε)

xA ω yA= �ε

23 / 101

Page 214: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

The formulaxA ω� yA = (xA ω yA)

impliesxA ω� yA = xA ω yA (1+ ε)

since (x) = x(1+ ε)

where limits for ε were given earier.Then,

Rel(xA ω� yA) =xA ω yA � xA ω� yA

xA ω yA

=xA ω yA � xA ω yA (1+ ε)

xA ω yA= �ε

24 / 101

Page 215: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

The formulaxA ω� yA = (xA ω yA)

impliesxA ω� yA = xA ω yA (1+ ε)

since (x) = x(1+ ε)

where limits for ε were given earier.

Then,

Rel(xA ω� yA) =xA ω yA � xA ω� yA

xA ω yA

=xA ω yA � xA ω yA (1+ ε)

xA ω yA= �ε

25 / 101

Page 216: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

The formulaxA ω� yA = (xA ω yA)

impliesxA ω� yA = xA ω yA (1+ ε)

since (x) = x(1+ ε)

where limits for ε were given earier.Then,

Rel(xA ω� yA) =xA ω yA � xA ω� yA

xA ω yA

=xA ω yA � xA ω yA (1+ ε)

xA ω yA= �ε

26 / 101

Page 217: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

The formulaxA ω� yA = (xA ω yA)

impliesxA ω� yA = xA ω yA (1+ ε)

since (x) = x(1+ ε)

where limits for ε were given earier.Then,

Rel(xA ω� yA) =xA ω yA � xA ω� yA

xA ω yA

=xA ω yA � xA ω yA (1+ ε)

xA ω yA

= �ε

27 / 101

Page 218: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

The formulaxA ω� yA = (xA ω yA)

impliesxA ω� yA = xA ω yA (1+ ε)

since (x) = x(1+ ε)

where limits for ε were given earier.Then,

Rel(xA ω� yA) =xA ω yA � xA ω� yA

xA ω yA

=xA ω yA � xA ω yA (1+ ε)

xA ω yA= �ε

28 / 101

Page 219: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

The formulaxA ω� yA = (xA ω yA)

impliesxA ω� yA = xA ω yA (1+ ε)

since (x) = x(1+ ε)

where limits for ε were given earier.Then,

Rel(xA ω� yA) =xA ω yA � xA ω� yA

xA ω yA

=xA ω yA � xA ω yA (1+ ε)

xA ω yA= �ε

29 / 101

Page 220: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

With rounded binary arithmetic having n digits in the mantissa,

�2�n � ε � 2�n

Coming back to error formula

(xT ω yT )� (xA ω� yA) = (xT ω yT � xA ω yA)+ (xA ω yA � xA ω� yA)| {z }Relative error is �ε

The second termxT ω yT � xA ω yA

is the propagated error.In what follows we examine it for particular cases.

30 / 101

Page 221: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

With rounded binary arithmetic having n digits in the mantissa,

�2�n � ε � 2�n

Coming back to error formula

(xT ω yT )� (xA ω� yA) = (xT ω yT � xA ω yA)+ (xA ω yA � xA ω� yA)| {z }Relative error is �ε

The second termxT ω yT � xA ω yA

is the propagated error.In what follows we examine it for particular cases.

31 / 101

Page 222: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

With rounded binary arithmetic having n digits in the mantissa,

�2�n � ε � 2�n

Coming back to error formula

(xT ω yT )� (xA ω� yA) = (xT ω yT � xA ω yA)+ (xA ω yA � xA ω� yA)| {z }Relative error is �ε

The second termxT ω yT � xA ω yA

is the propagated error.

In what follows we examine it for particular cases.

32 / 101

Page 223: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in arithmetic operations

With rounded binary arithmetic having n digits in the mantissa,

�2�n � ε � 2�n

Coming back to error formula

(xT ω yT )� (xA ω� yA) = (xT ω yT � xA ω yA)+ (xA ω yA � xA ω� yA)| {z }Relative error is �ε

The second termxT ω yT � xA ω yA

is the propagated error.In what follows we examine it for particular cases.

33 / 101

Page 224: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Let ω = �. WritexT = xA + ξ, yT = yA + η

Then for the relative error in xA yA

Rel(xA yA) =xT yT � xA yA

xT yT

=xT yT � (xT � ξ)(yt � η)

xT yT

=xT yT � xT yT + xTη + yT ξ � ξη

xT yT

=xTη + yT ξ � ξη

xT yT

xT+

η

yT� ξ

xT� η

yT= Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)

34 / 101

Page 225: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Let ω = �. WritexT = xA + ξ, yT = yA + η

Then for the relative error in xA yA

Rel(xA yA) =xT yT � xA yA

xT yT

=xT yT � (xT � ξ)(yt � η)

xT yT

=xT yT � xT yT + xTη + yT ξ � ξη

xT yT

=xTη + yT ξ � ξη

xT yT

xT+

η

yT� ξ

xT� η

yT= Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)

35 / 101

Page 226: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Let ω = �. WritexT = xA + ξ, yT = yA + η

Then for the relative error in xA yA

Rel(xA yA) =xT yT � xA yA

xT yT

=xT yT � (xT � ξ)(yt � η)

xT yT

=xT yT � xT yT + xTη + yT ξ � ξη

xT yT

=xTη + yT ξ � ξη

xT yT

xT+

η

yT� ξ

xT� η

yT= Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)

36 / 101

Page 227: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Let ω = �. WritexT = xA + ξ, yT = yA + η

Then for the relative error in xA yA

Rel(xA yA) =xT yT � xA yA

xT yT

=xT yT � (xT � ξ)(yt � η)

xT yT

=xT yT � xT yT + xTη + yT ξ � ξη

xT yT

=xTη + yT ξ � ξη

xT yT

xT+

η

yT� ξ

xT� η

yT= Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)

37 / 101

Page 228: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Let ω = �. WritexT = xA + ξ, yT = yA + η

Then for the relative error in xA yA

Rel(xA yA) =xT yT � xA yA

xT yT

=xT yT � (xT � ξ)(yt � η)

xT yT

=xT yT � xT yT + xTη + yT ξ � ξη

xT yT

=xTη + yT ξ � ξη

xT yT

xT+

η

yT� ξ

xT� η

yT= Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)

38 / 101

Page 229: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Let ω = �. WritexT = xA + ξ, yT = yA + η

Then for the relative error in xA yA

Rel(xA yA) =xT yT � xA yA

xT yT

=xT yT � (xT � ξ)(yt � η)

xT yT

=xT yT � xT yT + xTη + yT ξ � ξη

xT yT

=xTη + yT ξ � ξη

xT yT

xT+

η

yT� ξ

xT� η

yT

= Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)

39 / 101

Page 230: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Let ω = �. WritexT = xA + ξ, yT = yA + η

Then for the relative error in xA yA

Rel(xA yA) =xT yT � xA yA

xT yT

=xT yT � (xT � ξ)(yt � η)

xT yT

=xT yT � xT yT + xTη + yT ξ � ξη

xT yT

=xTη + yT ξ � ξη

xT yT

xT+

η

yT� ξ

xT� η

yT= Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)

40 / 101

Page 231: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Let ω = �. WritexT = xA + ξ, yT = yA + η

Then for the relative error in xA yA

Rel(xA yA) =xT yT � xA yA

xT yT

=xT yT � (xT � ξ)(yt � η)

xT yT

=xT yT � xT yT + xTη + yT ξ � ξη

xT yT

=xTη + yT ξ � ξη

xT yT

xT+

η

yT� ξ

xT� η

yT= Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)

41 / 101

Page 232: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Usually we have

jRel(xA)j � 1, jRel(yA)j � 1

therefore, we can skip the last term Rel(xA)�Rel( yA), since it ismuch smaller compared with previous two

Rel(xA yA) = Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)� Rel(xA) + Rel( yA)

Thus small relative errors in the arguments xA and yA leads to asmall relative error in the product xAyA.

Also, note that there is some cancellation if these relative errorsare of opposite sign.

42 / 101

Page 233: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Usually we have

jRel(xA)j � 1, jRel(yA)j � 1

therefore, we can skip the last term Rel(xA)�Rel( yA), since it ismuch smaller compared with previous two

Rel(xA yA) = Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)

� Rel(xA) + Rel( yA)

Thus small relative errors in the arguments xA and yA leads to asmall relative error in the product xAyA.

Also, note that there is some cancellation if these relative errorsare of opposite sign.

43 / 101

Page 234: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Usually we have

jRel(xA)j � 1, jRel(yA)j � 1

therefore, we can skip the last term Rel(xA)�Rel( yA), since it ismuch smaller compared with previous two

Rel(xA yA) = Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)� Rel(xA) + Rel( yA)

Thus small relative errors in the arguments xA and yA leads to asmall relative error in the product xAyA.

Also, note that there is some cancellation if these relative errorsare of opposite sign.

44 / 101

Page 235: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Usually we have

jRel(xA)j � 1, jRel(yA)j � 1

therefore, we can skip the last term Rel(xA)�Rel( yA), since it ismuch smaller compared with previous two

Rel(xA yA) = Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)� Rel(xA) + Rel( yA)

Thus small relative errors in the arguments xA and yA leads to asmall relative error in the product xAyA.

Also, note that there is some cancellation if these relative errorsare of opposite sign.

45 / 101

Page 236: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Usually we have

jRel(xA)j � 1, jRel(yA)j � 1

therefore, we can skip the last term Rel(xA)�Rel( yA), since it ismuch smaller compared with previous two

Rel(xA yA) = Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)� Rel(xA) + Rel( yA)

Thus small relative errors in the arguments xA and yA leads to asmall relative error in the product xAyA.

Also, note that there is some cancellation if these relative errorsare of opposite sign.

46 / 101

Page 237: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in multiplication

Usually we have

jRel(xA)j � 1, jRel(yA)j � 1

therefore, we can skip the last term Rel(xA)�Rel( yA), since it ismuch smaller compared with previous two

Rel(xA yA) = Rel(xA) + Rel( yA)� Rel(xA) � Rel( yA)� Rel(xA) + Rel( yA)

Thus small relative errors in the arguments xA and yA leads to asmall relative error in the product xAyA.

Also, note that there is some cancellation if these relative errorsare of opposite sign.

47 / 101

Page 238: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in division

There is a similar result for division:

Rel(xA yA) � Rel(xA)� Rel( yA)

providedjRel(yA)j � 1

48 / 101

Page 239: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in addition and subtraction

For ω equal to � or +, we have

[xT � yT ]� [xA � yA] = [xT � xA]� [yT � yA]

Thus the error in a sum is the sum of the errors in theoriginal arguments, and similarly for subtraction.

However, there is a more subtle error occurring here.

49 / 101

Page 240: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in addition and subtraction

For ω equal to � or +, we have

[xT � yT ]� [xA � yA] = [xT � xA]� [yT � yA]

Thus the error in a sum is the sum of the errors in theoriginal arguments, and similarly for subtraction.

However, there is a more subtle error occurring here.

50 / 101

Page 241: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsPropagation in addition and subtraction

For ω equal to � or +, we have

[xT � yT ]� [xA � yA] = [xT � xA]� [yT � yA]

Thus the error in a sum is the sum of the errors in theoriginal arguments, and similarly for subtraction.

However, there is a more subtle error occurring here.

51 / 101

Page 242: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Suppose you are solving

x2 � 26x + 1 = 0

Using the quadratic formula, we have the true answers

r1,T = 13+p168, r2,T = 13�

p168

From a table of square roots, we takep168 � 12.961

Since this is correctly rounded to 5 digits, we have���p168� 12.961��� � 0.0005Then de�ne

r1,A = 13+ 12.961 = 25.961,

r2,A = 13� 12.961 = 0.039

52 / 101

Page 243: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Suppose you are solving

x2 � 26x + 1 = 0Using the quadratic formula, we have the true answers

r1,T = 13+p168, r2,T = 13�

p168

From a table of square roots, we takep168 � 12.961

Since this is correctly rounded to 5 digits, we have���p168� 12.961��� � 0.0005Then de�ne

r1,A = 13+ 12.961 = 25.961,

r2,A = 13� 12.961 = 0.039

53 / 101

Page 244: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Suppose you are solving

x2 � 26x + 1 = 0Using the quadratic formula, we have the true answers

r1,T = 13+p168, r2,T = 13�

p168

From a table of square roots, we takep168 � 12.961

Since this is correctly rounded to 5 digits, we have���p168� 12.961��� � 0.0005Then de�ne

r1,A = 13+ 12.961 = 25.961,

r2,A = 13� 12.961 = 0.039

54 / 101

Page 245: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Suppose you are solving

x2 � 26x + 1 = 0Using the quadratic formula, we have the true answers

r1,T = 13+p168, r2,T = 13�

p168

From a table of square roots, we takep168 � 12.961

Since this is correctly rounded to 5 digits, we have���p168� 12.961��� � 0.0005

Then de�ne

r1,A = 13+ 12.961 = 25.961,

r2,A = 13� 12.961 = 0.039

55 / 101

Page 246: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Suppose you are solving

x2 � 26x + 1 = 0Using the quadratic formula, we have the true answers

r1,T = 13+p168, r2,T = 13�

p168

From a table of square roots, we takep168 � 12.961

Since this is correctly rounded to 5 digits, we have���p168� 12.961��� � 0.0005Then de�ne

r1,A = 13+ 12.961 = 25.961,

r2,A = 13� 12.961 = 0.03956 / 101

Page 247: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Then for both roots,

jrT � rAj � 0.0005

For the relative errors, however,

Rel (r1,A) =r1,T � r1,Ar1,T

� 0.0005

25.9605� 3.13� 10�5

Rel (r2,A) =r2,T � r2,Ar2,T

� 0.0005

0.0385� 0.0130

Why does r2,A have such poor accuracy in comparison to r1,A?

57 / 101

Page 248: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Then for both roots,

jrT � rAj � 0.0005

For the relative errors, however,

Rel (r1,A) =r1,T � r1,Ar1,T

� 0.0005

25.9605� 3.13� 10�5

Rel (r2,A) =r2,T � r2,Ar2,T

� 0.0005

0.0385� 0.0130

Why does r2,A have such poor accuracy in comparison to r1,A?

58 / 101

Page 249: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Then for both roots,

jrT � rAj � 0.0005

For the relative errors, however,

Rel (r1,A) =r1,T � r1,Ar1,T

� 0.0005

25.9605� 3.13� 10�5

Rel (r2,A) =r2,T � r2,Ar2,T

� 0.0005

0.0385� 0.0130

Why does r2,A have such poor accuracy in comparison to r1,A?

59 / 101

Page 250: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Then for both roots,

jrT � rAj � 0.0005

For the relative errors, however,

Rel (r1,A) =r1,T � r1,Ar1,T

� 0.0005

25.9605� 3.13� 10�5

Rel (r2,A) =r2,T � r2,Ar2,T

� 0.0005

0.0385� 0.0130

Why does r2,A have such poor accuracy in comparison to r1,A?

60 / 101

Page 251: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Then for both roots,

jrT � rAj � 0.0005

For the relative errors, however,

Rel (r1,A) =r1,T � r1,Ar1,T

� 0.0005

25.9605� 3.13� 10�5

Rel (r2,A) =r2,T � r2,Ar2,T

� 0.0005

0.0385� 0.0130

Why does r2,A have such poor accuracy in comparison to r1,A?

61 / 101

Page 252: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Then for both roots,

jrT � rAj � 0.0005

For the relative errors, however,

Rel (r1,A) =r1,T � r1,Ar1,T

� 0.0005

25.9605� 3.13� 10�5

Rel (r2,A) =r2,T � r2,Ar2,T

� 0.0005

0.0385� 0.0130

Why does r2,A have such poor accuracy in comparison to r1,A?

62 / 101

Page 253: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

Then for both roots,

jrT � rAj � 0.0005

For the relative errors, however,

Rel (r1,A) =r1,T � r1,Ar1,T

� 0.0005

25.9605� 3.13� 10�5

Rel (r2,A) =r2,T � r2,Ar2,T

� 0.0005

0.0385� 0.0130

Why does r2,A have such poor accuracy in comparison to r1,A?

63 / 101

Page 254: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

The answer is due to the loss of signi�cance error involved in theformula for calculating r2,A.

Instead, use the mathematically equivalent formula

r2,A =1

13+p168

� 1

25.961

This results in a much more accurate answer, at the expense of anadditional division.

64 / 101

Page 255: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

The answer is due to the loss of signi�cance error involved in theformula for calculating r2,A.

Instead, use the mathematically equivalent formula

r2,A =1

13+p168

� 1

25.961

This results in a much more accurate answer, at the expense of anadditional division.

65 / 101

Page 256: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

The answer is due to the loss of signi�cance error involved in theformula for calculating r2,A.

Instead, use the mathematically equivalent formula

r2,A =1

13+p168

� 1

25.961

This results in a much more accurate answer, at the expense of anadditional division.

66 / 101

Page 257: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

Suppose we are evaluating a function f (x) in the machine.

Then the result is generally not f (x), but rather an approximate ofit which we denote by ef (x).Now suppose that we have a number xA � xT .

We want to calculate f (xT ), but instead we evaluate ef (xA).What can we say about the error in this latter computed quantity?

f (xT )� ef (xA)

67 / 101

Page 258: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

Suppose we are evaluating a function f (x) in the machine.

Then the result is generally not f (x), but rather an approximate ofit which we denote by ef (x).

Now suppose that we have a number xA � xT .

We want to calculate f (xT ), but instead we evaluate ef (xA).What can we say about the error in this latter computed quantity?

f (xT )� ef (xA)

68 / 101

Page 259: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

Suppose we are evaluating a function f (x) in the machine.

Then the result is generally not f (x), but rather an approximate ofit which we denote by ef (x).Now suppose that we have a number xA � xT .

We want to calculate f (xT ), but instead we evaluate ef (xA).What can we say about the error in this latter computed quantity?

f (xT )� ef (xA)

69 / 101

Page 260: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

Suppose we are evaluating a function f (x) in the machine.

Then the result is generally not f (x), but rather an approximate ofit which we denote by ef (x).Now suppose that we have a number xA � xT .

We want to calculate f (xT ), but instead we evaluate ef (xA).

What can we say about the error in this latter computed quantity?

f (xT )� ef (xA)

70 / 101

Page 261: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

Suppose we are evaluating a function f (x) in the machine.

Then the result is generally not f (x), but rather an approximate ofit which we denote by ef (x).Now suppose that we have a number xA � xT .

We want to calculate f (xT ), but instead we evaluate ef (xA).What can we say about the error in this latter computed quantity?

f (xT )� ef (xA)71 / 101

Page 262: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

f (xT )� ef (xA) = [f (xT )� f (xA)]� hf (xA)� ef (xA)i

The quantity f (xA)� ef (xA) is the \noise" in the evaluation off (xA) in the computer, and we will return later to some discussionof it.

The quantity f (xT )� f (xA) is called the propagated error. It isthe error that results from using perfect arithmetic in theevaluation of the function.

If the function f (x) is di�erentiable, then we can use the\mean-value theorem" to write

f (xT )� f (xA) = f 0(ξ)(xT � xA)

for some ξ between xT andxA.

72 / 101

Page 263: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

f (xT )� ef (xA) = [f (xT )� f (xA)]� hf (xA)� ef (xA)iThe quantity f (xA)� ef (xA) is the \noise" in the evaluation off (xA) in the computer, and we will return later to some discussionof it.

The quantity f (xT )� f (xA) is called the propagated error. It isthe error that results from using perfect arithmetic in theevaluation of the function.

If the function f (x) is di�erentiable, then we can use the\mean-value theorem" to write

f (xT )� f (xA) = f 0(ξ)(xT � xA)

for some ξ between xT andxA.

73 / 101

Page 264: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

f (xT )� ef (xA) = [f (xT )� f (xA)]� hf (xA)� ef (xA)iThe quantity f (xA)� ef (xA) is the \noise" in the evaluation off (xA) in the computer, and we will return later to some discussionof it.

The quantity f (xT )� f (xA) is called the propagated error. It isthe error that results from using perfect arithmetic in theevaluation of the function.

If the function f (x) is di�erentiable, then we can use the\mean-value theorem" to write

f (xT )� f (xA) = f 0(ξ)(xT � xA)

for some ξ between xT andxA.

74 / 101

Page 265: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

f (xT )� ef (xA) = [f (xT )� f (xA)]� hf (xA)� ef (xA)iThe quantity f (xA)� ef (xA) is the \noise" in the evaluation off (xA) in the computer, and we will return later to some discussionof it.

The quantity f (xT )� f (xA) is called the propagated error. It isthe error that results from using perfect arithmetic in theevaluation of the function.

If the function f (x) is di�erentiable, then we can use the\mean-value theorem" to write

f (xT )� f (xA) = f 0(ξ)(xT � xA)

for some ξ between xT andxA.75 / 101

Page 266: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

Since usually xT and xA are close together, we can say ξ is close toeither of them, and

f (xT )� f (xA) = f 0(ξ)(xT � xA)� f 0(xT )(xT � xA)� f 0(xA)(xT � xA)

76 / 101

Page 267: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

Since usually xT and xA are close together, we can say ξ is close toeither of them, and

f (xT )� f (xA) = f 0(ξ)(xT � xA)

� f 0(xT )(xT � xA)� f 0(xA)(xT � xA)

77 / 101

Page 268: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

Since usually xT and xA are close together, we can say ξ is close toeither of them, and

f (xT )� f (xA) = f 0(ξ)(xT � xA)� f 0(xT )(xT � xA)

� f 0(xA)(xT � xA)

78 / 101

Page 269: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

Since usually xT and xA are close together, we can say ξ is close toeither of them, and

f (xT )� f (xA) = f 0(ξ)(xT � xA)� f 0(xT )(xT � xA)� f 0(xA)(xT � xA)

79 / 101

Page 270: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsErrors in function evaluation

Since usually xT and xA are close together, we can say ξ is close toeither of them, and

f (xT )� f (xA) = f 0(ξ)(xT � xA)� f 0(xT )(xT � xA)� f 0(xA)(xT � xA)

80 / 101

Page 271: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

De�ne f (x) = bx , where b is a positive real number. Then lastformula yields

bxT � bxA � (ln b)bxT (xT � xA)

Rel (bxA) � (ln b)bxT (xT � xA)bxT

=(ln b)(xT � xA)xT

xT= xT ln b � Rel(xA)= K � Rel(xA)

Note that if K = 104 and Rel(xA) = 10�7, then Rel(bxA) � 10�3.

This is a large decrease in accuracy; and it is independent of howwe actually calculate bx . The number K is called a conditionnumber for the computation.

81 / 101

Page 272: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

De�ne f (x) = bx , where b is a positive real number. Then lastformula yields

bxT � bxA � (ln b)bxT (xT � xA)

Rel (bxA) � (ln b)bxT (xT � xA)bxT

=(ln b)(xT � xA)xT

xT= xT ln b � Rel(xA)= K � Rel(xA)

Note that if K = 104 and Rel(xA) = 10�7, then Rel(bxA) � 10�3.

This is a large decrease in accuracy; and it is independent of howwe actually calculate bx . The number K is called a conditionnumber for the computation.

82 / 101

Page 273: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

De�ne f (x) = bx , where b is a positive real number. Then lastformula yields

bxT � bxA � (ln b)bxT (xT � xA)

Rel (bxA) � (ln b)bxT (xT � xA)bxT

=(ln b)(xT � xA)xT

xT

= xT ln b � Rel(xA)= K � Rel(xA)

Note that if K = 104 and Rel(xA) = 10�7, then Rel(bxA) � 10�3.

This is a large decrease in accuracy; and it is independent of howwe actually calculate bx . The number K is called a conditionnumber for the computation.

83 / 101

Page 274: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

De�ne f (x) = bx , where b is a positive real number. Then lastformula yields

bxT � bxA � (ln b)bxT (xT � xA)

Rel (bxA) � (ln b)bxT (xT � xA)bxT

=(ln b)(xT � xA)xT

xT= xT ln b � Rel(xA)

= K � Rel(xA)

Note that if K = 104 and Rel(xA) = 10�7, then Rel(bxA) � 10�3.

This is a large decrease in accuracy; and it is independent of howwe actually calculate bx . The number K is called a conditionnumber for the computation.

84 / 101

Page 275: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

De�ne f (x) = bx , where b is a positive real number. Then lastformula yields

bxT � bxA � (ln b)bxT (xT � xA)

Rel (bxA) � (ln b)bxT (xT � xA)bxT

=(ln b)(xT � xA)xT

xT= xT ln b � Rel(xA)= K � Rel(xA)

Note that if K = 104 and Rel(xA) = 10�7, then Rel(bxA) � 10�3.

This is a large decrease in accuracy; and it is independent of howwe actually calculate bx . The number K is called a conditionnumber for the computation.

85 / 101

Page 276: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

De�ne f (x) = bx , where b is a positive real number. Then lastformula yields

bxT � bxA � (ln b)bxT (xT � xA)

Rel (bxA) � (ln b)bxT (xT � xA)bxT

=(ln b)(xT � xA)xT

xT= xT ln b � Rel(xA)= K � Rel(xA)

Note that if K = 104 and Rel(xA) = 10�7, then Rel(bxA) � 10�3.

This is a large decrease in accuracy; and it is independent of howwe actually calculate bx . The number K is called a conditionnumber for the computation.

86 / 101

Page 277: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

De�ne f (x) = bx , where b is a positive real number. Then lastformula yields

bxT � bxA � (ln b)bxT (xT � xA)

Rel (bxA) � (ln b)bxT (xT � xA)bxT

=(ln b)(xT � xA)xT

xT= xT ln b � Rel(xA)= K � Rel(xA)

Note that if K = 104 and Rel(xA) = 10�7, then Rel(bxA) � 10�3.

This is a large decrease in accuracy; and it is independent of howwe actually calculate bx . The number K is called a conditionnumber for the computation.

87 / 101

Page 278: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

De�ne f (x) = bx , where b is a positive real number. Then lastformula yields

bxT � bxA � (ln b)bxT (xT � xA)

Rel (bxA) � (ln b)bxT (xT � xA)bxT

=(ln b)(xT � xA)xT

xT= xT ln b � Rel(xA)= K � Rel(xA)

Note that if K = 104 and Rel(xA) = 10�7, then Rel(bxA) � 10�3.

This is a large decrease in accuracy; and it is independent of howwe actually calculate bx .

The number K is called a conditionnumber for the computation.

88 / 101

Page 279: Analiza Numerica [Utm, Bostan v.]

Propagation of errorsExample

De�ne f (x) = bx , where b is a positive real number. Then lastformula yields

bxT � bxA � (ln b)bxT (xT � xA)

Rel (bxA) � (ln b)bxT (xT � xA)bxT

=(ln b)(xT � xA)xT

xT= xT ln b � Rel(xA)= K � Rel(xA)

Note that if K = 104 and Rel(xA) = 10�7, then Rel(bxA) � 10�3.

This is a large decrease in accuracy; and it is independent of howwe actually calculate bx . The number K is called a conditionnumber for the computation.

89 / 101

Page 280: Analiza Numerica [Utm, Bostan v.]

Summation

Let S be a sum with a relatively large number of terms

S = a1 + a2 + . . . an (1)

where aj , j = 1, . . . , n, are oating point numbers.

Thesummation process consists of n� 1 consecutive additions

S = (((. . . (a1 + a2) + a3) + . . .+ an�1) + an,

De�ne

S2 = (a1 + a2)

S3 = (S2 + a3)

S4 = (S3 + a4)...

Sn = (Sn�1 + an)

Recall the formula (x) = x(1+ ε)

90 / 101

Page 281: Analiza Numerica [Utm, Bostan v.]

Summation

Let S be a sum with a relatively large number of terms

S = a1 + a2 + . . . an (1)

where aj , j = 1, . . . , n, are oating point numbers. Thesummation process consists of n� 1 consecutive additions

S = (((. . . (a1 + a2) + a3) + . . .+ an�1) + an,

De�ne

S2 = (a1 + a2)

S3 = (S2 + a3)

S4 = (S3 + a4)...

Sn = (Sn�1 + an)

Recall the formula (x) = x(1+ ε)

91 / 101

Page 282: Analiza Numerica [Utm, Bostan v.]

Summation

Let S be a sum with a relatively large number of terms

S = a1 + a2 + . . . an (1)

where aj , j = 1, . . . , n, are oating point numbers. Thesummation process consists of n� 1 consecutive additions

S = (((. . . (a1 + a2) + a3) + . . .+ an�1) + an,

De�ne

S2 = (a1 + a2)

S3 = (S2 + a3)

S4 = (S3 + a4)...

Sn = (Sn�1 + an)

Recall the formula (x) = x(1+ ε) 92 / 101

Page 283: Analiza Numerica [Utm, Bostan v.]

Summation

S2 = (a1 + a2)(1+ ε2)

S3 = (S2 + a3)(1+ ε3)

S4 = (S3 + a4)(1+ ε4)...

Sn = (Sn�1 + an)(1+ εn)

Then

S3 = (S2 + a3)(1+ ε3)

= ((a1 + a2)(1+ ε2) + a3)(1+ ε3)

� (a1 + a2 + a3) + a1(ε2 + ε3)

+a2(ε2 + ε3) + a3ε3,

93 / 101

Page 284: Analiza Numerica [Utm, Bostan v.]

Summation

S2 = (a1 + a2)(1+ ε2)

S3 = (S2 + a3)(1+ ε3)

S4 = (S3 + a4)(1+ ε4)...

Sn = (Sn�1 + an)(1+ εn)

Then

S3 = (S2 + a3)(1+ ε3)

= ((a1 + a2)(1+ ε2) + a3)(1+ ε3)

� (a1 + a2 + a3) + a1(ε2 + ε3)

+a2(ε2 + ε3) + a3ε3,

94 / 101

Page 285: Analiza Numerica [Utm, Bostan v.]

Summation

Similarly,

S4 � (a1 + a2 + a3 + a4) + a1(ε2 + ε3 + ε4)

+a2(ε2 + ε3 + ε4) + a3(ε3 + ε4) + a4ε4

Finally,

Sn � (a1 + a2 + . . .+ an) + a1(ε2 + . . .+ εn)

+a2(ε2 + . . .+ εn) + a3(ε3 + . . .+ εn)

+a4(ε4 + . . .+ εn) + . . .+ anεn

95 / 101

Page 286: Analiza Numerica [Utm, Bostan v.]

Summation

Similarly,

S4 � (a1 + a2 + a3 + a4) + a1(ε2 + ε3 + ε4)

+a2(ε2 + ε3 + ε4) + a3(ε3 + ε4) + a4ε4

Finally,

Sn � (a1 + a2 + . . .+ an) + a1(ε2 + . . .+ εn)

+a2(ε2 + . . .+ εn) + a3(ε3 + . . .+ εn)

+a4(ε4 + . . .+ εn) + . . .+ anεn

96 / 101

Page 287: Analiza Numerica [Utm, Bostan v.]

Summation

We are interested in the error S � Sn :

S � Sn � �a1(ε2 + . . .+ εn)� a2(ε2 + . . .+ εn)� a3(ε3 + . . .+ εn)

� a4(ε4 + . . .+ εn)� . . .� anεn

From the last relation we can establish the strategy for sumation inorder to minimize the error S � Sn: initially rearrange the termsinincreasing order

ja1j � ja2j � ja3j � . . . � janj

In this case smaller numbers a1 and a2 will be multiplied withlarger numbers ε2 + . . .+ εn, and larger number an will bemultiplied with smaller number εn.

97 / 101

Page 288: Analiza Numerica [Utm, Bostan v.]

Summation

We are interested in the error S � Sn :

S � Sn � �a1(ε2 + . . .+ εn)� a2(ε2 + . . .+ εn)� a3(ε3 + . . .+ εn)

� a4(ε4 + . . .+ εn)� . . .� anεn

From the last relation we can establish the strategy for sumation inorder to minimize the error S � Sn: initially rearrange the termsinincreasing order

ja1j � ja2j � ja3j � . . . � janj

In this case smaller numbers a1 and a2 will be multiplied withlarger numbers ε2 + . . .+ εn, and larger number an will bemultiplied with smaller number εn.

98 / 101

Page 289: Analiza Numerica [Utm, Bostan v.]

Summation

We are interested in the error S � Sn :

S � Sn � �a1(ε2 + . . .+ εn)� a2(ε2 + . . .+ εn)� a3(ε3 + . . .+ εn)

� a4(ε4 + . . .+ εn)� . . .� anεn

From the last relation we can establish the strategy for sumation inorder to minimize the error S � Sn: initially rearrange the termsinincreasing order

ja1j � ja2j � ja3j � . . . � janj

In this case smaller numbers a1 and a2 will be multiplied withlarger numbers ε2 + . . .+ εn, and larger number an will bemultiplied with smaller number εn.

99 / 101

Page 290: Analiza Numerica [Utm, Bostan v.]

Summation with chopping

Numberof terms, n

Exactvalue

SL Error LS Error

10 2.929 2.928 0.001 2.927 0.00225 3.816 3.813 0.003 3.806 0.01050 4.499 4.491 0.008 4.470 0.020100 5.187 5.170 0.017 5.142 0.045200 5.878 5.841 0.037 5.786 0.092500 6.793 6.692 0.101 6.569 0.2241000 7.486 7.284 0.202 7.069 0.417

100 / 101

Page 291: Analiza Numerica [Utm, Bostan v.]

Summation with rounding

Numberof terms, n

Exactvalue

SL Error LS Error

10 2.929 2.929 0 2.929 025 3.816 3.816 0 3.817 �0.00150 4.499 4.500 �0.001 4.498 0.001100 5.187 5.187 0 5.187 0200 5.878 5.878 0 5.876 0.002500 6.793 6.794 �0.001 6.783 0.0101000 7.486 7.486 0 7.449 0.037

101 / 101

Page 292: Analiza Numerica [Utm, Bostan v.]

Numerical Analysis

conf.dr. Bostan Viorel

Fall 2010 Lecture 6

1 / 94

Page 293: Analiza Numerica [Utm, Bostan v.]

Root�nding

We want to �nd the numbers x for which

f (x) = 0

with f : [a, b]! R a given real-valued function. Here, we denotesuch roots or zeroes by the Greek letter α. So

f (α) = 0

Root�nding problems occur in many contexts. Sometimes they area direct formulation of some physical situtation, but more often,they are an intermediate step in solving a much larger problem.

2 / 94

Page 294: Analiza Numerica [Utm, Bostan v.]

Root�nding

We want to �nd the numbers x for which

f (x) = 0

with f : [a, b]! R a given real-valued function. Here, we denotesuch roots or zeroes by the Greek letter α. So

f (α) = 0

Root�nding problems occur in many contexts. Sometimes they area direct formulation of some physical situtation, but more often,they are an intermediate step in solving a much larger problem.

3 / 94

Page 295: Analiza Numerica [Utm, Bostan v.]

Bisection method

Most methods for solving f (x) = 0 are iterative methods. Thismeans that such a method given an initail guess x0 will provide uswith a sequence of consecutively computed solutionsx1, x2, x3, . . . , xn, . . . such that xn ! α.

We begin with the simplest of such methods, one which mostpeople use at some time.

Suppose we are given a function f (x) and we assume we have aninterval [a, b] containing the root, on which the function iscontinuous.

We also assume we are given an error tolerance ε > 0, and wewant an approximate root eα 2 [a, b] for which

jα� eαj < ε

4 / 94

Page 296: Analiza Numerica [Utm, Bostan v.]

Bisection method

Most methods for solving f (x) = 0 are iterative methods. Thismeans that such a method given an initail guess x0 will provide uswith a sequence of consecutively computed solutionsx1, x2, x3, . . . , xn, . . . such that xn ! α.

We begin with the simplest of such methods, one which mostpeople use at some time.

Suppose we are given a function f (x) and we assume we have aninterval [a, b] containing the root, on which the function iscontinuous.

We also assume we are given an error tolerance ε > 0, and wewant an approximate root eα 2 [a, b] for which

jα� eαj < ε

5 / 94

Page 297: Analiza Numerica [Utm, Bostan v.]

Bisection method

Most methods for solving f (x) = 0 are iterative methods. Thismeans that such a method given an initail guess x0 will provide uswith a sequence of consecutively computed solutionsx1, x2, x3, . . . , xn, . . . such that xn ! α.

We begin with the simplest of such methods, one which mostpeople use at some time.

Suppose we are given a function f (x) and we assume we have aninterval [a, b] containing the root, on which the function iscontinuous.

We also assume we are given an error tolerance ε > 0, and wewant an approximate root eα 2 [a, b] for which

jα� eαj < ε

6 / 94

Page 298: Analiza Numerica [Utm, Bostan v.]

Bisection method

Most methods for solving f (x) = 0 are iterative methods. Thismeans that such a method given an initail guess x0 will provide uswith a sequence of consecutively computed solutionsx1, x2, x3, . . . , xn, . . . such that xn ! α.

We begin with the simplest of such methods, one which mostpeople use at some time.

Suppose we are given a function f (x) and we assume we have aninterval [a, b] containing the root, on which the function iscontinuous.

We also assume we are given an error tolerance ε > 0, and wewant an approximate root eα 2 [a, b] for which

jα� eαj < ε

7 / 94

Page 299: Analiza Numerica [Utm, Bostan v.]

Bisection method

Bisection method is based on the following theorem:

Theorem

If f : [a, b]! R is a continuous function on closed and boundedinterval [a, b] and

f (a) � f (b) < 0then there exists α 2 [a, b] such that f (α) = 0.

Therefore, further assume that the function f (x) changes sign on[a, b].

8 / 94

Page 300: Analiza Numerica [Utm, Bostan v.]

Bisection method

Bisection method is based on the following theorem:

Theorem

If f : [a, b]! R is a continuous function on closed and boundedinterval [a, b] and

f (a) � f (b) < 0then there exists α 2 [a, b] such that f (α) = 0.

Therefore, further assume that the function f (x) changes sign on[a, b].

9 / 94

Page 301: Analiza Numerica [Utm, Bostan v.]

Bisection method

Bisection method is based on the following theorem:

Theorem

If f : [a, b]! R is a continuous function on closed and boundedinterval [a, b] and

f (a) � f (b) < 0then there exists α 2 [a, b] such that f (α) = 0.

Therefore, further assume that the function f (x) changes sign on[a, b].

10 / 94

Page 302: Analiza Numerica [Utm, Bostan v.]

Bisection method

Bisection Algorithm: Bisect(f , a, b, ε)

Step 1: De�ne

c =a+ b

2

Step 2: If b� c � ε, accept c as our root, and then stop.Step 3: If b� c > ε, then compare the sign of f (c) to that off (a) and f (b). If

sign(f (a)) � sign(f (b)) � 0

then replace a with c ; and otherwise, replace b with c .Return to Step 1.

Note that we prefer checking the sign using conditionsign(f (a)) � sign(f (b)) � 0 instead of using sign(f (a) � f (b)) � 0.

11 / 94

Page 303: Analiza Numerica [Utm, Bostan v.]

Bisection method

Bisection Algorithm: Bisect(f , a, b, ε)

Step 1: De�ne

c =a+ b

2

Step 2: If b� c � ε, accept c as our root, and then stop.Step 3: If b� c > ε, then compare the sign of f (c) to that off (a) and f (b). If

sign(f (a)) � sign(f (b)) � 0

then replace a with c ; and otherwise, replace b with c .Return to Step 1.

Note that we prefer checking the sign using conditionsign(f (a)) � sign(f (b)) � 0 instead of using sign(f (a) � f (b)) � 0.

12 / 94

Page 304: Analiza Numerica [Utm, Bostan v.]

Bisection method

Bisection Algorithm: Bisect(f , a, b, ε)

Step 1: De�ne

c =a+ b

2

Step 2: If b� c � ε, accept c as our root, and then stop.

Step 3: If b� c > ε, then compare the sign of f (c) to that off (a) and f (b). If

sign(f (a)) � sign(f (b)) � 0

then replace a with c ; and otherwise, replace b with c .Return to Step 1.

Note that we prefer checking the sign using conditionsign(f (a)) � sign(f (b)) � 0 instead of using sign(f (a) � f (b)) � 0.

13 / 94

Page 305: Analiza Numerica [Utm, Bostan v.]

Bisection method

Bisection Algorithm: Bisect(f , a, b, ε)

Step 1: De�ne

c =a+ b

2

Step 2: If b� c � ε, accept c as our root, and then stop.Step 3: If b� c > ε, then compare the sign of f (c) to that off (a) and f (b). If

sign(f (a)) � sign(f (b)) � 0

then replace a with c ; and otherwise, replace b with c .

Return to Step 1.

Note that we prefer checking the sign using conditionsign(f (a)) � sign(f (b)) � 0 instead of using sign(f (a) � f (b)) � 0.

14 / 94

Page 306: Analiza Numerica [Utm, Bostan v.]

Bisection method

Bisection Algorithm: Bisect(f , a, b, ε)

Step 1: De�ne

c =a+ b

2

Step 2: If b� c � ε, accept c as our root, and then stop.Step 3: If b� c > ε, then compare the sign of f (c) to that off (a) and f (b). If

sign(f (a)) � sign(f (b)) � 0

then replace a with c ; and otherwise, replace b with c .Return to Step 1.

Note that we prefer checking the sign using conditionsign(f (a)) � sign(f (b)) � 0 instead of using sign(f (a) � f (b)) � 0.

15 / 94

Page 307: Analiza Numerica [Utm, Bostan v.]

Bisection method

Bisection Algorithm: Bisect(f , a, b, ε)

Step 1: De�ne

c =a+ b

2

Step 2: If b� c � ε, accept c as our root, and then stop.Step 3: If b� c > ε, then compare the sign of f (c) to that off (a) and f (b). If

sign(f (a)) � sign(f (b)) � 0

then replace a with c ; and otherwise, replace b with c .Return to Step 1.

Note that we prefer checking the sign using conditionsign(f (a)) � sign(f (b)) � 0 instead of using sign(f (a) � f (b)) � 0.

16 / 94

Page 308: Analiza Numerica [Utm, Bostan v.]

Bisection method

y

x

α

a1

b1=b

2

c1=a

2c

2

17 / 94

Page 309: Analiza Numerica [Utm, Bostan v.]

Bisection method

Example

Consider the function

f (x) = x6 � x � 1

We want to �nd the largest root with accuracy of ε = 0.001. It canbe seen form the graph of the function that the root is located in[1, 2] . Also, note that the function is continuous. Let a = 1 andb = 2, then f (a) = �1 and f (b) = 61, consequently the functionchanges its sign and thus all conditions are being satis�ed.

18 / 94

Page 310: Analiza Numerica [Utm, Bostan v.]

Bisection method

n an bn cn f (cn) bn � cn1 1.00000 2.00000 1.50000 8.891e + 00 5.000e � 012 1.00000 1.50000 1.25000 1.565e + 00 2.500e � 013 1.00000 1.25000 1.12500 �9.771e � 02 1.250e � 014 1.12500 1.25000 1.18750 6.167e � 01 6.250e � 025 1.12500 1.18750 1.15625 2.333e � 01 3.125e � 026 1.12500 1.15625 1.14063 6.158e � 02 1.563e � 027 1.12500 1.14063 1.13281 �1.958e � 02 7.813e � 038 1.13281 1.14063 1.13672 2.062e � 02 3.906e � 039 1.13281 1.13672 1.13477 4.268e � 04 1.953e � 0310 1.13281 1.13477 1.13379 �9.598e � 03 9.766e � 04

19 / 94

Page 311: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

Let an, bn and cn be the values provided by bisection method atiteration n. Evidently,

bn+1 � an+1 =1

2(bn � an)

bn � an =1

2(bn�1 � an�1)

=1

22(bn�2 � an�2)

= . . .

=1

2n�1(b� a)

Since either α 2 [an, cn] or α 2 [cn, bn] we have

jα� cnj � cn � an = bn � cn =1

2(bn � an) =

1

2n(b� a)

20 / 94

Page 312: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

Let an, bn and cn be the values provided by bisection method atiteration n. Evidently,

bn+1 � an+1 =1

2(bn � an)

bn � an =1

2(bn�1 � an�1)

=1

22(bn�2 � an�2)

= . . .

=1

2n�1(b� a)

Since either α 2 [an, cn] or α 2 [cn, bn] we have

jα� cnj � cn � an = bn � cn =1

2(bn � an) =

1

2n(b� a)

21 / 94

Page 313: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

Let an, bn and cn be the values provided by bisection method atiteration n. Evidently,

bn+1 � an+1 =1

2(bn � an)

bn � an =1

2(bn�1 � an�1)

=1

22(bn�2 � an�2)

= . . .

=1

2n�1(b� a)

Since either α 2 [an, cn] or α 2 [cn, bn] we have

jα� cnj � cn � an = bn � cn =1

2(bn � an) =

1

2n(b� a)

22 / 94

Page 314: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

Let an, bn and cn be the values provided by bisection method atiteration n. Evidently,

bn+1 � an+1 =1

2(bn � an)

bn � an =1

2(bn�1 � an�1)

=1

22(bn�2 � an�2)

= . . .

=1

2n�1(b� a)

Since either α 2 [an, cn] or α 2 [cn, bn] we have

jα� cnj � cn � an = bn � cn =1

2(bn � an) =

1

2n(b� a)

23 / 94

Page 315: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

Let an, bn and cn be the values provided by bisection method atiteration n. Evidently,

bn+1 � an+1 =1

2(bn � an)

bn � an =1

2(bn�1 � an�1)

=1

22(bn�2 � an�2)

= . . .

=1

2n�1(b� a)

Since either α 2 [an, cn] or α 2 [cn, bn] we have

jα� cnj � cn � an = bn � cn =1

2(bn � an) =

1

2n(b� a)

24 / 94

Page 316: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

Let an, bn and cn be the values provided by bisection method atiteration n. Evidently,

bn+1 � an+1 =1

2(bn � an)

bn � an =1

2(bn�1 � an�1)

=1

22(bn�2 � an�2)

= . . .

=1

2n�1(b� a)

Since either α 2 [an, cn] or α 2 [cn, bn] we have

jα� cnj � cn � an = bn � cn =1

2(bn � an)

=1

2n(b� a)

25 / 94

Page 317: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

Let an, bn and cn be the values provided by bisection method atiteration n. Evidently,

bn+1 � an+1 =1

2(bn � an)

bn � an =1

2(bn�1 � an�1)

=1

22(bn�2 � an�2)

= . . .

=1

2n�1(b� a)

Since either α 2 [an, cn] or α 2 [cn, bn] we have

jα� cnj � cn � an = bn � cn =1

2(bn � an) =

1

2n(b� a)

26 / 94

Page 318: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

Let an, bn and cn be the values provided by bisection method atiteration n. Evidently,

bn+1 � an+1 =1

2(bn � an)

bn � an =1

2(bn�1 � an�1)

=1

22(bn�2 � an�2)

= . . .

=1

2n�1(b� a)

Since either α 2 [an, cn] or α 2 [cn, bn] we have

jα� cnj � cn � an = bn � cn =1

2(bn � an) =

1

2n(b� a)

27 / 94

Page 319: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

jα� cnj �1

2n(b� a)

This relation provides us with a stopping criterion for bisectionmethod. Moreover, it follows that cn ! α as n! ∞.

Suppose we want to estimate the number of iterations in bisectionmethod necessary to �nd the root with an error tolerance ε,

jα� cnj � ε

1

2n(b� a) � ε

n �ln�b�a

ε

�ln 2

For previuos example we get

n �ln�

10.001

�ln 2

� 9.97

28 / 94

Page 320: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

jα� cnj �1

2n(b� a)

This relation provides us with a stopping criterion for bisectionmethod. Moreover, it follows that cn ! α as n! ∞.Suppose we want to estimate the number of iterations in bisectionmethod necessary to �nd the root with an error tolerance ε,

jα� cnj � ε

1

2n(b� a) � ε

n �ln�b�a

ε

�ln 2

For previuos example we get

n �ln�

10.001

�ln 2

� 9.97

29 / 94

Page 321: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

jα� cnj �1

2n(b� a)

This relation provides us with a stopping criterion for bisectionmethod. Moreover, it follows that cn ! α as n! ∞.Suppose we want to estimate the number of iterations in bisectionmethod necessary to �nd the root with an error tolerance ε,

jα� cnj � ε

1

2n(b� a) � ε

n �ln�b�a

ε

�ln 2

For previuos example we get

n �ln�

10.001

�ln 2

� 9.97

30 / 94

Page 322: Analiza Numerica [Utm, Bostan v.]

Error analysis for bisection method

jα� cnj �1

2n(b� a)

This relation provides us with a stopping criterion for bisectionmethod. Moreover, it follows that cn ! α as n! ∞.Suppose we want to estimate the number of iterations in bisectionmethod necessary to �nd the root with an error tolerance ε,

jα� cnj � ε

1

2n(b� a) � ε

n �ln�b�a

ε

�ln 2

For previuos example we get

n �ln�

10.001

�ln 2

� 9.9731 / 94

Page 323: Analiza Numerica [Utm, Bostan v.]

Advantages and Disadvantages of Bisection method

Advantages:

1 It always converges.

2 You have a guaranteed error bound, and it decreases witheach successive iteration.

3 You have a guaranteed rate of convergence. The error bounddecreases by 1/2 with each iteration.

Disadvantages:

1 It is relatively slow when compared with other root�ndingmethods we will study, especially when the function f (x) hasseveral continuous derivatives about the root α.

2 The algorithm has no check to see whether the ε is too smallfor the computer arithmetic being used.

We also assume the function f (x) is continuous on the giveninterval [a, b]; but there is no way for the computer to con�rm this.

32 / 94

Page 324: Analiza Numerica [Utm, Bostan v.]

Advantages and Disadvantages of Bisection method

Advantages:

1 It always converges.

2 You have a guaranteed error bound, and it decreases witheach successive iteration.

3 You have a guaranteed rate of convergence. The error bounddecreases by 1/2 with each iteration.

Disadvantages:

1 It is relatively slow when compared with other root�ndingmethods we will study, especially when the function f (x) hasseveral continuous derivatives about the root α.

2 The algorithm has no check to see whether the ε is too smallfor the computer arithmetic being used.

We also assume the function f (x) is continuous on the giveninterval [a, b]; but there is no way for the computer to con�rm this.

33 / 94

Page 325: Analiza Numerica [Utm, Bostan v.]

Advantages and Disadvantages of Bisection method

Advantages:

1 It always converges.

2 You have a guaranteed error bound, and it decreases witheach successive iteration.

3 You have a guaranteed rate of convergence. The error bounddecreases by 1/2 with each iteration.

Disadvantages:

1 It is relatively slow when compared with other root�ndingmethods we will study, especially when the function f (x) hasseveral continuous derivatives about the root α.

2 The algorithm has no check to see whether the ε is too smallfor the computer arithmetic being used.

We also assume the function f (x) is continuous on the giveninterval [a, b]; but there is no way for the computer to con�rm this.

34 / 94

Page 326: Analiza Numerica [Utm, Bostan v.]

Advantages and Disadvantages of Bisection method

Advantages:

1 It always converges.

2 You have a guaranteed error bound, and it decreases witheach successive iteration.

3 You have a guaranteed rate of convergence. The error bounddecreases by 1/2 with each iteration.

Disadvantages:

1 It is relatively slow when compared with other root�ndingmethods we will study, especially when the function f (x) hasseveral continuous derivatives about the root α.

2 The algorithm has no check to see whether the ε is too smallfor the computer arithmetic being used.

We also assume the function f (x) is continuous on the giveninterval [a, b]; but there is no way for the computer to con�rm this.

35 / 94

Page 327: Analiza Numerica [Utm, Bostan v.]

Advantages and Disadvantages of Bisection method

Advantages:

1 It always converges.

2 You have a guaranteed error bound, and it decreases witheach successive iteration.

3 You have a guaranteed rate of convergence. The error bounddecreases by 1/2 with each iteration.

Disadvantages:

1 It is relatively slow when compared with other root�ndingmethods we will study, especially when the function f (x) hasseveral continuous derivatives about the root α.

2 The algorithm has no check to see whether the ε is too smallfor the computer arithmetic being used.

We also assume the function f (x) is continuous on the giveninterval [a, b]; but there is no way for the computer to con�rm this.

36 / 94

Page 328: Analiza Numerica [Utm, Bostan v.]

Advantages and Disadvantages of Bisection method

Advantages:

1 It always converges.

2 You have a guaranteed error bound, and it decreases witheach successive iteration.

3 You have a guaranteed rate of convergence. The error bounddecreases by 1/2 with each iteration.

Disadvantages:

1 It is relatively slow when compared with other root�ndingmethods we will study, especially when the function f (x) hasseveral continuous derivatives about the root α.

2 The algorithm has no check to see whether the ε is too smallfor the computer arithmetic being used.

We also assume the function f (x) is continuous on the giveninterval [a, b]; but there is no way for the computer to con�rm this.

37 / 94

Page 329: Analiza Numerica [Utm, Bostan v.]

Advantages and Disadvantages of Bisection method

Advantages:

1 It always converges.

2 You have a guaranteed error bound, and it decreases witheach successive iteration.

3 You have a guaranteed rate of convergence. The error bounddecreases by 1/2 with each iteration.

Disadvantages:

1 It is relatively slow when compared with other root�ndingmethods we will study, especially when the function f (x) hasseveral continuous derivatives about the root α.

2 The algorithm has no check to see whether the ε is too smallfor the computer arithmetic being used.

We also assume the function f (x) is continuous on the giveninterval [a, b]; but there is no way for the computer to con�rm this.

38 / 94

Page 330: Analiza Numerica [Utm, Bostan v.]

Advantages and Disadvantages of Bisection method

Advantages:

1 It always converges.

2 You have a guaranteed error bound, and it decreases witheach successive iteration.

3 You have a guaranteed rate of convergence. The error bounddecreases by 1/2 with each iteration.

Disadvantages:

1 It is relatively slow when compared with other root�ndingmethods we will study, especially when the function f (x) hasseveral continuous derivatives about the root α.

2 The algorithm has no check to see whether the ε is too smallfor the computer arithmetic being used.

We also assume the function f (x) is continuous on the giveninterval [a, b]; but there is no way for the computer to con�rm this.

39 / 94

Page 331: Analiza Numerica [Utm, Bostan v.]

Advantages and Disadvantages of Bisection method

Advantages:

1 It always converges.

2 You have a guaranteed error bound, and it decreases witheach successive iteration.

3 You have a guaranteed rate of convergence. The error bounddecreases by 1/2 with each iteration.

Disadvantages:

1 It is relatively slow when compared with other root�ndingmethods we will study, especially when the function f (x) hasseveral continuous derivatives about the root α.

2 The algorithm has no check to see whether the ε is too smallfor the computer arithmetic being used.

We also assume the function f (x) is continuous on the giveninterval [a, b]; but there is no way for the computer to con�rm this.

40 / 94

Page 332: Analiza Numerica [Utm, Bostan v.]

Root�nding

We want to �nd the root α of a given function f (x).

Thus we wantto �nd the point x at which the graph of y = f (x) intersects thex-axis. One of the principles of numerical analysis is the following.

Numerical Analysis Principle

If you cannot solve the given problem, then solve a "nearbyproblem".

How do we obtain a nearby problem for f (x) = 0?Begin �rst by asking for types of problems which we can solveeasily. At the top of the list should be that of �nding where astraight line intersects the x-axis.Thus we seek to replace f (x) = 0 by that of solving p(x) = 0 forsome linear polynomial p(x) that approximates f (x) in the vicinityof the root α.

41 / 94

Page 333: Analiza Numerica [Utm, Bostan v.]

Root�nding

We want to �nd the root α of a given function f (x). Thus we wantto �nd the point x at which the graph of y = f (x) intersects thex-axis.

One of the principles of numerical analysis is the following.

Numerical Analysis Principle

If you cannot solve the given problem, then solve a "nearbyproblem".

How do we obtain a nearby problem for f (x) = 0?Begin �rst by asking for types of problems which we can solveeasily. At the top of the list should be that of �nding where astraight line intersects the x-axis.Thus we seek to replace f (x) = 0 by that of solving p(x) = 0 forsome linear polynomial p(x) that approximates f (x) in the vicinityof the root α.

42 / 94

Page 334: Analiza Numerica [Utm, Bostan v.]

Root�nding

We want to �nd the root α of a given function f (x). Thus we wantto �nd the point x at which the graph of y = f (x) intersects thex-axis. One of the principles of numerical analysis is the following.

Numerical Analysis Principle

If you cannot solve the given problem, then solve a "nearbyproblem".

How do we obtain a nearby problem for f (x) = 0?Begin �rst by asking for types of problems which we can solveeasily. At the top of the list should be that of �nding where astraight line intersects the x-axis.Thus we seek to replace f (x) = 0 by that of solving p(x) = 0 forsome linear polynomial p(x) that approximates f (x) in the vicinityof the root α.

43 / 94

Page 335: Analiza Numerica [Utm, Bostan v.]

Root�nding

We want to �nd the root α of a given function f (x). Thus we wantto �nd the point x at which the graph of y = f (x) intersects thex-axis. One of the principles of numerical analysis is the following.

Numerical Analysis Principle

If you cannot solve the given problem, then solve a "nearbyproblem".

How do we obtain a nearby problem for f (x) = 0?Begin �rst by asking for types of problems which we can solveeasily. At the top of the list should be that of �nding where astraight line intersects the x-axis.Thus we seek to replace f (x) = 0 by that of solving p(x) = 0 forsome linear polynomial p(x) that approximates f (x) in the vicinityof the root α.

44 / 94

Page 336: Analiza Numerica [Utm, Bostan v.]

Root�nding

We want to �nd the root α of a given function f (x). Thus we wantto �nd the point x at which the graph of y = f (x) intersects thex-axis. One of the principles of numerical analysis is the following.

Numerical Analysis Principle

If you cannot solve the given problem, then solve a "nearbyproblem".

How do we obtain a nearby problem for f (x) = 0?

Begin �rst by asking for types of problems which we can solveeasily. At the top of the list should be that of �nding where astraight line intersects the x-axis.Thus we seek to replace f (x) = 0 by that of solving p(x) = 0 forsome linear polynomial p(x) that approximates f (x) in the vicinityof the root α.

45 / 94

Page 337: Analiza Numerica [Utm, Bostan v.]

Root�nding

We want to �nd the root α of a given function f (x). Thus we wantto �nd the point x at which the graph of y = f (x) intersects thex-axis. One of the principles of numerical analysis is the following.

Numerical Analysis Principle

If you cannot solve the given problem, then solve a "nearbyproblem".

How do we obtain a nearby problem for f (x) = 0?Begin �rst by asking for types of problems which we can solveeasily. At the top of the list should be that of �nding where astraight line intersects the x-axis.

Thus we seek to replace f (x) = 0 by that of solving p(x) = 0 forsome linear polynomial p(x) that approximates f (x) in the vicinityof the root α.

46 / 94

Page 338: Analiza Numerica [Utm, Bostan v.]

Root�nding

We want to �nd the root α of a given function f (x). Thus we wantto �nd the point x at which the graph of y = f (x) intersects thex-axis. One of the principles of numerical analysis is the following.

Numerical Analysis Principle

If you cannot solve the given problem, then solve a "nearbyproblem".

How do we obtain a nearby problem for f (x) = 0?Begin �rst by asking for types of problems which we can solveeasily. At the top of the list should be that of �nding where astraight line intersects the x-axis.Thus we seek to replace f (x) = 0 by that of solving p(x) = 0 forsome linear polynomial p(x) that approximates f (x) in the vicinityof the root α.

47 / 94

Page 339: Analiza Numerica [Utm, Bostan v.]

Root�nding

y

x

α

(x0,f (x

0))

x0x

1

48 / 94

Page 340: Analiza Numerica [Utm, Bostan v.]

Newton's method

Let x0 be an initial guess, su�ciently closed to the root α.

Consider the tangent line to the graph of f (x) in (x0, f (x0)).Tangent intersects x-axis at x1, a closer point to α. Tangent hasequation

p1(x) = f (x0) + f0(x0)(x � x0)

Since p1(x1) = 0 we get

f (x0) + f0(x0)(x1 � x0) = 0

x1 = x0 �f (x0)

f 0(x0)

Similarly, we get x2,

x2 = x1 �f (x1)

f 0(x1)

49 / 94

Page 341: Analiza Numerica [Utm, Bostan v.]

Newton's method

Let x0 be an initial guess, su�ciently closed to the root α.Consider the tangent line to the graph of f (x) in (x0, f (x0)).

Tangent intersects x-axis at x1, a closer point to α. Tangent hasequation

p1(x) = f (x0) + f0(x0)(x � x0)

Since p1(x1) = 0 we get

f (x0) + f0(x0)(x1 � x0) = 0

x1 = x0 �f (x0)

f 0(x0)

Similarly, we get x2,

x2 = x1 �f (x1)

f 0(x1)

50 / 94

Page 342: Analiza Numerica [Utm, Bostan v.]

Newton's method

Let x0 be an initial guess, su�ciently closed to the root α.Consider the tangent line to the graph of f (x) in (x0, f (x0)).Tangent intersects x-axis at x1, a closer point to α. Tangent hasequation

p1(x) = f (x0) + f0(x0)(x � x0)

Since p1(x1) = 0 we get

f (x0) + f0(x0)(x1 � x0) = 0

x1 = x0 �f (x0)

f 0(x0)

Similarly, we get x2,

x2 = x1 �f (x1)

f 0(x1)

51 / 94

Page 343: Analiza Numerica [Utm, Bostan v.]

Newton's method

Let x0 be an initial guess, su�ciently closed to the root α.Consider the tangent line to the graph of f (x) in (x0, f (x0)).Tangent intersects x-axis at x1, a closer point to α. Tangent hasequation

p1(x) = f (x0) + f0(x0)(x � x0)

Since p1(x1) = 0 we get

f (x0) + f0(x0)(x1 � x0) = 0

x1 = x0 �f (x0)

f 0(x0)

Similarly, we get x2,

x2 = x1 �f (x1)

f 0(x1)

52 / 94

Page 344: Analiza Numerica [Utm, Bostan v.]

Newton's method

Let x0 be an initial guess, su�ciently closed to the root α.Consider the tangent line to the graph of f (x) in (x0, f (x0)).Tangent intersects x-axis at x1, a closer point to α. Tangent hasequation

p1(x) = f (x0) + f0(x0)(x � x0)

Since p1(x1) = 0 we get

f (x0) + f0(x0)(x1 � x0) = 0

x1 = x0 �f (x0)

f 0(x0)

Similarly, we get x2,

x2 = x1 �f (x1)

f 0(x1)

53 / 94

Page 345: Analiza Numerica [Utm, Bostan v.]

Newton's method

Let x0 be an initial guess, su�ciently closed to the root α.Consider the tangent line to the graph of f (x) in (x0, f (x0)).Tangent intersects x-axis at x1, a closer point to α. Tangent hasequation

p1(x) = f (x0) + f0(x0)(x � x0)

Since p1(x1) = 0 we get

f (x0) + f0(x0)(x1 � x0) = 0

x1 = x0 �f (x0)

f 0(x0)

Similarly, we get x2,

x2 = x1 �f (x1)

f 0(x1)

54 / 94

Page 346: Analiza Numerica [Utm, Bostan v.]

Newton's method

Repeat this process to obtaian the sequence x1, x2, x3, . . . thathopefully will converge to α.

General scheme for Newton's method consists in:

Starting with initial guess x0 compute iteratively

xn+1 = xn �f (xn)

f 0(xn), n = 0, 1, 2, . . .

55 / 94

Page 347: Analiza Numerica [Utm, Bostan v.]

Newton's method

Repeat this process to obtaian the sequence x1, x2, x3, . . . thathopefully will converge to α.

General scheme for Newton's method consists in:

Starting with initial guess x0 compute iteratively

xn+1 = xn �f (xn)

f 0(xn), n = 0, 1, 2, . . .

56 / 94

Page 348: Analiza Numerica [Utm, Bostan v.]

Newton's method

Repeat this process to obtaian the sequence x1, x2, x3, . . . thathopefully will converge to α.

General scheme for Newton's method consists in:

Starting with initial guess x0 compute iteratively

xn+1 = xn �f (xn)

f 0(xn), n = 0, 1, 2, . . .

57 / 94

Page 349: Analiza Numerica [Utm, Bostan v.]

Newton's method

Example

Apply Newton's method to

f (x) = x6 � x � 1,f 0(x) = 6x5 � 1

to get

xn+1 = xn �x6n � xn � 16x5n � 1

, n � 0

Use initial guess x0 = 1.5.

58 / 94

Page 350: Analiza Numerica [Utm, Bostan v.]

Newton's method

Example

Apply Newton's method to

f (x) = x6 � x � 1,f 0(x) = 6x5 � 1

to get

xn+1 = xn �x6n � xn � 16x5n � 1

, n � 0

Use initial guess x0 = 1.5.

59 / 94

Page 351: Analiza Numerica [Utm, Bostan v.]

Newton's method

Example

Apply Newton's method to

f (x) = x6 � x � 1,f 0(x) = 6x5 � 1

to get

xn+1 = xn �x6n � xn � 16x5n � 1

, n � 0

Use initial guess x0 = 1.5.

60 / 94

Page 352: Analiza Numerica [Utm, Bostan v.]

Newton's method

n xn f (xn) xn � xn�1 α� xn0 1.50000000 8.89e + 11 1.30049088 2.54e + 1 �2.00e � 1 �3.65e � 12 1.18148042 5.38e � 1 �1.19e � 1 �1.66e � 13 1.13945559 4.92e � 2 �4.20e � 2 �4.68e � 24 1.13477763 5.50e � 4 �4.68e � 3 �4.73e � 35 1.13472415 7.11e � 8 �5.35e � 5 �5.35e � 56 1.13472414 1.55e � 15 �6.91e � 9 �6.91e � 9

True solution is α = 1.134724138.

61 / 94

Page 353: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Here we consider a division algorithm (based on Newton's method)implemented in some computers in the past.

Say, we are interestedin computing a

b = a �1b , where

1b is computed using Newton's

method.

f (x) � b� 1

x= 0,

with b positive. The root of this equation is: α = 1b .

f 0(x) =1

x2

and Newton's method for this problem becomes

xn+1 = xn �b� 1

xn1x2n

Simplifyingxn+1 = xn(2� bxn), n � 0

62 / 94

Page 354: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Here we consider a division algorithm (based on Newton's method)implemented in some computers in the past. Say, we are interestedin computing a

b = a �1b , where

1b is computed using Newton's

method.

f (x) � b� 1

x= 0,

with b positive. The root of this equation is: α = 1b .

f 0(x) =1

x2

and Newton's method for this problem becomes

xn+1 = xn �b� 1

xn1x2n

Simplifyingxn+1 = xn(2� bxn), n � 0

63 / 94

Page 355: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Here we consider a division algorithm (based on Newton's method)implemented in some computers in the past. Say, we are interestedin computing a

b = a �1b , where

1b is computed using Newton's

method.

f (x) � b� 1

x= 0,

with b positive. The root of this equation is: α = 1b .

f 0(x) =1

x2

and Newton's method for this problem becomes

xn+1 = xn �b� 1

xn1x2n

Simplifyingxn+1 = xn(2� bxn), n � 0

64 / 94

Page 356: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Here we consider a division algorithm (based on Newton's method)implemented in some computers in the past. Say, we are interestedin computing a

b = a �1b , where

1b is computed using Newton's

method.

f (x) � b� 1

x= 0,

with b positive. The root of this equation is: α = 1b .

f 0(x) =1

x2

and Newton's method for this problem becomes

xn+1 = xn �b� 1

xn1x2n

Simplifyingxn+1 = xn(2� bxn), n � 0

65 / 94

Page 357: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Here we consider a division algorithm (based on Newton's method)implemented in some computers in the past. Say, we are interestedin computing a

b = a �1b , where

1b is computed using Newton's

method.

f (x) � b� 1

x= 0,

with b positive. The root of this equation is: α = 1b .

f 0(x) =1

x2

and Newton's method for this problem becomes

xn+1 = xn �b� 1

xn1x2n

Simplifyingxn+1 = xn(2� bxn), n � 0

66 / 94

Page 358: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Initial guess x0 must be close enough to the true solution and ofcourse x0 > 0. Consider the error

α� xn+1 =1

b� xn+1

=1� bxn+1

b

=1� bxn(2� bxn)

b

=(1� bxn)2

b

On the other hand

Rel(xn+1) =α� xn+1

α= 1� bxn+1

67 / 94

Page 359: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Initial guess x0 must be close enough to the true solution and ofcourse x0 > 0. Consider the error

α� xn+1 =1

b� xn+1

=1� bxn+1

b

=1� bxn(2� bxn)

b

=(1� bxn)2

b

On the other hand

Rel(xn+1) =α� xn+1

α= 1� bxn+1

68 / 94

Page 360: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Initial guess x0 must be close enough to the true solution and ofcourse x0 > 0. Consider the error

α� xn+1 =1

b� xn+1

=1� bxn+1

b

=1� bxn(2� bxn)

b

=(1� bxn)2

b

On the other hand

Rel(xn+1) =α� xn+1

α= 1� bxn+1

69 / 94

Page 361: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Initial guess x0 must be close enough to the true solution and ofcourse x0 > 0. Consider the error

α� xn+1 =1

b� xn+1

=1� bxn+1

b

=1� bxn(2� bxn)

b

=(1� bxn)2

b

On the other hand

Rel(xn+1) =α� xn+1

α= 1� bxn+1

70 / 94

Page 362: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Initial guess x0 must be close enough to the true solution and ofcourse x0 > 0. Consider the error

α� xn+1 =1

b� xn+1

=1� bxn+1

b

=1� bxn(2� bxn)

b

=(1� bxn)2

b

On the other hand

Rel(xn+1) =α� xn+1

α= 1� bxn+1

71 / 94

Page 363: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Initial guess x0 must be close enough to the true solution and ofcourse x0 > 0. Consider the error

α� xn+1 =1

b� xn+1

=1� bxn+1

b

=1� bxn(2� bxn)

b

=(1� bxn)2

b

On the other hand

Rel(xn+1) =α� xn+1

α

= 1� bxn+1

72 / 94

Page 364: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Initial guess x0 must be close enough to the true solution and ofcourse x0 > 0. Consider the error

α� xn+1 =1

b� xn+1

=1� bxn+1

b

=1� bxn(2� bxn)

b

=(1� bxn)2

b

On the other hand

Rel(xn+1) =α� xn+1

α= 1� bxn+1

73 / 94

Page 365: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

Initial guess x0 must be close enough to the true solution and ofcourse x0 > 0. Consider the error

α� xn+1 =1

b� xn+1

=1� bxn+1

b

=1� bxn(2� bxn)

b

=(1� bxn)2

b

On the other hand

Rel(xn+1) =α� xn+1

α= 1� bxn+1

74 / 94

Page 366: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

It can be shown (try it!) that

Rel(xn+1) = (Rel(xn))2

In order to guarantee convergence xn ! α,

jRel(x0)j < 1

or

0 < x0 <2

b

For example, suppose that jRel(x0)j = 0.1. Then

Rel(x1) = 10�2, Rel(x2) = 10�4

Rel(x3) = 10�8, Rel(x4) = 10�16

75 / 94

Page 367: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

It can be shown (try it!) that

Rel(xn+1) = (Rel(xn))2

In order to guarantee convergence xn ! α,

jRel(x0)j < 1

or

0 < x0 <2

b

For example, suppose that jRel(x0)j = 0.1. Then

Rel(x1) = 10�2, Rel(x2) = 10�4

Rel(x3) = 10�8, Rel(x4) = 10�16

76 / 94

Page 368: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

It can be shown (try it!) that

Rel(xn+1) = (Rel(xn))2

In order to guarantee convergence xn ! α,

jRel(x0)j < 1

or

0 < x0 <2

b

For example, suppose that jRel(x0)j = 0.1. Then

Rel(x1) = 10�2, Rel(x2) = 10�4

Rel(x3) = 10�8, Rel(x4) = 10�16

77 / 94

Page 369: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

It can be shown (try it!) that

Rel(xn+1) = (Rel(xn))2

In order to guarantee convergence xn ! α,

jRel(x0)j < 1

or

0 < x0 <2

b

For example, suppose that jRel(x0)j = 0.1. Then

Rel(x1) = 10�2, Rel(x2) = 10�4

Rel(x3) = 10�8, Rel(x4) = 10�16

78 / 94

Page 370: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

It can be shown (try it!) that

Rel(xn+1) = (Rel(xn))2

In order to guarantee convergence xn ! α,

jRel(x0)j < 1

or

0 < x0 <2

b

For example, suppose that jRel(x0)j = 0.1. Then

Rel(x1) = 10�2, Rel(x2) = 10�4

Rel(x3) = 10�8, Rel(x4) = 10�16

79 / 94

Page 371: Analiza Numerica [Utm, Bostan v.]

Newton's method. Division example

y

x

y=b­1/x

1/b

(x0,f(x

0))

x0 x1

2/b

b

80 / 94

Page 372: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

Let f (x) 2 C 2[a, b] and α 2 [a, b]. Also let f 0(α) 6= 0.

ConsiderTaylor formula for f (x) about xn

f (x) = f (xn) + (x � xn)f 0(xn) +(x � xn)2

2f 00(ξn),

where ξn is between x and xn. Take x = α to get

f (α) = f (xn) + (α� xn)f 0(xn) +(α� xn)2

2f 00(ξn),

with ξn between α and xn. Since f (α) = 0 we have

0 =f (xn)

f 0(xn)+ (α� xn) + (α� xn)2

f 00(ξn)

2f 0(xn)

α� xn+1 = (α� xn)2��f 00(ξn)2f 0(xn)

81 / 94

Page 373: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

Let f (x) 2 C 2[a, b] and α 2 [a, b]. Also let f 0(α) 6= 0. ConsiderTaylor formula for f (x) about xn

f (x) = f (xn) + (x � xn)f 0(xn) +(x � xn)2

2f 00(ξn),

where ξn is between x and xn.

Take x = α to get

f (α) = f (xn) + (α� xn)f 0(xn) +(α� xn)2

2f 00(ξn),

with ξn between α and xn. Since f (α) = 0 we have

0 =f (xn)

f 0(xn)+ (α� xn) + (α� xn)2

f 00(ξn)

2f 0(xn)

α� xn+1 = (α� xn)2��f 00(ξn)2f 0(xn)

82 / 94

Page 374: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

Let f (x) 2 C 2[a, b] and α 2 [a, b]. Also let f 0(α) 6= 0. ConsiderTaylor formula for f (x) about xn

f (x) = f (xn) + (x � xn)f 0(xn) +(x � xn)2

2f 00(ξn),

where ξn is between x and xn. Take x = α to get

f (α) = f (xn) + (α� xn)f 0(xn) +(α� xn)2

2f 00(ξn),

with ξn between α and xn.

Since f (α) = 0 we have

0 =f (xn)

f 0(xn)+ (α� xn) + (α� xn)2

f 00(ξn)

2f 0(xn)

α� xn+1 = (α� xn)2��f 00(ξn)2f 0(xn)

83 / 94

Page 375: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

Let f (x) 2 C 2[a, b] and α 2 [a, b]. Also let f 0(α) 6= 0. ConsiderTaylor formula for f (x) about xn

f (x) = f (xn) + (x � xn)f 0(xn) +(x � xn)2

2f 00(ξn),

where ξn is between x and xn. Take x = α to get

f (α) = f (xn) + (α� xn)f 0(xn) +(α� xn)2

2f 00(ξn),

with ξn between α and xn. Since f (α) = 0 we have

0 =f (xn)

f 0(xn)+ (α� xn) + (α� xn)2

f 00(ξn)

2f 0(xn)

α� xn+1 = (α� xn)2��f 00(ξn)2f 0(xn)

84 / 94

Page 376: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

Let f (x) 2 C 2[a, b] and α 2 [a, b]. Also let f 0(α) 6= 0. ConsiderTaylor formula for f (x) about xn

f (x) = f (xn) + (x � xn)f 0(xn) +(x � xn)2

2f 00(ξn),

where ξn is between x and xn. Take x = α to get

f (α) = f (xn) + (α� xn)f 0(xn) +(α� xn)2

2f 00(ξn),

with ξn between α and xn. Since f (α) = 0 we have

0 =f (xn)

f 0(xn)+ (α� xn) + (α� xn)2

f 00(ξn)

2f 0(xn)

α� xn+1 = (α� xn)2��f 00(ξn)2f 0(xn)

�85 / 94

Page 377: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

For previous example, f 00(x) = 30x4.We have

�f 00(ξn)2f 0(xn)

� �f 00(α)2f 0(α)

=�30α4

2(6α5 � 1) � �2.42

Thereforeα� xn+1 � �2.42(α� xn)2

For example if n = 3, we get α� x3 � �4.73e � 03 and

α� x4 � �2.42(α� x3)2 � �5.42e � 05,

a result in accordance with the result presented in the table:α� x4 � �5.35e � 05.

86 / 94

Page 378: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

For previous example, f 00(x) = 30x4.We have

�f 00(ξn)2f 0(xn)

� �f 00(α)2f 0(α)

=�30α4

2(6α5 � 1) � �2.42

Thereforeα� xn+1 � �2.42(α� xn)2

For example if n = 3, we get α� x3 � �4.73e � 03 and

α� x4 � �2.42(α� x3)2 � �5.42e � 05,

a result in accordance with the result presented in the table:α� x4 � �5.35e � 05.

87 / 94

Page 379: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

For previous example, f 00(x) = 30x4.We have

�f 00(ξn)2f 0(xn)

� �f 00(α)2f 0(α)

=�30α4

2(6α5 � 1) � �2.42

Thereforeα� xn+1 � �2.42(α� xn)2

For example if n = 3, we get α� x3 � �4.73e � 03 and

α� x4 � �2.42(α� x3)2 � �5.42e � 05,

a result in accordance with the result presented in the table:α� x4 � �5.35e � 05.

88 / 94

Page 380: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

For previous example, f 00(x) = 30x4.We have

�f 00(ξn)2f 0(xn)

� �f 00(α)2f 0(α)

=�30α4

2(6α5 � 1) � �2.42

Thereforeα� xn+1 � �2.42(α� xn)2

For example if n = 3, we get α� x3 � �4.73e � 03 and

α� x4 � �2.42(α� x3)2 � �5.42e � 05,

a result in accordance with the result presented in the table:α� x4 � �5.35e � 05.

89 / 94

Page 381: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

If iteration xn is close to α we have

�f 00(ξn)2f 0(xn)

� �f 00(α)2f 0(α)

� M

α� xn+1 � M(α� xn)2

M(α� xn+1) � (M(α� xn))2

Inductively

M(α� xn+1) � (M(α� x0))2n

, n � 0In other words, in order to guarantee the convergence of Newton'smethod we should have

jM(α� x0)j < 1

jα� x0j <1

jM j =����2f 0(α)f 00(α)

����

90 / 94

Page 382: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

If iteration xn is close to α we have

�f 00(ξn)2f 0(xn)

� �f 00(α)2f 0(α)

� M

α� xn+1 � M(α� xn)2

M(α� xn+1) � (M(α� xn))2

Inductively

M(α� xn+1) � (M(α� x0))2n

, n � 0In other words, in order to guarantee the convergence of Newton'smethod we should have

jM(α� x0)j < 1

jα� x0j <1

jM j =����2f 0(α)f 00(α)

����

91 / 94

Page 383: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

If iteration xn is close to α we have

�f 00(ξn)2f 0(xn)

� �f 00(α)2f 0(α)

� M

α� xn+1 � M(α� xn)2

M(α� xn+1) � (M(α� xn))2

Inductively

M(α� xn+1) � (M(α� x0))2n

, n � 0In other words, in order to guarantee the convergence of Newton'smethod we should have

jM(α� x0)j < 1

jα� x0j <1

jM j =����2f 0(α)f 00(α)

����

92 / 94

Page 384: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

If iteration xn is close to α we have

�f 00(ξn)2f 0(xn)

� �f 00(α)2f 0(α)

� M

α� xn+1 � M(α� xn)2

M(α� xn+1) � (M(α� xn))2

Inductively

M(α� xn+1) � (M(α� x0))2n

, n � 0

In other words, in order to guarantee the convergence of Newton'smethod we should have

jM(α� x0)j < 1

jα� x0j <1

jM j =����2f 0(α)f 00(α)

����

93 / 94

Page 385: Analiza Numerica [Utm, Bostan v.]

Error analysis for Newton's method

If iteration xn is close to α we have

�f 00(ξn)2f 0(xn)

� �f 00(α)2f 0(α)

� M

α� xn+1 � M(α� xn)2

M(α� xn+1) � (M(α� xn))2

Inductively

M(α� xn+1) � (M(α� x0))2n

, n � 0In other words, in order to guarantee the convergence of Newton'smethod we should have

jM(α� x0)j < 1

jα� x0j <1

jM j =����2f 0(α)f 00(α)

����94 / 94

Page 386: Analiza Numerica [Utm, Bostan v.]

For xn close to α, and therefore cn also close to α,

we have

α− xn+1 ≈ −f 00(α)2f 0(α)

(α− xn)2

Thus Newton’s method is quadratically convergent,

provided f 0(α) 6= 0 and f(x) is twice differentiable inthe vicinity of the root α.

We can also use this to explore the ‘interval of con-

vergence’ of Newton’s method. Write the above as

α− xn+1 ≈M (α− xn)2 , M = − f 00(α)

2f 0(α)Multiply both sides by M to get

M (α− xn+1) ≈ [M (α− xn)]2

Page 387: Analiza Numerica [Utm, Bostan v.]

M (α− xn+1) ≈ [M (α− xn)]2

Then we want these quantities to decrease; and this

suggests choosing x0 so that

|M (α− x0)| < 1

|α− x0| <1

|M | =¯̄̄̄¯2f 0(α)f 00(α)

¯̄̄̄¯

If |M | is very large, then we may need to have a verygood initial guess in order to have the iterates xn

converge to α.

Page 388: Analiza Numerica [Utm, Bostan v.]

ADVANTAGES & DISADVANTAGES

Advantages: 1. It is rapidly convergent in most cases.

2. It is simple in its formulation, and therefore rela-

tively easy to apply and program.

3. It is intuitive in its construction. This means it is

easier to understand its behaviour, when it is likely to

behave well and when it may behave poorly.

Disadvantages: 1. It may not converge.

2. It is likely to have difficulty if f 0(α) = 0. This

condition means the x-axis is tangent to the graph of

y = f(x) at x = α.

3. It needs to know both f(x) and f 0(x). Contrastthis with the bisection method which requires only

f(x).

Page 389: Analiza Numerica [Utm, Bostan v.]

THE SECANT METHOD

Newton’s method was based on using the line tangent

to the curve of y = f(x), with the point of tangency

(x0, f(x0)). When x0 ≈ α, the graph of the tangent

line is approximately the same as the graph of y =

f(x) around x = α. We then used the root of the

tangent line to approximate α.

Consider using an approximating line based on ‘inter-

polation’. We assume we have two estimates of theroot α, say x0 and x1. Then we produce a linear

function

q(x) = a0 + a1x

with

q(x0) = f(x0), q(x1) = f(x1) (*)

This line is sometimes called a secant line. Its equa-

tion is given by

q(x) =(x1 − x) f(x0) + (x− x0) f(x1)

x1 − x0

Page 390: Analiza Numerica [Utm, Bostan v.]

(x0,f(x0))

(x1,f(x1))

x2x0

x1α

x

yy=f(x)

(x0,f(x0))

(x1,f(x1))

x2 x0x1

αx

yy=f(x)

Page 391: Analiza Numerica [Utm, Bostan v.]

q(x) =(x1 − x) f(x0) + (x− x0) f(x1)

x1 − x0

This is linear in x; and by direction evaluation, it satis-

fies the interpolation conditions of (*). We now solve

the equation q(x) = 0, denoting the root by x2. This

yields

x2 = x1 − f(x1)÷f(x1)− f(x0)

x1 − x0

We can now repeat the process. Use x1 and x2 to

produce another secant line, and then uses its root

to approximate α. This yields the general iteration

formula

xn+1 = xn−f(xn)÷f(xn)− f(xn−1)xn − xn−1

, n = 1, 2, 3...

This is called the secant method for solving f(x) = 0.

Page 392: Analiza Numerica [Utm, Bostan v.]

Example We solve the equation

f(x) ≡ x6 − x− 1 = 0which was used previously as an example for both the

bisection and Newton methods. The quantity xn −xn−1 is used as an estimate of α−xn−1. The iteratex8 equals α rounded to nine significant digits. As with

Newton’s method for this equation, the initial iterates

do not converge rapidly. But as the iterates become

closer to α, the speed of convergence increases.

n xn f(xn) xn − xn−1 α− xn−10 2.0 61.0

1 1.0 −1.0 −1.02 1.01612903 −9.15E− 1 1.61E− 2 1.35E− 13 1.19057777 6.57E− 1 1.74E− 1 1.19E− 14 1.11765583 −1.68E− 1 −7.29E− 2 −5.59E− 25 1.13253155 −2.24E− 2 1.49E− 2 1.71E− 26 1.13481681 9.54E− 4 2.29E− 3 2.19E− 37 1.13472365 −5.07E− 6 −9.32E− 5 −9.27E− 58 1.13472414 −1.13E− 9 4.92E− 7 4.92E− 7

Page 393: Analiza Numerica [Utm, Bostan v.]

It is clear from the numerical results that the se-

cant method requires more iterates than the New-

ton method. But note that the secant method does

not require a knowledge of f 0(x), whereas Newton’smethod requires both f(x) and f 0(x).

Note also that the secant method can be considered

an approximation of the Newton method

xn+1 = xn − f(xn)

f 0(xn)by using the approximation

f 0(xn) ≈ f(xn)− f(xn−1)xn − xn−1

Page 394: Analiza Numerica [Utm, Bostan v.]

CONVERGENCE ANALYSIS

With a combination of algebraic manipulation and the

mean-value theorem from calculus, we can show

α− xn+1 = (α− xn) (α− xn−1)"−f 00(ξn)2f 0(ζn)

#, (**)

with ξn and ζn unknown points. The point ξn is lo-

cated between the minimum and maximum of xn−1, xn,and α; and ζn is located between the minimum and

maximum of xn−1 and xn. Recall for Newton’s methodthat the Newton iterates satisfied

α− xn+1 = (α− xn)2

"−f 00(ξn)2f 0(xn)

#which closely resembles (**) above.

Page 395: Analiza Numerica [Utm, Bostan v.]

Using (**), it can be shown that xn converges to α,

and moreover,

limn→∞

|α− xn+1||α− xn|r =

¯̄̄̄¯ f 00(α)2f 0(α)

¯̄̄̄¯r−1

≡ c

where 12 (1 + sqrt(5)).= 1.62. This assumes that x0

and x1 are chosen sufficiently close to α; and how

close this is will vary with the function f . In addition,

the above result assumes f(x) has two continuous

derivatives for all x in some interval about α.

The above says that when we are close to α, that

|α− xn+1| ≈ c |α− xn|r

This looks very much like the Newton result

α− xn+1 ≈M (α− xn)2 , M =

−f 00(α)2f 0(α)

and c = |M |r−1. Both the secant and Newton meth-ods converge at faster than a linear rate, and they are

called superlinear methods.

Page 396: Analiza Numerica [Utm, Bostan v.]

The secant method converge slower than Newton’s

method; but it is still quite rapid. It is rapid enough

that we can prove

limn→∞

|xn+1 − xn||α− xn| = 1

and therefore,

|α− xn| ≈ |xn+1 − xn|is a good error estimator.

A note of warning: Do not combine the secant for-

mula and write it in the form

xn+1 =f(xn)xn−1 − f(xn−1)xn

f(xn)− f(xn−1)This has enormous loss of significance errors as com-

pared with the earlier formulation.

Page 397: Analiza Numerica [Utm, Bostan v.]

COSTS OF SECANT & NEWTON METHODS

The Newton method

xn+1 = xn − f(xn)

f 0(xn), n = 0, 1, 2, ...

requires two function evaluations per iteration, that

of f(xn) and f 0(xn). The secant method

xn+1 = xn−f(xn)÷f(xn)− f(xn−1)xn − xn−1

, n = 1, 2, 3...

requires 1 function evaluation per iteration, following

the initial step.

For this reason, the secant method is often faster in

time, even though more iterates are needed with it

than with Newton’s method to attain a similar accu-

racy.

Page 398: Analiza Numerica [Utm, Bostan v.]

ADVANTAGES & DISADVANTAGES

Advantages of secant method: 1. It converges at

faster than a linear rate, so that it is more rapidly

convergent than the bisection method.

2. It does not require use of the derivative of the

function, something that is not available in a number

of applications.

3. It requires only one function evaluation per iter-

ation, as compared with Newton’s method which re-

quires two.

Disadvantages of secant method:

1. It may not converge.

2. There is no guaranteed error bound for the com-

puted iterates.

3. It is likely to have difficulty if f 0(α) = 0. This

means the x-axis is tangent to the graph of y = f(x)

at x = α.

4. Newton’s method generalizes more easily to new

methods for solving simultaneous systems of nonlinear

equations.

Page 399: Analiza Numerica [Utm, Bostan v.]

BRENT’S METHOD

Richard Brent devised a method combining the advan-

tages of the bisection method and the secant method.

1. It is guaranteed to converge.

2. It has an error bound which will converge to zero

in practice.

3. For most problems f(x) = 0, with f(x) differen-

tiable about the root α, the method behaves like the

secant method.

4. In the worst case, it is not too much worse in its

convergence than the bisection method.

In Matlab, it is implemented as fzero; and it is present

in most Fortran numerical analysis libraries.

Page 400: Analiza Numerica [Utm, Bostan v.]

FIXED POINT ITERATION

We begin with a computational example. Consider

solving the two equations

E1: x = 1 + .5 sinxE2: x = 3 + 2 sinx

Graphs of these two equations are shown on accom-

panying graphs, with the solutions being

E1: α = 1.49870113351785E2: α = 3.09438341304928

We are going to use a numerical scheme called ‘fixed

point iteration’. It amounts to making an initial guess

of x0 and substituting this into the right side of the

equation. The resulting value is denoted by x1; and

then the process is repeated, this time substituting x1into the right side. This is repeated until convergence

occurs or until the iteration is terminated.

In the above cases, we show the results of the first 10

iterations in the accompanying table. Clearly conver-

gence is occurring with E1, but not with E2. Why?

Page 401: Analiza Numerica [Utm, Bostan v.]

x

y

y = x

y = 1 + .5sin x

α

x

y

y = x

y = 3 + 2sin x

α

Page 402: Analiza Numerica [Utm, Bostan v.]

E1: x = 1 + .5 sinxE2: x = 3 + 2 sinx

E1 E2n xn xn0 0.00000000000000 3.000000000000001 1.00000000000000 3.282240016119732 1.42073549240395 2.719631771815563 1.49438099256432 3.819100254885144 1.49854088439917 1.746293896516525 1.49869535552190 4.969279572147626 1.49870092540704 1.065630652992167 1.49870112602244 4.750188616394658 1.49870113324789 1.001428642365169 1.49870113350813 4.6844840491609710 1.49870113351750 1.00077863465869

Page 403: Analiza Numerica [Utm, Bostan v.]

The above iterations can be written symbolically as

E1 : xn+1 = 1 + 0:5 sinxn

E2 : xn+1 = 3 + 2 sinxn

for n = 0; 1; 2; : : : Why does one of these iterationsconverge, but not the other? The graphs show similarbehaviour, so why the di¤erence? Consider one moreexample:

Suppose we are solving the equation

x2 � 5 = 0

with exact root � =p5 � 2:2361 using iterates of the

form

xn+1 = g(xn):

Page 404: Analiza Numerica [Utm, Bostan v.]

Consider four di¤erent iterations

I1 : xn+1 = 5 + xn � x2n;

I2 : xn+1 =5

xn;

I3 : xn+1 = 1 + xn �1

5x2n;

I4 : xn+1 =1

2

�xn +

5

xn

�:

All of them, in case they are convergent will convergeto � =

p5 (just take the limit as n ! 1 of each

relation).

I1 I2 I3 I4n xn xn xn xn0 1:0e+ 00 1:0 1:0 1:01 5:0000e+ 00 5:0 1:8000 3:00002 �1:5000e+ 01 1:0 2:1520 2:33333 �2:3500e+ 02 5:0 2:2258 2:23814 �5:5455e+ 04 1:0 2:2350 2:23615 �3:0753e+ 09 5:0 2:2360 2:23616 �9:4575e+ 18 1:0 2:2361 2:23617 �8:9445e+ 37 5:0 2:2361 2:23618 �8:0004e+ 75 1:0 2:2361 2:2361

Page 405: Analiza Numerica [Utm, Bostan v.]

As another example, note that the Newton method

xn+1 = xn �f(xn)

f 0(xn)

is also a �xed point iteration, for the equation

x = x� f(x)

f 0(x)

In general, we are interested in solving equations

x = g(x)

by means of �xed point iteration:

xn+1 = g(xn); n = 0; 1; 2; : : :

It is called ��xed point iteration�because the root � isa �xed point of the function g(x), meaning that � is anumber for which

g(�) = �

Page 406: Analiza Numerica [Utm, Bostan v.]

EXISTENCE THEOREM

We begin by asking whether the equation

x = g(x)

has a solution. For this to occur, the graphs of y =x and y = g(x) must intersect, as seen on the earliergraphs. There are several lemmas and theorems that giveconditions under which we are guaranteed there is a �xedpoint �.

Lemma 1 Let g(x) be a continuous function on the in-terval [a; b], and suppose it satis�es the property

a � x � b ) a � g(x) � bThen the equation x = g(x) has at least one solution �in the interval [a; b].

The proof of this is fairly intuitive. Look at the functionf(x) = x � g(x), a � x � b. Evaluating at the end-points, f(a) � 0; f(b) � 0. The function f(x) iscontinuous on [a; b]; and therefore it contains a zero inthe interval.

Page 407: Analiza Numerica [Utm, Bostan v.]

Theorem: Assume g(x) and g0(x) exist and are con-tinuous on the interval [a, b]; and further, assume

a ≤ x ≤ b ⇒ a ≤ g(x) ≤ b

λ ≡ maxa≤x≤b

¯̄̄g0(x)

¯̄̄< 1

Then:

S1. The equation x = g(x) has a unique solution α

in [a, b].

S2. For any initial guess x0 in [a, b], the iteration

xn+1 = g(xn), n = 0, 1, 2, ...

will converge to α.

S3.

|α− xn| ≤ λn

1− λ|x1 − x0| , n ≥ 0

S4.

limn→∞

α− xn+1α− xn

= g0(α)

Thus for xn close to α,

α− xn+1 ≈ g0(α) (α− xn)

Page 408: Analiza Numerica [Utm, Bostan v.]

The proof is given in the text, and I go over only a

portion of it here. For S2, note that from (#), if x0is in [a, b], then

x1 = g(x0)

is also in [a, b]. Repeat the argument to show that

x2 = g(x1)

belongs to [a, b]. This can be continued by induction

to show that every xn belongs to [a, b].

We need the following general result. For any two

points w and z in [a, b],

g(w)− g(z) = g0(c) (w − z)

for some unknown point c between w and z. There-

fore,

|g(w)− g(z)| ≤ λ |w − z|for any a ≤ w, z ≤ b.

Page 409: Analiza Numerica [Utm, Bostan v.]

For S3, subtract xn+1 = g(xn) from α = g(α) to get

α− xn+1 = g(α)− g(xn)

= g0(cn) (α− xn) ($)

|α− xn+1| ≤ λ |α− xn| (*)

with cn between α and xn. From (*), we have that

the error is guaranteed to decrease by a factor of λ

with each iteration. This leads to

|α− xn| ≤ λn |α− xn| , n ≥ 0With some extra manipulation, we can obtain the error

bound in S3.

For S4, use ($) to write

α− xn+1α− xn

= g0(cn)

Since xn → α and cn is between α and xn, we have

g0(cn)→ g0(α).

Page 410: Analiza Numerica [Utm, Bostan v.]

The statement

α− xn+1 ≈ g0(α) (α− xn)

tells us that when near to the root α, the errors will

decrease by a constant factor of g0(α). If this is nega-tive, then the errors will oscillate between positive and

negative, and the iterates will be approaching from

both sides. When g0(α) is positive, the iterates willapproach α from only one side.

The statements

α− xn+1 = g0(cn) (α− xn)

α− xn+1 ≈ g0(α) (α− xn)

also tell us a bit more of what happens when¯̄̄g0(α)

¯̄̄> 1

Then the errors will increase as we approach the root

rather than decrease in size.

Page 411: Analiza Numerica [Utm, Bostan v.]

Look at the earlier examples

E1: x = 1 + .5 sinxE2: x = 3 + 2 sinx

In the first case E1,

g(x) = 1 + .5 sinxg0(x) = .5 cosx¯̄g0(α

¯̄ ≤ 12

Therefore the fixed point iteration

xn+1 = 1 + .5 sinxn

will converge for E1.

For the second case E2,

g(x) = 3 + 2 sinxg0(x) = 2 cosxg0(α) = 2 cos (3.09438341304928)

.= −1.998

Therefore the fixed point iteration

xn+1 = 3 + 2 sinxn

will diverge for E2.

Page 412: Analiza Numerica [Utm, Bostan v.]

Consider example x2 � 5 = 0

(I1) g(x) = 5 + x� x2; g0(x) = 1� 2x; g0(�) =1 � 2

p5 < �1: Thus, xn = g(xn�1) do not con-

verge top5:

(I2) g(x) =5x; g0(x) = � 5

x2; g0(�) = �1: There-

fore, xn = g(xn�1) can be either convergent ordivergent, but numerical results show it divergent.

(I3) g(x) = 1+x� 15x2; g0(x) = 1� 2

5x; g0(�) =1 � 2

5

p5 � 0:106: Thus, xn = g(xn�1) converge

top5: Moreover, we have

j�� xn+1j � 0:106 j�� xnj ;

if xn is su¢ ciently close to �: The errors are de-creasing with a liniar rate of 0:106.

Page 413: Analiza Numerica [Utm, Bostan v.]

(I4) g(x) =12

�x+ 5

x

�; g0(x) = 1

2

�1� 5

x2

�; g0(�) =

0:Sequence xn = g(xn�1) will converge top5;with

an order of convergence bigger than 1:

Sometimes it is di¢ cult to express equation f(x) = 0 inthe form x = g(x); such that the resulting iterates willconverge. Such a process is presented in the followingexamples.

Example 1 Let x4 � x� 1 = 0; rewritten as

x = 4p1 + x;

which will prov�de us with iterations

x0 = 1; xn+1 =4p1 + xn; n � 1

This sequence will converge to � � 1:2207:

Example 2 Let x3 + x� 1 = 0; rewritten as

x =1

1 + x2

Page 414: Analiza Numerica [Utm, Bostan v.]

and its �xed point iterations

x0 = 1; xn+1 =1

1 + x2n; n � 1

that will converge to � � 0:6823: Iterations are repre-sented graphically in the following �gure

0 x

y

y=g(x)

α=0.6823 x0x

2x

1 x3

y=x

Page 415: Analiza Numerica [Utm, Bostan v.]

x0 x

1 x2

α

y

xO

y =x

y =g(x)

0 < g0(�) < 1

x

y

O

x0

x1x

2x

y =x

y =g(x)

�1 < g0(�) < 0

x

y

Oαx

0x

1x

2

y =x

y =g(x)

g0(�) > 1

y

xO

y =x

y =g(x)

α x0x

1 x2

g0(�) < �1

Page 416: Analiza Numerica [Utm, Bostan v.]

Besides the convergence we would like to know how fast isthe sequence xn = g(xn�1) converging to the solution,in other words how fast the error � � xn is decreasing.We will say that sequence fxng1n=0 converges to � withorder of convergence p � 1; if

j�� xn+1j � c j�� xnjp ; n � 0;

where c � 0 is a constant. Cases p = 1, p = 2 and p =3 are called linear, quadratic and cubic convergencies. Incase of linear convergence, constant c is called the rateof linear convergence liniare and we require additionallythat c < 1; otherwise sequence of errors �� xn can failto converge to zero. Also, for linear convergence wer canuse the relation,

j�� xn+1j � cn j�� x0j ; n � 0:

Thus bisection method is linearly convergent with rate 12;Newton�s method is quadratically convergent, and secantmethod has order of convergence p = 1+

p5

2 :

Page 417: Analiza Numerica [Utm, Bostan v.]

If��g0(�)�� < 1, from the last theorem we have that iter-

ations xn are at least linearly convergent. If in addition,g0(�) 6= 0; then we have exactly linear convergence withrate g0(�): In practice, the last theorem is rarely usedsince.it is quite di¢ cult to �nd an interval [a; b] such thatg ([a; b]) � [a; b] : To simplify the usage of the Theoremwe consider the following corollary.

Page 418: Analiza Numerica [Utm, Bostan v.]

Corollary: Assume x = g(x) has a solution α, and

further assume that both g(x) and g0(x) are contin-uous for all x in some interval about α. In addition,

assume ¯̄̄g0(α)

¯̄̄< 1 (**)

Then any sufficiently small number ε > 0, the interval

[a, b] = [α − ε, α + ε] will satisfy the hypotheses of

the preceding theorem.

This means that if (**) is true, and if we choose x0sufficiently close to α, then the fixed point iteration

xn+1 = g(xn) will converge and the earlier results

S1-S4 will all hold. The corollary does not tell us how

close we need to be to α in order to have convergence.

Page 419: Analiza Numerica [Utm, Bostan v.]

NEWTON’S METHOD

For Newton’s method

xn+1 = xn − f(xn)

f 0(xn)we have it is a fixed point iteration with

g(x) = x− f(x)

f 0(x)Check its convergence by checking the condition (**).

g0(x) = 1− f 0(x)f 0(x)

+f(x)f 00(x)[f 0(x)]2

=f(x)f 00(x)[f 0(x)]2

g0(α) = 0

Therefore the Newton method will converge if x0 is

chosen sufficiently close to α.

Page 420: Analiza Numerica [Utm, Bostan v.]

HIGHER ORDER METHODS

What happens when g0(α) = 0? We use Taylor’s

theorem to answer this question.

Begin by writing

g(x) = g(α) + g0(α) (x− α) +1

2g00(c) (x− α)2

with c between x and α. Substitute x = xn and

recall that g(xn) = xn+1 and g(α) = α. Also assume

g0(α) = 0.Then

xn+1 = α+ 12g00(cn) (xn − α)2

α− xn+1 = −12g00(cn) (xn − α)2

with cn between α and xn. Thus if g0(α) = 0, the

fixed point iteration is quadratically convergent or bet-

ter. In fact, if g00(α) 6= 0, then the iteration is exactlyquadratically convergent.

Page 421: Analiza Numerica [Utm, Bostan v.]

ANOTHER RAPID ITERATION

Newton’s method is rapid, but requires use of the

derivative f 0(x). Can we get by without this. Theanswer is yes! Consider the method

Dn =f(xn + f(xn))− f(xn)

f(xn)

xn+1 = xn − f(xn)

Dn

This is an approximation to Newton’s method, with

f 0(xn) ≈ Dn. To analyze its convergence, regard it

as a fixed point iteration with

D(x) =f(x+ f(x))− f(x)

f(x)

g(x) = x− f(x)

D(x)

Then we can, with some difficulty, show g0(α) = 0

and g00(α) 6= 0. This will prove this new iteration is

quadratically convergent.

Page 422: Analiza Numerica [Utm, Bostan v.]

FIXED POINT INTERATION: ERROR

Recall the result

limn→∞

α− xn

α− xn−1= g0(α)

for the iteration

xn = g(xn−1), n = 1, 2, ...

Thus

α− xn ≈ λ (α− xn−1) (***)

with λ = g0(α) and |λ| < 1.

If we were to know λ, then we could solve (***) for

α:

α ≈ xn − λxn−11− λ

Page 423: Analiza Numerica [Utm, Bostan v.]

Usually, we write this as a modification of the cur-

rently computed iterate xn:

α ≈ xn − λxn−11− λ

=xn − λxn

1− λ+λxn − λxn−1

1− λ

= xn +λ

1− λ[xn − xn−1]

The formula

xn +λ

1− λ[xn − xn−1]

is said to be an extrapolation of the numbers xn−1and xn. But what is λ?

From

limn→∞

α− xn

α− xn−1= g0(α)

we have

λ ≈ α− xn

α− xn−1

Page 424: Analiza Numerica [Utm, Bostan v.]

Unfortunately this also involves the unknown root α

which we seek; and we must find some other way of

estimating λ.

To calculate λ consider the ratio

λn =xn − xn−1xn−1 − xn−2

To see this is approximately λ as xn approaches α,

write

xn − xn−1xn−1 − xn−2

=g(xn−1)− g(xn−2)

xn−1 − xn−2= g0(cn)

with cn between xn−1 and xn−2. As the iterates ap-proach α, the number cn must also approach α. Thus

λn approaches λ as xn→ α.

Page 425: Analiza Numerica [Utm, Bostan v.]

We combine these results to obtain the estimation

bxn = xn+λn

1− λn[xn − xn−1] , λn =

xn − xn−1xn−1 − xn−2

We call bxn the Aitken extrapolate of {xn−2, xn−1, xn};and α ≈ bxn.We can also rewrite this as

α− xn ≈ bxn − xn =λn

1− λn[xn − xn−1]

This is called Aitken’s error estimation formula.

The accuracy of these procedures is tied directly to

the accuracy of the formulas

α−xn ≈ λ (α− xn−1) , α−xn−1 ≈ λ (α− xn−2)

If this is accurate, then so are the above extrapolation

and error estimation formulas.

Page 426: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Consider the iteration

xn+1 = 6.28 + sin(xn), n = 0, 1, 2, ...

for solving

x = 6.28 + sinx

Iterates are shown on the accompanying sheet, includ-

ing calculations of λn, the error estimate

α−xn ≈ bxn−xn =λn

1− λn[xn − xn−1] (Estimate)

The latter is called “Estimate” in the table. In this

instance,

g0(α) .= .9644

and therefore the convergence is very slow. This is

apparent in the table.

Page 427: Analiza Numerica [Utm, Bostan v.]
Page 428: Analiza Numerica [Utm, Bostan v.]

AITKEN’S ALGORITHM

Step 1: Select x0Step 2: Calculate

x1 = g(x0), x2 = g(x1)

Step3: Calculate

x3 = x2 +λ2

1− λ2[x2 − x1] , λ2 =

x2 − x1x1 − x0

Step 4: Calculate

x4 = g(x3), x5 = g(x4)

and calculate x6 as the extrapolate of {x3, x4, x5}.Continue this procedure, ad infinatum.

Of course in practice we will have some kind of er-

ror test to stop this procedure when believe we have

sufficient accuracy.

Page 429: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Consider again the iteration

xn+1 = 6.28 + sin(xn), n = 0, 1, 2, ...

for solving

x = 6.28 + sinx

Now we use the Aitken method, and the results are

shown in the accompanying table. With this we have

α− x3 = 7.98× 10−4, α− x6 = 2.27× 10−6

In comparison, the original iteration had

α− x6 = 1.23× 10−2

Page 430: Analiza Numerica [Utm, Bostan v.]

GENERAL COMMENTS

Aitken extrapolation can greatly accelerate the con-

vergence of a linearly convergent iteration

xn+1 = g(xn)

This shows the power of understanding the behaviour

of the error in a numerical process. From that un-

derstanding, we can often improve the accuracy, thru

extrapolation or some other procedure.

This is a justification for using mathematical analyses

to understand numerical methods. We will see this

repeated at later points in the course, and it holds

with many different types of problems and numerical

methods for their solution.

Page 431: Analiza Numerica [Utm, Bostan v.]

MULTIPLE ROOTS

We study two classes of functions for which there is

additional difficulty in calculating their roots. The first

of these are functions in which the desired root has a

multiplicity greater than 1. What does this mean?

Let α be a root of the function f(x), and imagine

writing it in the factored form

f(x) = (x− α)mh(x)

with some integer m ≥ 1 and some continuous func-tion h(x) for which h(α) 6= 0. Then we say that α

is a root of f(x) of multiplicity m. For example, the

function

f(x) = ex2 − 1

has x = 0 as a root of multiplicity m = 2. In partic-

ular, define

h(x) =ex2 − 1x2

for x 6= 0.

Page 432: Analiza Numerica [Utm, Bostan v.]

Using Taylor polynomial approximations, we can show

for x 6= 0 thath(x) ≈ 1 + 1

2x2 + 1

6x4

limx→0h(x) = 1

This leads us to extend the definition of h(x) to

h(x) =ex2 − 1x2

, x 6= 0h(0) = 1

Thus

f(x) = x2h(x)

as asserted and x = 0 is a root of f(x) of multiplicity

m = 2.

Roots for which m = 1 are called simple roots, and

the methods studied to this point were intended for

such roots. We now consider the case of m > 1.

Page 433: Analiza Numerica [Utm, Bostan v.]

If the function f(x) is m-times differentiable around

α, then we can differentiate

f(x) = (x− α)mh(x)

m times to obtain an equivalent formulation of what

it means for the root to have multiplicity m.

For an example, consider the case

f(x) = (x− α)3 h(x)

Then

f 0(x) = 3 (x− α)2 h(x) + (x− α)3 h0(x)≡ (x− α)2 h2(x)

h2(x) = 3h(x) + (x− α)h0(x)h2(α) = 3h(α) 6= 0

This shows α is a root of f 0(x) of multiplicity 2.

Differentiating a second time, we can show

f 00(x) = (x− α)h3(x)

for a suitably defined h3(x) with h3(α) 6= 0, and α isa simple root of f 00(x).

Page 434: Analiza Numerica [Utm, Bostan v.]

Differentiating a third time, we have

f 000(α) = h3(α) 6= 0We can use this as part of a proof of the following: α

is a root of f(x) of multiplicity m = 3 if and only if

f(α) = f 0(α) = f 00(α) = 0, f 000(α) 6= 0

In general, α is a root of f(x) of multiplicity m if and

only if

f(α) = · · · = f (m−1)(α) = 0, f (m)(α) 6= 0

Page 435: Analiza Numerica [Utm, Bostan v.]

DIFFICULTIES OF MULTIPLE ROOTS

There are two main difficulties with the numerical cal-

culation of multiple roots (by which we mean m > 1

in the definition).

1. Methods such as Newton’s method and the se-

cant method converge more slowly than for the

case of a simple root.

2. There is a large interval of uncertainty in the pre-

cise location of a multiple root on a computer or

calculator.

The second of these is the more difficult to deal with,

but we begin with the first for the case of Newton’s

method.

Page 436: Analiza Numerica [Utm, Bostan v.]

Recall that we can regard Newton’s method as a fixed

point method:

xn+1 = g(xn), g(x) = x− f(x)

f 0(x)Then we substitute

f(x) = (x− α)mh(x)

to obtain

g(x) = x− (x− α)mh(x)

m (x− α)m−1 h(x) + (x− α)mh0(x)

= x− (x− α)h(x)

mh(x) + (x− α)h0(x)Then we can use this to show

g0(α) = 1− 1

m=

m− 1m

For m > 1, this is nonzero, and therefore Newton’s

method is only linearly convergent:

α− xn+1 ≈ λ (α− xn) , λ =m− 1m

Similar results hold for the secant method.

Page 437: Analiza Numerica [Utm, Bostan v.]

There are ways of improving the speed of convergence

of Newton’s method, creating a modified method that

is again quadratically convergent. In particular, con-

sider the fixed point iteration formula

xn+1 = g(xn), g(x) = x−mf(x)

f 0(x)in which we assume to know the multiplicity m of

the root α being sought. Then modifying the above

argument on the convergence of Newton’s method,

we obtain

g0(α) = 1−m · 1m= 0

and the iteration method will be quadratically conver-

gent.

But this is not the fundamental problem posed by

multiple roots.

Page 438: Analiza Numerica [Utm, Bostan v.]

NOISE IN FUNCTION EVALUATION

Recall the discussion of noise in evaluating a function

f(x), and in our case consider the evaluation for val-

ues of x near to α. In the following figures, the noise

as measured by vertical distance is the same in both

graphs.

x

y

simple root

x

y

double root

Page 439: Analiza Numerica [Utm, Bostan v.]

Noise was discussed earlier and as example we used func-tion

f(x) = x3 � 3x2 + 3x� 1 � (x� 1)3

Because of the noise in evaluating f(x), it appears fromthe graph that f(x) has many zeros around x = 1,whereas the exact function outside of the computer hasonly the root � = 1; of multiplicity 3. Any root�ndingmethod to �nd a multiple root � that uses evaluation off(x) is doomed to having a large interval of uncertaintyas to the location of the root. If high accuracy is desired,then the only satisfactory solution is to reformulate theproblem as a new problem F (x) = 0 in which � is a sim-ple root of F . Then use a standard root�nding methodto calculate �. It is important that the evaluation ofF (x) not involve f(x) directly, as that is the source ofthe noise and the uncertainly.

Page 440: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Consider �nding the roots of

f(x) = (x� 1:1)3(x� 2:1)= 2:7951� 8:954x+ 10:56x2 � 5:4x3 + x4

This has a root at 1.1

n xn f(xn) �� xn Rate0 0:800000 0:03510 0:3000001 0:892857 0:01073 0:207143 0:6902 0:958176 0:00325 0:141824 0:6853 1:00344 0:00099 0:09656 0:6814 1:03486 0:00029 0:06514 0:6755 1:05581 0:00009 0:04419 0:6786 1:07028 0:00003 0:02972 0:6737 1:08092 0:0 0:01908 0:642

From an examination of the rate of linear convergence ofNewton�s method applied to this function, one can guesswith high probability that the multiplicity ism = 3. Thenform exactly the second derivative

f 00(x) = 21:12� 32:4x+ 12x2

Page 441: Analiza Numerica [Utm, Bostan v.]

Applying Newton�s method to this with a guess of x = 1will lead to rapid convergence to � = 1:1.

In general, if we know the root � has multiplicitym > 1,then replace the problem by that of solving

f (m�1)(x) = 0

since � is a simple root of this equation.

Page 442: Analiza Numerica [Utm, Bostan v.]

STABILITY

Generally we expect the world to be stable. By this,we mean that if we make a small change in something,then we expect to have this lead to other correspond-ingly small changes. In fact, if we think about thiscarefully, then we know this need not be true. Wenow illustrate this for the case of rootfinding.

Consider the polynomial

f(x) = x7 − 28x6 + 322x5 − 1960x4+6769x3 − 13132x2 + 13068x− 5040

This has the exact roots {1, 2, 3, 4, 5, 6, 7}. Now con-sider the perturbed polynomial

F (x) = x7 − 28.002x6 + 322x5 − 1960x4+6769x3 − 13132x2 + 13068x− 5040

This is a relatively small change in one coefficient, ofrelative error

−.002−28 = 7.14× 10−5

What are the roots of F (x)?

Page 443: Analiza Numerica [Utm, Bostan v.]

Root of Root of Errorf(x) F (x)1 1.0000028 −2.8E − 62 1.9989382 1.1E − 33 3.0331253 −0.0334 3.8195692 0.1805 5.4586758 + .54012578i −.46− .54i6 5.4586758− .54012578i −.46 + .54i7 7.2330128 −0.233

Why have some of the roots departed so radically from

the original values? This phenomena goes under a

variety of names. We sometimes say this is an example

of an unstable or ill-conditioned rootfinding problem.

These words are often used in a casual manner, but

they also have a very precise meaning in many areas

of numerical analysis (and more generally, in all of

mathematics).

Page 444: Analiza Numerica [Utm, Bostan v.]

A PERTURBATION ANALYSIS

We want to study what happens to the root of a func-

tion f(x) when it is perturbed by a small amount. For

some function g(x) and for all small ε, define a per-

turbed function

Fε(x) = f(x) + εg(x)

The polynomial example would fit this if we use

g(x) = x6, ε = −.002Let α0 be a simple root of f(x). It can be shown (us-

ing the implicit differentiation theorem from calculus)

that if f(x) and g(x) are differentiable for x ≈ α0,

and if f 0(α0) 6= 0, then Fε(x) has a unique simple

root α(ε) near to α0 = α(0) for all small values of ε.

Moreover, α(ε) will be a differentiable function of ε.

We use this to estimate α(ε).

Page 445: Analiza Numerica [Utm, Bostan v.]

The linear Taylor polynomial approximation of α(ε) is

given by

α(ε) ≈ α(0) + εα0(0)

We need to find a formula for α0(0). Recall that

Fε(α(ε)) = 0

for all small values of ε. Differentiate this as a function

of ε and using the chain rule. Then we obtain

F 0ε(α(ε)) = f 0(α(ε))α0(ε)+g(α(ε)) + ε g0(α(ε))α0(ε) = 0

for all small ε. Substitute ε = 0, recall α(0) = α0,

and solve for α0(0) to obtain

f 0(α0)α0(0) + g(α0) = 0

α0(0) = − g(α0)

f 0(α0)This then leads to

α(ε) ≈ α(0) + εα0(0)

= α0 − εg(α0)

f 0(α0)(*)

Page 446: Analiza Numerica [Utm, Bostan v.]

Example: In our earlier polynomial example, consider

the simple root α0 = 3. Then

α(ε) ≈ 3− ε36

48

.= 3− 15.2ε

With ε = −.002, we obtainα(−.002) ≈ 3− 15.2(−.002) .

= 3.0304

This is close to the actual root of 3.0331253.

However, the approximation (*) is not good at esti-

mating the change in the roots 5 and 6. By ob-

servation, the perturbation in the root is a complex

number, whereas the formula (*) predicts only a per-

turbation that is real. The value of ε is too large to

have (*) be accurate for the roots 5 and 6.

Page 447: Analiza Numerica [Utm, Bostan v.]

DISCUSSION

Looking again at the formula

α(ε) ≈ α0 − εg(α0)

f 0(α0)we have that the size of

εg(α0)

f 0(α0)is an indication of the stability of the solution α0.

If this quantity is large, then potentially we will have

difficulty. Of course, not all functions g(x) are equally

possible, and we need to look only at functions g(x)

that will possibly occur in practice.

One quantity of interest is the size of f 0(α0). If itis very small relative to εg(α0), then we are likely to

have difficulty in finding α0 accurately.

Page 448: Analiza Numerica [Utm, Bostan v.]

INTERPOLATION

Interpolation is a process of finding a formula (oftena polynomial) whose graph will pass through a givenset of points (x, y).

As an example, consider defining

x0 = 0, x1 =π

4, x2 =

π

2and

yi = cosxi, i = 0, 1, 2

This gives us the three points

(0, 1) ,µπ4 ,

1sqrt(2)

¶,

³π2 , 0

´Now find a quadratic polynomial

p(x) = a0 + a1x+ a2x2

for which

p(xi) = yi, i = 0, 1, 2

The graph of this polynomial is shown on the accom-panying graph. We later give an explicit formula.

Page 449: Analiza Numerica [Utm, Bostan v.]

Quadratic interpolation of cos(x)

x

y

π/4 π/2

y = cos(x)y = p2(x)

Page 450: Analiza Numerica [Utm, Bostan v.]

PURPOSES OF INTERPOLATION

1. Replace a set of data points {(xi, yi)} with a func-tion given analytically.

2. Approximate functions with simpler ones, usually

polynomials or ‘piecewise polynomials’.

Purpose #1 has several aspects.

• The data may be from a known class of functions.Interpolation is then used to find the member of

this class of functions that agrees with the given

data. For example, data may be generated from

functions of the form

p(x) = a0 + a1ex + a2e

2x + · · ·+ anenx

Then we need to find the coefficientsnajobased

on the given data values.

Page 451: Analiza Numerica [Utm, Bostan v.]

• We may want to take function values f(x) givenin a table for selected values of x, often equally

spaced, and extend the function to values of x

not in the table.

For example, given numbers from a table of loga-

rithms, estimate the logarithm of a number x not

in the table.

• Given a set of data points {(xi, yi)}, find a curvepassing thru these points that is “pleasing to the

eye”. In fact, this is what is done continually with

computer graphics. How do we connect a set of

points to make a smooth curve? Connecting them

with straight line segments will often give a curve

with many corners, whereas what was intended

was a smooth curve.

Page 452: Analiza Numerica [Utm, Bostan v.]

Purpose #2 for interpolation is to approximate func-

tions f(x) by simpler functions p(x), perhaps to make

it easier to integrate or differentiate f(x). That will

be the primary reason for studying interpolation in this

course.

As as example of why this is important, consider the

problem of evaluating

I =Z 10

dx

1 + x10

This is very difficult to do analytically. But we will

look at producing polynomial interpolants of the inte-

grand; and polynomials are easily integrated exactly.

We begin by using polynomials as our means of doing

interpolation. Later in the chapter, we consider more

complex ‘piecewise polynomial’ functions, often called

‘spline functions’.

Page 453: Analiza Numerica [Utm, Bostan v.]

LINEAR INTERPOLATION

The simplest form of interpolation is probably thestraight line, connecting two points by a straight line.

Let two data points (x0, y0) and (x1, y1) be given.

There is a unique straight line passing through these

points. We can write the formula for a straight lineas

P1(x) = a0 + a1x

In fact, there are other more convenient ways to write

it, and we give several of them below.

P1(x) =x− x1x0 − x1

y0 +x− x0x1 − x0

y1

=(x1 − x) y0 + (x− x0) y1

x1 − x0

= y0 +x− x0x1 − x0

[y1 − y0]

= y0 +

Ãy1 − y0x1 − x0

!(x− x0)

Check each of these by evaluating them at x = x0and x1 to see if the respective values are y0 and y1.

Page 454: Analiza Numerica [Utm, Bostan v.]

Example. Following is a table of values for f(x) =tanx for a few values of x.

x 1 1.1 1.2 1.3tanx 1.5574 1.9648 2.5722 3.6021

Use linear interpolation to estimate tan(1.15). Then

use

x0 = 1.1, x1 = 1.2

with corresponding values for y0 and y1. Then

tanx ≈ y0 +x− x0x1 − x0

[y1 − y0]

tanx ≈ y0 +x− x0x1 − x0

[y1 − y0]

tan (1.15) ≈ 1.9648 +1.15− 1.11.2− 1.1 [2.5722− 1.9648]

= 2.2685

The true value is tan 1.15 = 2.2345. We will want

to examine formulas for the error in interpolation, to

know when we have sufficient accuracy in our inter-

polant.

Page 455: Analiza Numerica [Utm, Bostan v.]

x

y

1 1.3

y=tan(x)

x

y

1.1 1.2

y = tan(x)y = p1(x)

Page 456: Analiza Numerica [Utm, Bostan v.]

QUADRATIC INTERPOLATION

We want to find a polynomial

P2(x) = a0 + a1x+ a2x2

which satisfies

P2(xi) = yi, i = 0, 1, 2

for given data points (x0, y0) , (x1, y1) , (x2, y2). One

formula for such a polynomial follows:

P2(x) = y0L0(x) + y1L1(x) + y2L2(x) (∗∗)with

L0(x) =(x−x1)(x−x2)(x0−x1)(x0−x2), L1(x) =

(x−x0)(x−x2)(x1−x0)(x1−x2)

L2(x) =(x−x0)(x−x1)(x2−x0)(x2−x1)

The formula (∗∗) is called Lagrange’s form of the in-

terpolation polynomial.

Page 457: Analiza Numerica [Utm, Bostan v.]

LAGRANGE BASIS FUNCTIONS

The functions

L0(x) =(x−x1)(x−x2)(x0−x1)(x0−x2), L1(x) =

(x−x0)(x−x2)(x1−x0)(x1−x2)

L2(x) =(x−x0)(x−x1)(x2−x0)(x2−x1)

are called ‘Lagrange basis functions’ for quadratic in-

terpolation. They have the properties

Li(xj) =

(1, i = j0, i 6= j

for i, j = 0, 1, 2. Also, they all have degree 2. Their

graphs are on an accompanying page.

As a consequence of each Li(x) being of degree 2, we

have that the interpolant

P2(x) = y0L0(x) + y1L1(x) + y2L2(x)

must have degree ≤ 2.

Page 458: Analiza Numerica [Utm, Bostan v.]

UNIQUENESS

Can there be another polynomial, call it Q(x), forwhich

deg(Q) ≤ 2Q(xi) = yi, i = 0, 1, 2

Thus, is the Lagrange formula P2(x) unique?

Introduce

R(x) = P2(x)−Q(x)

From the properties of P2 and Q, we have deg(R) ≤2. Moreover,

R(xi) = P2(xi)−Q(xi) = yi − yi = 0

for all three node points x0, x1, and x2. How manypolynomials R(x) are there of degree at most 2 andhaving three distinct zeros? The answer is that onlythe zero polynomial satisfies these properties, and there-fore

R(x) = 0 for all x

Q(x) = P2(x) for all x

Page 459: Analiza Numerica [Utm, Bostan v.]

SPECIAL CASES

Consider the data points

(x0, 1), (x1, 1), (x2, 1)

What is the polynomial P2(x) in this case?

Answer: We must have the polynomial interpolant is

P2(x) ≡ 1meaning that P2(x) is the constant function. Why?First, the constant function satisfies the property ofbeing of degree ≤ 2. Next, it clearly interpolates thegiven data. Therefore by the uniqueness of quadraticinterpolation, P2(x) must be the constant function 1.

Consider now the data points

(x0,mx0), (x1,mx1), (x2,mx2)

for some constant m. What is P2(x) in this case? Byan argument similar to that above,

P2(x) = mx for all x

Thus the degree of P2(x) can be less than 2.

Page 460: Analiza Numerica [Utm, Bostan v.]

HIGHER DEGREE INTERPOLATION

We consider now the case of interpolation by poly-nomials of a general degree n. We want to find apolynomial Pn(x) for which

deg(Pn) ≤ nPn(xi) = yi, i = 0, 1, · · · , n (∗∗)

with given data points

(x0, y0) , (x1, y1) , · · · , (xn, yn)The solution is given by Lagrange’s formula

Pn(x) = y0L0(x) + y1L1(x) + · · ·+ ynLn(x)

The Lagrange basis functions are given by

Lk(x) =(x− x0) ..(x− xk−1)(x− xk+1).. (x− xn)

(xk − x0) ..(xk − xk−1)(xk − xk+1).. (xk − xn)

for k = 0, 1, 2, ..., n. The quadratic case was coveredearlier.

In a manner analogous to the quadratic case, we canshow that the above Pn(x) is the only solution to theproblem (∗∗).

Page 461: Analiza Numerica [Utm, Bostan v.]

In the formula

Lk(x) =(x− x0) ..(x− xk−1)(x− xk+1).. (x− xn)

(xk − x0) ..(xk − xk−1)(xk − xk+1).. (xk − xn)

we can see that each such function is a polynomial of

degree n. In addition,

Lk(xi) =

(1, k = i0, k 6= i

Using these properties, it follows that the formula

Pn(x) = y0L0(x) + y1L1(x) + · · ·+ ynLn(x)

satisfies the interpolation problem of finding a solution

to

deg(Pn) ≤ nPn(xi) = yi, i = 0, 1, · · · , n

Page 462: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Recall the table

x 1 1.1 1.2 1.3tanx 1.5574 1.9648 2.5722 3.6021

We now interpolate this table with the nodes

x0 = 1, x1 = 1.1, x2 = 1.2, x3 = 1.3

Without giving the details of the evaluation process,

we have the following results for interpolation with

degrees n = 1, 2, 3.

n 1 2 3Pn(1.15) 2.2685 2.2435 2.2296Error −.0340 −.0090 .0049

It improves with increasing degree n, but not at a very

rapid rate. In fact, the error becomes worse when n is

increased further. Later we will see that interpolation

of a much higher degree, say n ≥ 10, is often poorly

behaved when the node points {xi} are evenly spaced.

Page 463: Analiza Numerica [Utm, Bostan v.]

A FIRST ORDER DIVIDED DIFFERENCE

For a given function f(x) and two distinct points x0 andx1, de�ne

f [x0; x1] =f(x1)� f(x0)x1 � x0

This is called a �rst order divided di¤erence of f(x). Bythe Mean-value theorem,

f(x1)� f(x0) = f 0(c)(x1 � x0)

for some c between x0 and x1. Thus

f [x0; x1] = f 0(c)

and the divided di¤erence is very much like the derivative,especially if x0 and x1 are quite close together. In fact,

f 0(x1 + x02

) � f [x0; x1]

is quite an accurate approximation of the derivative

Page 464: Analiza Numerica [Utm, Bostan v.]

SECOND ORDER DIVIDED DIFFERENCES

Given three distinct points x0, x1, and x2, de�ne

f [x0; x1; x2] =f [x1; x2]� f [x0; x1]

x2 � x0This is called the second order divided di¤erence of f(x).By a fairly complicated argument, we can show

f [x0; x1; x2] =1

2f 00(c)

for some c intermediate to x0, x1, and x2. In fact, as weinvestigate,

f 00(x1) � 2f [x0; x1; x2]

in the case the nodes are evenly spaced,

x1 � x0 = x2 � x1:

Page 465: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Consider the table

x 1 1.1 1.2 1.3 1.4cosx .54030 .45360 .36236 .26750 .16997

Let x0 = 1, x1 = 1.1, and x2 = 1.2. Then

f [x0, x1] =.45360− .54030

1.1− 1 = −.86700

f [x1, x2] =.36236− .45360

1.1− 1 = −.91240

f [x0, x1, x2] =f [x1, x2]− f [x0, x1]

x2 − x0

=−.91240− (−.86700)

1.2− 1.0 = −.22700For comparison,

f 0µx1 + x02

¶= − sin (1.05) = −.86742

1

2f 00 (x1) = −1

2cos (1.1) = −.22680

Page 466: Analiza Numerica [Utm, Bostan v.]

GENERAL DIVIDED DIFFERENCES

Given n + 1 distinct points x0, ..., xn, with n ≥ 2,

define

f [x0, ..., xn] =f [x1, ..., xn]− f [x0, ..., xn−1]

xn − x0

This is a recursive definition of the nth-order divided

difference of f(x), using divided differences of order

n. Its relation to the derivative is as follows:

f [x0, ..., xn] =1

n!f (n)(c)

for some c intermediate to the points {x0, ..., xn}. LetI denote the interval

I = [min {x0, ..., xn} ,max {x0, ..., xn}]Then c ∈ I, and the above result is based on the

assumption that f(x) is n-times continuously differ-

entiable on the interval I.

Page 467: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

The following table gives divided differences for the

data in

x 1 1.1 1.2 1.3 1.4cosx .54030 .45360 .36236 .26750 .16997

For the column headings, we use

Dkf(xi) = f [xi, ..., xi+k]

i xi f(xi) Df(xi) D2f(xi) D3f(xi) D4f(xi)0 1.0 .54030 -.8670 -.2270 .1533 .01251 1.1 .45360 -.9124 -.1810 .15832 1.2 .36236 -.9486 -.13353 1.3 .26750 -.97534 1.4 .16997

These were computed using the recursive definition

f [x0, ..., xn] =f [x1, ..., xn]− f [x0, ..., xn−1]

xn − x0

Page 468: Analiza Numerica [Utm, Bostan v.]

ORDER OF THE NODES

Looking at f [x0, x1], we have

f [x0, x1] =f(x1)− f(x0)

x1 − x0=

f(x0)− f(x1)

x0 − x1= f [x1, x0]

The order of x0 and x1 does not matter. Looking at

f [x0, x1, x2] =f [x1, x2]− f [x0, x1]

x2 − x0

we can expand it to get

f [x0, x1, x2] =f(x0)

(x0 − x1) (x0 − x2)

+f(x1)

(x1 − x0) (x1 − x2)+

f(x2)

(x2 − x0) (x2 − x1)

With this formula, we can show that the order of the

arguments x0, x1, x2 does not matter in the final value

of f [x0, x1, x2] we obtain. Mathematically,

f [x0, x1, x2] = f [xi0, xi1, xi2]

for any permutation (i0, i1, i2) of (0, 1, 2).

Page 469: Analiza Numerica [Utm, Bostan v.]

We can show in general that the value of f [x0, ..., xn]

is independent of the order of the arguments {x0, ..., xn},even though the intermediate steps in its calculations

using

f [x0, ..., xn] =f [x1, ..., xn]− f [x0, ..., xn−1]

xn − x0

are order dependent.

We can show

f [x0, ..., xn] = f [xi0, ..., xin]

for any permutation (i0, i1, ..., in) of (0, 1, ..., n).

Page 470: Analiza Numerica [Utm, Bostan v.]

COINCIDENT NODES

What happens when some of the nodes {x0, ..., xn}are not distinct. Begin by investigating what happens

when they all come together as a single point x0.

For first order divided differences, we have

limx1→x0

f [x0, x1] = limx1→x0

f(x1)− f(x0)

x1 − x0= f 0(x0)

We extend the definition of f [x0, x1] to coincident

nodes using

f [x0, x0] = f 0(x0)

Page 471: Analiza Numerica [Utm, Bostan v.]

For second order divided differences, recall

f [x0, x1, x2] =1

2f 00(c)

with c intermediate to x0, x1, and x2.

Then as x1 → x0 and x2 → x0, we must also have

that c→ x0. Therefore,

limx1→x0x2→x0

f [x0, x1, x2] =1

2f 00(x0)

We therefore define

f [x0, x0, x0] =1

2f 00(x0)

Page 472: Analiza Numerica [Utm, Bostan v.]

For the case of general f [x0, ..., xn], recall that

f [x0, ..., xn] =1

n!f (n)(c)

for some c intermediate to {x0, ..., xn}. Then

lim{x1,...,xn}→x0

f [x0, ..., xn] =1

n!f (n)(x0)

and we define

f [x0, ..., x0| {z }]n+1 times

=1

n!f (n)(x0)

What do we do when only some of the nodes are

coincident. This too can be dealt with, although we

do so here only by examples.

f [x0, x1, x1] =f [x1, x1]− f [x0, x1]

x1 − x0

=f 0(x1)− f [x0, x1]

x1 − x0The recursion formula can be used in general in this

way to allow all possible combinations of possibly co-

incident nodes.

Page 473: Analiza Numerica [Utm, Bostan v.]

LAGRANGE�S FORMULA FOR THEINTERPOLATION POLYNOMIAL

Recall the general interpolation problem: �nd a polyno-mial Pn(x) for which

deg(Pn) � n

Pn(xi) = yi; i = 0; 1; : : : ; n

with given data points

(x0; y0); (x1; y1); ���; (xn; yn)

and with fx0; :::; xng distinct points. The solution tothis problem is given as Lagrange�s formula

Pn(x) = y0L0(x) + y1L1(x) + ���+ ynLn(x)

with fL0(x); :::; Ln(x)g the Lagrange basis polynomials.Each Lj is of degree n and it satis�es

Lj(xi) =

(1; if ; j = i0; if ; j 6= i

for i = 0; 1; : : : ; n.

Page 474: Analiza Numerica [Utm, Bostan v.]

THE NEWTON DIVIDED DIFFERENCE FORM

OF THE INTERPOLATION POLYNOMIAL

Let the data values for the problem

deg(Pn) ≤ nPn(xi) = yi, i = 0, 1, · · · , n

be generated from a function f(x):

yi = f(xi), i = 0, 1, ..., n

Using the divided differences

f [x0, x1], f [x0, x1, x2], ..., f [x0, ..., xn]

we can write the interpolation polynomials

P1(x), P2(x), ..., Pn(x)

in a way that is simple to compute.

P1(x) = f(x0) + f [x0, x1] (x− x0)P2(x) = f(x0) + f [x0, x1] (x− x0)

+f [x0, x1, x2] (x− x0) (x− x1)= P1(x) + f [x0, x1, x2] (x− x0) (x− x1)

Page 475: Analiza Numerica [Utm, Bostan v.]

For the case of the general problem

deg(Pn) ≤ nPn(xi) = yi, i = 0, 1, · · · , n

we have

Pn(x) = f(x0) + f [x0, x1] (x− x0)+f [x0, x1, x2] (x− x0) (x− x1)+f [x0, x1, x2, x3] (x− x0) (x− x1) (x− x2)+ · · ·+f [x0, ..., xn] (x− x0) · · · (x− xn−1)

From this we have the recursion relation

Pn(x) = Pn−1(x)+f [x0, ..., xn] (x− x0) · · · (x− xn−1)

in which Pn−1(x) interpolates f(x) at the points in{x0, ..., xn−1}.

Page 476: Analiza Numerica [Utm, Bostan v.]

Example: Recall the table

i xi f(xi) Df(xi) D2f(xi) D3f(xi) D4f(xi)0 1.0 .54030 -.8670 -.2270 .1533 .01251 1.1 .45360 -.9124 -.1810 .15832 1.2 .36236 -.9486 -.13353 1.3 .26750 -.97534 1.4 .16997

withDkf(xi) = f [xi, ..., xi+k], k = 1, 2, 3, 4. Then

P1(x) = .5403− .8670 (x− 1)P2(x) = P1(x)− .2270 (x− 1) (x− 1.1)P3(x) = P2(x) + .1533 (x− 1) (x− 1.1) (x− 1.2)P4(x) = P3(x)

+.0125 (x− 1) (x− 1.1) (x− 1.2) (x− 1.3)Using this table and these formulas, we have the fol-

lowing table of interpolants for the value x = 1.05.

The true value is cos(1.05) = .49757105.

n 1 2 3 4Pn(1.05) .49695 .49752 .49758 .49757Error 6.20E−4 5.00E−5 −1.00E−5 0.0

Page 477: Analiza Numerica [Utm, Bostan v.]

EVALUATION OF THE DIVIDED DIFFERENCE

INTERPOLATION POLYNOMIAL

Let

d1 = f [x0, x1]d2 = f [x0, x1, x2]

...dn = f [x0, ..., xn]

Then the formula

Pn(x) = f(x0) + f [x0, x1] (x− x0)+f [x0, x1, x2] (x− x0) (x− x1)+f [x0, x1, x2, x3] (x− x0) (x− x1) (x− x2)+ · · ·+f [x0, ..., xn] (x− x0) · · · (x− xn−1)

can be written as

Pn(x) = f(x0) + (x− x0) (d1 + (x− x1) (d2 + · · ·+(x− xn−2) (dn−1 + (x− xn−1) dn) · · · )

Thus we have a nested polynomial evaluation, and

this is quite efficient in computational cost.

Page 478: Analiza Numerica [Utm, Bostan v.]

ERROR IN LINEAR INTERPOLATION

Let P1(x) denote the linear polynomial interpolating

f(x) at x0 and x1, with f(x) a given function (e.g.

f(x) = cosx). What is the error f(x)− P1(x)?

Let f(x) be twice continuously differentiable on an in-

terval [a, b] which contains the points {x0, x1}. Thenfor a ≤ x ≤ b,

f(x)− P1(x) =(x− x0) (x− x1)

2f 00(cx)

for some cx between the minimum and maximum of

x0, x1, and x.

If x1 and x are ‘close to x0’, then

f(x)− P1(x) ≈(x− x0) (x− x1)

2f 00(x0)

Thus the error acts like a quadratic polynomial, with

zeros at x0 and x1.

Page 479: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Let f(x) = log10 x; and in line with typical tables of

log10 x, we take 1 ≤ x, x0, x1 ≤ 10. For definiteness,let x0 < x1 with h = x1 − x0. Then

f 00(x) = −log10 ex2

log10 x− P1(x) =(x− x0) (x− x1)

2

"−log10 e

c2x

#

= (x− x0) (x1 − x)

"log10 e

2c2x

#We usually are interpolating with x0 ≤ x ≤ x1; and

in that case, we have

(x− x0) (x1 − x) ≥ 0, x0 ≤ cx ≤ x1

Page 480: Analiza Numerica [Utm, Bostan v.]

(x− x0) (x1 − x) ≥ 0, x0 ≤ cx ≤ x1

and therefore

(x− x0) (x1 − x)

"log10 e

2x21

#≤ log10 x− P1(x)

≤ (x− x0) (x1 − x)

"log10 e

2x20

#

For h = x1 − x0 small, we have for x0 ≤ x ≤ x1

log10 x− P1(x) ≈ (x− x0) (x1 − x)

"log10 e

2x20

#

Typical high school algebra textbooks contain tables

of log10 x with a spacing of h = .01. What is the

error in this case? To look at this, we use

0 ≤ log10 x− P1(x) ≤ (x− x0) (x1 − x)

"log10 e

2x20

#

Page 481: Analiza Numerica [Utm, Bostan v.]

By simple geometry or calculus,

maxx0≤x≤x1

(x− x0) (x1 − x) ≤ h2

4

Therefore,

0 ≤ log10 x− P1(x) ≤h2

4

"log10 e

2x20

#.= .0543

h2

x20

If we want a uniform bound for all points 1 ≤ x0 ≤ 10,we have

0 ≤ log10 x− P1(x) ≤h2 log10 e

8

.= .0543h2

0 ≤ log10 x− P1(x) ≤ .0543h2

For h = .01, as is typical of the high school text book

tables of log10 x,

0 ≤ log10 x− P1(x) ≤ 5.43× 10−6

Page 482: Analiza Numerica [Utm, Bostan v.]

If you look at most tables, a typical entry is given to

only four decimal places to the right of the decimal

point, e.g.

log 5.41.= .7332

Therefore the entries are in error by as much as .00005.

Comparing this with the interpolation error, we see the

latter is less important than the rounding errors in the

table entries.

From the bound

0 ≤ log10 x− P1(x) ≤h2 log10 e

8x20

.= .0543

h2

x20

we see the error decreases as x0 increases, and it is

about 100 times smaller for points near 10 than for

points near 1.

Page 483: Analiza Numerica [Utm, Bostan v.]

AN ERROR FORMULA:

THE GENERAL CASE

Recall the general interpolation problem: find a poly-

nomial Pn(x) for which deg(Pn) ≤ n

Pn(xi) = f(xi), i = 0, 1, · · · , nwith distinct node points {x0, ..., xn} and a givenfunction f(x). Let [a, b] be a given interval on which

f(x) is (n+ 1)-times continuously differentiable; and

assume the points x0, ..., xn, and x are contained in

[a, b]. Then

f(x)−Pn(x) = (x− x0) (x− x1) · · · (x− xn)

(n+ 1)!f (n+1) (cx)

with cx some point between the minimum and maxi-

mum of the points in {x, x0, ..., xn}.

Page 484: Analiza Numerica [Utm, Bostan v.]

f(x)−Pn(x) = (x− x0) (x− x1) · · · (x− xn)

(n+ 1)!f (n+1) (cx)

As shorthand, introduce

Ψn(x) = (x− x0) (x− x1) · · · (x− xn)

a polynomial of degree n+ 1 with roots {x0, ..., xn}.Then

f(x)− Pn(x) =Ψn(x)

(n+ 1)!f (n+1) (cx)

Page 485: Analiza Numerica [Utm, Bostan v.]

THE QUADRATIC CASE

For n = 2, we have

f(x)− P2(x) =(x− x0) (x− x1) (x− x2)

3!f (3) (cx)

(*)

with cx some point between the minimum and maxi-

mum of the points in {x, x0, x1, x2}.

To illustrate the use of this formula, consider the case

of evenly spaced nodes:

x1 = x0 + h, x2 = x1 + h

Further suppose we have x0 ≤ x ≤ x2, as we would

usually have when interpolating in a table of given

function values (e.g. log10 x). The quantity

Ψ2(x) = (x− x0) (x− x1) (x− x2)

can be evaluated directly for a particular x.

Page 486: Analiza Numerica [Utm, Bostan v.]

Graph of

Ψ2(x) = (x+ h)x (x− h)

using (x0, x1, x2) = (−h, 0, h):

x

y

h

-h

Page 487: Analiza Numerica [Utm, Bostan v.]

In the formula (∗), however, we do not know cx, and

therefore we replace¯̄̄f (3) (cx)

¯̄̄with a maximum of¯̄̄

f (3) (x)¯̄̄as x varies over x0 ≤ x ≤ x2. This yields

|f(x)− P2(x)| ≤|Ψ2(x)|3!

maxx0≤x≤x2

¯̄̄f (3) (x)

¯̄̄(**)

If we want a uniform bound for x0 ≤ x ≤ x2, we must

compute

maxx0≤x≤x2

|Ψ2(x)| = maxx0≤x≤x2

|(x− x0) (x− x1) (x− x2)|

Using calculus,

maxx0≤x≤x2

|Ψ2(x)| =2h3

3 sqrt(3), at x = x1±

h

sqrt(3)

Combined with (∗∗), this yields

|f(x)− P2(x)| ≤h3

9 sqrt(3)max

x0≤x≤x2

¯̄̄f (3) (x)

¯̄̄for x0 ≤ x ≤ x2.

Page 488: Analiza Numerica [Utm, Bostan v.]

For f(x) = log10 x, with 1 ≤ x0 ≤ x ≤ x2 ≤ 10, thisleads to

|log10 x− P2(x)| ≤h3

9 sqrt(3)· maxx0≤x≤x2

2 log10 e

x3

=.05572h3

x30

For the case of h = .01, we have

|log10 x− P2(x)| ≤5.57× 10−8

x30≤ 5.57× 10−8

Page 489: Analiza Numerica [Utm, Bostan v.]

Question: How much larger could we make h so that

quadratic interpolation would have an error compa-

rable to that of linear interpolation of log10 x with

h = .01? The error bound for the linear interpolation

was 5.43× 10−6, and therefore we want the same tobe true of quadratic interpolation. Using a simpler

bound, we want to find h so that

|log10 x− P2(x)| ≤ .05572h3 ≤ 5× 10−6

This is true if h = .04477. Therefore a spacing of

h = .04 would be sufficient. A table with this spac-

ing and quadratic interpolation would have an error

comparable to a table with h = .01 and linear inter-

polation.

Page 490: Analiza Numerica [Utm, Bostan v.]

For the case of general n,

f(x)− Pn(x) =(x− x0) · · · (x− xn)

(n+ 1)!f (n+1) (cx)

=Ψn(x)

(n+ 1)!f (n+1) (cx)

Ψn(x) = (x− x0) (x− x1) · · · (x− xn)

with cx some point between the minimum and max-

imum of the points in {x, x0, ..., xn}. When bound-ing the error we replace f (n+1) (cx) with its maximum

over the interval containing {x, x0, ..., xn}, as we haveillustrated earlier in the linear and quadratic cases.

Consider now the function

Ψn(x)

(n+ 1)!

over the interval determined by the minimum and

maximum of the points in {x, x0, ..., xn}. For evenlyspaced node points on [0, 1], with x0 = 0 and xn = 1,

we give graphs for n = 2, 3, 4, 5 and for n = 6, 7, 8, 9

on accompanying pages.

Page 491: Analiza Numerica [Utm, Bostan v.]

DISCUSSION OF ERROR

Consider the error

f(x)− Pn(x) =(x− x0) · · · (x− xn)

(n+ 1)!f (n+1) (cx)

=Ψn(x)

(n+ 1)!f (n+1) (cx)

Ψn(x) = (x− x0) (x− x1) · · · (x− xn)

as n increases and as x varies. As noted previously, we

cannot do much with f (n+1) (cx) except to replace it

with a maximum value of¯̄̄f (n+1) (x)

¯̄̄over a suitable

interval. Thus we concentrate on understanding the

size of

Ψn(x)

(n+ 1)!

Page 492: Analiza Numerica [Utm, Bostan v.]

ERROR FOR EVENLY SPACED NODES

We consider first the case in which the node points

are evenly spaced, as this seems the ‘natural’ way to

define the points at which interpolation is carried out.

Moreover, using evenly spaced nodes is the case to

consider for table interpolation. What can we learn

from the given graphs?

The interpolation nodes are determined by using

h =1

n, x0 = 0, x1 = h, x2 = 2h, ..., xn = nh = 1

For this case,

Ψn(x) = x (x− h) (x− 2h) · · · (x− 1)Our graphs are the cases of n = 2, ..., 9.

Page 493: Analiza Numerica [Utm, Bostan v.]

x

y n = 2

1x

y n = 3

1

x

y n = 4

1

x

y n = 5

1

Graphs of Ψn(x) on [0, 1] for n = 2, 3, 4, 5

Page 494: Analiza Numerica [Utm, Bostan v.]

x

y n = 6

1

x

y n = 7

1

x

y n = 8

1

x

y n = 9

1

Graphs of Ψn(x) on [0, 1] for n = 6, 7, 8, 9

Page 495: Analiza Numerica [Utm, Bostan v.]

Graph of

Ψ6(x) = (x− x0) (x− x1) · · · (x− x6)

with evenly spaced nodes:

xx0 x1 x2 x3 x4 x5 x6

Page 496: Analiza Numerica [Utm, Bostan v.]

Using the following table

,

n Mn n Mn

1 1.25E−1 6 4.76E−72 2.41E−2 7 2.20E−83 2.06E−3 8 9.11E−104 1.48E−4 9 3.39E−115 9.01E−6 10 1.15E−12

we can observe that the maximum

Mn ≡ maxx0≤x≤xn

|Ψn(x)|(n+ 1)!

becomes smaller with increasing n.

Page 497: Analiza Numerica [Utm, Bostan v.]

From the graphs, there is enormous variation in the

size of Ψn(x) as x varies over [0, 1]; and thus there

is also enormous variation in the error as x so varies.

For example, in the n = 9 case,

maxx0≤x≤x1

|Ψn(x)|(n+ 1)!

= 3.39× 10−11

maxx4≤x≤x5

|Ψn(x)|(n+ 1)!

= 6.89× 10−13

and the ratio of these two errors is approximately 49.

Thus the interpolation error is likely to be around 49

times larger when x0 ≤ x ≤ x1 as compared to the

case when x4 ≤ x ≤ x5. When doing table inter-

polation, the point x at which you are interpolating

should be centrally located with respect to the inter-

polation nodes m{x0, ..., xn} being used to define theinterpolation, if possible.

Page 498: Analiza Numerica [Utm, Bostan v.]

AN APPROXIMATION PROBLEM

Consider now the problem of using an interpolation

polynomial to approximate a given function f(x) on

a given interval [a, b]. In particular, take interpolation

nodes

a ≤ x0 < x1 < · · · < xn−1 < xn ≤ b

and produce the interpolation polynomial Pn(x) that

interpolates f(x) at the given node points. We would

like to have

maxa≤x≤b |f(x)− Pn(x)|→ 0 as n→∞

Does it happen?

Recall the error bound

maxa≤x≤b |f(x)− Pn(x)|

≤ maxa≤x≤b

|Ψn(x)|(n+ 1)!

· maxa≤x≤b

¯̄̄f (n+1) (x)

¯̄̄We begin with an example using evenly spaced node

points.

Page 499: Analiza Numerica [Utm, Bostan v.]

RUNGE�S EXAMPLE

Use evenly spaced node points:

h =b� an

xi = a+ ih for i = 0; : : : ; n

For some functions, such as f(x) = ex, the maximumerror goes to zero quite rapidly. But the size of the deriv-ative term f (n+1)(x) in

maxa�x�b

jf(x)� Pn(x)j

� 1

(n+ 1)!maxa�x�b

jn(x)j � maxa�x�b

���f (n+1)(x)���can badly hurt or destroy the convergence of other cases.In particular, we show the graph of

f(x) =1

1 + x2

and Pn(x) on [�5; 5] for the case n = 10. It canbe proven that for this function, the maximum error on[�5; 5] does not converge to zero. Thus the use of evenlyspaced nodes is not necessarily a good approach to ap-proximating a function f(x) by interpolation.

Page 500: Analiza Numerica [Utm, Bostan v.]

Runge’s example with n = 10:

x

y

y=P10(x)

y=1/(1+x2)

Page 501: Analiza Numerica [Utm, Bostan v.]

OTHER CHOICES OF NODES

Recall the general error bound

maxa≤x≤b |f(x)− Pn(x)| ≤ max

a≤x≤b|Ψn(x)|(n+ 1)!

· maxa≤x≤b

¯̄̄f (n+1) (x)

¯̄̄There is nothing we really do with the derivative term

for f ; but we can examine the way of defining the

nodes {x0, ..., xn} within the interval [a, b]. We askhow these nodes can be chosen so that the maximum

of |Ψn(x)| over [a, b] is made as small as possible.

Page 502: Analiza Numerica [Utm, Bostan v.]

This problem has quite an elegant solution, and it will beconsidered in next lecture. The node points fx0; :::; xngturn out to be the zeros of a particular polynomial Tn+1(x)of degree n + 1, called a Chebyshev polynomial. Thesezeros are known explicitly, and with them

maxa�x�b

jn(x)j =�b� a2

�n+12�n

This turns out to be smaller than for evenly spaced cases;and although this polynomial interpolation does not workfor all functions f(x), it works for all di¤erentiable func-tions and more.

Page 503: Analiza Numerica [Utm, Bostan v.]

ANOTHER ERROR FORMULA

Recall the error formula

f(x)− Pn(x) =Ψn(x)

(n+ 1)!f (n+1) (c)

Ψn(x) = (x− x0) (x− x1) · · · (x− xn)

with c between the minimum and maximum of {x0, ..., xn, x}.A second formula is given by

f(x)− Pn(x) = Ψn(x) f [x0, ..., xn, x]

To show this is a simple, but somewhat subtle argu-

ment.

Let Pn+1(x) denote the polynomial of degree ≤ n+1

which interpolates f(x) at the points {x0, ..., xn, xn+1}.Then

Pn+1(x) = Pn(x)

+f [x0, ..., xn, xn+1] (x− x0) · · · (x− xn)

Page 504: Analiza Numerica [Utm, Bostan v.]

Substituting x = xn+1, and using the fact that Pn+1(x)

interpolates f(x) at xn+1, we have

f(xn+1) = Pn(xn+1)

+f [x0, ..., xn, xn+1] (xn+1 − x0) · · · (xn+1 − xn)

f(xn+1) = Pn(xn+1)

+f [x0, ..., xn, xn+1] (xn+1 − x0) · · · (xn+1 − xn)

In this formula, the number xn+1 is completely ar-

bitrary, other than being distinct from the points in

{x0, ..., xn}. To emphasize this fact, replace xn+1 byx throughout the formula, obtaining

f(x) = Pn(x) + f [x0, ..., xn, x] (x− x0) · · · (x− xn)

= Pn(x) +Ψn(x) f [x0, ..., xn, x]

provided x 6= x0, ..., xn.

Page 505: Analiza Numerica [Utm, Bostan v.]

The formula

f(x) = Pn(x) + f [x0, ..., xn, x] (x− x0) · · · (x− xn)

= Pn(x) +Ψn(x) f [x0, ..., xn, x]

is easily true for x a node point. Provided f(x) is

differentiable, the formula is also true for x a node

point.

This shows

f(x)− Pn(x) = Ψn(x) f [x0, ..., xn, x]

Compare the two error formulas

f(x)− Pn(x) = Ψn(x) f [x0, ..., xn, x]

f(x)− Pn(x) =Ψn(x)

(n+ 1)!f (n+1) (c)

Page 506: Analiza Numerica [Utm, Bostan v.]

Then

Ψn(x) f [x0, ..., xn, x] =Ψn(x)

(n+ 1)!f (n+1) (c)

f [x0, ..., xn, x] =f (n+1) (c)

(n+ 1)!

for some c between the smallest and largest of the

numbers in {x0, ..., xn, x}.

To make this somewhat symmetric in its arguments,

let m = n+ 1, x = xn+1. Then

f [x0, ..., xm−1, xm] =f (m) (c)

m!

with c an unknown number between the smallest and

largest of the numbers in {x0, ..., xm}. This was givenin an earlier lecture where divided differences were in-

troduced.

Page 507: Analiza Numerica [Utm, Bostan v.]

PIECEWISE POLYNOMIAL INTERPOLATION

Recall the examples of higher degree polynomial in-

terpolation of the function f(x) =³1 + x2

´−1on

[−5, 5]. The interpolants Pn(x) oscillated a great

deal, whereas the function f(x) was nonoscillatory.

To obtain interpolants that are better behaved, we

look at other forms of interpolating functions.

Consider the data

x 0 1 2 2.5 3 3.5 4y 2.5 0.5 0.5 1.5 1.5 1.125 0

What are methods of interpolating this data, other

than using a degree 6 polynomial. Shown in the text

are the graphs of the degree 6 polynomial interpolant,

along with those of piecewise linear and a piecewise

quadratic interpolating functions.

Since we only have the data to consider, we would gen-

erally want to use an interpolant that had somewhat

the shape of that of the piecewise linear interpolant.

Page 508: Analiza Numerica [Utm, Bostan v.]

x

y

1 2 3 4

1

2

The data points

x

y

1 2 3 4

1

2

Piecewise linear interpolation

Page 509: Analiza Numerica [Utm, Bostan v.]

x

y

1 2 3 4

1

2

3

4

Polynomial Interpolation

x

y

1 2 3 4

1

2

Piecewise quadratic interpolation

Page 510: Analiza Numerica [Utm, Bostan v.]

PIECEWISE POLYNOMIAL FUNCTIONS

Consider being given a set of data points (x1, y1), ...,

(xn, yn), with

x1 < x2 < · · · < xn

Then the simplest way to connect the points (xj, yj)

is by straight line segments. This is called a piecewise

linear interpolant of the datan(xj, yj)

o. This graph

has “corners”, and often we expect the interpolant to

have a smooth graph.

To obtain a somewhat smoother graph, consider using

piecewise quadratic interpolation. Begin by construct-

ing the quadratic polynomial that interpolates

{(x1, y1), (x2, y2), (x3, y3)}Then construct the quadratic polynomial that inter-

polates

{(x3, y3), (x4, y4), (x5, y5)}

Page 511: Analiza Numerica [Utm, Bostan v.]

Continue this process of constructing quadratic inter-

polants on the subintervals

[x1, x3], [x3, x5], [x5, x7], ...

If the number of subintervals is even (and therefore

n is odd), then this process comes out fine, with the

last interval being [xn−2, xn]. This was illustrated

on the graph for the preceding data. If, however, n is

even, then the approximation on the last interval must

be handled by some modification of this procedure.

Suggest such!

With piecewise quadratic interpolants, however, there

are “corners” on the graph of the interpolating func-

tion. With our preceding example, they are at x3 and

x5. How do we avoid this?

Piecewise polynomial interpolants are used in many

applications. We will consider them later, to obtain

numerical integration formulas.

Page 512: Analiza Numerica [Utm, Bostan v.]

SMOOTH NON-OSCILLATORY

INTERPOLATION

Let data points (x1, y1), ..., (xn, yn) be given, as let

x1 < x2 < · · · < xn

Consider finding functions s(x) for which the follow-

ing properties hold:

(1) s(xi) = yi, i = 1, ..., n

(2) s(x), s0(x), s00(x) are continuous on [x1, xn].Then among such functions s(x) satisfying these prop-

erties, find the one which minimizes the integralZ xn

x1

¯̄̄s00(x)

¯̄̄2dx

The idea of minimizing the integral is to obtain an in-

terpolating function for which the first derivative does

not change rapidly. It turns out there is a unique so-

lution to this problem, and it is called a natural cubic

spline function.

Page 513: Analiza Numerica [Utm, Bostan v.]

SPLINE FUNCTIONS

Let a set of node points {xi} be given, satisfyinga ≤ x1 < x2 < · · · < xn ≤ b

for some numbers a and b. Often we use [a, b] =

[x1, xn]. A cubic spline function s(x) on [a, b] with

“breakpoints” or “knots” {xi} has the following prop-erties:

1. On each of the intervals

[a, x1], [x1, x2], ..., [xn−1, xn], [xn, b]

s(x) is a polynomial of degree ≤ 3.2. s(x), s0(x), s00(x) are continuous on [a, b].

In the case that we have given data points (x1, y1),...,

(xn, yn), we say s(x) is a cubic interpolating spline

function for this data if

3. s(xi) = yi, i = 1, ..., n.

Page 514: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Define

(x− α)3+ =

((x− α)3 , x ≥ α

0, x ≤ α

This is a cubic spline function on (−∞,∞) with thesingle breakpoint x1 = α.

Combinations of these form more complicated cubic

spline functions. For example,

s(x) = 3 (x− 1)3+ − 2 (x− 3)3+is a cubic spline function on (−∞,∞) with the break-points x1 = 1, x2 = 3.

Define

s(x) = p3(x) +nX

j=1

aj³x− xj

´3+

with p3(x) some cubic polynomial. Then s(x) is a

cubic spline function on (−∞,∞) with breakpoints{x1, ..., xn}.

Page 515: Analiza Numerica [Utm, Bostan v.]

Return to the earlier problem of choosing an interpo-

lating function s(x) to minimize the integralZ xn

x1

¯̄̄s00(x)

¯̄̄2dx

There is a unique solution to problem. The solution

s(x) is a cubic interpolating spline function, and more-

over, it satisfies

s00(x1) = s00(xn) = 0

Spline functions satisfying these boundary conditions

are called “natural” cubic spline functions, and the so-

lution to our minimization problem is a “natural cubic

interpolatory spline function”. We will show a method

to construct this function from the interpolation data.

Motivation for these boundary conditions can be given

by looking at the physics of bending thin beams of

flexible materials to pass thru the given data. To the

left of x1 and to the right of xn, the beam is straight

and therefore the second derivatives are zero at the

transition points x1 and xn.

Page 516: Analiza Numerica [Utm, Bostan v.]

CONSTRUCTION OF THE

INTERPOLATING SPLINE FUNCTION

To make the presentation more specific, suppose we

have data

(x1, y1) , (x2, y2) , (x3, y3) , (x4, y4)

with x1 < x2 < x3 < x4. Then on each of the

intervals

[x1, x2] , [x2, x3] , [x3, x4]

s(x) is a cubic polynomial. Taking the first interval,

s(x) is a cubic polynomial and s00(x) is a linear poly-nomial. Let

Mi = s00(xi), i = 1, 2, 3, 4

Then on [x1, x2],

s00(x) = (x2 − x)M1 + (x− x1)M2

x2 − x1, x1 ≤ x ≤ x2

Page 517: Analiza Numerica [Utm, Bostan v.]

We can find s(x) by integrating twice:

s(x) =(x2 − x)3M1 + (x− x1)

3M2

6 (x2 − x1)+ c1x+ c2

We determine the constants of integration by using

s(x1) = y1, s(x2) = y2 (*)

Then

s(x) =(x2 − x)3M1 + (x− x1)

3M2

6 (x2 − x1)

+(x2 − x) y1 + (x− x1) y2

x2 − x1

−x2 − x16

[(x2 − x)M1 + (x− x1)M2]

for x1 ≤ x ≤ x2.

Check that this formula satisfies the given interpola-

tion condition (*)!

Page 518: Analiza Numerica [Utm, Bostan v.]

We can repeat this on the intervals [x2, x3] and [x3, x4],

obtaining similar formulas.

For x2 ≤ x ≤ x3,

s(x) =(x3 − x)3M2 + (x− x2)

3M3

6 (x3 − x2)

+(x3 − x) y2 + (x− x2) y3

x3 − x2

−x3 − x26

[(x3 − x)M2 + (x− x2)M3]

For x3 ≤ x ≤ x4,

s(x) =(x4 − x)3M3 + (x− x3)

3M4

6 (x4 − x3)

+(x4 − x) y3 + (x− x3) y4

x4 − x3

−x4 − x36

[(x4 − x)M3 + (x− x3)M4]

Page 519: Analiza Numerica [Utm, Bostan v.]

We still do not know the values of the second deriv-

atives {M1,M2,M3,M4}. The above formulas guar-antee that s(x) and s00(x) are continuous forx1 ≤ x ≤ x4. For example, the formula on [x1, x2]

yields

s(x2) = y2, s00(x2) =M2

The formula on [x2, x3] also yields

s(x2) = y2, s00(x2) =M2

All that is lacking is to make s0(x) continuous at x2and x3. Thus we require

s0(x2 + 0) = s0(x2 − 0)s0(x3 + 0) = s0(x3 − 0) (**)

This means

limx&x2

s0(x) = limx%x2

s0(x)

and similarly for x3.

Page 520: Analiza Numerica [Utm, Bostan v.]

To simplify the presentation somewhat, I assume in

the following that our node points are evenly spaced:

x2 = x1 + h, x3 = x1 + 2h, x4 = x1 + 3h

Then our earlier formulas simplify to

s(x) =(x2 − x)3M1 + (x− x1)

3M2

6h

+(x2 − x) y1 + (x− x1) y2

h

−h6[(x2 − x)M1 + (x− x1)M2]

for x1 ≤ x ≤ x2, with similar formulas on [x2, x3] and

[x3, x4].

Page 521: Analiza Numerica [Utm, Bostan v.]

Without going thru all of the algebra, the conditions

(**) leads to the following pair of equations.

h

6M1 +

2h

3M2 +

h

6M3

=y3 − y2

h− y2 − y1

hh

6M2 +

2h

3M3 +

h

6M4

=y4 − y3

h− y3 − y2

h

This gives us two equations in four unknowns. The

earlier boundary conditions on s00(x) gives us immedi-ately

M1 =M4 = 0

Then we can solve the linear system for M2 and M3.

Page 522: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Consider the interpolation data points

x 1 2 3 4

y 1 12

13

14

In this case, h = 1, and linear system becomes

2

3M2 +

1

6M3 = y3 − 2y2 + y1 =

1

31

6M2 +

2

3M3 = y4 − 2y3 + y2 =

1

12

This has the solution

M2 =1

2, M3 = 0

This leads to the spline function formula on each

subinterval.

Page 523: Analiza Numerica [Utm, Bostan v.]

On [1, 2],

s(x) =(x2 − x)3M1 + (x− x1)

3M2

6h

+(x2 − x) y1 + (x− x1) y2

h

−h6[(x2 − x)M1 + (x− x1)M2]

=(2− x)3 · 0 + (x− 1)3

³12

´6

+(2− x) · 1 + (x− 1)

³12

´1

−16

h(2− x) · 0 + (x− 1)

³12

´i= 112 (x− 1)3 − 7

12 (x− 1) + 1

Similarly, for 2 ≤ x ≤ 3,

s(x) =−112(x− 2)3 + 1

4(x− 2)2 − 1

3(x− 1) + 1

2

and for 3 ≤ x ≤ 4,

s(x) =−112(x− 4) + 1

4

Page 524: Analiza Numerica [Utm, Bostan v.]

x 1 2 3 4

y 1 12

13

14

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

x

y

y = 1/xy = s(x)

Graph of example of natural cubic spline

interpolation

Page 525: Analiza Numerica [Utm, Bostan v.]

x 0 1 2 2.5 3 3.5 4y 2.5 0.5 0.5 1.5 1.5 1.125 0

x

y

1 2 3 4

1

2

Interpolating natural cubic spline function

Page 526: Analiza Numerica [Utm, Bostan v.]

ALTERNATIVE BOUNDARY CONDITIONS

Return to the equations

h

6M1 +

2h

3M2 +

h

6M3

=y3 − y2

h− y2 − y1

hh

6M2 +

2h

3M3 +

h

6M4

=y4 − y3

h− y3 − y2

h

Sometimes other boundary conditions are imposed on

s(x) to help in determining the values of M1 and

M4. For example, the data in our numerical exam-

ple were generated from the function f(x) = 1x. With

it, f 00(x) = 2x3, and thus we could use

M1 = 2, M4 =1

32

With this we are led to a new formula for s(x), one

that approximates f(x) = 1x more closely.

Page 527: Analiza Numerica [Utm, Bostan v.]

THE CLAMPED SPLINE

In this case, we augment the interpolation conditions

s(xi) = yi, i = 1, 2, 3, 4

with the boundary conditions

s0(x1) = y01, s0(x4) = y04 (#)

The conditions (#) lead to another pair of equations,

augmenting the earlier ones. Combined these equa-

tions are

h

3M1 +

h

6M2 =

y2 − y1h

− y01h

6M1 +

2h

3M2 +

h

6M3

=y3 − y2

h− y2 − y1

hh

6M2 +

2h

3M3 +

h

6M4

=y4 − y3

h− y3 − y2

hh

6M3 +

h

3M4 = y04 −

y4 − y3h

Page 528: Analiza Numerica [Utm, Bostan v.]

For our numerical example, it is natural to obtain

these derivative values from f 0(x) = − 1x2:

y01 = −1, y04 = −1

16

When combined with your earlier equations, we have

the system

1

3M1 +

1

6M2 =

1

21

6M1 +

2

3M2 +

1

6M3 =

1

31

6M2 +

2

3M3 +

1

6M4 =

1

121

6M3 +

1

3M4 =

1

48

This has the solution

[M1,M2,M3,M4] =·173

120,7

60,11

120,1

60

¸

Page 529: Analiza Numerica [Utm, Bostan v.]

We can now write the functions s(x) for each of the

subintervals [x1, x2], [x2, x3], and [x3, x4]. Recall for

x1 ≤ x ≤ x2,

s(x) =(x2 − x)3M1 + (x− x1)

3M2

6h

+(x2 − x) y1 + (x− x1) y2

h

−h6[(x2 − x)M1 + (x− x1)M2]

We can substitute in from the data

x 1 2 3 4

y 1 12

13

14

and the solutions {Mi}. Doing so, consider the errorf(x)− s(x). As an example,

f(x) =1

x, f

µ3

2

¶=2

3, s

µ3

2

¶= .65260

This is quite a decent approximation.

Page 530: Analiza Numerica [Utm, Bostan v.]

THE GENERAL PROBLEM

Consider the spline interpolation problem with n nodes

(x1, y1) , (x2, y2) , ..., (xn, yn)

and assume the node points {xi} are evenly spaced,xj = x1 + (j − 1)h, j = 1, ..., n

We have that the interpolating spline s(x) on

xj ≤ x ≤ xj+1 is given by

s(x) =

³xj+1 − x

´3Mj +

³x− xj

´3Mj+1

6h

+

³xj+1 − x

´yj +

³x− xj

´yj+1

h

−h6

h³xj+1 − x

´Mj +

³x− xj

´Mj+1

ifor j = 1, ..., n− 1.

Page 531: Analiza Numerica [Utm, Bostan v.]

To enforce continuity of s0(x) at the interior nodepoints x2, ..., xn−1, the second derivatives

nMj

omust

satisfy the linear equations

h

6Mj−1 +

2h

3Mj +

h

6Mj+1 =

yj−1 − 2yj + yj+1

h

for j = 2, ..., n− 1. Writing them out,

h

6M1 +

2h

3M2 +

h

6M3 =

y1 − 2y2 + y3h

h

6M2 +

2h

3M3 +

h

6M4 =

y2 − 2y3 + y4h

...h

6Mn−2 +

2h

3Mn−1 +

h

6Mn =

yn−2 − 2yn−1 + yn

h

This is a system of n−2 equations in the n unknowns{M1, ...,Mn}. Two more conditions must be imposedon s(x) in order to have the number of equations equal

the number of unknowns, namely n. With the added

boundary conditions, this form of linear system can be

solved very efficiently.

Page 532: Analiza Numerica [Utm, Bostan v.]

BOUNDARY CONDITIONS

“Natural” boundary conditions

s00(x1) = s00(xn) = 0Spline functions satisfying these conditions are called“natural cubic splines”. They arise out the minimiza-tion problem stated earlier. But generally they are notconsidered as good as some other cubic interpolatingsplines.

“Clamped” boundary conditions We add the condi-tions

s0(x1) = y01, s0(xn) = y0nwith y01, y0n given slopes for the endpoints of s(x) on[x1, xn]. This has many quite good properties whencompared with the natural cubic interpolating spline;but it does require knowing the derivatives at the end-points.

“Not a knot” boundary conditions This is more com-plicated to explain, but it is the version of cubic splineinterpolation that is implemented in Matlab.

Page 533: Analiza Numerica [Utm, Bostan v.]

THE “NOT A KNOT” CONDITIONS

As before, let the interpolation nodes be

(x1, y1) , (x2, y2) , ..., (xn, yn)

We separate these points into two categories. For

constructing the interpolating cubic spline function,

we use the points

(x1, y1) , (x3, y3) , ..., (xn−2, yn−2) , (xn, yn)Thus deleting two of the points. We now have n− 2points, and the interpolating spline s(x) can be deter-

mined on the intervals

[x1, x3] , [x3, x4] , ..., [xn−3, xn−2] , [xn−2, xn]This leads to n− 4 equations in the n− 2 unknownsM1,M3, ...,Mn−2,Mn. The two additional boundary

conditions are

s(x2) = y2, s(xn−1) = yn−1These translate into two additional equations, and we

obtain a system of n−2 linear simultaneous equationsin the n− 2 unknowns M1,M3, ...,Mn−2,Mn.

Page 534: Analiza Numerica [Utm, Bostan v.]

x 0 1 2 2.5 3 3.5 4y 2.5 0.5 0.5 1.5 1.5 1.125 0

x

y

1 2 3 4

1

2

Interpolating cubic spline function with ”not-a knot”

boundary conditions

Page 535: Analiza Numerica [Utm, Bostan v.]

MATLAB SPLINE FUNCTION LIBRARY

Given data points

(x1, y1) , (x2, y2) , ..., (xn, yn)

type arrays containing the x and y coordinates:

x = [x1 x2 ...xn]y = [y1 y2 ...yn]plot (x, y, ’o’)

The last statement will draw a plot of the data points,

marking them with the letter ‘oh’. To find the inter-

polating cubic spline function and evaluate it at the

points of another array xx, say

h = (xn − x1) / (10 ∗ n) ; xx = x1 : h : xn;

use

yy = spline (x, y, xx)plot (x, y, ’o’, xx, yy)

The last statement will plot the data points, as be-

fore, and it will plot the interpolating spline s(x) as a

continuous curve.

Page 536: Analiza Numerica [Utm, Bostan v.]

ERROR IN CUBIC SPLINE INTERPOLATION

Let an interval [a, b] be given, and then define

h =b− a

n− 1, xj = a+ (j − 1)h, j = 1, ..., n

Suppose we want to approximate a given function

f(x) on the interval [a, b] using cubic spline inter-

polation. Define

yi = f(xi), j = 1, ..., n

Let sn(x) denote the cubic spline interpolating this

data and satisfying the “not a knot” boundary con-

ditions. Then it can be shown that for a suitable

constant c,

En ≡ maxa≤x≤b |f(x)− sn(x)| ≤ ch4

The corresponding bound for natural cubic spline in-

terpolation contains only a term of h2 rather than h4;

it does not converge to zero as rapidly.

Page 537: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Take f(x) = arctanx on [0, 5]. The following ta-

ble gives values of the maximum error En for various

values of n. The values of h are being successively

halved.

n En E12n/En

7 7.09E−313 3.24E−4 21.925 3.06E−5 10.649 1.48E−6 20.797 9.04E−8 16.4

Page 538: Analiza Numerica [Utm, Bostan v.]

BEST APPROXIMATION

Given a function f(x) that is continuous on a giveninterval [a, b], consider approximating it by some poly-nomial p(x). To measure the error in p(x) as an ap-proximation, introduce

E(p) = maxa≤x≤b |f(x)− p(x)|

This is called the maximum error or uniform error ofapproximation of f(x) by p(x) on [a, b].

With an eye towards efficiency, we want to find the‘best’ possible approximation of a given degree n.With this in mind, introduce the following:

ρn(f) = mindeg(p)≤n

E(p)

= mindeg(p)≤n

"maxa≤x≤b |f(x)− p(x)|

#The number ρn(f) will be the smallest possible uni-form error, orminimax error, when approximating f(x)by polynomials of degree at most n. If there is apolynomial giving this smallest error, we denote it bymn(x); thus E(mn) = ρn(f).

Page 539: Analiza Numerica [Utm, Bostan v.]

Example. Let f(x) = ex on [−1, 1]. In the followingtable, we give the values of E(tn), tn(x) the Tay-

lor polynomial of degree n for ex about x = 0, and

E(mn).

Maximum Error in:n tn(x) mn(x)1 7.18E− 1 2.79E− 12 2.18E− 1 4.50E− 23 5.16E− 2 5.53E− 34 9.95E− 3 5.47E− 45 1.62E− 3 4.52E− 56 2.26E− 4 3.21E− 67 2.79E− 5 2.00E− 78 3.06E− 6 1.11E− 89 3.01E− 7 5.52E− 10

Page 540: Analiza Numerica [Utm, Bostan v.]

Consider graphically how we can improve on the Tay-

lor polynomial

t1(x) = 1 + x

as a uniform approximation to ex on the interval [−1, 1].

The linear minimax approximation is

m1(x) = 1.2643 + 1.1752x

x

y

-1 1

1

2

y=t1(x)

y=m1(x)

y=ex

Linear Taylor and minimax approximations to ex

Page 541: Analiza Numerica [Utm, Bostan v.]

x

y

-1 1

0.0516

Error in cubic Taylor approximation to ex

x

y

-1 1

0.00553

-0.00553

Error in cubic minimax approximation to ex

Page 542: Analiza Numerica [Utm, Bostan v.]

Accuracy of the minimax approximation.

ρn(f) ≤[(b− a)/2]n+1

(n+ 1)!2nmaxa≤x≤b

¯̄̄f (n+1)(x)

¯̄̄This error bound does not always become smaller with

increasing n, but it will give a fairly accurate bound

for many common functions f(x).

Example. Let f(x) = ex for −1 ≤ x ≤ 1. Thenρn(e

x) ≤ e

(n+ 1)!2n(*)

n Bound (*) ρn(f)1 6.80E− 1 2.79E− 12 1.13E− 1 4.50E− 23 1.42E− 2 5.53E− 34 1.42E− 3 5.47E− 45 1.18E− 4 4.52E− 56 8.43E− 6 3.21E− 67 5.27E− 7 2.00E− 7

Page 543: Analiza Numerica [Utm, Bostan v.]

CHEBYSHEV POLYNOMIALS

Chebyshev polynomials are used in many parts of nu-merical analysis, and more generally, in applicationsof mathematics. For an integer n ≥ 0, define thefunction

Tn(x) = cos³n cos−1 x

´, −1 ≤ x ≤ 1 (1)

This may not appear to be a polynomial, but we willshow it is a polynomial of degree n. To simplify themanipulation of (1), we introduce

θ = cos−1(x) or x = cos(θ), 0 ≤ θ ≤ π (2)

Then

Tn(x) = cos(nθ) (3)

Example. n = 0

T0(x) = cos(0 · θ) = 1n = 1

T1(x) = cos(θ) = x

n = 2

T2(x) = cos(2θ) = 2 cos2(θ)− 1 = 2x2 − 1

Page 544: Analiza Numerica [Utm, Bostan v.]

x

y

-1 1

1

-1

T0(x)T1(x)T2(x)

x

y

-1 1

1

-1

T3(x)T4(x)

Page 545: Analiza Numerica [Utm, Bostan v.]

The triple recursion relation. Recall the trigonomet-

ric addition formulas,

cos(α± β) = cos(α) cos(β)∓ sin(α) sin(β)Let n ≥ 1, and apply these identities to get

Tn+1(x) = cos[(n+ 1)θ] = cos(nθ + θ)

= cos(nθ) cos(θ)− sin(nθ) sin(θ)Tn−1(x) = cos[(n− 1)θ] = cos(nθ − θ)

= cos(nθ) cos(θ) + sin(nθ) sin(θ)

Add these two equations, and then use (1) and (3) to

obtain

Tn+1(x) + Tn−1 = 2 cos(nθ) cos(θ) = 2xTn(x)Tn+1(x) = 2xTn(x)− Tn−1(x), n ≥ 1

(4)

This is called the triple recursion relation for the Cheby-

shev polynomials. It is often used in evaluating them,

rather than using the explicit formula (1).

Page 546: Analiza Numerica [Utm, Bostan v.]

Example. Recall

T0(x) = 1, T1(x) = x

Tn+1(x) = 2xTn(x)− Tn−1(x), n ≥ 1

Let n = 2. Then

T3(x) = 2xT2(x)− T1(x)

= 2x(2x2 − 1)− x

= 4x3 − 3xLet n = 3. Then

T4(x) = 2xT3(x)− T2(x)

= 2x(4x3 − 3x)− (2x2 − 1)= 8x4 − 8x2 + 1

Page 547: Analiza Numerica [Utm, Bostan v.]

The minimum size property. Note that

|Tn(x)| ≤ 1, −1 ≤ x ≤ 1 (5)

for all n ≥ 0. Also, note thatTn(x) = 2

n−1xn + lower degree terms, n ≥ 1(6)

This can be proven using the triple recursion relation

and mathematical induction.

Introduce a modified version of Tn(x),

eTn(x) = 1

2n−1Tn(x) = xn+lower degree terms (7)

From (5) and (6),¯̄̄ eTn(x)¯̄̄ ≤ 1

2n−1, −1 ≤ x ≤ 1, n ≥ 1 (8)

Example.

eT4(x) = 1

8

³8x4 − 8x2 + 1

´= x4 − x2 +

1

8

Page 548: Analiza Numerica [Utm, Bostan v.]

A polynomial whose highest degree term has a coeffi-

cient of 1 is called a monic polynomial. Formula (8)

says the monic polynomial eTn(x) has size 1/2n−1 on−1 ≤ x ≤ 1, and this becomes smaller as the degreen increases. In comparison,

max−1≤x≤1 |xn| = 1

Thus xn is a monic polynomial whose size does not

change with increasing n.

Theorem. Let n ≥ 1 be an integer, and consider all

possible monic polynomials of degree n. Then the

degree n monic polynomial with the smallest maxi-

mum on [−1, 1] is the modified Chebyshev polynomialeTn(x), and its maximum value on [−1, 1] is 1/2n−1.

This result is used in devising applications of Cheby-

shev polynomials. We apply it to obtain an improved

interpolation scheme.

Page 549: Analiza Numerica [Utm, Bostan v.]

A NEAR-MINIMAX APPROXIMATION METHOD

Let f(x) be continuous on [a, b] = [−1, 1]. Considerapproximating f by an interpolatory polynomial of de-

gree at most n = 3. Let x0, x1, x2, x3 be interpo-

lation node points in [−1, 1]; let c3(x) be of degree≤ 3 and interpolate f(x) at {x0, x1, x2, x3}. The in-terpolation error is

f(x)− c3(x) =ω(x)

4!f (4)(ξx), −1 ≤ x ≤ 1 (1)

ω(x) = (x− x0)(x− x1)(x− x2)(x− x3) (2)

with ξx in [−1, 1]. We want to choose the nodes

{x0, x1, x2, x3} so as to minimize the maximum value

of |f(x)− c3(x)| on [−1, 1].

From (1), the only general quantity, independent of f ,

is ω(x). Thus we choose {x0, x1, x2, x3} to minimizemax−1≤x≤1 |ω(x)| (3)

Page 550: Analiza Numerica [Utm, Bostan v.]

Expand to get

ω(x) = x4 + lower degree terms

This is a monic polynomial of degree 4. From the

theorem in the preceding section, the smallest possible

value for (3) is obtained with

ω(x) = eT4(x) = T4(x)

23=1

8(8x4 − 8x2 + 1) (4)

and the smallest value of (3) is 1/23 in this case. The

equation (4) defines implicitly the nodes {x0, x1, x2, x3}:they are the roots of T4(x).

In our case this means solving

T4(x) = cos(4θ) = 0, x = cos(θ)

4θ = ±π2,±3π

2,±5π

2,±7π

2, . . .

θ = ±π8,±3π

8,±5π

8,±7π

8, . . .

x = cosµπ

8

¶, cos

µ3π

8

¶, cos

µ5π

8

¶, . . . (5)

using cos(−θ) = cos(θ).

Page 551: Analiza Numerica [Utm, Bostan v.]

x = cosµπ

8

¶, cos

µ3π

8

¶, cos

µ5π

8

¶, cos

µ7π

8

¶, . . .

The first four values are distinct; the following ones

are repetitive. For example,

cosµ9π

8

¶= cos

µ7π

8

¶The first four values are

{x0, x1, x2, x3} = {±0.382683,±0.923880} (6)

Example. Let f(x) = ex on [−1, 1]. Use these nodesto produce the interpolating polynomial c3(x) of de-

gree 3. From the interpolation error formula and the

bound of 1/23 for |ω(x)| on [−1, 1] , we have

max−1≤x≤1 |f(x)− c3(x)| ≤1/23

4!max−1≤x≤1 e

ξx

≤ e

192

.= 0.014158

By direct calculation,

max−1≤x≤1 |ex − c3(x)| .= 0.00666

Page 552: Analiza Numerica [Utm, Bostan v.]

Interpolation Data: f(x) = ex

i xi f(xi) f [x0, . . . , xi]0 0.923880 2.5190442 2.51904421 0.382683 1.4662138 1.94537692 −0.382683 0.6820288 0.70474203 −0.923880 0.3969760 0.1751757

x

y

-1 1

0.00666

-0.00624

The error ex − c3(x)

For comparison, E(t3).= 0.0142 and ρ3(e

x).= 0.00553.

Page 553: Analiza Numerica [Utm, Bostan v.]

THE GENERAL CASE

Consider interpolating f(x) on [−1, 1] by a polyno-mial of degree ≤ n, with the interpolation nodes

{x0, . . . , xn} in [−1, 1]. Denote the interpolation poly-nomial by cn(x). The interpolation error on [−1, 1] isgiven by

f(x)− cn(x) =ω(x)

(n+ 1)!f (n+1)(ξx) (7)

ω(x) = (x− x0) · · · (x− xn)

with ξx and unknown point in [−1, 1]. In order to

minimize the interpolation error, we seek to minimize

max−1≤x≤1 |ω(x)| (8)

Page 554: Analiza Numerica [Utm, Bostan v.]

The polynomial being minimized is monic of degree

n+ 1,

ω(x) = xn+1 + lower degree terms

From the theorem of the preceding section, this min-

imum is attained by the monic polynomial

eTn+1(x) = 1

2nTn+1(x)

Thus the interpolation nodes are the zeros of Tn+1(x);

and by the procedure that led to (5), they are given

by

xj = cosµ2j + 1

2n+ 2π¶, j = 0, 1, . . . , n (9)

The near-minimax approximation cn(x) of degree n is

obtained by interpolating to f(x) at these n+1 nodes

on [−1, 1].

The polynomial cn(x) is sometimes called a Cheby-

shev approximation.

Page 555: Analiza Numerica [Utm, Bostan v.]

Example. Let f(x) = ex. the following table contains

the maximum errors in cn(x) on [−1, 1] for varyingn. For comparison, we also include the corresponding

minimax errors. These figures illustrate that for prac-

tical purposes, cn(x) is a satisfactory replacement for

the minimax approximation mn(x).

n max |ex − cn(x)| ρn(ex)

1 3.72E− 1 2.79E− 12 5.65E− 2 4.50E− 23 6.66E− 3 5.53E− 34 6.40E− 4 5.47E− 45 5.18E− 5 4.52E− 56 3.80E− 6 3.21E− 6

Page 556: Analiza Numerica [Utm, Bostan v.]

THEORETICAL INTERPOLATION ERROR

For the error

f(x)− cn(x) =ω(x)

(n+ 1)!f (n+1)(ξx)

we have

max−1≤x≤1 |f(x)− cn(x)| ≤max−1≤x≤1 |ω(x)|(n+ 1)!

max−1≤ξ≤1 |f(ξ)|

From the theorem of the preceding section,

max−1≤x≤1¯̄̄ eTn+1(x)¯̄̄ = max−1≤x≤1 |ω(x)| =

1

2n

in this case. Thus

max−1≤x≤1 |f(x)− cn(x)| ≤ 1

(n+ 1)!2nmax−1≤ξ≤1 |f(ξ)|

Page 557: Analiza Numerica [Utm, Bostan v.]

OTHER INTERVALS

Consider approximating f(x) on the finite interval

[a, b]. Introduce the linear change of variables

x =1

2[(1− t) a+ (1 + t) b] (10)

t =2

b− a

·x− b+ a

2

¸(11)

Introduce

F (t) = fµ1

2[(1− t) a+ (1 + t) b]

¶, −1 ≤ t ≤ 1

The function F (t) on [−1, 1] is equivalent to f(x) on[a, b], and we can move between them via (10)-(11).

We can now proceed to approximate f(x) on [a, b] by

instead approximating F (t) on [−1, 1].

Example. Approximating f(x) = cosx on [0, π/2] is

equivalent to approximating

F (t) = cosµ1 + t

4π¶, −1 ≤ t ≤ 1

Page 558: Analiza Numerica [Utm, Bostan v.]

NUMERICAL DIFFERENTIATION

There are two major reasons for considering numeri-

cally approximations of the differentiation process.

1. Approximation of derivatives in ordinary differen-

tial equations and partial differential equations.

This is done in order to reduce the differential

equation to a form that can be solved more easily

than the original differential equation.

2. Forming the derivative of a function f(x) which is

known only as empirical data {(xi, yi) | i = 1, . . . ,m}.The data generally is known only approximately,

so that yi ≈ f(xi), i = 1, . . . ,m.

Page 559: Analiza Numerica [Utm, Bostan v.]

Recall the definition

f 0(x) = limh→0

f(x+ h)− f(x)

h

This justifies using

f 0(x) ≈ f(x+ h)− f(x)

h≡ Dhf(x) (1)

for small values of h. The approximation Dhf(x) is

called a numerical derivative of f(x) with stepsize h.

Example. Use Dhf(x) to approximate the derivative

of f(x) = cos(x) at x = π/6. In the table, the error

is almost halved when h is halved.

h Dhf Error Ratio0.1 −0.54243 0.042430.05 −0.52144 0.02144 1.980.025 −0.51077 0.01077 1.990.0125 −0.50540 0.00540 1.990.00625 −0.50270 0.00270 2.000.003125 −0.50135 0.00135 2.00

Page 560: Analiza Numerica [Utm, Bostan v.]

Error behaviour. Using Taylor’s theorem,

f(x+ h) = f(x) + hf 0(x) + 12h2f 00(c)

with c between x and x+ h. Evaluating (1),

Dhf(x) =1

h

nhf(x) + hf 0(x) + 1

2h2f 00(c)

i− f(x)

o= f 0(x) + 1

2hf00(c)

f 0(x)−Dhf(x) = −12hf 00(c) (2)

Using a higher order Taylor expansion,

f 0(x)−Dhf(x) = −12hf 00(x)− 16h2f 00(c),

f 0(x)−Dhf(x) ≈ −12hf 00(x) (3)

for small values of h.

For f(x) = cosx,

f 0(x)−Dhf(x) =12h cos c, c ∈

hπ6 ,

π6 + h

iIn the preceding table, check the accuracy of the ap-

proximation (3) with x = π6.

Page 561: Analiza Numerica [Utm, Bostan v.]

The formula (1),

f 0(x) ≈ f(x+ h)− f(x)

h≡ Dhf(x)

is called a forward difference formula for approximat-

ing f 0(x). In contrast, the approximation

f 0(x) ≈ f(x)− f(x− h)

h, h > 0 (4)

is called a backward difference formula for approxi-

mating f 0(x). A similar derivation leads to

f 0(x)− f(x)− f(x− h)

h=

h

2f 00(c) (5)

for some c between x and x − h. The accuracy of

the backward difference formula (4) is essentially the

same as that of the forward difference formula (1).

The motivation for this formula is in applications to

solving differential equations.

Page 562: Analiza Numerica [Utm, Bostan v.]

DIFFERENTIATION USING INTERPOLATION

Let Pn(x) be the degree n polynomial that interpo-lates f(x) at n + 1 node points x0, x1, . . . , xn. Tocalculate f 0(x) at some point x = t, use

f 0(t) ≈ P 0n(t) (6)

Many different formulas can be obtained by varying nand by varying the placement of the nodes x0, . . . , xnrelative to the point t of interest.

Example. Take n = 2, and use evenly spaced nodesx0, x1 = x0 + h, x2 = x1 + h. Then

P2(x) = f(x0)L0(x) + f(x1)L1(x) + f(x2)L2(x)

P 02(x) = f(x0)L00(x) + f(x1)L

01(x) + f(x2)L

02(x)

with

L0(x) =(x− x1)(x− x2)

(x0 − x1)(x0 − x2)

L1(x) =(x− x0)(x− x2)

(x1 − x0)(x1 − x2)

L2(x) =(x− x0)(x− x1)

(x2 − x0)(x2 − x1)

Page 563: Analiza Numerica [Utm, Bostan v.]

Forming the derivatives of these Lagrange basis func-

tions and evaluating them at x = x1

f 0(x1) ≈ P 02(x1) =f(x1 + h)− f(x1 − h)

2h≡ Dhf(x1)

(7)

For the error,

f 0(x1)−f(x1 + h)− f(x1 − h)

2h= −h

2

6f 000(c2) (8)

with x1 − h ≤ c2 ≤ x1 + h.

A proof of this begins with the interpolation error for-

mula

f(x)− P2(x) = Ψ2(x)f [x0, x1, x2, x]

Ψ2(x) = (x− x0) (x− x1) (x− x2)

Differentiate to get

f 0(x)− P 02(x) = Ψ2(x)d

dxf [x0, x1, x2, x]

+Ψ02(x)f [x0, x1, x2, x]

Page 564: Analiza Numerica [Utm, Bostan v.]

f 0(x)− P 02(x) = Ψ2(x)d

dxf [x0, x1, x2, x]

+Ψ02(x)f [x0, x1, x2, x]With properties of the divided difference, we can show

f 0(x)−P 02(x) = 124Ψ2(x)f

(4)³c1,x

´+16Ψ

02(x)f

(3)³c2,x

´with c1,x and c2,x between the smallest and largest of

the values {x0, x1, x2, x}. Letting x = x1 and noting

that Ψ2(x1) = 0, we obtain (8).

Example. Take f(x) = cos(x) and x1 =16π. Then

(7) is illustrated as follows.

h Dhf Error Ratio0.1 −0.49916708 −0.00083290.05 −0.49979169 −0.0002083 4.000.025 −0.49994792 −0.00005208 4.000.0125 −0.49998698 −0.00001302 4.000.00625 −0.49999674 −0.000003255 4.00

Note the smaller errors and faster convergence as com-

pared to the forward difference formula (1).

Page 565: Analiza Numerica [Utm, Bostan v.]

UNDETERMINED COEFFICIENTS

Derive an approximation for f 00(x) at x = t. Write

f 00(t) ≈ D(2)h f(t) ≡ Af(t+ h)

+Bf(t) + Cf(t− h)(9)

with A, B, and C unspecified constants. Use Taylor

polynomial approximations

f(t− h) ≈ f(t)− hf 0(t) + h2

2f 00(t)

−h3

6f 000(t) + h4

24f (4)(t)

f(t+ h) ≈ f(t) + hf 0(t) + h2

2f 00(t)

+h3

6f 000(t) + h4

24f (4)(t)

(10)

Page 566: Analiza Numerica [Utm, Bostan v.]

Substitute into (9) and rearrange:

D(2)h f(t) ≈ (A+B + C)f(t)

+h(A− C)f 0(t) + h2

2(A+ C)f 00(t)

+h3

6(A− C)f 000(t) + h4

24(A+ C)f (4)(t)

(11)

To have

D(2)h f(t) ≈ f 00(t) (12)

for arbitrary functions f(x), require

A+B + C = 0: coefficient of f(t)

h(A− C) = 0: coefficient of f 0(t)h2

2(A+ C) = 1: coefficient of f 00(t)

Solution:

A = C =1

h2, B = − 2

h2(13)

Page 567: Analiza Numerica [Utm, Bostan v.]

This determines

D(2)h f(t) =

f(t+ h)− 2f(t) + f(t− h)

h2(14)

For the error, substitute (13) into (11):

D(2)h f(t) ≈ f 00(t) + h2

12f (4)(t)

Thus

f 00(t)− f(t+ h)− 2f(t) + f(t− h)

h2≈ −h

2

12f (4)(t)

(15)

Example. Let f(x) = cos(x), t = 16π; use (14) to

calculate f 00(t) = − cos³16π´.

h D(2)h f Error Ratio

0.5 −0.84813289 −1.789E− 20.25 −0.86152424 −4.501E− 3 3.97

0.125 −0.86489835 −1.127E− 3 3.99

0.0625 −0.86574353 −2.819E− 4 4.00

0.03125 −0.86595493 −7.048E− 5 4.00

Page 568: Analiza Numerica [Utm, Bostan v.]

EFFECTS OF ERROR IN FUNCTION VALUES

Recall

D(2)h f(x1) =

f(x2)− 2f(x1) + f(x0)

h2≈ f 00(x1)

with x2 = x1 + h, x0 = x1 − h. Assume the ac-

tual function values used in the computation contain

data error, and denote these values by bf0, bf1, and bf2.Introduce the data errors:

i = f(xi)− bfi, i = 0, 1, 2 (16)

The actual quantity calculated is

cD(2)h f(x1) =bf2 − 2 bf1 + bf2

h2(17)

For the error in this quantity, replace bfj by f(xj)− j,

j = 0, 1, 2, to obtain the following:

Page 569: Analiza Numerica [Utm, Bostan v.]

f 00(x1)− cD(2)h f(x1) = f 00(x1)

−[f(x2)− 2]− 2[f(x1)− 1] + [f(x0)− 0]

h2

=

"f 00(x1)−

f(x2)− 2f(x1) + f(x0)

h2

#

+ 2 − 2 1 + 0

h2

≈ − 112h

2f (4)(x1) +2 − 2 1 + 0

h2(18)

The last line uses (15).

The errors { 0, 1, 2} are generally random in some

interval [−δ, δ]. Ifn bf0, bf1, bf2o are experimental data,

then δ is a bound on the experimental error. Ifn bfjo

are obtained from computing f(x) in a computer, then

the errors j are the combination of rounding or chop-

ping errors and δ is a bound on these errors.

Page 570: Analiza Numerica [Utm, Bostan v.]

In either case, (18) yields the approximate inequality¯̄̄̄f 00(x1)− cD(2)h f(x1)

¯̄̄̄≤ h2

12

¯̄̄f (4)(x1)

¯̄̄+4δ

h2(19)

This suggests that as h→ 0, the error will eventually

increase, because of the final term 4δh2.

Example. Calculate cD(2)h (x1) for f(x) = cos(x) at

x1 =16π. To show the effect of rounding errors, the

values bfi are obtained by rounding f(xi) to six signif-icant digits; and the errors satisfy

| i| ≤ 5.0× 10−7 = δ, i = 0, 1, 2

Other than these rounding errors, the formula cD(2)h f(x1)

is calculated exactly. In this example, the bound (19)

becomes¯̄̄̄f 00(x1)− cD(2)h f(x1)

¯̄̄̄≤ 112h

2 cos³16π´

+³4h2

´(5× 10−7)

.= 0.0722h2 + 2×10−6

h2≡ E(h)

Page 571: Analiza Numerica [Utm, Bostan v.]

For h = 0.125, the bound E(h).= 0.00126, which is

not too far off from the actual error given in the table.

h cD(2)h f(x1) Error0.5 −0.848128 −0.0178970.25 −0.861504 −0.0045210.125 −0.864832 −0.0011930.0625 −0.865536 −0.0004890.03125 −0.865280 −0.0007450.015625 −0.860160 −0.0058650.0078125 −0.851968 −0.0140570.00390625 −0.786432 −0.079593

The bound E(h) indicates that there is a smallest

value of h, call it h∗, below which the error bound

will begin to increase. To find it, let E0(h) = 0, with

its root being h∗. This leads to h∗ .= 0.0726, which is

consistent with the behavior of the errors in the table.

Page 572: Analiza Numerica [Utm, Bostan v.]

LINEAR SYSTEMS

Consider the following example of a linear system:

x1 + 2x2 + 3x3 = −5−x1 + x3 = −3

3x1 + x2 + 3x3 = −3Its unique solution is

x1 = 1, x2 = 0, x3 = −2In general we want to solve n equations in n un-

knowns. For this, we need some simplifying nota-

tion. In particular we introduce arrays. We can think

of these as means for storing information about the

linear system in a computer. In the above case, we

introduce

A =

1 2 3−1 0 13 1 3

, b =

−5−3−3

, x =

10−2

Page 573: Analiza Numerica [Utm, Bostan v.]

These arrays completely specify the linear system and

its solution. We also know that we can give mean-

ing to multiplication and addition of these quantities,

calling them matrices and vectors. The linear system

is then written as

Ax = b

with Ax denoting a matrix-vector multiplication.

The general system is written as

a1,1x1 + · · ·+ a1,nxn = b1...

an,1x1 + · · ·+ an,nxn = bn

This is a system of n linear equations in the n un-

knowns x1, ..., xn. This can be written in matrix-

vector notation as

Ax = b

A =

a1,1 · · · a1,n... . . . ...

an,1 · · · an,n

, b = b1...bn

x =

x1...xn

Page 574: Analiza Numerica [Utm, Bostan v.]

A TRIDIAGONAL SYSTEM

Consider the tridiagonal linear system

3x1 − x2 = 2−x1 + 3x2 − x3 = 1

...−xn−2 + 3xn−1 − xn = 1

−xn−1 + 3xn = 2

The solution is

x1 = · · · = xn = 1

This has the associated arrays

A =

3 −1 0 · · · 0−1 3 −1 0

. . .... −1 3 −10 · · · −1 3

, b =21...12

, x =11...11

Page 575: Analiza Numerica [Utm, Bostan v.]

SOLVING LINEAR SYSTEMS

Linear systems Ax = b occur widely in applied mathe-matics. They occur as direct formulations of �real world�problems; but more often, they occur as a part of the nu-merical analysis of some other problem. As examples ofthe latter, we have the construction of spline functions,the numerical solution of systems of nonlinear equations,ordinary and partial di¤erential equations, integral equa-tions, and the solution of optimization problems.

There are many ways of classifying linear systems.

Size: Small, moderate, and large. This of course varieswith the machine you are using.

Page 576: Analiza Numerica [Utm, Bostan v.]

For a matrix A of order n× n, it will take 8n2 bytes

to store it in double precision. Thus a matrix of order

8000 will need around 512 MB of storage. The latter

would be too large for most present day PCs, if the

matrix was to be stored in the computer’s memory,

although one can easily expand a PC to contain much

more memory than this.

Sparse vs. Dense. Many linear systems have a matrixA in which almost all the elements are zero. These

matrices are said to be sparse. For example, it is quite

common to work with tridiagonal matrices

A =

a1 c1 0 · · · 0b2 a2 c2 0 ...0 b3 a3 c3... . . .0 · · · bn an

in which the order is 104 or much more. For such

matrices, it does not make sense to store the zero ele-

ments; and the sparsity should be taken into account

when solving the linear system Ax = b. Also, the

sparsity need not be as regular as in this example.

Page 577: Analiza Numerica [Utm, Bostan v.]

BASIC DEFINITIONS AND THEORY

A homogeneous linear systemAx = b is one for which theright hand constants are all zero. Using vector notation,we say b is the zero vector for a homogeneous system.Otherwise the linear system is call non-homogeneous.

Theorem. The following are equivalent statements.

(1) For each b, there is exactly one solution x.

(2) For each b, there is a solution x.

(3) The homogeneous system Ax = 0 has only the solu-tion x = 0.

(4) det(A) 6= 0.

(5) Inverse matrix A�1 exists.

Page 578: Analiza Numerica [Utm, Bostan v.]

EXAMPLE. Consider again the tridiagonal system

3x1 − x2 = 2−x1 + 3x2 − x3 = 1

...−xn−2 + 3xn−1 − xn = 1

−xn−1 + 3xn = 2

The homogeneous version is simply

3x1 − x2 = 0−x1 + 3x2 − x3 = 0

...−xn−2 + 3xn−1 − xn = 0

−xn−1 + 3xn = 0

Assume x 6= 0, and therefore that x has nonzero com-ponents. Let xk denote a component of maximum

size:

|xk| = max1≤j≤n

¯̄̄xj¯̄̄

Page 579: Analiza Numerica [Utm, Bostan v.]

Consider now equation k, and assume 1 < k < n.

Then

−xk−1 + 3xk − xk+1 = 0

xk = 13

¡xk−1 + xk+1

¢|xk| ≤ 1

3

¡¯̄xk−1

¯̄+¯̄xk+1

¯̄¢≤ 1

3 (|xk|+ |xk|)= 2

3 |xk|This implies xk = 0, and therefore x = 0. A similar

proof is valid if k = 1 or k = n, using the first or the

last equation, respectively.

Thus the original tridiagonal linear system Ax = b has

a unique solution x for each right side b.

Page 580: Analiza Numerica [Utm, Bostan v.]

METHODS OF SOLUTION

There are two general categories of numerical methods

for solving Ax = b.

Direct Methods: These are methods with a finite

number of steps; and they end with the exact solution

x, provided that all arithmetic operations are exact.

The most used of these methods is Gaussian elimi-

nation, which we begin with. There are other direct

methods, but we do not study them here.

Iteration Methods: These are used in solving all types

of linear systems, but they are most commonly used

with large sparse systems, especially those produced

by discretizing partial differential equations. This is

an extremely active area of research.

Page 581: Analiza Numerica [Utm, Bostan v.]

MATRICES in MATLAB

Consider the matrices

A =

1 2 32 2 33 3 3

, b =

111

In MATLAB, A can be created as follows.

A = [1 2 3; 2 2 3; 3 3 3];A = [1, 2, 3; 2, 2, 3; 3, 3, 3];A = [1 2 3

2 2 33 3 3] ;

Commas can be used to replace the spaces. The vec-

tor b can be created by

b = ones(3, 1);

Page 582: Analiza Numerica [Utm, Bostan v.]

Consider setting up the matrices for the system

Ax = b with

Ai,j = max {i, j} , bi = 1, 1 ≤ i, j ≤ n

One way to set up the matrix A is as follows:

A = zeros(n, n);for i = 1 : n

A(i, 1 : i) = i;A(i, i+ 1 : n) = i+ 1 : n;

end

and set up the vector b by

b = ones(n, 1);

Page 583: Analiza Numerica [Utm, Bostan v.]

MATRIX ADDITION

Let A =hai,j

iand B =

hbi,j

ibe matrices of order

m× n. Then

C = A+B

is another matrix of order m× n, with

ci.j = ai,j + bi,j

EXAMPLE. 1 23 45 6

+ 1 −1−1 11 −1

= 2 12 56 5

Page 584: Analiza Numerica [Utm, Bostan v.]

MULTIPLICATION BY A CONSTANT

c

a1,1 · · · a1,n... . . . ...

am,1 · · · am,n

= ca1,1 · · · ca1,n

... . . . ...cam,1 · · · cam,n

EXAMPLE.

5

1 23 45 6

= 5 1015 2025 30

(−1)"a bc d

#=

"−a −b−c −d

#

Page 585: Analiza Numerica [Utm, Bostan v.]

THE ZERO MATRIX 0

Define the zero matrix of order m× n as the matrix

of that order having all zero entries. It is sometimes

written as 0m×n, but more commonly as simply 0.Then for any matrix A of order m× n,

A+ 0 = 0 +A = A

The zero matrix 0m×n acts in the same role as doesthe number zero when doing arithmetic with real and

complex numbers.

EXAMPLE."1 23 4

#+

"0 00 0

#=

"1 23 4

#

Page 586: Analiza Numerica [Utm, Bostan v.]

We denote by −A the solution of the equation

A+B = 0

It is the matrix obtained by taking the negative of all

of the entries in A. For example,"a bc d

#+

"−a −b−c −d

#=

"0 00 0

#

⇒ −"a bc d

#=

"−a −b−c −d

#= (−1)

"a bc d

#

−"a1,1 a1,2a2,1 a2,2

#=

"−a1,1 −a1,2−a2,1 −a2,2

#

Page 587: Analiza Numerica [Utm, Bostan v.]

MATRIX MULTIPLICATION

Let A =hai,j

ihave order m×n and B =

hbi,j

ihave

order n× p. Then

C = AB

is a matrix of order m× p and

ci,j = Ai,∗B∗,j= ai,1b1,j + ai,2b2,j + · · ·+ ai,nbn,j

or equivalently

ci,j =hai,1 ai,2 · · · ai,n

ib1,jb2,j...

bn,j

= ai,1b1,j + ai,2b2,j + · · ·+ ai,nbn,j

Page 588: Analiza Numerica [Utm, Bostan v.]

EXAMPLES

"1 2 34 5 6

# 1 23 45 6

= "22 2849 64

#

1 23 45 6

" 1 2 34 5 6

#=

9 12 1519 26 3329 40 51

a1,1 · · · a1,n

... . . . ...an,1 · · · an,n

x1...xn

= a1,1x1 + · · ·+ a1,nxn

...an,1x1 + · · ·+ an,nxn

Thus we write the linear system

a1,1x1 + · · ·+ a1,nxn = b1...

an,1x1 + · · ·+ an,nxn = bn

as

Ax = b

Page 589: Analiza Numerica [Utm, Bostan v.]

THE IDENTITY MATRIX I

For a given integer n ≥ 1, Define In to be the matrixof order n × n with 1’s in all diagonal positions and

zeros elsewhere:

In =

1 0 . . . 00 1 0... . . . ...0 . . . 1

More commonly it is denoted by simply I.

Let A be a matrix of order m× n. Then

AIn = A, ImA = A

The identity matrix I acts in the same role as does

the number 1 when doing arithmetic with real and

complex numbers.

Page 590: Analiza Numerica [Utm, Bostan v.]

THE MATRIX INVERSE

Let A be a matrix of order n×n for some n ≥ 1. Wesay a matrix B is an inverse for A if

AB = BA = I

It can be shown that if an inverse exists for A, then

it is unique.

EXAMPLES. If ad− bc 6= 0, then"a bc d

#−1=

1

ad− bc

"d −b−c a

#"1 22 2

#−1=

" −1 1

1 −12

#1 1

213

12

13

14

13

14

15

−1

=

9 −36 30−36 192 −18030 −180 180

Page 591: Analiza Numerica [Utm, Bostan v.]

Recall the earlier theorem on the solution of linear

systems Ax = b with A a square matrix.

Theorem. The following are equivalent statements.

1. For each b, there is exactly one solution x.

2. For each b, there is a solution x.

3. The homogeneous system Ax = 0 has only the

solution x = 0.

4. det (A) 6= 0.

5. A−1 exists.

Page 592: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

det

1 2 34 5 67 8 9

= 0Therefore, the linear system 1 2 3

4 5 67 8 9

x1x2x3

= b1b2b3

is not always solvable, the coefficient matrix does not

have an inverse, and the homogeneous system Ax = 0

has a solution other than the zero vector, namely 1 2 34 5 67 8 9

1−21

= 000

Page 593: Analiza Numerica [Utm, Bostan v.]

PARTITIONED MATRICES

Matrices can be built up from smaller matrices; or

conversely, we can decompose a large matrix into a

matrix of smaller matrices. For example, consider

A =

1 2 02 1 10 −1 5

= "B cd e

#

B =

"1 22 1

#c =

"01

#d =

h0 −1

ie = 5

Matlab allows you to build up larger matrices out of

smaller matrices in exactly this manner; and smaller

matrices can be defined as portions of larger matrices.

We will often write an n × n square matrix in terms

of its columns:

A =hA∗,1, ..., A∗,n

iFor the n× n identity matrix I, we write

I = [e1, ..., en]

with ej denoting a column vector with a 1 in position

j and zeros elsewhere.

Page 594: Analiza Numerica [Utm, Bostan v.]

ARITHMETIC OF PARTITIONED MATRICES

As with matrices, we can do addition and multiplica-

tion with partitioned matrices provided the individual

constituent parts have the proper orders.

For example, let A,B,C,D be n× n matrices. Then"I AB I

# "I CD I

#=

"I +AD C +AB +D I +BC

#

Let A be n × n and x be a column vector of length

n. Then

Ax =hA∗,1, ..., A∗,n

i x1...xn

= x1A∗,1+· · ·+xnA∗,n

Compare this to a1,1 · · · a1,n... . . . ...

an,1 · · · an,n

x1...xn

= a1,1x1 + · · ·+ a1,nxn

...an,1x1 + · · ·+ an,nxn

Page 595: Analiza Numerica [Utm, Bostan v.]

PARTITIONED MATRICES IN MATLAB

In MATLAB, matrices can be constructed using smaller

matrices. For example, let

A = [1, 2; 3, 4]; x = [5, 6]; y = [7, 8]0;

Then

B = [A, y; x, 9];

forms the matrix

B =

1 2 73 4 85 6 9

Page 596: Analiza Numerica [Utm, Bostan v.]

SOLVING LINEAR SYSTEMS

We want to solve the linear system

a1,1x1 + · · ·+ a1,nxn = b1...

an,1x1 + · · ·+ an,nxn = bn

This will be done by the method used in beginning

algebra, by successively eliminating unknowns from

equations, until eventually we have only one equation

in one unknown. This process is known as Gaussian

elimination. To put it onto a computer, however, we

must be more precise than is generally the case in high

school algebra.

We begin with the linear system

3x1 − 2x2 − x3 = 0 (E1)6x1 − 2x2 + 2x3 = 6 (E2)−9x1 + 7x2 + x3 = −1 (E3)

Page 597: Analiza Numerica [Utm, Bostan v.]

3x1 − 2x2 − x3 = 0 (E1)6x1 − 2x2 + 2x3 = 6 (E2)−9x1 + 7x2 + x3 = −1 (E3)

[1] Eliminate x1 from equations (E2) and (E3). Sub-

tract 2 times (E1) from (E2); and subtract −3 times(E1) from (E3). This yields

3x1 − 2x2 − x3 = 0 (E1)2x2 + 4x3 = 6 (E2)x2 − 2x3 = −1 (E3)

[2] Eliminate x2 from equation (E3). Subtract12 times

(E2) from (E3). This yields

3x1 − 2x2 − x3 = 0 (E1)2x2 + 4x3 = 6 (E2)

−4x3 = −4 (E3)

Using back substitution, solve for x3, x2, and x1, ob-

taining

x3 = x2 = x1 = 1

Page 598: Analiza Numerica [Utm, Bostan v.]

In the computer, we work on the arrays rather than

on the equations. To illustrate this, we repeat the

preceding example using array notation.

The original system is Ax = b, with

A =

3 −2 −16 −2 2−9 7 1

, b =

06−1

We often write these in combined form as an aug-

mented matrix:

[A | b] = 3 −2 −1

6 −2 2−9 7 1

¯̄̄̄¯̄̄ 0

6−1

In step 1, we eliminate x1 from equations 2 and 3.

We multiply row 1 by 2 and subtract it from row 2;

and we multiply row 1 by -3 and subtract it from row

3. This yields 3 −2 −10 2 40 1 −2

¯̄̄̄¯̄̄ 0

6−1

Page 599: Analiza Numerica [Utm, Bostan v.]

3 −2 −10 2 40 1 −2

¯̄̄̄¯̄̄ 0

6−1

In step 2, we eliminate x2 from equation 3. We mul-

tiply row 2 by 12 and subtract from row 3. This yields 3 −2 −10 2 40 0 −4

¯̄̄̄¯̄̄ 0

6−4

Then we proceed with back substitution as previously.

Page 600: Analiza Numerica [Utm, Bostan v.]

For the general case, we reduce

[A | b] =

a(1)1,1 · · · a

(1)1,n

... . . . ...

a(1)n,1 · · · a

(1)n,n

¯̄̄̄¯̄̄̄¯̄b(1)1...

b(1)n

in n− 1 steps to the form

a(1)1,1 · · · a

(1)1,n

0 . . . ...... . . .

0 · · · 0 a(n)n,n

¯̄̄̄¯̄̄̄¯̄̄̄b(1)1......

b(n)n

More simply, and introducing new notation, this is

equivalent to the matrix-vector equation Ux = g:u1,1 · · · u1,n0 . . . ...... . . .0 · · · 0 un,n

x1......xn

=g1......gn

Page 601: Analiza Numerica [Utm, Bostan v.]

This is the linear system

u1,1x1 + u1,2x2 + · · ·+ u1,n−1xn−1 + u1,nxn = g1...

un−1,n−1xn−1 + un−1,nxn = gn−1un,nxn = gn

We solve for xn, then xn−1, and backwards to x1.

This process is called back substitution.

xn =gn

un,n

uk =gk −

nuk,k+1xk+1 + · · ·+ uk,nxn

ouk,k

for k = n−1, ..., 1. What we have done here is simplya more carefully defined and methodical version of

what you have done in high school algebra.

Page 602: Analiza Numerica [Utm, Bostan v.]

How do we carry out the conversion ofa(1)1,1 · · · a

(1)1,n

... . . . ...

a(1)n,1 · · · a

(1)n,n

¯̄̄̄¯̄̄̄¯̄b(1)1...

b(1)n

to

a(1)1,1 · · · a

(1)1,n

0 . . . ...... . . .

0 · · · 0 a(n)n,n

¯̄̄̄¯̄̄̄¯̄̄̄b(1)1......

b(n)n

To help us keep track of the steps of this process, we

will denote the initial system by

[A(1) | b(1)] =

a(1)1,1 · · · a

(1)1,n

... . . . ...

a(1)n,1 · · · a

(1)n,n

¯̄̄̄¯̄̄̄¯̄b(1)1...

b(1)n

Initially we will make the assumption that every pivot

element will be nonzero; and later we remove this

assumption.

Page 603: Analiza Numerica [Utm, Bostan v.]

Step 1. We will eliminate x1 from equations 2 thru

n. Begin by defining the multipliers

mi,1 =a(1)i,1

a(1)1,1

, i = 2, ..., n

Here we are assuming the pivot element a(1)1,1 6= 0.

Then in succession, multiply mi,1 times row 1 (called

the pivot row) and subtract the result from row i.

This yields new matrix elements

a(2)i,j = a

(1)i,j −mi,1a

(1)1,j , j = 2, ..., n

b(2)i = b

(1)i −mi,1b

(1)1

for i = 2, ..., n.

Note that the index j does not include j = 1. The

reason is that with the definition of the multipliermi,1,

it is automatic that

a(2)i,1 = a

(1)i,1 −mi,1a

(1)1,1 = 0, i = 2, ..., n

Page 604: Analiza Numerica [Utm, Bostan v.]

The augmented matrix now is

[A(2) | b(2)] =

a(1)1,1 a

(1)1,2 · · · a

(1)1,n

0 a(2)2,2 a

(2)2,n

... ... . . . ...

0 a(2)n,2 · · · a

(2)n,n

¯̄̄̄¯̄̄̄¯̄̄̄¯̄

b(1)1

b(2)2...

b(2)n

Step k: Assume that for i = 1, ..., k− 1 the unknownxi has been eliminated from equations i + 1 thru n.

We have the augmented matrix

[A(k) | b(k)] =

a(1)1,1 a

(1)1,2 · · · a

(1)1,n

0 a(2)2,2 · · · a

(2)2,n

. . . . . . ...

... 0 a(k)k,k · · · a

(k)k,n

... ... . . . ...

0 · · · 0 a(k)n,k · · · a

(k)n,n

¯̄̄̄¯̄̄̄¯̄̄̄¯̄̄̄¯̄̄̄¯

b(1)1

b(2)2...

b(k)k...

b(k)n

Page 605: Analiza Numerica [Utm, Bostan v.]

We want to eliminate unknown xk from equations k+

1 thru n. Begin by defining the multipliers

mi,k =a(k)i,k

a(k)k,k

, i = k + 1, ..., n

The pivot element is a(k)k,k, and we assume it is nonzero.

Using these multipliers, we eliminate xk from equa-

tions k + 1 thru n. Multiply mi,k times row k (the

pivot row) and subtract from row i, for i = k+1 thru

n.

a(k+1)i,j = a

(k)i,j −mi,ka

(k)k,j , j = k + 1, ..., n

b(k+1)i = b

(k)i −mi,kb

(k)k

for i = k+1, ..., n. This yields the augmented matrix

Page 606: Analiza Numerica [Utm, Bostan v.]

[A(k+1) | b(k+1)]:

a(1)1,1 · · · a

(1)1,n

0 . . . ...

a(k)k,k a

(k)k,k+1 · · · a

(k)k,n

... 0 a(k+1)k+1,k+1 a

(k+1)k+1,n

... ... . . . ...

0 · · · 0 a(k+1)n,k+1 · · · a

(k+1)n,n

¯̄̄̄¯̄̄̄¯̄̄̄¯̄̄̄¯̄̄̄¯

b(1)1...

b(k)k

b(k+1)k+1...

b(k+1)n

Doing this for k = 1, 2, ..., n − 1 leads to the uppertriangular system with the augmented matrix

a(1)1,1 · · · a

(1)1,n

0 . . . ...... . . .

0 · · · 0 a(n)n,n

¯̄̄̄¯̄̄̄¯̄̄̄b(1)1......

b(n)n

We later remove the assumption

a(k)k,k 6= 0, k = 1, 2, ..., n

Page 607: Analiza Numerica [Utm, Bostan v.]

QUESTIONS

• How do we remove the assumption on the pivotelements?

• How many operations are involved in this proce-dure?

• How much error is there in the computed solutiondue to rounding errors in the calculations?

• How does the machine architecture affect the im-plementation of this algorithm.

Page 608: Analiza Numerica [Utm, Bostan v.]

PARTIAL PIVOTING

Recall the reduction of

[A(1) | b(1)] =

a(1)1,1 · · · a

(1)1,n

... . . . ...

a(1)n,1 · · · a

(1)n,n

¯̄̄̄¯̄̄̄¯̄b(1)1...

b(1)n

to

[A(2) | b(2)] =

a(1)1,1 a

(1)1,2 · · · a

(1)1,n

0 a(2)2,2 a

(2)2,n

... ... . . . ...

0 a(2)n,2 · · · a

(2)n,n

¯̄̄̄¯̄̄̄¯̄̄̄¯̄

b(1)1

b(2)2...

b(2)n

What if a

(1)1,1 = 0? In that case we look for an equation

in which the x1 is present. To do this in such a way

as to avoid zero the maximum extant possible, we do

the following.

Page 609: Analiza Numerica [Utm, Bostan v.]

Look at all the elements in the first column,

a(1)1,1, a

(1)2,1, ..., a

(1)n,1

and pick the largest in size. Say it is¯̄̄̄a(1)k,1

¯̄̄̄= max

j=1,...,n

¯̄̄̄a(1)j,1

¯̄̄̄Then interchange equations 1 and k, which means

interchanging rows 1 and k in the augmented matrix

[A(1) | b(1)]. Then proceed with the elimination of x1from equations 2 thru n as before.

Having obtained

[A(2) | b(2)] =

a(1)1,1 a

(1)1,2 · · · a

(1)1,n

0 a(2)2,2 a

(2)2,n

... ... . . . ...

0 a(2)n,2 · · · a

(2)n,n

¯̄̄̄¯̄̄̄¯̄̄̄¯̄

b(1)1

b(2)2...

b(2)n

what if a

(2)2,2 = 0? Then we proceed as before.

Page 610: Analiza Numerica [Utm, Bostan v.]

Among the elements

a(2)2,2, a

(2)3,2, ..., a

(2)n,2

pick the one of largest size:¯̄̄̄a(2)k,2

¯̄̄̄= max

j=2,...,n

¯̄̄̄a(2)j,2

¯̄̄̄Interchange rows 2 and k. Then proceed as before to

eliminate x2 from equations 3 thru n, thus obtaining

[A(3) | b(3)] =

a(1)1,1 a

(1)1,2 a

(1)1,3 · · · a

(1)1,n

0 a(2)2,2 a

(2)2,3 · · · a

(2)2,n

0 0 a(3)3,3 · · · a

(3)3,n

... ... ... . . . ...

0 0 a(3)n,3 · · · a

(3)n,n

¯̄̄̄¯̄̄̄¯̄̄̄¯̄̄̄¯̄

b(1)1

b(2)2

b(3)3...

b(3)n

This is done at every stage of the elimination process.

This technique is called partial pivoting, and it is a

part of most Gaussian elimination programs (including

the one in the text).

Page 611: Analiza Numerica [Utm, Bostan v.]

Consequences of partial pivoting. Recall the defini-tion of the elements obtained in the process of elimi-

nating x1 from equations 2 thru n.

mi,1 =a(1)i,1

a(1)1,1

, i = 2, ..., n

a(2)i,j = a

(1)i,j −mi,1a

(1)1,j , j = 2, ..., n

b(2)i = b

(1)i −mi,1b

(1)1

for i = 2, ..., n. By our definition of the pivot element

a(1)1,1, we have¯̄̄

mi,1

¯̄̄≤ 1, i = 2, ..., n

Thus in the calculation of a(2)i,j and b

(2)i , we have that

the elements do not grow rapidly in size. This is in

comparison to what might happen otherwise, in which

the multipliers mi,1 might have been very large. This

property is true of the multipliers at very step of the

elimination process:¯̄̄mi,k

¯̄̄≤ 1, i = k + 1, ..., n, k = 1, ..., n− 1

Page 612: Analiza Numerica [Utm, Bostan v.]

The property¯̄̄mi,k

¯̄̄≤ 1, i = k + 1, ..., n

leads to good error propagation properties in Gaussian

elimination with partial pivoting. The only error in

Gaussian elimination is that derived from the round-

ing errors in the arithmetic operations. For example,

at the first elimination step (eliminating x1 from equa-

tions 2 thru n),

a(2)i,j = a

(1)i,j −mi,1a

(1)1,j , j = 2, ..., n

b(2)i = b

(1)i −mi,1b

(1)1

The above property on the size of the multipliers pre-

vents these numbers and the errors in their calculation

from growing as rapidly as they might if no partial piv-

oting was used.

As an example of the improvement in accuracy ob-

tained with partial pivoting, see the example on pages

262-263.

Page 613: Analiza Numerica [Utm, Bostan v.]

OPERATION COUNTS

One of the major ways in which we compare the effi-

ciency of different numerical methods is to count the

number of needed arithmetic operations. For solving

the linear system

a1,1x1 + · · ·+ a1,nxn = b1...

an,1x1 + · · ·+ an,nxn = bn

using Gaussian elimination, we have the following op-

eration counts.

1. A → U , where we are converting Ax = b to

Ux = g:

Divisionsn(n− 1)

2

Additionsn(n− 1)(2n− 1)

6

Multiplicationsn(n− 1)(2n− 1)

6

Page 614: Analiza Numerica [Utm, Bostan v.]

2. b→ g:

Additionsn(n− 1)

2

Multiplicationsn(n− 1)

23. Solving Ux = g:

Divisions n

Additionsn(n− 1)

2

Multiplicationsn(n− 1)

2

On some machines, the cost of a division is much

more than that of a multiplication; whereas on others

there is not any important difference. We assume the

latter; and then the operation costs are as follows.

MD(A→ U) =n³n2 − 1

´3

MD(b→ g) =n(n− 1)

2

MD(Find x) =n(n+ 1)

2

Page 615: Analiza Numerica [Utm, Bostan v.]

AS(A→ U) =n(n− 1)(2n− 1)

6

AS(b→ g) =n(n− 1)

2

AS(Find x) =n(n− 1)

2

Thus the total number of operations is

Additions2n3 + 3n2 − 5n

6ÃMultiplicationsand Divisions

!n3 + 3n2 − n

3

Both are around 13n3, and thus the total operations

account is approximately

2

3n3

What happens to the cost when n is doubled?

Page 616: Analiza Numerica [Utm, Bostan v.]

Solving Ax = b and Ax = c. What is the cost? Only

the modification of the right side is different in these

two cases. Thus the additional cost isÃMD(b→ g)MD(Find x)

!= n2

ÃAS(b→ g)AS(Find x)

!= n(n− 1)

The total is around 2n2 operations, which is quite a

bit smaller than 23n3 when n is even moderately large,

say n = 100.

Thus one can solve the linear system Ax = c at little

additional cost to that for solving Ax = b. This has

important consequences when it comes to estimation

of the error in computed solutions.

Page 617: Analiza Numerica [Utm, Bostan v.]

CALCULATING THE MATRIX INVERSE

Consider finding the inverse of a 3× 3 matrix

A =

a1,1 a1,2 a1,3a2,1 a2,2 a2,3a3,1 a3,2 a3,3

= hA∗,1, A∗,2, A∗,3

iWe want to find a matrix

X =hX∗,1,X∗,2,X∗,3

ifor which

AX = I

AhX∗,1,X∗,2,X∗,3

i= [e1, e2, e3]h

AX∗,1, AX∗,2, AX∗,3i= [e1, e2, e3]

This means we want to solve

AX∗,1 = e1, AX∗,2 = e2, AX∗,3 = e3

We want to solve three linear systems, all with the

same matrix of coefficients A.

Page 618: Analiza Numerica [Utm, Bostan v.]

MATRIX INVERSE EXAMPLE

A =

1 1 −21 1 11 −1 0

1 1 −21 1 11 −1 0

¯̄̄̄¯̄̄ 1 0 00 1 00 0 1

m2,1 = 1 ↓ m3,1 = 1 1 1 −20 0 30 −2 2

¯̄̄̄¯̄̄ 1 0 0−1 1 0−1 0 1

↓ 1 1 −2

0 −2 20 0 3

¯̄̄̄¯̄̄ 1 0 0−1 0 1−1 1 0

Page 619: Analiza Numerica [Utm, Bostan v.]

1 1 −20 −2 20 0 3

¯̄̄̄¯̄̄ 1 0 0−1 0 1−1 1 0

Then by using back substitution to solve for each col-

umn of the inverse, we obtain

A−1 =

16

13

12

16

13 −12

−13 13 0

Page 620: Analiza Numerica [Utm, Bostan v.]

COST OF MATRIX INVERSION

In calculating A−1, we are solving for the matrix X =hX∗,1,X∗,2, . . . ,X∗,n

iwhere

AhX∗,1,X∗,2, . . . ,X∗,n

i= [e1, e2, . . . , en]

and ej is column j of the identity matrix. Thus weare solving n linear systems

AX∗,1 = e1, AX∗,2 = e2, . . . , AX∗,n = en (1)

all with the same coefficient matrix. Returning tothe earlier operation counts for solving a single linearsystem, we have the following.

Cost of triangulating A: approx. 23n3 operations

Cost of solving Ax = b: 2n2 operations

Thus solving the n linear systems in (1) costs approx-imately

23n3 + n

³2n2

´= 83n3 operations, approximately

It costs approximately four times as many operationsto invert A as to solve a single system. With attentionto the form of the right-hand sides in (1) this can bereduced to 2n3 operations.

Page 621: Analiza Numerica [Utm, Bostan v.]

MATLAB MATRIX OPERATIONS

To solve the linear system Ax = b in Matlab, use

x = A \ bIn Matlab, the command

inv (A)

will calculate the inverse of A.

There are many matrix operations built into Matlab,

both for general matrices and for special classes of

matrices. We do not discuss those here, but recom-

mend the student to investigate these thru the Matlab

help options.

Page 622: Analiza Numerica [Utm, Bostan v.]

GAUSSIAN ELIMINATION - REVISITED

Consider solving the linear system

2x1 + x2 − x3 + 2x4 = 54x1 + 5x2 − 3x3 + 6x4 = 9−2x1 + 5x2 − 2x3 + 6x4 = 44x1 + 11x2 − 4x3 + 8x4 = 2

by Gaussian elimination without pivoting. We denote

this linear system by Ax = b. The augmented matrix

for this system is

[A | b] =

2 1 −1 24 5 −3 6−2 5 −2 64 11 −4 8

¯̄̄̄¯̄̄̄¯5942

To eliminate x1 from equations 2, 3, and 4, use mul-

tipliers

m2,1 = 2, m3,1 = −1, m4,1 = 2

Page 623: Analiza Numerica [Utm, Bostan v.]

To eliminate x1 from equations 2, 3, and 4, use mul-

tipliers

m2,1 = 2, m3,1 = −1, m4,1 = 2

This will introduce zeros into the positions below the

diagonal in column 1, yielding2 1 −1 20 3 −1 20 6 −3 80 9 −2 4

¯̄̄̄¯̄̄̄¯

5−19−8

To eliminate x2 from equations 3 and 4, use multipli-

ers

m3,2 = 2, m4,2 = 3

This reduces the augmented matrix to2 1 −1 20 3 −1 20 0 −1 40 0 1 −2

¯̄̄̄¯̄̄̄¯

5−111−5

Page 624: Analiza Numerica [Utm, Bostan v.]

To eliminate x3 from equation 4, use the multiplier

m4,3 = −1This reduces the augmented matrix to

2 1 −1 20 3 −1 20 0 −1 40 0 0 2

¯̄̄̄¯̄̄̄¯

5−1116

Return this to the familiar linear system

2x1 + x2 − x3 + 2x4 = 53x2 − x3 + 2x4 = −1

−x3 + 4x4 = 112x4 = 6

Solving by back substitution, we obtain

x4 = 3, x3 = 1, x2 = −2, x1 = 1

Page 625: Analiza Numerica [Utm, Bostan v.]

There is a surprising result involving matrices asso-

ciated with this elimination process. Introduce the

upper triangular matrix

U =

2 1 −1 20 3 −1 20 0 −1 40 0 0 2

which resulted from the elimination process. Then

introduce the lower triangular matrix

L =

1 0 0 0

m2,1 1 0 0m3,1 m3,2 1 0m4,1 m4,2 m4,3 1

=

1 0 0 02 1 0 0−1 2 1 02 3 −1 1

This uses the multipliers introduced in the elimination

process. Then

A = LU2 1 −1 24 5 −3 6−2 5 −2 64 11 −4 8

=

1 0 0 02 1 0 0−1 2 1 02 3 −1 1

2 1 −1 20 3 −1 20 0 −1 40 0 0 2

Page 626: Analiza Numerica [Utm, Bostan v.]

In general, when the process of Gaussian elimination

without pivoting is applied to solving a linear system

Ax = b, we obtain A = LU with L and U constructed

as above.

For the case in which partial pivoting is used, we ob-

tain the slightly modified result

LU = PA

where L and U are constructed as before and P is a

permutation matrix. For example, consider

P =

0 0 1 01 0 0 00 0 0 10 1 0 0

Then

PA =

0 0 1 01 0 0 00 0 0 10 1 0 0

a1,1 a1,2 a1,3 a1,4a2,1 a2,2 a2,3 a2,4a3,1 a3,2 a3,3 a3,4a4,1 a4,2 a4,3 a4,4

=A3,∗A1,∗A4,∗A2,∗

Page 627: Analiza Numerica [Utm, Bostan v.]

PA =

0 0 1 01 0 0 00 0 0 10 1 0 0

a1,1 a1,2 a1,3 a1,4a2,1 a2,2 a2,3 a2,4a3,1 a3,2 a3,3 a3,4a4,1 a4,2 a4,3 a4,4

=

A3,∗A1,∗A4,∗A2,∗

The matrix PA is obtained fromA by switching around

rows of A. The result LU = PA means that the LU-

factorization is valid for the matrix A with its rows

suitably permuted.

Page 628: Analiza Numerica [Utm, Bostan v.]

Consequences: If we have a factorization

A = LU

with L lower triangular and U upper triangular, then

we can solve the linear system Ax = b in a relatively

straightforward way.

The linear system can be written as

LUx = b

Write this as a two stage process:

Lg = b, Ux = g

The system Lg = b is a lower triangular system

g1 = b12,1g1 + g2 = b23,1g1 + 3,2g2 + g3 = b3

...

n,1g1 + · · · n,n−1gn−1 + gn = bn

We solve it by “forward substitution”. Then we solve

the upper triangular system Ux = g by back substi-

tution.

Page 629: Analiza Numerica [Utm, Bostan v.]

VARIANTS OF GAUSSIAN ELIMINATION

If no partial pivoting is needed, then we can look for

a factorization

A = LU

without going thru the Gaussian elimination process.

For example, suppose A is 4× 4. We writea1,1 a1,2 a1,3 a1,4a2,1 a2,2 a2,3 a2,4a3,1 a3,2 a3,3 a3,4a4,1 a4,2 a4,3 a4,4

=

1 0 0 0

2,1 1 0 0

3,1 3,2 1 0

4,1 4,2 4,3 1

u1,1 u1,2 u1,3 u1,40 u2,2 u2,3 u2,40 0 u3,3 u3,40 0 0 u4,4

To find the elements

ni,j

oand

nui,j

o, we multiply

the right side matrices L and U and match the results

with the corresponding elements in A.

Page 630: Analiza Numerica [Utm, Bostan v.]

Multiplying the first row of L times all of the columns

of U leads to

u1,j = a1,j, j = 1, 2, 3, 4

Then multiplying rows 2, 3, 4 times the first column

of U yields

i,1u1,1 = ai,1, i = 2, 3, 4

and we can solve forn2,1, 3,1, 4,1

o. We can con-

tinue this process, finding the second row of U and

then the second column of L, and so on. For example,

to solve for 4,3, we need to solve for it in

4,1u1,3 + 4,2u2,3 + 4,3u3,3 = a4,3

Why do this? A hint of an answer is given by this

last equation. If we had an n× n matrix A, then we

would find n,n−1 by solving for it in the equation

n,1u1,n−1+ n,2u2,n−1+· · ·+ n,n−1un−1,n−1 = an,n−1

n,n−1 =an,n−1 −

hn,1u1,n−1 + · · ·+ n,n−2un−2,n−1

iun−1,n−1

Page 631: Analiza Numerica [Utm, Bostan v.]

Embedded in this formula we have a dot product. Thisis in fact typical of this process, with the length of theinner products varying from one position to another.

Recalling the discussion of dot products, we can evaluatethis last formula by using a higher precision arithmeticand thus avoid many rounding errors.

This leads to a variant of Gaussian elimination in whichthere are far fewer rounding errors.

With ordinary Gaussian elimination, the number of round-ing errors is proportional to n3. This reduces the numberof rounding errors, with the number now being propor-tional to only n2. This can lead to major increases inaccuracy, especially for matrices which are very sensitiveto small changes.

Page 632: Analiza Numerica [Utm, Bostan v.]

TRIDIAGONAL MATRICES

A =

b1 c1 0 0 · · · 0a2 b2 c2 00 a3 b3 c3

.... . .

... an−1 bn−1 cn−10 · · · an bn

These occur very commonly in the numerical solution

of partial differential equations, as well as in other ap-

plications (e.g. computing interpolating cubic spline

functions).

We factor A = LU , as before. But now L and U

take very simple forms. Before proceeding, we note

with an example that the same may not be true of the

matrix inverse.

Page 633: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Define an n× n tridiagonal matrix

A =

−1 1 0 0 · · · 01 −2 1 00 1 −2 1 ...

. . .... 1 −2 1

0 · · · 1 −n−1n

Then A−1 is given by³

A−1´i,j= max {i, j}

Thus the sparse matrix A can (and usually does) have

a dense inverse.

Page 634: Analiza Numerica [Utm, Bostan v.]

We factor A = LU , with

L =

1 0 0 0 · · · 0α2 1 0 00 α3 1 0 ...

. . .... αn−1 1 00 · · · αn 1

U =

β1 c1 0 0 · · · 00 β2 c2 00 0 β3 c3

.... . .

... 0 βn−1 cn−10 · · · 0 βn

Multiply these and match coefficients with A to find

{αi, γi}.

Page 635: Analiza Numerica [Utm, Bostan v.]

To solve the linear system

Ax = f

or

LUx = f

instead solve the two triangular systems

Lg = f; Ux = g

Solving Lg = f :

g1 = f1

gj = fj � �jgj�1; j = 2; : : : ; n

Solving Ux = g:

xn =gn

�n

xj =gj � cjxj+1

�j; j = n� 1; : : : ; 1

Page 636: Analiza Numerica [Utm, Bostan v.]

By doing a few multiplications of rows of L times

columns of U , we obtain the general pattern as fol-

lows.

β1 = b1 : row 1 of LU

α2β1 = a2, α2c1 + β2 = b2 : row 2 of LU...

αnβn−1 = an, αncn−1 + βn = bn : row n of LU

These are straightforward to solve.

β1 = b1

αj =aj

βj−1, βj = bj − αjcj−1, j = 2, ..., n

Page 637: Analiza Numerica [Utm, Bostan v.]

OPERATIONS COUNT

Factoring A = LU .

Additions: n− 1Multiplications: n− 1Divisions: n− 1

Solving Lz = f and Ux = z:

Additions: 2n− 2Multiplications: 2n− 2Divisions: n

Thus the total number of arithmetic operations is ap-

proximately 3n to factor A; and it takes about 5n to

solve the linear system using the factorization of A.

If we had A−1 at no cost, what would it cost to com-pute x = A−1f?

xi =nX

j=1

³A−1

´i,jfj, i = 1, ..., n

Page 638: Analiza Numerica [Utm, Bostan v.]

MATLAB MATRIX OPERATIONS

To obtain the LU-factorization of a matrix, including

the use of partial pivoting, use the Matlab command

lu. In particular,

[L, U, P ] = lu(X)

returns the lower triangular matrix L, upper triangular

matrix U , and permutation matrix P so that

PX = LU

Page 639: Analiza Numerica [Utm, Bostan v.]

NUMERICAL INTEGRATION

How do you evaluate

I =Z b

af(x) dx

From calculus, if F (x) is an antiderivative of f(x),

then

I =Z b

af(x) dx = F (x)|ba = F (b)− F (a)

However, in practice most integrals cannot be evalu-

ated by this means. And even when this can work, an

approximate numerical method may be much simpler

and easier to use. For example, the integrand inZ 10

dx

1 + x5

has an extremely complicated antiderivative; and it is

easier to evaluate the integral by approximate means.

Try evaluating this integral with Maple or Mathemat-

ica.

Page 640: Analiza Numerica [Utm, Bostan v.]

NUMERICAL INTEGRATIONA GENERAL FRAMEWORK

Returning to a lesson used earlier with rootfinding:If you cannot solve a problem, then replace it with a“near-by” problem that you can solve.In our case, we want to evaluate

I =Z b

af(x) dx

To do so, many of the numerical schemes are basedon choosing approximates of f(x). Calling one suchef(x), use

I ≈Z b

a

ef(x) dx ≡ eIWhat is the error?

E = I − eI = Z b

a

hf(x)− ef(x)i dx

|E| ≤Z b

a

¯̄̄f(x)− ef(x)¯̄̄ dx

≤ (b− a)°°°f − ef°°°∞°°°f − ef°°°∞ ≡ max

a≤x≤b¯̄̄f(x)− ef(x)¯̄̄

Page 641: Analiza Numerica [Utm, Bostan v.]

We also want to choose the approximates ef(x) of aform we can integrate directly and easily. Examples

are polynomials, trig functions, piecewise polynomials,

and others.

If we use polynomial approximations, then how do we

choose them. At this point, we have two choices:

1. Taylor polynomials approximating f(x)

2. Interpolatory polynomials approximating f(x)

Page 642: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Consider evaluating

I =Z 10ex2dx

Use

et = 1 + t+ 12!t2 + · · ·+ 1

n!tn + 1

(n+1)!tn+1ect

ex2= 1 + x2 + 1

2!x4 + · · ·+ 1

n!x2n + 1

(n+1)!x2n+2edx

with 0 ≤ dx ≤ x2. Then

I =Z 10

h1 + x2 + 1

2!x4 + · · ·+ 1

n!x2nidx

+ 1(n+1)!

Z 10

hx2n+2edx

idx

Taking n = 3, we have

I = 1 + 13 +

110 +

142 +E = 1.4571 +E

0 < E ≤ e24

Z 10

hx8idx = e

216 = .0126

Page 643: Analiza Numerica [Utm, Bostan v.]

USING INTERPOLATORY POLYNOMIALS

In spite of the simplicity of the above example, it is

generally more difficult to do numerical integration by

constructing Taylor polynomial approximations than

by constructing polynomial interpolates. We therefore

construct the function ef inZ b

af(x) dx ≈

Z b

a

ef(x) dxby means of interpolation.

Initially, we consider only the case in which the in-

terpolation is based on interpolation at evenly spaced

node points.

Page 644: Analiza Numerica [Utm, Bostan v.]

LINEAR INTERPOLATION

The linear interpolant to f(x), interpolating at a and

b, is given by

P1(x) =(b− x) f(a) + (x− a) f(b)

b− a

Using the linear interpolant

P1(x) =(b− x) f(a) + (x− a) f(b)

b− a

we obtain the approximationZ b

af(x) dx ≈

Z b

aP1(x) dx

= 12 (b− a) [f(a) + f(b)] ≡ T1(f)

The rulebZa

f(x) dx ≈ T1(f)

is called the trapezoidal rule.

Page 645: Analiza Numerica [Utm, Bostan v.]

x

y

a b

y=f(x)

y=p1(x)

Illustrating I ≈ T1(f)

Example.Z π/2

0sinxdx ≈ π

4

hsin 0 + sin

³π2

´i= π

4.= .785398

Error = .215

Page 646: Analiza Numerica [Utm, Bostan v.]

HOW TO OBTAIN GREATER ACCURACY?

How do we improve our estimate of the integral

I =Z b

af(x) dx

One direction is to increase the degree of the approxi-mation, moving next to a quadratic interpolating poly-nomial for f(x). We first look at an alternative.

Instead of using the trapezoidal rule on the originalinterval [a, b], apply it to integrals of f(x) over smallersubintervals. For example:

I =Z c

af(x) dx+

Z b

cf(x) dx, c = b+a

2

≈ c−a2 [f(a) + f(c)] + b−c

2 [f(c) + f(b)]

= h2 [f(a) + 2f(c) + f(b)] ≡ T2(f), h = b−a

2

Example.Z π/2

0sinxdx ≈ π

8

hsin 0 + 2 sin

³π4

´+ sin

³π2

´i.= .948059

Error = .0519

Page 647: Analiza Numerica [Utm, Bostan v.]

x

y

a=x0 b=x3x1 x2

y=f(x)

Illustrating I ≈ T3(f)

Page 648: Analiza Numerica [Utm, Bostan v.]

THE TRAPEZOIDAL RULE

We can continue as above by dividing [a, b] into even

smaller subintervals and applying

βZα

f(x) dx ≈ β − α

2[f(α) + f(β)] , (∗)

on each of the smaller subintervals. Begin by intro-

ducing a positive integer n ≥ 1,

h =b− a

n, xj = a+ j h, j = 0, 1, ..., n

Then

I =Z xn

x0f(x) dx

=Z x1

x0f(x) dx+

Z x2

x1f(x) dx+ · · ·+

Z xn

xn−1f(x) dx

Use [α, β] = [x0, x1], [x1, x2], ..., [xn−1, xn], for eachof which the subinterval has length h.

Page 649: Analiza Numerica [Utm, Bostan v.]

Then applying

βZα

f(x) dx ≈ β − α

2[f(α) + f(β)]

we have

I ≈ h2 [f(x0) + f(x1)] +

h2 [f(x1) + f(x2)]

+ · · ·+h2 [f(xn−2) + f(xn−1)] + h

2 [f(xn−1) + f(xn)]

Simplifying,

I ≈ h·1

2f(a) + f(x1) + · · ·+ f(xn−1) +

1

2f(b)

¸≡ Tn(f)

This is called the “composite trapezoidal rule”, or

more simply, the trapezoidal rule.

Page 650: Analiza Numerica [Utm, Bostan v.]

Example. Again integrate sinx overh0, π2

i. Then we

have

n Tn(f) Error Ratio1 .785398163 2.15E−12 .948059449 5.19E−2 4.134 .987115801 1.29E−2 4.038 .996785172 3.21E−3 4.0116 .999196680 8.03E−4 4.0032 .999799194 2.01E−4 4.0064 .999949800 5.02E−5 4.00128 .999987450 1.26E−5 4.00256 .999996863 3.14E−6 4.00

Note that the errors are decreasing by a constant fac-

tor of 4. Why do we always double n?

Page 651: Analiza Numerica [Utm, Bostan v.]

USING QUADRATIC INTERPOLATION

We want to approximate I =R ba f(x) dx using quadratic

interpolation of f(x). Interpolate f(x) at the points

{a, c, b}, with c = 12 (a+ b). Also let h = 1

2 (b− a).

The quadratic interpolating polynomial is given by

P2(x) =(x− c) (x− b)

2h2f(a) +

(x− a) (x− b)

−h2 f(c)

+(x− a) (x− c)

2h2f(b)

Replacing f(x) by P2(x), we obtain the approximationZ b

af(x) dx ≈

Z b

aP2(x) dx

= h3 [f(a) + 4f(c) + f(b)] ≡ S2(f)

This is called Simpson’s rule.

Page 652: Analiza Numerica [Utm, Bostan v.]

x

y

a b(a+b)/2

y=f(x)

Illustration of I ≈ S2(f)

Example.Z π/2

0sinxdx ≈ π/2

3

hsin 0 + 4 sin

³π4

´+ sin

³π2

´i.= 1.00227987749221

Error = −0.00228

Page 653: Analiza Numerica [Utm, Bostan v.]

SIMPSON’S RULE

As with the trapezoidal rule, we can apply Simpson’s

rule on smaller subdivisions in order to obtain better

accuracy in approximating

I =Z b

af(x) dx

Again, Simpson’s rule is given byZ β

αf(x) dx ≈ δ

3[f(α) + 4f(γ) + f(β)] , γ =

α+ β

2

and δ = 12 (β − α).

Let n be a positive even integer, and

h =b− a

n, xj = a+ j h, j = 0, 1, ..., n

Then write

I =Z xn

x0f(x) dx

=Z x2

x0f(x) dx+

Z x4

x2f(x) dx+ · · ·+

Z xn

xn−2f(x) dx

Page 654: Analiza Numerica [Utm, Bostan v.]

ApplyZ β

αf(x) dx ≈ δ

3[f(α) + 4f(γ) + f(β)] , γ =

α+ β

2

to each of these subintegrals, with

[α, β] = [x0, x2] , [x2, x4] , ..., [xn−2, xn]

In all cases, 12 (β − α) = h. Then

I ≈ h3 [f(x0) + 4f(x1) + f(x2)]

+h3 [f(x2) + 4f(x3) + f(x4)]

+ · · ·+h3 [f(xn−4) + 4f(xn−3) + f(xn−2)]

+h3 [f(xn−2) + 4f(xn−1) + f(xn)]

This can be simplified toZ b

af(x) dx ≈ Sn(f) ≡ h

3 [f(x0) + 4f(x1)

+2f(x2) + 4f(x3) + 2f(x4)

+ · · ·+ 2f(xn−2) + 4f(xn−1) + f(xn)]

This is called the “composite Simpson’s rule” or more

simply, .Simpson’s rule

Page 655: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

ApproximateZ π/2

0sinxdx. The Simpson rule results

are as follows.

n Sn(f) Error Ratio2 1.00227987749221 −2.28E−34 1.00013458497419 −1.35E−4 16.948 1.00000829552397 −8.30E−6 16.2216 1.00000051668471 −5.17E−7 16.0632 1.00000003226500 −3.23E−8 16.0164 1.00000000201613 −2.02E−9 16.00128 1.00000000012600 −1.26E−10 16.00256 1.00000000000788 −7.88E−12 16.00512 1.00000000000049 −4.92E−13 15.99

Note that the ratios of successive errors have con-

verged to 16. Why? Also compare this table with

that for the trapezoidal rule. For example,

I − T4 = 1.29E − 2I − S4 = −1.35E − 4

Page 656: Analiza Numerica [Utm, Bostan v.]

Example 1

I(1) =Z 10e�x

2dx � 0:746824132812427

I(2) =Z 40

dx

1 + x2= arctan(4) � 1:32581766366803

I(3) =Z 2�0

dx

2 + cosx=2�p3� 3:62759872846844

Table 1. Trapezoidal rule applied to Example 1.

n I(1) I(2) I(3)

Error R Error R Error R2 1:6E � 2 �1:3E � 1 �5:6E � 14 3:8E � 3 4:02 �3:6E � 3 37:0 �3:8E � 2 14:98 9:6E � 4 4:01 5:6E � 4 �6:4 �1:9E � 4 195:016 2:4E � 4 4:00 1:4E � 4 3:9 �5:2E � 9 3760032 6:0E � 5 4:00 3:6E � 5 4:0064 1:5E � 5 4:00 9:0E � 6 4:00128 3:7E � 6 4:00 2:3E � 6 4:00

Page 657: Analiza Numerica [Utm, Bostan v.]

Table 2. Simpson rule applied to Example 1.

n I(1) I(2) I(3)

Error R Error R Error R2 �3:6E � 4 8:7E � 2 �1:264 �3:1E � 5 11:4 3:9E � 2 2:2 1:4E � 1 �9:28 �2:0E � 6 15:7 2:0E � 3 20 1:2E � 2 11:216 �1:3E � 7 15:9 4:0E � 6 485 6:4E � 5 19132 �7:8E � 9 16:0 2:3E � 8 172 1:7E � 9 3760064 �4:9E � 10 16:0 1:5E � 9 16128 �3:0E � 11 16:0 9:2E � 11 16

Page 658: Analiza Numerica [Utm, Bostan v.]

TRAPEZOIDAL METHOD

ERROR FORMULA

Theorem Let f(x) have two continuous derivatives on

the interval a ≤ x ≤ b. Then

ETn (f) ≡

Z b

af(x) dx− Tn(f) = −h

2 (b− a)

12f 00 (cn)

for some cn in the interval [a, b].

Later I will say something about the proof of this re-

sult, as it leads to some other useful formulas for the

error.

The above formula says that the error decreases in

a manner that is roughly proportional to h2. Thus

doubling n (and halving h) should cause the error to

decrease by a factor of approximately 4. This is what

we observed with a past example from the preceding

section.

Page 659: Analiza Numerica [Utm, Bostan v.]

Example. Consider evaluating

I =Z 20

dx

1 + x2

using the trapezoidal method Tn(f). How large should

n be chosen in order to ensure that¯̄̄ETn (f)

¯̄̄≤ 5× 10−6

We begin by calculating the derivatives:

f 0(x) = −2x³1 + x2

´2, f 00(x) = −2 + 6x2³1 + x2

´3From a graph of f 00(x),

max0≤x≤2

¯̄̄f 00(x)

¯̄̄= 2

Recall that b− a = 2. Therefore,

ETn (f) = −h

2 (b− a)

12f 00 (cn)¯̄̄

ETn (f)

¯̄̄≤ h2 (2)

12· 2 = h2

3

Page 660: Analiza Numerica [Utm, Bostan v.]

ETn (f) = −h

2 (b− a)

12f 00 (cn)¯̄̄

ETn (f)

¯̄̄≤ h22

12· 2 = h2

3

We bound¯̄f 00 (cn)

¯̄since we do not know cn, and

therefore we must assume the worst possible case, that

which makes the error formula largest. That is what

has been done above.

When do we have¯̄̄ETn (f)

¯̄̄≤ 5× 10−6 (1)

To ensure this, we choose h so small that

h2

3≤ 5× 10−6

This is equivalent to choosing h and n to satisfy

h ≤ .003873

n =2

h≥ 516.4

Thus n ≥ 517 will imply (1).

Page 661: Analiza Numerica [Utm, Bostan v.]

DERIVING THE ERROR FORMULA

There are two stages in deriving the error:

(1) Obtain the error formula for the case of a single

subinterval (n = 1);

(2) Use this to obtain the general error formula given

earlier.

For the trapezoidal method with only a single subin-

terval, we haveZ α+h

αf(x) dx− h

2[f(α) + f(α+ h)] = −h

3

12f 00(c)

for some c in the interval [α,α+ h].

A sketch of the derivation of this error formula is given

in the problems.

Page 662: Analiza Numerica [Utm, Bostan v.]

Recall that the general trapezoidal rule Tn(f) was ob-

tained by applying the simple trapezoidal rule to a sub-

division of the original interval of integration. Recall

defining and writing

h =b− a

n, xj = a+ j h, j = 0, 1, ..., n

I =

xnZx0

f(x) dx

=

x1Zx0

f(x) dx+

x2Zx1

f(x) dx+ · · ·

+

xnZxn−1

f(x) dx

I ≈ h2 [f(x0) + f(x1)] +

h2 [f(x1) + f(x2)]

+ · · ·+h2 [f(xn−2) + f(xn−1)] + h

2 [f(xn−1) + f(xn)]

Page 663: Analiza Numerica [Utm, Bostan v.]

Then the error

ETn (f) ≡

Z b

af(x) dx− Tn(f)

can be analyzed by adding together the errors over the

subintervals [x0, x1], [x1, x2], ..., [xn−1, xn]. RecallZ α+h

αf(x) dx− h

2[f(α) + f(α+ h)] = −h

3

12f 00(c)

Then on [xj−1, xj],xjZxj−1

f(x) dx− h

2

hf(xj−1) + f(xj)

i= −h

3

12f 00(γj)

with xj−1 ≤ γj ≤ xj, but otherwise γj unknown.

Then combining these errors, we obtain

ETn (f) = −

h3

12f 00(γ1)− · · ·−

h3

12f 00(γn)

This formula can be further simplified, and we will do

so in two ways.

Page 664: Analiza Numerica [Utm, Bostan v.]

Rewrite this error as

ETn (f) = −

h3n

12

"f 00(γ1) + · · ·+ f 00(γn)

n

#Denote the quantity inside the brackets by ζn. This

number satisfies

mina≤x≤b f

00(x) ≤ ζn ≤ maxa≤x≤b f

00(x)

Since f 00(x) is a continuous function (by original as-sumption), we have that there must be some number

cn in [a, b] for which

f 00(cn) = ζn

Recall also that hn = b− a. Then

ETn (f) = −h

3n

12

"f 00(γ1) + · · ·+ f 00(γn)

n

#

= −h2 (b− a)

12f 00 (cn)

This is the error formula given on the first slide.

Page 665: Analiza Numerica [Utm, Bostan v.]

AN ERROR ESTIMATE

We now obtain a way to estimate the error ETn (f).

Return to the formula

ETn (f) = −

h3

12f 00(γ1)− · · ·−

h3

12f 00(γn)

and rewrite it as

ETn (f) = −

h2

12

hf 00(γ1)h+ · · ·+ f 00(γn)h

iThe quantity

f 00(γ1)h+ · · ·+ f 00(γn)h

is a Riemann sum for the integralZ b

af 00(x) dx = f 0(b)− f 0(a)

By this we mean

limn→∞

hf 00(γ1)h+ · · ·+ f 00(γn)h

i=Z b

af 00(x) dx

Page 666: Analiza Numerica [Utm, Bostan v.]

Thus

f 00(γ1)h+ · · ·+ f 00(γn)h ≈ f 0(b)− f 0(a)

for larger values of n. Combining this with the earlier

error formula

ETn (f) = −

h2

12

hf 00(γ1)h+ · · ·+ f 00(γn)h

iwe have

ETn (f) ≈ −

h2

12

hf 0(b)− f 0(a)

i≡ eET

n (f)

This is a computable estimate of the error in the nu-

merical integration. It is called an asymptotic error

estimate.

Page 667: Analiza Numerica [Utm, Bostan v.]

Example. Consider evaluating

I(f) =Z π

0ex cosxdx = −e

π + 1

2

.= −12.070346

In this case,

f 0(x) = ex [cosx− sinx]f 00(x) = −2ex sinx

max0≤x≤π

¯̄f 00(x)

¯̄=

¯̄f 00 (.75π)

¯̄= 14. 921

Then

ETn (f) = −h

2 (b− a)

12f 00 (cn)¯̄̄

ETn (f)

¯̄̄≤ h2π

12· 14.921 = 3.906h2

Also

eETn (f) = −h

2

12

£f 0(π)− f 0(0)

¤=

h2

12[eπ + 1]

.= 2.012h2

Page 668: Analiza Numerica [Utm, Bostan v.]

I(f)� Tn(f) � �h2

12

�f 0(b)� f 0(a)

�I(f) � Tn(f)�

h2

12

�f 0(b)� f 0(a)

�CTn(f) � Tn(f)�

h2

12

�f 0(b)� f 0(a)

This is the corrected trapezoidal rule. It is easy to obtainfrom the trapezoidal rule, and in most cases, it convergesmore rapidly than the trapezoidal rule.

Table 3. Asymptotic and corrected trapesoidal rule ap-plied to integral I(1) from Example 1.

n I � Tn(f) R eEn(f) I � CTn(f) R2 1:6E � 2 1:5E � 2 1:3E � 44 3:8E � 3 4 3:8E � 3 7:9E � 6 15:88 9:6E � 4 4 9:6E � 4 4:9E � 7 1616 2:4E � 4 4 2:4E � 4 3:1E � 8 1632 5:9E � 5 4 5:9E � 5 2:0E � 9 1664 1:5E � 5 4 1:5E � 5 2:2E � 10 16

Page 669: Analiza Numerica [Utm, Bostan v.]

SIMPSON’S RULE ERROR FORMULA

Recall the general Simpson’s ruleZ b

af(x) dx ≈ Sn(f) ≡ h

3 [f(x0) + 4f(x1) + 2f(x2)

+4f(x3) + 2f(x4) + · · ·+2f(xn−2) + 4f(xn−1) + f(xn)]

For its error, we have

ESn(f) ≡

bZa

f(x) dx− Sn(f) = −h4 (b− a)

180f (4)(cn)

for some a ≤ cn ≤ b, with cn otherwise unknown. For

an asymptotic error estimate,

bZa

f(x) dx−Sn(f) ≈ eESn (f) ≡ −

h4

180

hf 000(b)− f 000(a)

i

Page 670: Analiza Numerica [Utm, Bostan v.]

DISCUSSION

For Simpson’s error formula, both formulas assume

that the integrand f(x) has four continuous deriva-

tives on the interval [a, b]. What happens when this

is not valid? We return later to this question.

Both formulas also say the error should decrease by a

factor of around 16 when n is doubled.

Compare these results with those for the trapezoidal

rule error formulas:.

ETn (f) ≡

Z b

af(x) dx− Tn(f) = −h

2 (b− a)

12f 00 (cn)

ETn (f) ≈ −

h2

12

hf 0(b)− f 0(a)

i≡ eET

n (f)

Page 671: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Consider evaluating

I =Z 20

dx

1 + x2

using Simpson’s rule Sn(f). How large should n be

chosen in order to ensure that¯̄̄ESn(f)

¯̄̄≤ 5× 10−6

Begin by noting that

f (4)(x) = 245x4 − 10x2 + 1³

1 + x2´5

max0≤x≤1

¯̄̄f (4)(x)

¯̄̄= f (4)(0) = 24

Then

ESn(f) = −h

4 (b− a)

180f (4)(cn)¯̄̄

ESn(f)

¯̄̄≤ h4 · 2

180· 24 = 4h4

15

Page 672: Analiza Numerica [Utm, Bostan v.]

Then¯̄̄ESn(f)

¯̄̄≤ 5× 10−6 is true if

4h4

15≤ 5× 10−6

h ≤ .0658n ≥ 30.39

Therefore, choosing n ≥ 32 will give the desired er-

ror bound. Compare this with the earlier trapezoidal

example in which n ≥ 517 was needed.

For the asymptotic error estimate, we have

f 000(x) = −24x x2 − 1³1 + x2

´4eESn (f) ≡ − h4

180

£f 000(2)− f 000(0)

¤=

h4

180· 144625

=4

3125h4

Page 673: Analiza Numerica [Utm, Bostan v.]

INTEGRATING sqrt(x)

Consider the numerical approximation ofZ 10sqrt(x) dx =

2

3

In the following table, we give the errors when using

both the trapezoidal and Simpson rules.

n ETn Ratio ES

n Ratio2 6.311E − 2 2.860E − 24 2.338E − 2 2.70 1.012E − 2 2.828 8.536E − 3 2.74 3.587E − 3 2.8316 3.085E − 3 2.77 1.268E − 3 2.8332 1.108E − 3 2.78 4.485E − 4 2.8364 3.959E − 4 2.80 1.586E − 4 2.83128 1.410E − 4 2.81 5.606E − 5 2.83

The rate of convergence is slower because the func-

tion f(x) =sqrt(x) is not sufficiently differentiable on

[0, 1]. Both methods converge with a rate propor-

tional to h1.5.

Page 674: Analiza Numerica [Utm, Bostan v.]

ASYMPTOTIC ERROR FORMULAS

If we have a numerical integration formula,Z b

af(x) dx ≈

nXj=0

wjf(xj)

let En(f) denote its error,

En(f) =Z b

af(x) dx−

nXj=0

wjf(xj)

We say another formula eEn(f) is an asymptotic errorformula this numerical integration if it satisfies

limn→∞

eEn(f)

En(f)= 1

Equivalently,

limn→∞

En(f)− eEn(f)

En(f)= 0

These conditions say that eEn(f) looks increasinglylike En(f) as n increases, and thus

En(f) ≈ eEn(f)

Page 675: Analiza Numerica [Utm, Bostan v.]

Example. For the trapezoidal rule,

ETn (f) ≈ eET

n (f) ≡ −h2

12

hf 0(b)− f 0(a)

iThis assumes f(x) has two continuous derivatives on

the interval [a, b].

Example. For Simpson’s rule,

ESn(f) ≈ eES

n(f) ≡ −h4

180

hf 000(b)− f 000(a)

iThis assumes f(x) has four continuous derivatives on

the interval [a, b].

Note that both of these formulas can be written in an

equivalent form as

eEn(f) =c

np

for appropriate constant c and exponent p. With the

trapezoidal rule, p = 2 and

c = −(b− a)2

12

hf 0(b)− f 0(a)

iand for Simpson’s rule, p = 4 with a suitable c.

Page 676: Analiza Numerica [Utm, Bostan v.]

The formula eEn(f) =c

np(2)

occurs for many other numerical integration formulas

that we have not yet defined or studied. In addition,

if we use the trapezoidal or Simpson rules with an

integrand f(x) which is not sufficiently differentiable,

then (2) may hold with an exponent p that is less than

the ideal.

Example. Consider

I =Z 10xβ dx

in which −1 < β < 1, β 6= 0. Then the conver-

gence of the trapezoidal rule can be shown to have an

asymptotic error formula

En ≈ eEn =c

nβ+1(3)

for some constant c dependent on β. A similar result

holds for Simpson’s rule, with −1 < β < 3, β not an

integer. We can actually specify a formula for c; but

the formula is often less important than knowing that

(2) is valid for some c.

Page 677: Analiza Numerica [Utm, Bostan v.]

APPLICATION OF ASYMPTOTIC

ERROR FORMULAS

Assume we know that an asymptotic error formula

I − In ≈ c

np

is valid for some numerical integration rule denoted by

In. Initially, assume we know the exponent p. Then

imagine calculating both In and I2n. With I2n, we

have

I − I2n ≈c

2pnp

This leads to

I − In ≈ 2p [I − I2n]

I ≈ 2pI2n − In

2p − 1 = I2n +I2n − In

2p − 1The formula

I ≈ I2n +I2n − In

2p − 1 (4)

is called Richardson’s extrapolation formula.

Page 678: Analiza Numerica [Utm, Bostan v.]

Example. With the trapezoidal rule and with the in-tegrand f(x) having two continuous derivatives,

I ≈ T2n +1

3[T2n − Tn]

Example. With Simpson’s rule and with the integrandf(x) having four continuous derivatives,

I ≈ S2n +1

15[S2n − Sn]

We can also use the formula (2) to obtain error esti-

mation formulas:

I − I2n ≈I2n − In

2p − 1 (5)

This is called Richardson’s error estimate. For exam-

ple, with the trapezoidal rule,

I − T2n ≈1

3[T2n − Tn]

These formulas are illustrated for the trapezoidal rule

in an accompanying table, forZ π

0ex cosxdx = −e

π + 1

2

.= −12.07034632

Page 679: Analiza Numerica [Utm, Bostan v.]

AITKEN EXTRAPOLATION

In this case, we again assume

I − In ≈ c

np

But in contrast to previously, we do not know either

c or p. Imagine computing In, I2n, and I4n. Then

I − In ≈ c

np

I − I2n ≈ c

2pnp

I − I4n ≈ c

4pnp

We can directly try to estimate I. Dividing

I − In

I − I2n≈ 2p ≈ I − I2n

I − I4nSolving for I, we obtain

(I − I2n)2 ≈ (I − In) (I − I4n)

I (In + I4n − 2I2n) ≈ InI4n − I22n

I ≈ InI4n − I22nIn + I4n − 2I2n

Page 680: Analiza Numerica [Utm, Bostan v.]

This can be improved computationally, to avoid loss

of significance errors.

I ≈ I4n +

"InI4n − I22n

In + I4n − 2I2n− I4n

#

= I4n −(I4n − I2n)

2

(I4n − I2n)− (I2n − In)

This is called Aitken’s extrapolation formula.

To estimate p, we use

I2n − In

I4n − I2n≈ 2p

To see this, write

I2n − In

I4n − I2n=(I − In)− (I − I2n)

(I − I2n)− (I − I4n)

Then substitute from the following and simplify:

I − In ≈ c

np

I − I2n ≈ c

2pnp

I − I4n ≈ c

4pnp

Page 681: Analiza Numerica [Utm, Bostan v.]

Example. Consider the following table of numerical

integrals. What is its order of convergence?

n In In − I12n

Ratio

2 .284517796864 .28559254576 1.075E − 38 .28570248748 1.099E − 4 9.7816 .28571317731 1.069E − 5 10.2832 .28571418363 1.006E − 6 10.6264 .28571427643 9.280E − 8 10.84

It appears

2p.= 10.84, p

.= log2 10.84 = 3.44

We could now combine this with Richardson’s error

formula to estimate the error:

I − In ≈ 1

2p − 1·In − I1

2n

¸For example,

I − I64 ≈1

9.84[9.280E − 8] = 9.43E − 9

Page 682: Analiza Numerica [Utm, Bostan v.]

PERIODIC FUNCTIONS

A function f(x) is periodic if the following condition

is satisfied. There is a smallest real number τ > 0 for

which

f(x+ τ) = f(x), −∞ < x <∞ (6)

The number τ is called the period of the function

f(x). The constant function f(x) ≡ 1 is also consid-ered periodic, but it satisfies this condition with any

τ > 0. Basically, a periodic function is one which

repeats itself over intervals of length τ .

The condition (6) implies

f (m)(x+ τ) = f (m)(x), −∞ < x <∞ (7)

for the mth-derivative of f(x), provided there is such

a derivative. Thus the derivatives are also periodic.

Periodic functions occur very frequently in applica-

tions of mathematics, reflecting the periodicity of many

phenomena in the physical world.

Page 683: Analiza Numerica [Utm, Bostan v.]

PERIODIC INTEGRANDS

Consider the special class of integrals

I(f) =Z b

af(x) dx

in which f(x) is periodic, with b−a an integer multipleof the period τ for f(x). In this case, the performance

of the trapezoidal rule and other numerical integration

rules is much better than that predicted by earlier error

formulas.

To hint at this improved performance, recallZ b

af(x) dx− Tn(f) ≈ eEn(f) ≡ −h

2

12

hf 0(b)− f 0(a)

iWith our assumption on the periodicity of f(x), we

have

f(a) = f(b), f 0(a) = f 0(b)

Therefore, eEn(f) = 0

Page 684: Analiza Numerica [Utm, Bostan v.]

and we should expect improved performance in the

convergence behaviour of the trapezoidal sums Tn(f).

If in addition to being periodic on [a, b], the integrand

f(x) also has m continous derivatives, then it can be

shown that

I(f)− Tn(f) =c

nm+ smaller terms

By “smaller terms”, we mean terms which decrease

to zero more rapidly than n−m.

Thus if f(x) is periodic with b−a an integer multiple

of the period τ for f(x), and if f(x) is infinitely differ-

entiable, then the error I−Tn decreases to zero more

rapidly than n−m for any m > 0. For periodic inte-

grands, the trapezoidal rule is an optimal numerical

integration method.

Page 685: Analiza Numerica [Utm, Bostan v.]

Example. Consider evaluating

I =Z 2π0

sinxdx

1 + esinx

Using the trapezoidal rule, we have the results in the

following table. In this case, the formulas based on

Richardson extrapolation are no longer valid.

n Tn Tn − T12n

2 0.04 −0.72589193317292 −7.259E − 18 −0.74006131211583 −1.417E − 216 −0.74006942337672 −8.111E − 632 −0.74006942337946 −2.746E − 1264 −0.74006942337946 0.0

Page 686: Analiza Numerica [Utm, Bostan v.]

NUMERICAL INTEGRATION:

ANOTHER APPROACH

We look for numerical integration formulasZ 1−1

f(x) dx ≈nX

j=1

wjf(xj)

which are to be exact for polynomials of as large a

degree as possible. There are no restrictions placed

on the nodesnxjonor the weights

nwj

oin working

towards that goal. The motivation is that if it is exact

for high degree polynomials, then perhaps it will be

very accurate when integrating functions that are well

approximated by polynomials.

There is no guarantee that such an approach will work.

In fact, it turns out to be a bad idea when the node

pointsnxjoare required to be evenly spaced over the

interval of integration. But without this restriction onnxjowe are able to develop a very accurate set of

quadrature formulas.

Page 687: Analiza Numerica [Utm, Bostan v.]

The case n = 1. We want a formula

w1f(x1) �1R�1f(x)dx

The weight w1 and the nodex1 are to be so chosen thatthe formula is exact for polynomials of as large degreeas possible. To do this we substitute f(x) = 1 andf(x) = x. The �rst choice leads to

w1 � 1 �1R�11dx

w1 = 2

The choice f(x) = x leads to

w1x1 �1R�1xdx

x1 = 0

The desired formula is

1R�1f(x)dx � 2f(0)

It is called the midpoint rule.

Page 688: Analiza Numerica [Utm, Bostan v.]

The case n = 2. We want a formula

w1f(x1) +w2f(x2) ≈Z 1−1

f(x) dx

The weights w1, w2 and the nodes x1, x2 are to be so

chosen that the formula is exact for polynomials of as

large a degree as possible. We substitute and force

equality for

f(x) = 1, x, x2, x3

This leads to the system

w1 +w2 =Z 1−11 dx = 2

w1x1 + w2x2 =Z 1−1

xdx = 0

w1x21 + w2x

22 =

Z 1−1

x2 dx =2

3

w1x31 + w2x

32 =

Z 1−1

x3 dx = 0

The solution is given by

w1 = w2 = 1, x1 =−1

sqrt(3), x2 =

1sqrt(3)

Page 689: Analiza Numerica [Utm, Bostan v.]

This yields the formulaZ 1−1

f(x) dx ≈ fµ

−1sqrt(3)

¶+ f

µ1

sqrt(3)

¶(1)

We say it has degree of precision equal to 3 since it

integrates exactly all polynomials of degree ≤ 3. We

can verify directly that it does not integrate exactly

f(x) = x4. Z 1−1

x4 dx = 25

−1sqrt(3)

¶+ f

µ1

sqrt(3)

¶= 29

Thus (1) has degree of precision exactly 3.

EXAMPLE IntegrateZ 1−1

dx

3 + x= log 2

.= 0.69314718

The formula (1) yields

1

3 + x1+

1

3 + x2= 0.69230769

Error = .000839

Page 690: Analiza Numerica [Utm, Bostan v.]

THE GENERAL CASE

We want to find the weights {wi} and nodes {xi} soas to have Z 1

−1f(x) dx ≈

nXj=1

wjf(xj)

be exact for a polynomials f(x) of as large a degreeas possible. As unknowns, there are n weights wi andn nodes xi. Thus it makes sense to initially impose2n conditions so as to obtain 2n equations for the 2nunknowns. We require the quadrature formula to beexact for the cases

f(x) = xi, i = 0, 1, 2, ..., 2n− 1Then we obtain the system of equations

w1xi1 +w2x

i2 + · · ·+ wnx

in =

Z 1−1

xi dx

for i = 0, 1, 2, ..., 2n− 1. For the right sides,Z 1−1

xi dx =

2

i+ 1, i = 0, 2, ..., 2n− 2

0, i = 1, 3, ..., 2n− 1

Page 691: Analiza Numerica [Utm, Bostan v.]

The system of equations

w1xi1 + · · ·+ wnx

in =

Z 1−1

xi dx, i = 0, ..., 2n− 1has a solution, and the solution is unique except for

re-ordering the unknowns. The resulting numerical

integration rule is called Gaussian quadrature.

In fact, the nodes and weights are not found by solv-

ing this system. Rather, the nodes and weights have

other properties which enable them to be found more

easily by other methods. There are programs to pro-

duce them; and most subroutine libraries have either

a program to produce them or tables of them for com-

monly used cases.

Page 692: Analiza Numerica [Utm, Bostan v.]

CHANGE OF INTERVAL

OF INTEGRATION

Integrals on other finite intervals [a, b] can be con-

verted to integrals over [−1, 1], as follows:Z b

aF (x) dx =

b− a

2

Z 1−1

F

Ãb+ a+ t(b− a)

2

!dt

based on the change of integration variables

x =b+ a+ t(b− a)

2, −1 ≤ t ≤ 1

EXAMPLE Over the interval [0, π], use

x = (1 + t) π2

Then Z π

0F (x) dx = π

2

Z 1−1

F³(1 + t) π2

´dt

Page 693: Analiza Numerica [Utm, Bostan v.]

AN ERROR FORMULA

The usual error formula for Gaussian quadrature for-

mula,

En(f) =Z 1−1

f(x) dx−nX

j=1

wjf(xj)

is not particularly intuitive. It is given by

En(f) = enf (2n)(cn)

(2n)!

en =22n+1 (n!)4

(2n+ 1) [(2n)!]2≈ π

4n

for some a ≤ cn ≤ b.

To help in understanding the implications of this error

formula, introduce

Mk = max−1≤x≤1

¯̄̄f (k)(x)

¯̄̄k!

Page 694: Analiza Numerica [Utm, Bostan v.]

With many integrands f(x), this sequence {Mk} isbounded or even decreases to zero. For example,

f(x) =

cosx

1

2 + x

⇒ Mk ≤1

k!1

Then for our error formula,

En(f) = enf (2n)(cn)

(2n)!|En(f)| ≤ enM2n (2)

By other methods, we can show

en ≈ π

4n

When combined with (2) and an assumption of uni-

form boundedness for {Mk}, we have the error de-creases by a factor of at least 4 with each increase of

n to n + 1. Compare this to the convergence of the

trapezoidal and Simpson rules for such functions, to

help explain the very rapid convergence of Gaussian

quadrature.

Page 695: Analiza Numerica [Utm, Bostan v.]

A SECOND ERROR FORMULA

Let f(x) be continuous for a ≤ x ≤ b; let n ≥ 1.

Then, for the Gaussian numerical integration formula

I ≡Z b

af(x) dx ≈

nXj=1

wjf(xj) ≡ In

on [a, b], the error in In satisfies

|I(f)− In(f)| ≤ 2 (b− a) ρ2n−1(f) (3)

Here ρ2n−1(f) is the minimax error of degree 2n− 1for f(x) on [a, b]:

ρm(f) = mindeg(p)≤m

"maxa≤x≤b |f(x)− p(x)|

#, m ≥ 0

Page 696: Analiza Numerica [Utm, Bostan v.]

EXAMPLE Let f(x) = e−x2. Then the minimax er-rors ρm(f) are given in the following table.

m ρm(f) m ρm(f)1 5.30E− 2 6 7.82E− 62 1.79E− 2 7 4.62E− 73 6.63E− 4 8 9.64E− 84 4.63E− 4 9 8.05E− 95 1.62E− 5 10 9.16E− 10

Using this table, apply (3) to

I =Z 10e−x2 dx

For n = 3, (3) implies

|I − I3| ≤ 2ρ5µe−x2

¶.= 3.24× 10−5

The actual error is 9.55E− 6.

Page 697: Analiza Numerica [Utm, Bostan v.]

INTEGRATING

A NON-SMOOTH INTEGRAND

Consider using Gaussian quadrature to evaluate

I =Z 10sqrt(x) dx = 2

3

n I − In Ratio2 −7.22E− 34 −1.16E− 3 6.28 −1.69E− 4 6.916 −2.30E− 5 7.432 −3.00E− 6 7.664 −3.84E− 7 7.8

The column labeled Ratio is defined by

I − I12n

I − In

It is consistent with I−In ≈ c

n3, which can be proven

theoretically. In comparison for the trapezoidal and

Simpson rules, I − In ≈ c

n1.5

Page 698: Analiza Numerica [Utm, Bostan v.]

WEIGHTED GAUSSIAN QUADRATURE

Consider needing to evaluate integrals such asZ 10f(x) log xdx,

Z 10x13f(x) dx

How do we proceed? Consider numerical integration

formulas Z b

aw(x)f(x) dx ≈

nXj=1

wjf(xj)

in which f(x) is considered a “nice” function (one

with several continuous derivatives). The function

w(x) is allowed to be singular, but must be integrable.

We assume here that [a, b] is a finite interval. The

function w(x) is called a “weight function”, and it is

implicitly absorbed into the definition of the quadra-

ture weights {wi}. We again determine the nodes

{xi} and weights {wi} so as to make the integrationformula exact for f(x) a polynomial of as large a de-

gree as possible.

Page 699: Analiza Numerica [Utm, Bostan v.]

The resulting numerical integration formulaZ b

aw(x)f(x) dx ≈

nXj=1

wjf(xj)

is called a Gaussian quadrature formula with weight

function w(x). We determine the nodes {xi} andweights {wi} by requiring exactness in the above for-mula for

f(x) = xi, i = 0, 1, 2, ..., 2n− 1

To make the derivation more understandable, we con-

sider the particular caseZ 10x13f(x) dx ≈

nXj=1

wjf(xj)

We follow the same pattern as used earlier.

Page 700: Analiza Numerica [Utm, Bostan v.]

The case n = 1. We want a formula

w1f(x1) ≈Z 10x13f(x) dx

The weight w1 and the node x1 are to be so chosen

that the formula is exact for polynomials of as large a

degree as possible. Choosing f(x) = 1, we have

w1 =Z 10x13 dx = 3

4

Choosing f(x) = x, we have

w1x1 =

1Z0

x13xdx = 3

7

x1 = 47

Thus Z 10x13f(x) dx ≈ 3

4f³47

´has degree of precision 1.

Page 701: Analiza Numerica [Utm, Bostan v.]

The case n = 2. We want a formula

w1f(x1) +w2f(x2) ≈Z 10x13f(x) dx

The weights w1, w2 and the nodes x1, x2 are to be

so chosen that the formula is exact for polynomials of

as large a degree as possible. We determine them by

requiring equality for

f(x) = 1, x, x2, x3

This leads to the system

w1 +w2 =

1Z0

x13 dx = 3

4

w1x1 + w2x2 =

1Z0

xx13 dx = 3

7

w1x21 + w2x

22 =

1Z0

x2x13 dx = 3

10

w1x31 + w2x

32 =

1Z0

x3x13 dx = 3

13

Page 702: Analiza Numerica [Utm, Bostan v.]

The solution is

x1 =713 − 3

65 sqrt(35), x2 =713 +

365 sqrt(35)

w1 =38 − 3

392 sqrt(35), w2 =38 +

3392 sqrt(35)

Numerically,

x1 = .2654117024, x2 = .8115113746w1 = .3297238792, w2 = .4202761208

The formulaZ 10x13f(x) dx ≈ w1f(x1) +w2f(x2) (4)

has degree of precision 3.

Page 703: Analiza Numerica [Utm, Bostan v.]

EXAMPLE Consider evaluating the integralZ 10x13 cosx dx (5)

In applying (4), we take f(x) = cosx. Then

w1f(x1) + w2f(x2) = 0.6074977951

The true answer isZ 10x13 cosx dx

.= 0.6076257393

and our numerical answer is in error by E2.= .000128.

This is quite a good answer involving very little com-

putational effort (once the formula has been deter-

mined). In contrast, the trapezoidal and Simpson

rules applied to (5) would converge very slowly be-

cause the first derivative of the integrand is singular

at the origin.

Page 704: Analiza Numerica [Utm, Bostan v.]

CHANGE OF VARIABLES

As a side note to the preceding example, we observe

that the change of variables x = t3 transforms the

integral (5) to

3Z 10t3 cos

³t3´dt

and both the trapezoidal and Simpson rules will per-

form better with this formula, although still not as

good as our weighted Gaussian quadrature.

A change of the integration variable can often im-

prove the performance of a standard method, usually

by increasing the differentiability of the integrand.

EXAMPLE Using x = tr for some r > 1, we haveZ 10g(x) log x dx = r

Z 10tr−1g (tr) log t dt

The new integrand is generally smoother than the

original one.

Page 705: Analiza Numerica [Utm, Bostan v.]
Page 706: Analiza Numerica [Utm, Bostan v.]
Page 707: Analiza Numerica [Utm, Bostan v.]
Page 708: Analiza Numerica [Utm, Bostan v.]

INTERPOLATION

Interpolation is a process of finding a formula (oftena polynomial) whose graph will pass through a givenset of points (x, y).

As an example, consider defining

x0 = 0, x1 =π

4, x2 =

π

2and

yi = cosxi, i = 0, 1, 2

This gives us the three points

(0, 1) ,µπ4 ,

1sqrt(2)

¶,

³π2 , 0

´Now find a quadratic polynomial

p(x) = a0 + a1x+ a2x2

for which

p(xi) = yi, i = 0, 1, 2

The graph of this polynomial is shown on the accom-panying graph. We later give an explicit formula.

Page 709: Analiza Numerica [Utm, Bostan v.]

Quadratic interpolation of cos(x)

x

y

π/4 π/2

y = cos(x)y = p2(x)

Page 710: Analiza Numerica [Utm, Bostan v.]

PURPOSES OF INTERPOLATION

1. Replace a set of data points {(xi, yi)} with a func-tion given analytically.

2. Approximate functions with simpler ones, usually

polynomials or ‘piecewise polynomials’.

Purpose #1 has several aspects.

• The data may be from a known class of functions.Interpolation is then used to find the member of

this class of functions that agrees with the given

data. For example, data may be generated from

functions of the form

p(x) = a0 + a1ex + a2e

2x + · · ·+ anenx

Then we need to find the coefficientsnajobased

on the given data values.

Page 711: Analiza Numerica [Utm, Bostan v.]

• We may want to take function values f(x) givenin a table for selected values of x, often equally

spaced, and extend the function to values of x

not in the table.

For example, given numbers from a table of loga-

rithms, estimate the logarithm of a number x not

in the table.

• Given a set of data points {(xi, yi)}, find a curvepassing thru these points that is “pleasing to the

eye”. In fact, this is what is done continually with

computer graphics. How do we connect a set of

points to make a smooth curve? Connecting them

with straight line segments will often give a curve

with many corners, whereas what was intended

was a smooth curve.

Page 712: Analiza Numerica [Utm, Bostan v.]

Purpose #2 for interpolation is to approximate func-

tions f(x) by simpler functions p(x), perhaps to make

it easier to integrate or differentiate f(x). That will

be the primary reason for studying interpolation in this

course.

As as example of why this is important, consider the

problem of evaluating

I =Z 10

dx

1 + x10

This is very difficult to do analytically. But we will

look at producing polynomial interpolants of the inte-

grand; and polynomials are easily integrated exactly.

We begin by using polynomials as our means of doing

interpolation. Later in the chapter, we consider more

complex ‘piecewise polynomial’ functions, often called

‘spline functions’.

Page 713: Analiza Numerica [Utm, Bostan v.]

LINEAR INTERPOLATION

The simplest form of interpolation is probably thestraight line, connecting two points by a straight line.

Let two data points (x0, y0) and (x1, y1) be given.

There is a unique straight line passing through these

points. We can write the formula for a straight lineas

P1(x) = a0 + a1x

In fact, there are other more convenient ways to write

it, and we give several of them below.

P1(x) =x− x1x0 − x1

y0 +x− x0x1 − x0

y1

=(x1 − x) y0 + (x− x0) y1

x1 − x0

= y0 +x− x0x1 − x0

[y1 − y0]

= y0 +

Ãy1 − y0x1 − x0

!(x− x0)

Check each of these by evaluating them at x = x0and x1 to see if the respective values are y0 and y1.

Page 714: Analiza Numerica [Utm, Bostan v.]

Example. Following is a table of values for f(x) =tanx for a few values of x.

x 1 1.1 1.2 1.3tanx 1.5574 1.9648 2.5722 3.6021

Use linear interpolation to estimate tan(1.15). Then

use

x0 = 1.1, x1 = 1.2

with corresponding values for y0 and y1. Then

tanx ≈ y0 +x− x0x1 − x0

[y1 − y0]

tanx ≈ y0 +x− x0x1 − x0

[y1 − y0]

tan (1.15) ≈ 1.9648 +1.15− 1.11.2− 1.1 [2.5722− 1.9648]

= 2.2685

The true value is tan 1.15 = 2.2345. We will want

to examine formulas for the error in interpolation, to

know when we have sufficient accuracy in our inter-

polant.

Page 715: Analiza Numerica [Utm, Bostan v.]

x

y

1 1.3

y=tan(x)

x

y

1.1 1.2

y = tan(x)y = p1(x)

Page 716: Analiza Numerica [Utm, Bostan v.]

QUADRATIC INTERPOLATION

We want to find a polynomial

P2(x) = a0 + a1x+ a2x2

which satisfies

P2(xi) = yi, i = 0, 1, 2

for given data points (x0, y0) , (x1, y1) , (x2, y2). One

formula for such a polynomial follows:

P2(x) = y0L0(x) + y1L1(x) + y2L2(x) (∗∗)with

L0(x) =(x−x1)(x−x2)(x0−x1)(x0−x2), L1(x) =

(x−x0)(x−x2)(x1−x0)(x1−x2)

L2(x) =(x−x0)(x−x1)(x2−x0)(x2−x1)

The formula (∗∗) is called Lagrange’s form of the in-

terpolation polynomial.

Page 717: Analiza Numerica [Utm, Bostan v.]

LAGRANGE BASIS FUNCTIONS

The functions

L0(x) =(x−x1)(x−x2)(x0−x1)(x0−x2), L1(x) =

(x−x0)(x−x2)(x1−x0)(x1−x2)

L2(x) =(x−x0)(x−x1)(x2−x0)(x2−x1)

are called ‘Lagrange basis functions’ for quadratic in-

terpolation. They have the properties

Li(xj) =

(1, i = j0, i 6= j

for i, j = 0, 1, 2. Also, they all have degree 2. Their

graphs are on an accompanying page.

As a consequence of each Li(x) being of degree 2, we

have that the interpolant

P2(x) = y0L0(x) + y1L1(x) + y2L2(x)

must have degree ≤ 2.

Page 718: Analiza Numerica [Utm, Bostan v.]

UNIQUENESS

Can there be another polynomial, call it Q(x), forwhich

deg(Q) ≤ 2Q(xi) = yi, i = 0, 1, 2

Thus, is the Lagrange formula P2(x) unique?

Introduce

R(x) = P2(x)−Q(x)

From the properties of P2 and Q, we have deg(R) ≤2. Moreover,

R(xi) = P2(xi)−Q(xi) = yi − yi = 0

for all three node points x0, x1, and x2. How manypolynomials R(x) are there of degree at most 2 andhaving three distinct zeros? The answer is that onlythe zero polynomial satisfies these properties, and there-fore

R(x) = 0 for all x

Q(x) = P2(x) for all x

Page 719: Analiza Numerica [Utm, Bostan v.]

SPECIAL CASES

Consider the data points

(x0, 1), (x1, 1), (x2, 1)

What is the polynomial P2(x) in this case?

Answer: We must have the polynomial interpolant is

P2(x) ≡ 1meaning that P2(x) is the constant function. Why?First, the constant function satisfies the property ofbeing of degree ≤ 2. Next, it clearly interpolates thegiven data. Therefore by the uniqueness of quadraticinterpolation, P2(x) must be the constant function 1.

Consider now the data points

(x0,mx0), (x1,mx1), (x2,mx2)

for some constant m. What is P2(x) in this case? Byan argument similar to that above,

P2(x) = mx for all x

Thus the degree of P2(x) can be less than 2.

Page 720: Analiza Numerica [Utm, Bostan v.]

HIGHER DEGREE INTERPOLATION

We consider now the case of interpolation by poly-nomials of a general degree n. We want to find apolynomial Pn(x) for which

deg(Pn) ≤ nPn(xi) = yi, i = 0, 1, · · · , n (∗∗)

with given data points

(x0, y0) , (x1, y1) , · · · , (xn, yn)The solution is given by Lagrange’s formula

Pn(x) = y0L0(x) + y1L1(x) + · · ·+ ynLn(x)

The Lagrange basis functions are given by

Lk(x) =(x− x0) ..(x− xk−1)(x− xk+1).. (x− xn)

(xk − x0) ..(xk − xk−1)(xk − xk+1).. (xk − xn)

for k = 0, 1, 2, ..., n. The quadratic case was coveredearlier.

In a manner analogous to the quadratic case, we canshow that the above Pn(x) is the only solution to theproblem (∗∗).

Page 721: Analiza Numerica [Utm, Bostan v.]

In the formula

Lk(x) =(x− x0) ..(x− xk−1)(x− xk+1).. (x− xn)

(xk − x0) ..(xk − xk−1)(xk − xk+1).. (xk − xn)

we can see that each such function is a polynomial of

degree n. In addition,

Lk(xi) =

(1, k = i0, k 6= i

Using these properties, it follows that the formula

Pn(x) = y0L0(x) + y1L1(x) + · · ·+ ynLn(x)

satisfies the interpolation problem of finding a solution

to

deg(Pn) ≤ nPn(xi) = yi, i = 0, 1, · · · , n

Page 722: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Recall the table

x 1 1.1 1.2 1.3tanx 1.5574 1.9648 2.5722 3.6021

We now interpolate this table with the nodes

x0 = 1, x1 = 1.1, x2 = 1.2, x3 = 1.3

Without giving the details of the evaluation process,

we have the following results for interpolation with

degrees n = 1, 2, 3.

n 1 2 3Pn(1.15) 2.2685 2.2435 2.2296Error −.0340 −.0090 .0049

It improves with increasing degree n, but not at a very

rapid rate. In fact, the error becomes worse when n is

increased further. Later we will see that interpolation

of a much higher degree, say n ≥ 10, is often poorly

behaved when the node points {xi} are evenly spaced.

Page 723: Analiza Numerica [Utm, Bostan v.]

A FIRST ORDER DIVIDED DIFFERENCE

For a given function f(x) and two distinct points x0and x1, define

f [x0, x1] =f(x1)− f(x0)

x1 − x0

This is called a first order divided difference of f(x).

By the Mean-value theorem,

f(x1)− f(x0) = f 0(c) (x1 − x0)

for some c between x0 and x1. Thus

f [x0, x1] = f 0(c)and the divided difference in very much like the deriv-

ative, especially if x0 and x1 are quite close together.

In fact,

f 0µx1 + x02

¶≈ f [x0, x1]

is quite an accurate approximation of the derivative

(see §5.4).

Page 724: Analiza Numerica [Utm, Bostan v.]

SECOND ORDER DIVIDED DIFFERENCES

Given three distinct points x0, x1, and x2, define

f [x0, x1, x2] =f [x1, x2]− f [x0, x1]

x2 − x0

This is called the second order divided difference of

f(x).

By a fairly complicated argument, we can show

f [x0, x1, x2] =1

2f 00(c)

for some c intermediate to x0, x1, and x2. In fact, as

we investigate in §5.4,f 00 (x1) ≈ 2f [x0, x1, x2]

in the case the nodes are evenly spaced,

x1 − x0 = x2 − x1

Page 725: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Consider the table

x 1 1.1 1.2 1.3 1.4cosx .54030 .45360 .36236 .26750 .16997

Let x0 = 1, x1 = 1.1, and x2 = 1.2. Then

f [x0, x1] =.45360− .54030

1.1− 1 = −.86700

f [x1, x2] =.36236− .45360

1.1− 1 = −.91240

f [x0, x1, x2] =f [x1, x2]− f [x0, x1]

x2 − x0

=−.91240− (−.86700)

1.2− 1.0 = −.22700For comparison,

f 0µx1 + x02

¶= − sin (1.05) = −.86742

1

2f 00 (x1) = −1

2cos (1.1) = −.22680

Page 726: Analiza Numerica [Utm, Bostan v.]

GENERAL DIVIDED DIFFERENCES

Given n + 1 distinct points x0, ..., xn, with n ≥ 2,

define

f [x0, ..., xn] =f [x1, ..., xn]− f [x0, ..., xn−1]

xn − x0

This is a recursive definition of the nth-order divided

difference of f(x), using divided differences of order

n. Its relation to the derivative is as follows:

f [x0, ..., xn] =1

n!f (n)(c)

for some c intermediate to the points {x0, ..., xn}. LetI denote the interval

I = [min {x0, ..., xn} ,max {x0, ..., xn}]Then c ∈ I, and the above result is based on the

assumption that f(x) is n-times continuously differ-

entiable on the interval I.

Page 727: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

The following table gives divided differences for the

data in

x 1 1.1 1.2 1.3 1.4cosx .54030 .45360 .36236 .26750 .16997

For the column headings, we use

Dkf(xi) = f [xi, ..., xi+k]

i xi f(xi) Df(xi) D2f(xi) D3f(xi) D4f(xi)0 1.0 .54030 -.8670 -.2270 .1533 .01251 1.1 .45360 -.9124 -.1810 .15832 1.2 .36236 -.9486 -.13353 1.3 .26750 -.97534 1.4 .16997

These were computed using the recursive definition

f [x0, ..., xn] =f [x1, ..., xn]− f [x0, ..., xn−1]

xn − x0

Page 728: Analiza Numerica [Utm, Bostan v.]

ORDER OF THE NODES

Looking at f [x0, x1], we have

f [x0, x1] =f(x1)− f(x0)

x1 − x0=

f(x0)− f(x1)

x0 − x1= f [x1, x0]

The order of x0 and x1 does not matter. Looking at

f [x0, x1, x2] =f [x1, x2]− f [x0, x1]

x2 − x0

we can expand it to get

f [x0, x1, x2] =f(x0)

(x0 − x1) (x0 − x2)

+f(x1)

(x1 − x0) (x1 − x2)+

f(x2)

(x2 − x0) (x2 − x1)

With this formula, we can show that the order of the

arguments x0, x1, x2 does not matter in the final value

of f [x0, x1, x2] we obtain. Mathematically,

f [x0, x1, x2] = f [xi0, xi1, xi2]

for any permutation (i0, i1, i2) of (0, 1, 2).

Page 729: Analiza Numerica [Utm, Bostan v.]

We can show in general that the value of f [x0, ..., xn]

is independent of the order of the arguments {x0, ..., xn},even though the intermediate steps in its calculations

using

f [x0, ..., xn] =f [x1, ..., xn]− f [x0, ..., xn−1]

xn − x0

are order dependent.

We can show

f [x0, ..., xn] = f [xi0, ..., xin]

for any permutation (i0, i1, ..., in) of (0, 1, ..., n).

Page 730: Analiza Numerica [Utm, Bostan v.]

COINCIDENT NODES

What happens when some of the nodes {x0, ..., xn}are not distinct. Begin by investigating what happens

when they all come together as a single point x0.

For first order divided differences, we have

limx1→x0

f [x0, x1] = limx1→x0

f(x1)− f(x0)

x1 − x0= f 0(x0)

We extend the definition of f [x0, x1] to coincident

nodes using

f [x0, x0] = f 0(x0)

Page 731: Analiza Numerica [Utm, Bostan v.]

For second order divided differences, recall

f [x0, x1, x2] =1

2f 00(c)

with c intermediate to x0, x1, and x2.

Then as x1 → x0 and x2 → x0, we must also have

that c→ x0. Therefore,

limx1→x0x2→x0

f [x0, x1, x2] =1

2f 00(x0)

We therefore define

f [x0, x0, x0] =1

2f 00(x0)

Page 732: Analiza Numerica [Utm, Bostan v.]

For the case of general f [x0, ..., xn], recall that

f [x0, ..., xn] =1

n!f (n)(c)

for some c intermediate to {x0, ..., xn}. Then

lim{x1,...,xn}→x0

f [x0, ..., xn] =1

n!f (n)(x0)

and we define

f [x0, ..., x0| {z }]n+1 times

=1

n!f (n)(x0)

What do we do when only some of the nodes are

coincident. This too can be dealt with, although we

do so here only by examples.

f [x0, x1, x1] =f [x1, x1]− f [x0, x1]

x1 − x0

=f 0(x1)− f [x0, x1]

x1 − x0The recursion formula can be used in general in this

way to allow all possible combinations of possibly co-

incident nodes.

Page 733: Analiza Numerica [Utm, Bostan v.]

LAGRANGE’S FORMULA FOR

THE INTERPOLATION POLYNOMIAL

Recall the general interpolation problem: find a poly-

nomial Pn(x) for which

deg(Pn) ≤ nPn(xi) = yi, i = 0, 1, · · · , n

with given data points

(x0, y0) , (x1, y1) , · · · , (xn, yn)and with {x0, ..., xn} distinct points.

In §5.1, we gave the solution as Lagrange’s formulaPn(x) = y0L0(x) + y1L1(x) + · · ·+ ynLn(x)

with {L0(x), ..., Ln(x)} the Lagrange basis polynomi-als. Each Lj is of degree n and it satisfies

Lj(xi) =

(1, j = i0, j 6= i

for i = 0, 1, ..., n.

Page 734: Analiza Numerica [Utm, Bostan v.]

THE NEWTON DIVIDED DIFFERENCE FORM

OF THE INTERPOLATION POLYNOMIAL

Let the data values for the problem

deg(Pn) ≤ nPn(xi) = yi, i = 0, 1, · · · , n

be generated from a function f(x):

yi = f(xi), i = 0, 1, ..., n

Using the divided differences

f [x0, x1], f [x0, x1, x2], ..., f [x0, ..., xn]

we can write the interpolation polynomials

P1(x), P2(x), ..., Pn(x)

in a way that is simple to compute.

P1(x) = f(x0) + f [x0, x1] (x− x0)P2(x) = f(x0) + f [x0, x1] (x− x0)

+f [x0, x1, x2] (x− x0) (x− x1)= P1(x) + f [x0, x1, x2] (x− x0) (x− x1)

Page 735: Analiza Numerica [Utm, Bostan v.]

For the case of the general problem

deg(Pn) ≤ nPn(xi) = yi, i = 0, 1, · · · , n

we have

Pn(x) = f(x0) + f [x0, x1] (x− x0)+f [x0, x1, x2] (x− x0) (x− x1)+f [x0, x1, x2, x3] (x− x0) (x− x1) (x− x2)+ · · ·+f [x0, ..., xn] (x− x0) · · · (x− xn−1)

From this we have the recursion relation

Pn(x) = Pn−1(x)+f [x0, ..., xn] (x− x0) · · · (x− xn−1)

in which Pn−1(x) interpolates f(x) at the points in{x0, ..., xn−1}.

Page 736: Analiza Numerica [Utm, Bostan v.]

Example: Recall the table

i xi f(xi) Df(xi) D2f(xi) D3f(xi) D4f(xi)0 1.0 .54030 -.8670 -.2270 .1533 .01251 1.1 .45360 -.9124 -.1810 .15832 1.2 .36236 -.9486 -.13353 1.3 .26750 -.97534 1.4 .16997

withDkf(xi) = f [xi, ..., xi+k], k = 1, 2, 3, 4. Then

P1(x) = .5403− .8670 (x− 1)P2(x) = P1(x)− .2270 (x− 1) (x− 1.1)P3(x) = P2(x) + .1533 (x− 1) (x− 1.1) (x− 1.2)P4(x) = P3(x)

+.0125 (x− 1) (x− 1.1) (x− 1.2) (x− 1.3)Using this table and these formulas, we have the fol-

lowing table of interpolants for the value x = 1.05.

The true value is cos(1.05) = .49757105.

n 1 2 3 4Pn(1.05) .49695 .49752 .49758 .49757Error 6.20E−4 5.00E−5 −1.00E−5 0.0

Page 737: Analiza Numerica [Utm, Bostan v.]

EVALUATION OF THE DIVIDED DIFFERENCE

INTERPOLATION POLYNOMIAL

Let

d1 = f [x0, x1]d2 = f [x0, x1, x2]

...dn = f [x0, ..., xn]

Then the formula

Pn(x) = f(x0) + f [x0, x1] (x− x0)+f [x0, x1, x2] (x− x0) (x− x1)+f [x0, x1, x2, x3] (x− x0) (x− x1) (x− x2)+ · · ·+f [x0, ..., xn] (x− x0) · · · (x− xn−1)

can be written as

Pn(x) = f(x0) + (x− x0) (d1 + (x− x1) (d2 + · · ·+(x− xn−2) (dn−1 + (x− xn−1) dn) · · · )

Thus we have a nested polynomial evaluation, and

this is quite efficient in computational cost.

Page 738: Analiza Numerica [Utm, Bostan v.]

ERROR IN LINEAR INTERPOLATION

Let P1(x) denote the linear polynomial interpolating

f(x) at x0 and x1, with f(x) a given function (e.g.

f(x) = cosx). What is the error f(x)− P1(x)?

Let f(x) be twice continuously differentiable on an in-

terval [a, b] which contains the points {x0, x1}. Thenfor a ≤ x ≤ b,

f(x)− P1(x) =(x− x0) (x− x1)

2f 00(cx)

for some cx between the minimum and maximum of

x0, x1, and x.

If x1 and x are ‘close to x0’, then

f(x)− P1(x) ≈(x− x0) (x− x1)

2f 00(x0)

Thus the error acts like a quadratic polynomial, with

zeros at x0 and x1.

Page 739: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Let f(x) = log10 x; and in line with typical tables of

log10 x, we take 1 ≤ x, x0, x1 ≤ 10. For definiteness,let x0 < x1 with h = x1 − x0. Then

f 00(x) = −log10 ex2

log10 x− P1(x) =(x− x0) (x− x1)

2

"−log10 e

c2x

#

= (x− x0) (x1 − x)

"log10 e

2c2x

#We usually are interpolating with x0 ≤ x ≤ x1; and

in that case, we have

(x− x0) (x1 − x) ≥ 0, x0 ≤ cx ≤ x1

Page 740: Analiza Numerica [Utm, Bostan v.]

(x− x0) (x1 − x) ≥ 0, x0 ≤ cx ≤ x1

and therefore

(x− x0) (x1 − x)

"log10 e

2x21

#≤ log10 x− P1(x)

≤ (x− x0) (x1 − x)

"log10 e

2x20

#

For h = x1 − x0 small, we have for x0 ≤ x ≤ x1

log10 x− P1(x) ≈ (x− x0) (x1 − x)

"log10 e

2x20

#

Typical high school algebra textbooks contain tables

of log10 x with a spacing of h = .01. What is the

error in this case? To look at this, we use

0 ≤ log10 x− P1(x) ≤ (x− x0) (x1 − x)

"log10 e

2x20

#

Page 741: Analiza Numerica [Utm, Bostan v.]

By simple geometry or calculus,

maxx0≤x≤x1

(x− x0) (x1 − x) ≤ h2

4

Therefore,

0 ≤ log10 x− P1(x) ≤h2

4

"log10 e

2x20

#.= .0543

h2

x20

If we want a uniform bound for all points 1 ≤ x0 ≤ 10,we have

0 ≤ log10 x− P1(x) ≤h2 log10 e

8

.= .0543h2

0 ≤ log10 x− P1(x) ≤ .0543h2

For h = .01, as is typical of the high school text book

tables of log10 x,

0 ≤ log10 x− P1(x) ≤ 5.43× 10−6

Page 742: Analiza Numerica [Utm, Bostan v.]

If you look at most tables, a typical entry is given to

only four decimal places to the right of the decimal

point, e.g.

log 5.41.= .7332

Therefore the entries are in error by as much as .00005.

Comparing this with the interpolation error, we see the

latter is less important than the rounding errors in the

table entries.

From the bound

0 ≤ log10 x− P1(x) ≤h2 log10 e

8x20

.= .0543

h2

x20

we see the error decreases as x0 increases, and it is

about 100 times smaller for points near 10 than for

points near 1.

Page 743: Analiza Numerica [Utm, Bostan v.]

AN ERROR FORMULA:

THE GENERAL CASE

Recall the general interpolation problem: find a poly-

nomial Pn(x) for which deg(Pn) ≤ n

Pn(xi) = f(xi), i = 0, 1, · · · , nwith distinct node points {x0, ..., xn} and a givenfunction f(x). Let [a, b] be a given interval on which

f(x) is (n+ 1)-times continuously differentiable; and

assume the points x0, ..., xn, and x are contained in

[a, b]. Then

f(x)−Pn(x) = (x− x0) (x− x1) · · · (x− xn)

(n+ 1)!f (n+1) (cx)

with cx some point between the minimum and maxi-

mum of the points in {x, x0, ..., xn}.

Page 744: Analiza Numerica [Utm, Bostan v.]

f(x)−Pn(x) = (x− x0) (x− x1) · · · (x− xn)

(n+ 1)!f (n+1) (cx)

As shorthand, introduce

Ψn(x) = (x− x0) (x− x1) · · · (x− xn)

a polynomial of degree n+ 1 with roots {x0, ..., xn}.Then

f(x)− Pn(x) =Ψn(x)

(n+ 1)!f (n+1) (cx)

Page 745: Analiza Numerica [Utm, Bostan v.]

THE QUADRATIC CASE

For n = 2, we have

f(x)− P2(x) =(x− x0) (x− x1) (x− x2)

3!f (3) (cx)

(*)

with cx some point between the minimum and maxi-

mum of the points in {x, x0, x1, x2}.

To illustrate the use of this formula, consider the case

of evenly spaced nodes:

x1 = x0 + h, x2 = x1 + h

Further suppose we have x0 ≤ x ≤ x2, as we would

usually have when interpolating in a table of given

function values (e.g. log10 x). The quantity

Ψ2(x) = (x− x0) (x− x1) (x− x2)

can be evaluated directly for a particular x.

Page 746: Analiza Numerica [Utm, Bostan v.]

Graph of

Ψ2(x) = (x+ h)x (x− h)

using (x0, x1, x2) = (−h, 0, h):

x

y

h

-h

Page 747: Analiza Numerica [Utm, Bostan v.]

In the formula (∗), however, we do not know cx, and

therefore we replace¯̄̄f (3) (cx)

¯̄̄with a maximum of¯̄̄

f (3) (x)¯̄̄as x varies over x0 ≤ x ≤ x2. This yields

|f(x)− P2(x)| ≤|Ψ2(x)|3!

maxx0≤x≤x2

¯̄̄f (3) (x)

¯̄̄(**)

If we want a uniform bound for x0 ≤ x ≤ x2, we must

compute

maxx0≤x≤x2

|Ψ2(x)| = maxx0≤x≤x2

|(x− x0) (x− x1) (x− x2)|

Using calculus,

maxx0≤x≤x2

|Ψ2(x)| =2h3

3 sqrt(3), at x = x1±

h

sqrt(3)

Combined with (∗∗), this yields

|f(x)− P2(x)| ≤h3

9 sqrt(3)max

x0≤x≤x2

¯̄̄f (3) (x)

¯̄̄for x0 ≤ x ≤ x2.

Page 748: Analiza Numerica [Utm, Bostan v.]

For f(x) = log10 x, with 1 ≤ x0 ≤ x ≤ x2 ≤ 10, thisleads to

|log10 x− P2(x)| ≤h3

9 sqrt(3)· maxx0≤x≤x2

2 log10 e

x3

=.05572h3

x30

For the case of h = .01, we have

|log10 x− P2(x)| ≤5.57× 10−8

x30≤ 5.57× 10−8

Page 749: Analiza Numerica [Utm, Bostan v.]

Question: How much larger could we make h so that

quadratic interpolation would have an error compa-

rable to that of linear interpolation of log10 x with

h = .01? The error bound for the linear interpolation

was 5.43× 10−6, and therefore we want the same tobe true of quadratic interpolation. Using a simpler

bound, we want to find h so that

|log10 x− P2(x)| ≤ .05572h3 ≤ 5× 10−6

This is true if h = .04477. Therefore a spacing of

h = .04 would be sufficient. A table with this spac-

ing and quadratic interpolation would have an error

comparable to a table with h = .01 and linear inter-

polation.

Page 750: Analiza Numerica [Utm, Bostan v.]

For the case of general n,

f(x)− Pn(x) =(x− x0) · · · (x− xn)

(n+ 1)!f (n+1) (cx)

=Ψn(x)

(n+ 1)!f (n+1) (cx)

Ψn(x) = (x− x0) (x− x1) · · · (x− xn)

with cx some point between the minimum and max-

imum of the points in {x, x0, ..., xn}. When bound-ing the error we replace f (n+1) (cx) with its maximum

over the interval containing {x, x0, ..., xn}, as we haveillustrated earlier in the linear and quadratic cases.

Consider now the function

Ψn(x)

(n+ 1)!

over the interval determined by the minimum and

maximum of the points in {x, x0, ..., xn}. For evenlyspaced node points on [0, 1], with x0 = 0 and xn = 1,

we give graphs for n = 2, 3, 4, 5 and for n = 6, 7, 8, 9

on accompanying pages.

Page 751: Analiza Numerica [Utm, Bostan v.]

DISCUSSION OF ERROR

Consider the error

f(x)− Pn(x) =(x− x0) · · · (x− xn)

(n+ 1)!f (n+1) (cx)

=Ψn(x)

(n+ 1)!f (n+1) (cx)

Ψn(x) = (x− x0) (x− x1) · · · (x− xn)

as n increases and as x varies. As noted previously, we

cannot do much with f (n+1) (cx) except to replace it

with a maximum value of¯̄̄f (n+1) (x)

¯̄̄over a suitable

interval. Thus we concentrate on understanding the

size of

Ψn(x)

(n+ 1)!

Page 752: Analiza Numerica [Utm, Bostan v.]

ERROR FOR EVENLY SPACED NODES

We consider first the case in which the node points

are evenly spaced, as this seems the ‘natural’ way to

define the points at which interpolation is carried out.

Moreover, using evenly spaced nodes is the case to

consider for table interpolation. What can we learn

from the given graphs?

The interpolation nodes are determined by using

h =1

n, x0 = 0, x1 = h, x2 = 2h, ..., xn = nh = 1

For this case,

Ψn(x) = x (x− h) (x− 2h) · · · (x− 1)Our graphs are the cases of n = 2, ..., 9.

Page 753: Analiza Numerica [Utm, Bostan v.]

x

y n = 2

1x

y n = 3

1

x

y n = 4

1

x

y n = 5

1

Graphs of Ψn(x) on [0, 1] for n = 2, 3, 4, 5

Page 754: Analiza Numerica [Utm, Bostan v.]

x

y n = 6

1

x

y n = 7

1

x

y n = 8

1

x

y n = 9

1

Graphs of Ψn(x) on [0, 1] for n = 6, 7, 8, 9

Page 755: Analiza Numerica [Utm, Bostan v.]

Graph of

Ψ6(x) = (x− x0) (x− x1) · · · (x− x6)

with evenly spaced nodes:

xx0 x1 x2 x3 x4 x5 x6

Page 756: Analiza Numerica [Utm, Bostan v.]

Using the following table

,

n Mn n Mn

1 1.25E−1 6 4.76E−72 2.41E−2 7 2.20E−83 2.06E−3 8 9.11E−104 1.48E−4 9 3.39E−115 9.01E−6 10 1.15E−12

we can observe that the maximum

Mn ≡ maxx0≤x≤xn

|Ψn(x)|(n+ 1)!

becomes smaller with increasing n.

Page 757: Analiza Numerica [Utm, Bostan v.]

From the graphs, there is enormous variation in the

size of Ψn(x) as x varies over [0, 1]; and thus there

is also enormous variation in the error as x so varies.

For example, in the n = 9 case,

maxx0≤x≤x1

|Ψn(x)|(n+ 1)!

= 3.39× 10−11

maxx4≤x≤x5

|Ψn(x)|(n+ 1)!

= 6.89× 10−13

and the ratio of these two errors is approximately 49.

Thus the interpolation error is likely to be around 49

times larger when x0 ≤ x ≤ x1 as compared to the

case when x4 ≤ x ≤ x5. When doing table inter-

polation, the point x at which you are interpolating

should be centrally located with respect to the inter-

polation nodes m{x0, ..., xn} being used to define theinterpolation, if possible.

Page 758: Analiza Numerica [Utm, Bostan v.]

AN APPROXIMATION PROBLEM

Consider now the problem of using an interpolation

polynomial to approximate a given function f(x) on

a given interval [a, b]. In particular, take interpolation

nodes

a ≤ x0 < x1 < · · · < xn−1 < xn ≤ b

and produce the interpolation polynomial Pn(x) that

interpolates f(x) at the given node points. We would

like to have

maxa≤x≤b |f(x)− Pn(x)|→ 0 as n→∞

Does it happen?

Recall the error bound

maxa≤x≤b |f(x)− Pn(x)|

≤ maxa≤x≤b

|Ψn(x)|(n+ 1)!

· maxa≤x≤b

¯̄̄f (n+1) (x)

¯̄̄We begin with an example using evenly spaced node

points.

Page 759: Analiza Numerica [Utm, Bostan v.]

RUNGE’S EXAMPLE

Use evenly spaced node points:

h =b− a

n, xi = a+ ih for i = 0, ..., n

For some functions, such as f(x) = ex, the maximumerror goes to zero quite rapidly. But the size of thederivative term f (n+1)(x) in

maxa≤x≤b |f(x)− Pn(x)|

≤ maxa≤x≤b

|Ψn(x)|(n+ 1)!

· maxa≤x≤b

¯̄̄f (n+1) (x)

¯̄̄can badly hurt or destroy the convergence of othercases.

In particular, we show the graph of f(x) = 1/³1 + x2

´and Pn(x) on [−5, 5] for the cases n = 8 and n = 12.The case n = 10 is in the text on page 127. It canbe proven that for this function, the maximum er-ror on [−5, 5] does not converge to zero. Thus theuse of evenly spaced nodes is not necessarily a goodapproach to approximating a function f(x) by inter-polation.

Page 760: Analiza Numerica [Utm, Bostan v.]

Runge’s example with n = 10:

x

y

y=P10(x)

y=1/(1+x2)

Page 761: Analiza Numerica [Utm, Bostan v.]

OTHER CHOICES OF NODES

Recall the general error bound

maxa≤x≤b |f(x)− Pn(x)| ≤ max

a≤x≤b|Ψn(x)|(n+ 1)!

· maxa≤x≤b

¯̄̄f (n+1) (x)

¯̄̄There is nothing we really do with the derivative term

for f ; but we can examine the way of defining the

nodes {x0, ..., xn} within the interval [a, b]. We askhow these nodes can be chosen so that the maximum

of |Ψn(x)| over [a, b] is made as small as possible.

Page 762: Analiza Numerica [Utm, Bostan v.]

This problem has quite an elegant solution, and it is

taken up in §4.6. The node points {x0, ..., xn} turnout to be the zeros of a particular polynomial Tn+1(x)

of degree n+1, called a Chebyshev polynomial. These

zeros are known explicitly, and with them

maxa≤x≤b |Ψn(x)| =

µb− a

2

¶n+12−n

This turns out to be smaller than for evenly spaced

cases; and although this polynomial interpolation does

not work for all functions f(x), it works for all differ-

entiable functions and more.

Page 763: Analiza Numerica [Utm, Bostan v.]

ANOTHER ERROR FORMULA

Recall the error formula

f(x)− Pn(x) =Ψn(x)

(n+ 1)!f (n+1) (c)

Ψn(x) = (x− x0) (x− x1) · · · (x− xn)

with c between the minimum and maximum of {x0, ..., xn, x}.A second formula is given by

f(x)− Pn(x) = Ψn(x) f [x0, ..., xn, x]

To show this is a simple, but somewhat subtle argu-

ment.

Let Pn+1(x) denote the polynomial of degree ≤ n+1

which interpolates f(x) at the points {x0, ..., xn, xn+1}.Then

Pn+1(x) = Pn(x)

+f [x0, ..., xn, xn+1] (x− x0) · · · (x− xn)

Page 764: Analiza Numerica [Utm, Bostan v.]

Substituting x = xn+1, and using the fact that Pn+1(x)

interpolates f(x) at xn+1, we have

f(xn+1) = Pn(xn+1)

+f [x0, ..., xn, xn+1] (xn+1 − x0) · · · (xn+1 − xn)

f(xn+1) = Pn(xn+1)

+f [x0, ..., xn, xn+1] (xn+1 − x0) · · · (xn+1 − xn)

In this formula, the number xn+1 is completely ar-

bitrary, other than being distinct from the points in

{x0, ..., xn}. To emphasize this fact, replace xn+1 byx throughout the formula, obtaining

f(x) = Pn(x) + f [x0, ..., xn, x] (x− x0) · · · (x− xn)

= Pn(x) +Ψn(x) f [x0, ..., xn, x]

provided x 6= x0, ..., xn.

Page 765: Analiza Numerica [Utm, Bostan v.]

The formula

f(x) = Pn(x) + f [x0, ..., xn, x] (x− x0) · · · (x− xn)

= Pn(x) +Ψn(x) f [x0, ..., xn, x]

is easily true for x a node point. Provided f(x) is

differentiable, the formula is also true for x a node

point.

This shows

f(x)− Pn(x) = Ψn(x) f [x0, ..., xn, x]

Compare the two error formulas

f(x)− Pn(x) = Ψn(x) f [x0, ..., xn, x]

f(x)− Pn(x) =Ψn(x)

(n+ 1)!f (n+1) (c)

Page 766: Analiza Numerica [Utm, Bostan v.]

Then

Ψn(x) f [x0, ..., xn, x] =Ψn(x)

(n+ 1)!f (n+1) (c)

f [x0, ..., xn, x] =f (n+1) (c)

(n+ 1)!

for some c between the smallest and largest of the

numbers in {x0, ..., xn, x}.

To make this somewhat symmetric in its arguments,

let m = n+ 1, x = xn+1. Then

f [x0, ..., xm−1, xm] =f (m) (c)

m!

with c an unknown number between the smallest and

largest of the numbers in {x0, ..., xm}. This was givenin an earlier lecture where divided differences were in-

troduced.

Page 767: Analiza Numerica [Utm, Bostan v.]

PIECEWISE POLYNOMIAL INTERPOLATION

Recall the examples of higher degree polynomial in-

terpolation of the function f(x) =³1 + x2

´−1on

[−5, 5]. The interpolants Pn(x) oscillated a great

deal, whereas the function f(x) was nonoscillatory.

To obtain interpolants that are better behaved, we

look at other forms of interpolating functions.

Consider the data

x 0 1 2 2.5 3 3.5 4y 2.5 0.5 0.5 1.5 1.5 1.125 0

What are methods of interpolating this data, other

than using a degree 6 polynomial. Shown in the text

are the graphs of the degree 6 polynomial interpolant,

along with those of piecewise linear and a piecewise

quadratic interpolating functions.

Since we only have the data to consider, we would gen-

erally want to use an interpolant that had somewhat

the shape of that of the piecewise linear interpolant.

Page 768: Analiza Numerica [Utm, Bostan v.]

x

y

1 2 3 4

1

2

The data points

x

y

1 2 3 4

1

2

Piecewise linear interpolation

Page 769: Analiza Numerica [Utm, Bostan v.]

x

y

1 2 3 4

1

2

3

4

Polynomial Interpolation

x

y

1 2 3 4

1

2

Piecewise quadratic interpolation

Page 770: Analiza Numerica [Utm, Bostan v.]

PIECEWISE POLYNOMIAL FUNCTIONS

Consider being given a set of data points (x1, y1), ...,

(xn, yn), with

x1 < x2 < · · · < xn

Then the simplest way to connect the points (xj, yj)

is by straight line segments. This is called a piecewise

linear interpolant of the datan(xj, yj)

o. This graph

has “corners”, and often we expect the interpolant to

have a smooth graph.

To obtain a somewhat smoother graph, consider using

piecewise quadratic interpolation. Begin by construct-

ing the quadratic polynomial that interpolates

{(x1, y1), (x2, y2), (x3, y3)}Then construct the quadratic polynomial that inter-

polates

{(x3, y3), (x4, y4), (x5, y5)}

Page 771: Analiza Numerica [Utm, Bostan v.]

Continue this process of constructing quadratic inter-

polants on the subintervals

[x1, x3], [x3, x5], [x5, x7], ...

If the number of subintervals is even (and therefore

n is odd), then this process comes out fine, with the

last interval being [xn−2, xn]. This was illustrated

on the graph for the preceding data. If, however, n is

even, then the approximation on the last interval must

be handled by some modification of this procedure.

Suggest such!

With piecewise quadratic interpolants, however, there

are “corners” on the graph of the interpolating func-

tion. With our preceding example, they are at x3 and

x5. How do we avoid this?

Piecewise polynomial interpolants are used in many

applications. We will consider them later, to obtain

numerical integration formulas.

Page 772: Analiza Numerica [Utm, Bostan v.]

SMOOTH NON-OSCILLATORY

INTERPOLATION

Let data points (x1, y1), ..., (xn, yn) be given, as let

x1 < x2 < · · · < xn

Consider finding functions s(x) for which the follow-

ing properties hold:

(1) s(xi) = yi, i = 1, ..., n

(2) s(x), s0(x), s00(x) are continuous on [x1, xn].Then among such functions s(x) satisfying these prop-

erties, find the one which minimizes the integralZ xn

x1

¯̄̄s00(x)

¯̄̄2dx

The idea of minimizing the integral is to obtain an in-

terpolating function for which the first derivative does

not change rapidly. It turns out there is a unique so-

lution to this problem, and it is called a natural cubic

spline function.

Page 773: Analiza Numerica [Utm, Bostan v.]

SPLINE FUNCTIONS

Let a set of node points {xi} be given, satisfyinga ≤ x1 < x2 < · · · < xn ≤ b

for some numbers a and b. Often we use [a, b] =

[x1, xn]. A cubic spline function s(x) on [a, b] with

“breakpoints” or “knots” {xi} has the following prop-erties:

1. On each of the intervals

[a, x1], [x1, x2], ..., [xn−1, xn], [xn, b]

s(x) is a polynomial of degree ≤ 3.2. s(x), s0(x), s00(x) are continuous on [a, b].

In the case that we have given data points (x1, y1),...,

(xn, yn), we say s(x) is a cubic interpolating spline

function for this data if

3. s(xi) = yi, i = 1, ..., n.

Page 774: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Define

(x− α)3+ =

((x− α)3 , x ≥ α

0, x ≤ α

This is a cubic spline function on (−∞,∞) with thesingle breakpoint x1 = α.

Combinations of these form more complicated cubic

spline functions. For example,

s(x) = 3 (x− 1)3+ − 2 (x− 3)3+is a cubic spline function on (−∞,∞) with the break-points x1 = 1, x2 = 3.

Define

s(x) = p3(x) +nX

j=1

aj³x− xj

´3+

with p3(x) some cubic polynomial. Then s(x) is a

cubic spline function on (−∞,∞) with breakpoints{x1, ..., xn}.

Page 775: Analiza Numerica [Utm, Bostan v.]

Return to the earlier problem of choosing an interpo-

lating function s(x) to minimize the integralZ xn

x1

¯̄̄s00(x)

¯̄̄2dx

There is a unique solution to problem. The solution

s(x) is a cubic interpolating spline function, and more-

over, it satisfies

s00(x1) = s00(xn) = 0

Spline functions satisfying these boundary conditions

are called “natural” cubic spline functions, and the so-

lution to our minimization problem is a “natural cubic

interpolatory spline function”. We will show a method

to construct this function from the interpolation data.

Motivation for these boundary conditions can be given

by looking at the physics of bending thin beams of

flexible materials to pass thru the given data. To the

left of x1 and to the right of xn, the beam is straight

and therefore the second derivatives are zero at the

transition points x1 and xn.

Page 776: Analiza Numerica [Utm, Bostan v.]

CONSTRUCTION OF THE

INTERPOLATING SPLINE FUNCTION

To make the presentation more specific, suppose we

have data

(x1, y1) , (x2, y2) , (x3, y3) , (x4, y4)

with x1 < x2 < x3 < x4. Then on each of the

intervals

[x1, x2] , [x2, x3] , [x3, x4]

s(x) is a cubic polynomial. Taking the first interval,

s(x) is a cubic polynomial and s00(x) is a linear poly-nomial. Let

Mi = s00(xi), i = 1, 2, 3, 4

Then on [x1, x2],

s00(x) = (x2 − x)M1 + (x− x1)M2

x2 − x1, x1 ≤ x ≤ x2

Page 777: Analiza Numerica [Utm, Bostan v.]

We can find s(x) by integrating twice:

s(x) =(x2 − x)3M1 + (x− x1)

3M2

6 (x2 − x1)+ c1x+ c2

We determine the constants of integration by using

s(x1) = y1, s(x2) = y2 (*)

Then

s(x) =(x2 − x)3M1 + (x− x1)

3M2

6 (x2 − x1)

+(x2 − x) y1 + (x− x1) y2

x2 − x1

−x2 − x16

[(x2 − x)M1 + (x− x1)M2]

for x1 ≤ x ≤ x2.

Check that this formula satisfies the given interpola-

tion condition (*)!

Page 778: Analiza Numerica [Utm, Bostan v.]

We can repeat this on the intervals [x2, x3] and [x3, x4],

obtaining similar formulas.

For x2 ≤ x ≤ x3,

s(x) =(x3 − x)3M2 + (x− x2)

3M3

6 (x3 − x2)

+(x3 − x) y2 + (x− x2) y3

x3 − x2

−x3 − x26

[(x3 − x)M2 + (x− x2)M3]

For x3 ≤ x ≤ x4,

s(x) =(x4 − x)3M3 + (x− x3)

3M4

6 (x4 − x3)

+(x4 − x) y3 + (x− x3) y4

x4 − x3

−x4 − x36

[(x4 − x)M3 + (x− x3)M4]

Page 779: Analiza Numerica [Utm, Bostan v.]

We still do not know the values of the second deriv-

atives {M1,M2,M3,M4}. The above formulas guar-antee that s(x) and s00(x) are continuous forx1 ≤ x ≤ x4. For example, the formula on [x1, x2]

yields

s(x2) = y2, s00(x2) =M2

The formula on [x2, x3] also yields

s(x2) = y2, s00(x2) =M2

All that is lacking is to make s0(x) continuous at x2and x3. Thus we require

s0(x2 + 0) = s0(x2 − 0)s0(x3 + 0) = s0(x3 − 0) (**)

This means

limx&x2

s0(x) = limx%x2

s0(x)

and similarly for x3.

Page 780: Analiza Numerica [Utm, Bostan v.]

To simplify the presentation somewhat, I assume in

the following that our node points are evenly spaced:

x2 = x1 + h, x3 = x1 + 2h, x4 = x1 + 3h

Then our earlier formulas simplify to

s(x) =(x2 − x)3M1 + (x− x1)

3M2

6h

+(x2 − x) y1 + (x− x1) y2

h

−h6[(x2 − x)M1 + (x− x1)M2]

for x1 ≤ x ≤ x2, with similar formulas on [x2, x3] and

[x3, x4].

Page 781: Analiza Numerica [Utm, Bostan v.]

Without going thru all of the algebra, the conditions

(**) leads to the following pair of equations.

h

6M1 +

2h

3M2 +

h

6M3

=y3 − y2

h− y2 − y1

hh

6M2 +

2h

3M3 +

h

6M4

=y4 − y3

h− y3 − y2

h

This gives us two equations in four unknowns. The

earlier boundary conditions on s00(x) gives us immedi-ately

M1 =M4 = 0

Then we can solve the linear system for M2 and M3.

Page 782: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Consider the interpolation data points

x 1 2 3 4

y 1 12

13

14

In this case, h = 1, and linear system becomes

2

3M2 +

1

6M3 = y3 − 2y2 + y1 =

1

31

6M2 +

2

3M3 = y4 − 2y3 + y2 =

1

12

This has the solution

M2 =1

2, M3 = 0

This leads to the spline function formula on each

subinterval.

Page 783: Analiza Numerica [Utm, Bostan v.]

On [1, 2],

s(x) =(x2 − x)3M1 + (x− x1)

3M2

6h

+(x2 − x) y1 + (x− x1) y2

h

−h6[(x2 − x)M1 + (x− x1)M2]

=(2− x)3 · 0 + (x− 1)3

³12

´6

+(2− x) · 1 + (x− 1)

³12

´1

−16

h(2− x) · 0 + (x− 1)

³12

´i= 112 (x− 1)3 − 7

12 (x− 1) + 1

Similarly, for 2 ≤ x ≤ 3,

s(x) =−112(x− 2)3 + 1

4(x− 2)2 − 1

3(x− 1) + 1

2

and for 3 ≤ x ≤ 4,

s(x) =−112(x− 4) + 1

4

Page 784: Analiza Numerica [Utm, Bostan v.]

x 1 2 3 4

y 1 12

13

14

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

x

y

y = 1/xy = s(x)

Graph of example of natural cubic spline

interpolation

Page 785: Analiza Numerica [Utm, Bostan v.]

x 0 1 2 2.5 3 3.5 4y 2.5 0.5 0.5 1.5 1.5 1.125 0

x

y

1 2 3 4

1

2

Interpolating natural cubic spline function

Page 786: Analiza Numerica [Utm, Bostan v.]

ALTERNATIVE BOUNDARY CONDITIONS

Return to the equations

h

6M1 +

2h

3M2 +

h

6M3

=y3 − y2

h− y2 − y1

hh

6M2 +

2h

3M3 +

h

6M4

=y4 − y3

h− y3 − y2

h

Sometimes other boundary conditions are imposed on

s(x) to help in determining the values of M1 and

M4. For example, the data in our numerical exam-

ple were generated from the function f(x) = 1x. With

it, f 00(x) = 2x3, and thus we could use

M1 = 2, M4 =1

32

With this we are led to a new formula for s(x), one

that approximates f(x) = 1x more closely.

Page 787: Analiza Numerica [Utm, Bostan v.]

THE CLAMPED SPLINE

In this case, we augment the interpolation conditions

s(xi) = yi, i = 1, 2, 3, 4

with the boundary conditions

s0(x1) = y01, s0(x4) = y04 (#)

The conditions (#) lead to another pair of equations,

augmenting the earlier ones. Combined these equa-

tions are

h

3M1 +

h

6M2 =

y2 − y1h

− y01h

6M1 +

2h

3M2 +

h

6M3

=y3 − y2

h− y2 − y1

hh

6M2 +

2h

3M3 +

h

6M4

=y4 − y3

h− y3 − y2

hh

6M3 +

h

3M4 = y04 −

y4 − y3h

Page 788: Analiza Numerica [Utm, Bostan v.]

For our numerical example, it is natural to obtain

these derivative values from f 0(x) = − 1x2:

y01 = −1, y04 = −1

16

When combined with your earlier equations, we have

the system

1

3M1 +

1

6M2 =

1

21

6M1 +

2

3M2 +

1

6M3 =

1

31

6M2 +

2

3M3 +

1

6M4 =

1

121

6M3 +

1

3M4 =

1

48

This has the solution

[M1,M2,M3,M4] =·173

120,7

60,11

120,1

60

¸

Page 789: Analiza Numerica [Utm, Bostan v.]

We can now write the functions s(x) for each of the

subintervals [x1, x2], [x2, x3], and [x3, x4]. Recall for

x1 ≤ x ≤ x2,

s(x) =(x2 − x)3M1 + (x− x1)

3M2

6h

+(x2 − x) y1 + (x− x1) y2

h

−h6[(x2 − x)M1 + (x− x1)M2]

We can substitute in from the data

x 1 2 3 4

y 1 12

13

14

and the solutions {Mi}. Doing so, consider the errorf(x)− s(x). As an example,

f(x) =1

x, f

µ3

2

¶=2

3, s

µ3

2

¶= .65260

This is quite a decent approximation.

Page 790: Analiza Numerica [Utm, Bostan v.]

THE GENERAL PROBLEM

Consider the spline interpolation problem with n nodes

(x1, y1) , (x2, y2) , ..., (xn, yn)

and assume the node points {xi} are evenly spaced,xj = x1 + (j − 1)h, j = 1, ..., n

We have that the interpolating spline s(x) on

xj ≤ x ≤ xj+1 is given by

s(x) =

³xj+1 − x

´3Mj +

³x− xj

´3Mj+1

6h

+

³xj+1 − x

´yj +

³x− xj

´yj+1

h

−h6

h³xj+1 − x

´Mj +

³x− xj

´Mj+1

ifor j = 1, ..., n− 1.

Page 791: Analiza Numerica [Utm, Bostan v.]

To enforce continuity of s0(x) at the interior nodepoints x2, ..., xn−1, the second derivatives

nMj

omust

satisfy the linear equations

h

6Mj−1 +

2h

3Mj +

h

6Mj+1 =

yj−1 − 2yj + yj+1

h

for j = 2, ..., n− 1. Writing them out,

h

6M1 +

2h

3M2 +

h

6M3 =

y1 − 2y2 + y3h

h

6M2 +

2h

3M3 +

h

6M4 =

y2 − 2y3 + y4h

...h

6Mn−2 +

2h

3Mn−1 +

h

6Mn =

yn−2 − 2yn−1 + yn

h

This is a system of n−2 equations in the n unknowns{M1, ...,Mn}. Two more conditions must be imposedon s(x) in order to have the number of equations equal

the number of unknowns, namely n. With the added

boundary conditions, this form of linear system can be

solved very efficiently.

Page 792: Analiza Numerica [Utm, Bostan v.]

BOUNDARY CONDITIONS

“Natural” boundary conditions

s00(x1) = s00(xn) = 0Spline functions satisfying these conditions are called“natural cubic splines”. They arise out the minimiza-tion problem stated earlier. But generally they are notconsidered as good as some other cubic interpolatingsplines.

“Clamped” boundary conditions We add the condi-tions

s0(x1) = y01, s0(xn) = y0nwith y01, y0n given slopes for the endpoints of s(x) on[x1, xn]. This has many quite good properties whencompared with the natural cubic interpolating spline;but it does require knowing the derivatives at the end-points.

“Not a knot” boundary conditions This is more com-plicated to explain, but it is the version of cubic splineinterpolation that is implemented in Matlab.

Page 793: Analiza Numerica [Utm, Bostan v.]

THE “NOT A KNOT” CONDITIONS

As before, let the interpolation nodes be

(x1, y1) , (x2, y2) , ..., (xn, yn)

We separate these points into two categories. For

constructing the interpolating cubic spline function,

we use the points

(x1, y1) , (x3, y3) , ..., (xn−2, yn−2) , (xn, yn)Thus deleting two of the points. We now have n− 2points, and the interpolating spline s(x) can be deter-

mined on the intervals

[x1, x3] , [x3, x4] , ..., [xn−3, xn−2] , [xn−2, xn]This leads to n− 4 equations in the n− 2 unknownsM1,M3, ...,Mn−2,Mn. The two additional boundary

conditions are

s(x2) = y2, s(xn−1) = yn−1These translate into two additional equations, and we

obtain a system of n−2 linear simultaneous equationsin the n− 2 unknowns M1,M3, ...,Mn−2,Mn.

Page 794: Analiza Numerica [Utm, Bostan v.]

x 0 1 2 2.5 3 3.5 4y 2.5 0.5 0.5 1.5 1.5 1.125 0

x

y

1 2 3 4

1

2

Interpolating cubic spline function with ”not-a knot”

boundary conditions

Page 795: Analiza Numerica [Utm, Bostan v.]

MATLAB SPLINE FUNCTION LIBRARY

Given data points

(x1, y1) , (x2, y2) , ..., (xn, yn)

type arrays containing the x and y coordinates:

x = [x1 x2 ...xn]y = [y1 y2 ...yn]plot (x, y, ’o’)

The last statement will draw a plot of the data points,

marking them with the letter ‘oh’. To find the inter-

polating cubic spline function and evaluate it at the

points of another array xx, say

h = (xn − x1) / (10 ∗ n) ; xx = x1 : h : xn;

use

yy = spline (x, y, xx)plot (x, y, ’o’, xx, yy)

The last statement will plot the data points, as be-

fore, and it will plot the interpolating spline s(x) as a

continuous curve.

Page 796: Analiza Numerica [Utm, Bostan v.]

ERROR IN CUBIC SPLINE INTERPOLATION

Let an interval [a, b] be given, and then define

h =b− a

n− 1, xj = a+ (j − 1)h, j = 1, ..., n

Suppose we want to approximate a given function

f(x) on the interval [a, b] using cubic spline inter-

polation. Define

yi = f(xi), j = 1, ..., n

Let sn(x) denote the cubic spline interpolating this

data and satisfying the “not a knot” boundary con-

ditions. Then it can be shown that for a suitable

constant c,

En ≡ maxa≤x≤b |f(x)− sn(x)| ≤ ch4

The corresponding bound for natural cubic spline in-

terpolation contains only a term of h2 rather than h4;

it does not converge to zero as rapidly.

Page 797: Analiza Numerica [Utm, Bostan v.]

EXAMPLE

Take f(x) = arctanx on [0, 5]. The following ta-

ble gives values of the maximum error En for various

values of n. The values of h are being successively

halved.

n En E12n/En

7 7.09E−313 3.24E−4 21.925 3.06E−5 10.649 1.48E−6 20.797 9.04E−8 16.4

Page 798: Analiza Numerica [Utm, Bostan v.]

BEST APPROXIMATION

Given a function f(x) that is continuous on a giveninterval [a, b], consider approximating it by some poly-nomial p(x). To measure the error in p(x) as an ap-proximation, introduce

E(p) = maxa≤x≤b |f(x)− p(x)|

This is called the maximum error or uniform error ofapproximation of f(x) by p(x) on [a, b].

With an eye towards efficiency, we want to find the‘best’ possible approximation of a given degree n.With this in mind, introduce the following:

ρn(f) = mindeg(p)≤n

E(p)

= mindeg(p)≤n

"maxa≤x≤b |f(x)− p(x)|

#The number ρn(f) will be the smallest possible uni-form error, orminimax error, when approximating f(x)by polynomials of degree at most n. If there is apolynomial giving this smallest error, we denote it bymn(x); thus E(mn) = ρn(f).

Page 799: Analiza Numerica [Utm, Bostan v.]

Example. Let f(x) = ex on [−1, 1]. In the followingtable, we give the values of E(tn), tn(x) the Tay-

lor polynomial of degree n for ex about x = 0, and

E(mn).

Maximum Error in:n tn(x) mn(x)1 7.18E− 1 2.79E− 12 2.18E− 1 4.50E− 23 5.16E− 2 5.53E− 34 9.95E− 3 5.47E− 45 1.62E− 3 4.52E− 56 2.26E− 4 3.21E− 67 2.79E− 5 2.00E− 78 3.06E− 6 1.11E− 89 3.01E− 7 5.52E− 10

Page 800: Analiza Numerica [Utm, Bostan v.]

Consider graphically how we can improve on the Tay-

lor polynomial

t1(x) = 1 + x

as a uniform approximation to ex on the interval [−1, 1].

The linear minimax approximation is

m1(x) = 1.2643 + 1.1752x

x

y

-1 1

1

2

y=t1(x)

y=m1(x)

y=ex

Linear Taylor and minimax approximations to ex

Page 801: Analiza Numerica [Utm, Bostan v.]

x

y

-1 1

0.0516

Error in cubic Taylor approximation to ex

x

y

-1 1

0.00553

-0.00553

Error in cubic minimax approximation to ex

Page 802: Analiza Numerica [Utm, Bostan v.]

Accuracy of the minimax approximation.

ρn(f) ≤[(b− a)/2]n+1

(n+ 1)!2nmaxa≤x≤b

¯̄̄f (n+1)(x)

¯̄̄This error bound does not always become smaller with

increasing n, but it will give a fairly accurate bound

for many common functions f(x).

Example. Let f(x) = ex for −1 ≤ x ≤ 1. Thenρn(e

x) ≤ e

(n+ 1)!2n(*)

n Bound (*) ρn(f)1 6.80E− 1 2.79E− 12 1.13E− 1 4.50E− 23 1.42E− 2 5.53E− 34 1.42E− 3 5.47E− 45 1.18E− 4 4.52E− 56 8.43E− 6 3.21E− 67 5.27E− 7 2.00E− 7

Page 803: Analiza Numerica [Utm, Bostan v.]

CHEBYSHEV POLYNOMIALS

Chebyshev polynomials are used in many parts of nu-merical analysis, and more generally, in applicationsof mathematics. For an integer n ≥ 0, define thefunction

Tn(x) = cos³n cos−1 x

´, −1 ≤ x ≤ 1 (1)

This may not appear to be a polynomial, but we willshow it is a polynomial of degree n. To simplify themanipulation of (1), we introduce

θ = cos−1(x) or x = cos(θ), 0 ≤ θ ≤ π (2)

Then

Tn(x) = cos(nθ) (3)

Example. n = 0

T0(x) = cos(0 · θ) = 1n = 1

T1(x) = cos(θ) = x

n = 2

T2(x) = cos(2θ) = 2 cos2(θ)− 1 = 2x2 − 1

Page 804: Analiza Numerica [Utm, Bostan v.]

x

y

-1 1

1

-1

T0(x)T1(x)T2(x)

x

y

-1 1

1

-1

T3(x)T4(x)

Page 805: Analiza Numerica [Utm, Bostan v.]

The triple recursion relation. Recall the trigonomet-

ric addition formulas,

cos(α± β) = cos(α) cos(β)∓ sin(α) sin(β)Let n ≥ 1, and apply these identities to get

Tn+1(x) = cos[(n+ 1)θ] = cos(nθ + θ)

= cos(nθ) cos(θ)− sin(nθ) sin(θ)Tn−1(x) = cos[(n− 1)θ] = cos(nθ − θ)

= cos(nθ) cos(θ) + sin(nθ) sin(θ)

Add these two equations, and then use (1) and (3) to

obtain

Tn+1(x) + Tn−1 = 2 cos(nθ) cos(θ) = 2xTn(x)Tn+1(x) = 2xTn(x)− Tn−1(x), n ≥ 1

(4)

This is called the triple recursion relation for the Cheby-

shev polynomials. It is often used in evaluating them,

rather than using the explicit formula (1).

Page 806: Analiza Numerica [Utm, Bostan v.]

Example. Recall

T0(x) = 1, T1(x) = x

Tn+1(x) = 2xTn(x)− Tn−1(x), n ≥ 1

Let n = 2. Then

T3(x) = 2xT2(x)− T1(x)

= 2x(2x2 − 1)− x

= 4x3 − 3xLet n = 3. Then

T4(x) = 2xT3(x)− T2(x)

= 2x(4x3 − 3x)− (2x2 − 1)= 8x4 − 8x2 + 1

Page 807: Analiza Numerica [Utm, Bostan v.]

The minimum size property. Note that

|Tn(x)| ≤ 1, −1 ≤ x ≤ 1 (5)

for all n ≥ 0. Also, note thatTn(x) = 2

n−1xn + lower degree terms, n ≥ 1(6)

This can be proven using the triple recursion relation

and mathematical induction.

Introduce a modified version of Tn(x),

eTn(x) = 1

2n−1Tn(x) = xn+lower degree terms (7)

From (5) and (6),¯̄̄ eTn(x)¯̄̄ ≤ 1

2n−1, −1 ≤ x ≤ 1, n ≥ 1 (8)

Example.

eT4(x) = 1

8

³8x4 − 8x2 + 1

´= x4 − x2 +

1

8

Page 808: Analiza Numerica [Utm, Bostan v.]

A polynomial whose highest degree term has a coeffi-

cient of 1 is called a monic polynomial. Formula (8)

says the monic polynomial eTn(x) has size 1/2n−1 on−1 ≤ x ≤ 1, and this becomes smaller as the degreen increases. In comparison,

max−1≤x≤1 |xn| = 1

Thus xn is a monic polynomial whose size does not

change with increasing n.

Theorem. Let n ≥ 1 be an integer, and consider all

possible monic polynomials of degree n. Then the

degree n monic polynomial with the smallest maxi-

mum on [−1, 1] is the modified Chebyshev polynomialeTn(x), and its maximum value on [−1, 1] is 1/2n−1.

This result is used in devising applications of Cheby-

shev polynomials. We apply it to obtain an improved

interpolation scheme.