Lecture Notes Math 307

Chapter I

Linear Equations

1

I Linear Equations

I.1 Solving Linear Equations

Prerequisites and Learning Goals

From your work in previous courses, you should be able to

• Write a system of linear equations using matrix notation.

• Use Gaussian elimination to bring a system of linear equations into upper triangularform and reduced row echelon form (rref).

• Determine whether a system of equations has a unique solution, infinitely many so-lutions or no solutions, and compute all solutions if they exist; express the set of allsolutions in parametric form.

• Compute the inverse of a matrix when it exists, use the inverse to solve a system ofequations, describe for what systems of equations this is possible.

• Find the transpose of a matrix.

• Interpret a matrix as a linear transformation acting on vectors.

After completing this section, you should be able to

• Calculate the standard Euclidean norm, the 1-norm and the infinity norm of a vector.

• Calculate the Hilbert-Schmidt norm of a matrix.

• Define the matrix norm of a matrix; describe the connection between the matrix normand how a matrix stretches the length of vectors; compute the matrix norm of a diagonalmatrix.

• Define the condition number of a matrix and its relation to the matrix norm; usethe condition number to estimate relative errors in the solution to a system of linearequations.

• Explain why a small condition number is desirable in practical computations.

• Use MATLAB/Octave to enter matrices and vectors, make larger matrices from smallerblocks, multiply matrices, compute the inverse and transpose, extract elements, rows,columns and submatrices, use rref() to find the reduced row echelon form for a matrix,solve linear equations using A\b, use rand() to generate random matrices, use tic()

and toc() to time operations, compute norms and condition numbers.

• Use MATLAB/Octave to test conjectures about norms, condition numbers, etc.

2


I.1.1 Review: Systems of linear equations

The first part of the course is about systems of linear equations. You will have studied suchsystems in a previous course, and should remember how to find solutions (when they exist)using Gaussian elimination.

Many practical problems can be solved by turning them into a system of linear equations. Inthis chapter we will study a few examples: the problem of finding a function that interpolatesa collection of given points, and the approximate solutions of differential equations. Inpractical problems, the question of existence of solutions, although important, is not theend of the story. It turns out that some systems of equations, even though they may havea unique solution, are very sensitive to changes in the coefficients. This makes them verydifficult to solve reliably. We will see some examples of such ill-conditioned systems, andlearn how to recognize them using the condition number of a matrix.

Recall that a system of linear equations, like this system of 2 equations in 3 unknowns

x1 +2x2 +x3 = 0x1 −5x2 +x3 = 1

can be written as a matrix equation

[

1 2 11 −5 1

]

x1

x2

x3

=

[

01

]

.

A general system of m linear equations in n unknowns can be written as

Ax = b

where A is an given m × n (m rows, n columns) matrix, b is a given m-component vector,and x is the n-component vector of unknowns.

A system of linear equations may have no solutions, a unique solutions, or infinitely manysolutions. This is easy to see when there is only a single variable x, so that the equation hasthe form

ax = b

where a and b are given numbers. The solution is easy to find if a 6= 0: x = b/a. If a = 0then the equation reads 0x = b. In this case, the equation either has no solutions (whenb 6= 0) or infinitely many (when b = 0), since in this case every x is a solution.

To solve a general system Ax = b, form the augmented matrix [A|b] and use Gaussianelimination to reduce the matrix to reduced row echelon form. This reduced matrix (whichrepresents a system of linear equations that has exactly the same solutions as the originalsystem) can be used to decide whether solutions exist, and to find them. If you don’tremember this procedure, you should review it.

3

I Linear Equations

In the example above, the augmented matrix is

[

1 2 11 −5 1

∣

∣

∣

∣

01

]

.

The reduced row echelon form is[

1 0 10 1 0

∣

∣

∣

∣

2/7−1/7

]

,

which leads to a family of solutions (one for each value of the parameter s)

x =

2/7−1/7

0

+ s

−101

.

I.1.2 Solving a non-singular system of n equations in n unknowns

Let’s start with a system of equations where the number of equations is the same as thenumber of unknowns. Such a system can be written as a matrix equation

Ax = b,

where A is a square matrix, b is a given vector, and x is the vector of unknowns we aretrying to find. When A is non-singular (invertible) there is a unique solution. It is given byx = A−1b, where A−1 is the inverse matrix of A. Of course, computing A−1 is not the mostefficient way to solve a system of equations.

For our first introduction to MATLAB/Octave, let’s consider an example:

A =

1 1 11 1 −11 −1 1

b =

311

.

First, we define the matrix A and the vector b in MATLAB/Octave. Here is the input (afterthe prompt symbol >) and the output (without a prompt symbol).

>A=[1 1 1;1 1 -1;1 -1 1]

A =

1 1 1

1 1 -1

1 -1 1

>b=[3;1;1]

4


b =

3

1

1

Notice that the entries on the same row are separated by spaces (or commas) while rowsare separated by semicolons. In MATLAB/Octave, column vectors are n by 1 matrices androw vectors are 1 by n matrices. The semicolons in the definition of b make it a columnvector. In MATLAB/Octave, X’ denotes the transpose of X. Thus we get the same result ifwe define b as

>b=[3 1 1]’

b =

3

1

1

The solution can be found by computing the inverse of A and multiplying

>x = A^(-1)*b

x =

1

1

1

However if A is a large matrix we don’t want to actually calculate the inverse. The syntaxfor solving a system of equations efficiently is

>x = A\b

x =

1

1

1

5

I Linear Equations

If you try this with a singular matrix A, MATLAB/Octave will complain and print anwarning message. If you see the warning, the answer is not reliable! You can always checkto see that x really is a solution by computing Ax.

>A*x

ans =

3

1

1

As expected, the result is b.

By the way, you can check to see how much faster A\b is than A^(-1)*b by using thefunctions tic() and toc(). The function tic() starts the clock, and toc() stops the clockand prints the elapsed time. To try this out, let’s make A and b really big with randomentries.

A=rand(1000,1000);

b=rand(1000,1);

Here we are using the MATLAB/Octave command rand(m,n) that generates an m × nmatrix with random entries chosen between 0 and 1. Each time rand is used it generatesnew numbers.

Notice the semicolon ; at the end of the inputs. This suppresses the output. Without thesemicolon, MATLAB/Octave would start writing the 1,000,000 random entries of A to ourscreen! Now we are ready to time our calculations.

tic();A^(-1)*b;toc();

Elapsed time is 44 seconds.

tic();A\b;toc();

Elapsed time is 13.55 seconds.

So we see that A\b quite a bit faster.

6


I.1.3 Reduced row echelon form

How can we solve Ax = b when A is singular, or not a square matrix (that is, the numberof equations is different from the number of unknowns)? In your previous linear algebracourse you learned how to use elementary row operations to transform the original systemof equations to an upper triangular system. The upper triangular system obtained this wayhas exactly the same solutions as the original system. However, it is much easier to solve.In practice, the row operations are performed on the augmented matrix [A|b].

If efficiency is not an issue, then addition row operations can be used to bring the systeminto reduced row echelon form. In the this form, the pivot columns have a 1 in the pivotposition and zeros elsewhere. For example, if A is a square non-singular matrix then thereduced row echelon form of [A|b] is [I|x], where I is the identity matrix and x is thesolution.

In MATLAB/Octave you can compute the reduced row echelon form in one step using thefunction rref(). For the system we considered above we do this as follows. First define A

and b as before. This time I’ll suppress the output.

>A=[1 1 1;1 1 -1;1 -1 1];

>b=[3 1 1]’;

In MATLAB/Octave, the square brackets [ ... ] can be used to construct larger matricesfrom smaller building blocks, provided the sizes match correctly. So we can define theaugmented matrix C as

>C=[A b]

C =

1 1 1 3

1 1 -1 1

1 -1 1 1

Now we compute the reduced row echelon form.

>rref(C)

ans =

1 0 0 1

0 1 -0 1

0 0 1 1

7

I Linear Equations

The solution appears on the right.

Now let’s try to solve Ax = b with

A =

1 2 34 5 67 8 9

b =

111

.

This time the matrix A is singular and doesn’t have an inverse. Recall that the determinantof a singular matrix is zero, so we can check by computing it.

>A=[1 2 3; 4 5 6; 7 8 9];

>det(A)

ans = 0

However we can still try to solve the equation Ax = b using Gaussian elimination.

>b=[1 1 1]’;

>rref([A b])

ans =

1.00000 0.00000 -1.00000 -1.00000

0.00000 1.00000 2.00000 1.00000

0.00000 0.00000 0.00000 0.00000

Letting x3 = s be a parameter, and proceeding as you learned in previous courses, we arriveat the general solution

x =

−110

+ s

1−21

.

On the other hand, if

A =

1 2 34 5 67 8 9

b =

110

,

then

>rref([1 2 3 1;4 5 6 1;7 8 9 0])

ans =

1.00000 0.00000 -1.00000 0.00000

0.00000 1.00000 2.00000 0.00000

0.00000 0.00000 0.00000 1.00000

tells us that there is no solution.

8


I.1.4 Gaussian elimination steps using MATLAB/Octave

If C is a matrix in MATLAB/Octave, then C(1,2) is the entry in the 1st row and 2nd column.The whole first row can be extracted using C(1,:) while C(:,2) yields the second column.Finally we can pick out the submatrix of C consisting of rows 1-2 and columns 2-4 with thenotation C(1:2,2:4).

Let’s illustrate this by performing a few steps of Gaussian elimination on the augmentedmatrix from our first example. Start with

C=[1 1 1 3; 1 1 -1 1; 1 -1 1 1];

The first step in Gaussian elimination is to subtract the first row from the second.

>C(2,:)=C(2,:)-C(1,:)

C =

1 1 1 3

0 0 -2 -2

1 -1 1 1

Next, we subtract the first row from the third.

>C(3,:)=C(3,:)-C(1,:)

C =

1 1 1 3

0 0 -2 -2

0 -2 0 -2

To bring the system into upper triangular form, we need to swap the second and third rows.Here is the MATLAB/Octave code.

>temp=C(3,:);C(3,:)=C(2,:);C(2,:)=temp

C =

1 1 1 3

0 -2 0 -2

0 0 -2 -2

9

I Linear Equations

I.1.5 Norms for a vector

Norms are a way of measuring the size of a vector. They are important when we study howvectors change, or want to know how close one vector is to another. A vector may have manycomponents and it might happen that some are big and some are small. A norm is a way ofcapturing information about the size of a vector in a single number. There is more than oneway to define a norm.

In your previous linear algebra course, you probably have encountered the most commonnorm, called the Euclidean norm (or the 2-norm). The word norm without qualificationusually refers to this norm. What is the Euclidean norm of the vector

a =

[

−43

]

?

When you draw the vector as an arrow on the plane, this norm is the Euclidean distancebetween the tip and the tail. This leads to the formula

‖a‖ =√

(−4)2 + 32 = 5.

This is the answer that MATLAB/Octave gives too:

> a=[-4 3]

a =

-4 3

> norm(a)

ans = 5

The formula is easily generalized to n dimensions. If x = [x1, x2, . . . , xn]T then

‖x‖ =√

|x1|2 + |x2|2 + · · · + |xn|2.

The absolute value signs in this formula, which might seem superfluous, are put in to makethe formula correct when the components are complex numbers. So, for example

∥

∥

∥

∥

[

i1

]∥

∥

∥

∥

=√

|i|2 + |1|2 =√

1 + 1 =√

2.

Does MATLAB/Octave give this answer too?

There are situations where other ways of measuring the norm of a vector are more natural.Suppose that the tip and tail of the vector a = [−4, 3]T are locations in a city where you canonly walk along the streets and avenues.

10


−4

3

If you defined the norm to be the shortest distance that you can walk to get from the tailto the tip, the answer would be

‖a‖1 = | − 4| + |3| = 7.

This norm is called the 1-norm and can be calculated in MATLAB/Octave by adding 1 asan extra argument in the norm function.

> norm(a,1)

ans = 7

The 1-norm is also easily generalized to n dimensions. If x = [x1, x2, . . . , xn]T then

‖x‖1 = |x1| + |x2| + · · · + |xn|.

Another norm that is often used measures the largest component in absolute value. Thisnorm is called the infinity norm. For a = [−4, 3]T we have

‖a‖∞ = max{| − 4|, |3|} = 4.

To compute this norm in MATLAB/Octave we use inf as the second argument in the normfunction.

> norm(a,inf)

ans = 4

Here are three properties that the norms we have defined all have in common:

1. For every vector x and every number s, ‖sx‖ = |s|‖x‖.

2. The only vector with norm zero is the zero vector, that is, ‖x‖ = 0 if and only if x = 0

11

I Linear Equations

3. For all vectors x and y, ‖x + y‖ ≤ ‖x‖ + ‖y‖. This inequality is called the triangleinequality. It says that the length of the longest side of a triangle is smaller than thesum of the lengths of the two shorter sides.

What is the point of introducing many ways of measuring the length of a vector? Sometimesone of the non-standard norms has natural meaning in the context of a given problem. Forexample, when we study stochastic matrices, we will see that multiplication of a vector by astochastic matrix preserves the 1-norm of the vector. So in this situation it is natural to use1-norms. However, in this course we will almost always use the standard Euclidean norm.If v a vector then ‖v‖ (without any subscripts) will always denote the standard Euclideannorm.

I.1.6 Matrix norms

Just as for vectors, there are many ways to measure the size of a matrix A.

For a start we could think of a matrix as a vector whose entries just happen to be writtenin a box, like

A =

[

1 20 2

]

,

rather than in a row, like

a =

1202

.

Taking this point of view, we would define the norm of A to be√

12 + 22 + 02 + 22 = 3. Infact, the norm computed in this way is sometimes used for matrices. It is called the Hilbert-Schmidt norm. For a general matrix A = [ai,j], the formula for the Hilbert-Schmidt normis

‖A‖HS =

√

∑

i

∑

j

|ai,j|2.

The Hilbert-Schmidt norm does measure the size of matrix in some sense. It has the advan-tage of being easy to compute from the entries ai,j. But it is not closely tied to the action ofA as a linear transformation.

When A is considered as a linear transformation or operator, acting on vectors, there isanother norm that is more natural to use.

Starting with a vector x the matrix A transforms it to the vector Ax. We want to say thata matrix is big if increases the size of vectors, in other words, if ‖Ax‖ is big compared to ‖x‖.So it is natural to consider the stretching ratio ‖Ax‖/‖x‖. Of course, this ratio depends onx, since some vectors get stretched more than others by A. Also, the ratio is not defined ifx = 0. But in this case Ax = 0 too, so there is no stretching.

12


We now define the matrix norm of A to be the largest of these ratios,

‖A‖ = maxx:‖x‖6=0

‖Ax‖‖x‖ .

This norm measures the maximum factor by which A can stretch the length of a vector. Itis sometimes called the operator norm.

Since ‖A‖ is defined to be the maximum of a collection of stretching ratios, it must bebigger than or equal to any particular stretching ratio. In other words, for any non zerovector x we know ‖A‖ ≥ ‖Ax‖/‖x‖, or

‖Ax‖ ≤ ‖A‖‖x‖.

This is how the matrix norm is often used in practice. If we know ‖x‖ and the matrix norm‖A‖, then we have an upper bound on the norm of Ax.

In fact, the maximum of a collection of numbers is the smallest number that is larger thanor equal to every number in the collection (draw a picture on the number line to see this),the matrix norm ‖A‖ is the smallest number that is bigger than ‖Ax‖/‖x‖ for every choiceof non-zero x. Thus ‖A‖ is the smallest number C for which

‖Ax‖ ≤ C‖x‖

for every x.

An equivalent definition for ‖A‖ is

‖A‖ = maxx:‖x‖=1

‖Ax‖.

Why do these definitions give the same answer? The reason is that the quantity ‖Ax‖/‖x‖does not change if we multiply x by a non-zero scalar (convince yourself!). So, when calcu-lating the maximum over all non-zero vectors in the first expression for ‖A‖, all the vectorspointing in the same direction will give the same value for ‖Ax‖/‖x‖. This means that weneed only pick one vector in any given direction, and might as well choose the unit vector.For this vector, the denominator is equal to one, so we can ignore it.

Here is another way of saying this. Consider the image of the unit sphere under A. Thisis the set of vectors {Ax : ‖x‖ = 1} The length of the longest vector in this set is ‖A‖.

The picture below is a sketch of the unit sphere (circle) in two dimensions, and its image

under A =

[

1 20 2

]

. This image is an ellipse.

||A||

13

I Linear Equations

The norm of the matrix is the distance from the origin to the point on the ellipse farthest

from the origin. In this case this turns out to be ‖A‖ =√

9/2 + (1/2)√

65.

It’s hard to see how this expression can be obtained from the entries of the matrix. Thereis no easy formula. However, if A is a diagonal matrix the norm is easy to compute.

To see this, let’s consider a diagonal matrix

A =

3 0 00 2 00 0 1

.

If

x =

x1

x2

x3

then

Ax =

3x1

2x2

x3

so that

‖Ax‖2 = |3x1|2 + |2x2|2 + |x3|2

= 32|x1|2 + 22|x2|2 + |x3|2

≤ 32|x1|2 + 32|x2|2 + 32|x3|2

= 32‖x‖2.

This implies that for any unit vector x

‖Ax‖ ≤ 3

and taking the maximum over all unit vectors x yields ‖A‖ ≤ 3. On the other hand, themaximum of ‖Ax‖ over all unit vectors x is larger than the value of ‖Ax‖ for any particularunit vector. In particular, if

e1 =

100

then‖A‖ ≥ ‖Ae1‖ = 3.

Thus we see that‖A‖ = 3.

In general, the matrix norm of a diagonal matrix with diagonal entries λ1, λ2, · · · , λn is thelargest value of |λk|.

14


The MATLAB/Octave code for a diagonal matrix with diagonal entries 3, 2 and 1 isdiag([3 2 1]) and the expression for the norm of A is norm(A). So for example

>norm(diag([3 2 1]))

ans = 3

I.1.7 Condition number

Let’s return to the situation where A is a square matrix and we are trying to solve Ax = b.If A is a matrix arising from a real world application (for example if A contains valuesmeasured in an experiment) then it will almost never happen that A is singular. After all,a tiny change in any of the entries of A can change a singular matrix to a non-singular one.What is much more likely to happen is that A is close to being singular. In this case A−1

will still exist, but will have some enormous entries. This means that the solution x = A−1b

will be very sensitive to the tiniest changes in b so that it might happen that round-off errorin the computer completely destroys the accuracy of the answer.

To check whether a system of linear equations is well-conditioned, we might therefore thinkof using ‖A−1‖ as a measure. But this isn’t quite right, since we actually don’t care if ‖A−1‖is large, provided it stretches each vector about the same amount. For example, if we simplymultiply each entry of A by 10−6 the size of A−1 will go way up, by a factor of 106, but ourability to solve the system accurately is unchanged. The new solution is simply 106 timesthe old solution, that is, we have simply shifted the position of the decimal point.

It turns out that for a square matrix A, the ratio of the largest stretching factor to thesmallest stretching factor of A is a good measure of how well conditioned the system ofequation Ax = b is. This ratio is called the condition number and is denoted cond(A).

Let’s first compute an expression for cond(A) in terms of matrix norms. Then we willexplain why it measures the conditioning of a system of equations.

We already know that the largest stretching factor for a matrix A is the matrix norm ‖A‖.So let’s look at the smallest streching factor. We might as well assume that A is invertible.Otherwise, there is a non-zero vector that A sends to zero, so that the smallest stretchingfactor is 0 and the condition number is infinite.

minx 6=0

‖Ax‖‖x‖ = min

x 6=0

‖Ax‖‖A−1Ax‖

= miny 6=0

‖y‖‖A−1y‖

=1

maxy 6=0

‖A−1y‖‖y‖

=1

‖A−1‖ .

15

I Linear Equations

Here we used the fact that if x ranges over all non-zero vectors so does y = Ax and thatthe minimum of a collection of positive numbers is one divided by the maximum of theirreciprocals. Thus the smallest stretching factor for A is 1/‖A−1‖. This leads to the followingformula for the condition number of an invertible matrix:

cond(A) = ‖A‖‖A−1‖.

In our applications we will use the condition number as a measure of how well we can solvethe equations that come up accurately.

Now, let us try to see why the condition number of A is a good measure of how well wecan solve the equations Ax = b accurately.

Starting with Ax = b we change the right side to b′ = b + ∆b. The new solution is

x′ = A−1(b + ∆b) = x + ∆x

where x = A−1b is the original solution and the change in the solutions is ∆x = A−1∆b.Now the absolute errors ‖∆b‖ and ‖∆x‖ are not very meaningful, since an absolute error‖∆b‖ = 100 is not very large if ‖b‖ = 1, 000, 000, but is large if ‖b‖ = 1. What we really careabout are the relative errors ‖∆b‖/‖b‖ and ‖∆x‖/‖x‖. Can we bound the relative errorin the solution in terms of the relative error in the equation? The answer is yes. Beginningwith

‖∆x‖‖b‖ = ‖A−1∆b‖‖Ax‖≤ ‖A−1‖‖∆b‖‖A‖‖x‖,

we can divide by ‖b‖‖x‖ to obtain

‖∆x‖‖x‖ ≤ ‖A−1‖‖A‖‖∆b‖

‖b‖

= cond(A)‖∆b‖‖b‖ .

This equation gives the real meaning of the condition number. If the condition number isnear to 1 then the relative error of the solution is about the same as the relative error inthe equation. However, a large condition number means that a small relative error in theequation can lead to a large relative error in the solution.

In MATLAB/Octave the condition number is computed using cond(A).

> A=[2 0; 0 0.5];

> cond(A)

ans = 4

16


I.1.8 Summary of MATLAB/Octave commands used in this section

How to create a row vector

[ ] square brackets are used to construct matrices and vectors. Create a row in the matrixby entering elements within brackets. Separate each element with a comma or space.For example, to create a row vector a with three columns (i.e. a 1-by-3 matrix), type

a=[1 1 1] or equivalently a=[1,1,1]

How to create a column vector or a matrix with more than one row

; when the semicolon is used inside square brackets, it terminates rows. For example,

a=[1;1;1] creates a column vector with three rows

B=[1 2 3; 4 5 6] creates a 2 − by − 3 matrix

’ when a matrix (or a vector) is followed by a single quote ’ (or apostrophe) MATLABflips rows with columns, that is, it generates the transpose. When the original matrixis a simple row vector, the apostrophe operator turns the vector into a column vector.For example,

a=[1 1 1]’ creates a column vector with three rows

B=[1 2 3; 4 5 6]’ creates a 3 − by − 2 matrix where the first row is 1 4

How to use specialized matrix functions

rand(n,m) returns a n-by-m matrix with random numbers between 0 and 1.

How to extract elements or submatrices from a matrix

A(i,j) returns the entry of the matrix A in the i-th row and the j-th column

A(i,:) returns a row vector containing the i-th row of A

A(:,j) returns a column vector containing the j-th column of A

A(i:j,k:m) returns a matrix containing a specific submatrix of the matrix A. Specifically,it returns all rows between the i-th and the j-th rows of A, and all columns betweenthe k-th and the m-th columns of A.

17

I Linear Equations

How to perform specific operations on a matrix

det(A) returns the determinant of the (square) matrix A

rref(A) returns the reduced row echelon form of the matrix A

norm(V) returns the 2-norm (Euclidean norm) of the vector V

norm(V,1) returns the 1-norm of the vector V

norm(V,inf) returns the infinity norm of the vector V

18

I.2 Interpolation

I.2 Interpolation



• compute the determinant of a square matrix; apply the basic linearity properties ofthe determinant, and explain what its value means about existence and uniqueness ofsolutions.


• give a definition of interpolation function and explain the idea of getting a uniqueinterpolation function by restricting the class of functions under consideration.

• Define the problem of Lagrange interpolation and express it in terms of a system ofequations where the unknowns are the coefficients of a polynomial of given degree; setup the system in matrix form using the Vandermonde matrix, derive the formula forthe determinant of the Vandermonde matrix; explain why a solution to the Lagrangeinterpolation problem always exists.

• Explain why Lagrange interpolation is not a practical method for large numbers ofpoints.

• Define the mathematical problem of interpolation using splines, compare and contrastit with Lagrange interpolation.

• Explain how minimizing the bending energy leads to a description of the shape of thespline as a piecewise polynomial function.

• Express the interpolation problem of cubic splines in terms of a system of equationswhere the unknowns are related to the coefficients of the cubic polynomials.

• Given a set of points, use MATLAB/Octave to calculate and plot the interpolatingpolynomial in Lagrange interpolation and the piecewise function for cubic splines.

• Use the MATLAB/Octave functions linspace, vander, polyval, zeros and ones.

• Use m files in MATLAB/Octave.

19

I Linear Equations

I.2.1 Introduction

Suppose we are given some points (x1, y1), . . . , (xn, yn) in the plane, where the points xi areall distinct.

Our task is to find a function f(x) that passes through all these points. In other words, werequire that f(xi) = yi for i = 1, . . . , n. Such a function is called an interpolating function.Problems like this arise in practical applications in situations where a function is sampledat a finite number of points. For example, the function could be the shape of the model wehave made for a car. We take a bunch of measurements (x1, y1), . . . , (xn, yn) and send themto the factory. What’s the best way to reproduce the original shape?

Of course, it is impossible to reproduce the original shape with certainty. There are in-finitely many functions going through the sampled points.

To make our problem of finding the interpolating function f(x) have a unique solution,we must require something more of f(x), either that f(x) lies in some restricted class offunctions, or that f(x) is the function that minimizes some measure of “badness”. We willlook at both approaches.

I.2.2 Lagrange interpolation

For Lagrange interpolation, we try to find a polynomial p(x) of lowest possible degree thatpasses through our points. Since we have n points, and therefore n equations p(xi) = yi tosolve, it makes sense that p(x) should be a polynomial of degree n− 1

p(x) = a1xn−1 + a2x

n−2 + · · · + an−1x+ an

with n unknown coefficients a1, a2, . . . , an. (Don’t blame me for the screwy way of numberingthe coefficients. This is the MATLAB/Octave convention.)

20

I.2 Interpolation

The n equations p(xi) = yi are n linear equations for these unknown coefficients which wemay write as

xn−11 xn−2

1 · · · x21 x1 1

xn−12 xn−2

2 · · · x22 x2 1

......

. . ....

......

xn−1n xn−2

n · · · x2n xn 1

a1

a2...

an−2

an−1

an

=

y1

y2...yn

.

Thus we see that the problem of Lagrange interpolation reduces to solving a system of linearequations. If this system has a unique solution, then there is exactly one polynomial p(x)of degree n − 1 running through our points. This matrix for this system of equations has aspecial form and is called a Vandermonde matrix.

To decide whether the system of equations has a unique solution we need to determinewhether the Vandermonde matrix is invertible or not. One way to do this is to compute thedeterminant. It turns out that the determinant of a Vandermonde matrix has a particularlysimple form, but it’s a little tricky to see this. The 2 × 2 case is simple enough:

det

([

x1 1x2 1

])

= x1 − x2.

To go on to the 3 × 3 case we won’t simply expand the determinant, but recall that thedeterminant is unchanged under row (and column) operations of the type ”add a multiple ofone row (column) to another.” Thus if we start with a 3× 3 Vandermonde determinant, add−x1 times the second column to the first, and then add −x1 times the third column to thesecond, the determinant doesn’t change and we find that

det

x21 x1 1x2

2 x2 1x2

3 x3 1

= det

0 x1 1x2

2 − x1x2 x2 1x2

3 − x1x3 x3 1

= det

0 0 1x2

2 − x1x2 x2 − x1 1x2

3 − x1x3 x3 − x1 1

.

Now we can take advantage of the zeros in the first row, and calculate the determinant byexpanding along the top row. This gives

det

x21 x1 1x2

2 x2 1x2

3 x3 1

= det

([

x22 − x1x2 x2 − x1

x23 − x1x3 x3 − x1

])

= det

([

x2(x2 − x1) x2 − x1

x3(x3 − x1) x3 − x1

])

.

Now, we recall that the determinant is linear in each row separately. This implies that

det

([

x2(x2 − x1) x2 − x1

x3(x3 − x1) x3 − x1

])

= (x2 − x1) det

([

x2 1x3(x3 − x1) x3 − x1

])

= (x2 − x1)(x3 − x1) det

([

x2 1x3 1

])

.

But the determinant on the right is a 2× 2 Vandermonde determinant that we have already

21

I Linear Equations

computed. Thus we end up with the formula

det

x21 x1 1x2

2 x2 1x2

3 x3 1

= −(x2 − x1)(x3 − x1)(x3 − x2).

The general formula is

det

xn−11 xn−2

1 · · · x21 x1 1

xn−12 xn−2

2 · · · x22 x2 1

......

. . ....

......

xn−1n xn−2

n · · · x2n xn 1

= ±∏

i>j

(xi − xj),

where ± = (−1)n(n−1)/2. It can be proved by induction using the same strategy as we used forthe 3×3 case. The product on the right is the product of all differences xi−xj . This productis non-zero, since we are assuming that all the points xi are distinct. Thus the Vandermondematrix is invertible, and a solution to the Lagrange interpolation problem always exists.

Now let’s use MATLAB/Octave to see how this interpolation works in practice.

We begin by putting some points xi into a vector X and the corresponding points yi into avector Y.

>X=[0 0.2 0.4 0.6 0.8 1.0]

>Y=[1 1.1 1.3 0.8 0.4 1.0]

We can use the plot command in MATLAB/Octave to view these points. The commandplot(X,Y) will pop open a window and plot the points (xi, yi) joined by straight lines. Inthis case we are not interested in joining the points (at least not with straight lines) so weadd a third argument: ’o’ plots the points as little circles. (For more information you cantype help plot on the MATLAB/Octave command line.) Thus we type

>plot(X,Y,’o’)

>axis([-0.1, 1.1, 0, 1.5])

>hold on

The axis command adjusts the axis. Normally when you issue a new plot command, theexisting plot is erased. The hold on prevents this, so that subsequent plots are all drawn onthe same graph. The original behaviour is restored with hold off.

When you do this you should see a graph appear that looks something like this.

22

I.2 Interpolation

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 0.2 0.4 0.6 0.8 1

Now let’s compute the interpolation polynomial. Luckily there are build in functions inMATLAB/Octave that make this very easy. To start with, the function vander(X) returnsthe Vandermonde matrix corresponding to the points in X. So we define

>V=vander(X)

V =

0.00000 0.00000 0.00000 0.00000 0.00000 1.00000

0.00032 0.00160 0.00800 0.04000 0.20000 1.00000

0.01024 0.02560 0.06400 0.16000 0.40000 1.00000

0.07776 0.12960 0.21600 0.36000 0.60000 1.00000

0.32768 0.40960 0.51200 0.64000 0.80000 1.00000

1.00000 1.00000 1.00000 1.00000 1.00000 1.00000

We saw above that the coefficients of the interpolation polynomial are given by the solutiona to the equation V a = y. We find those coefficients using

>a=V\Y’

Let’s have a look at the interpolating polynomial. The MATLAB/Octave function polyval(a,X)

takes a vector X of x values, say x1, x2, . . . xk and returns a vector containing the valuesp(x1), p(x2), . . . p(xk), where p is the polynomial whose coefficients are in the vector a, thatis,

p(x) = a1xn−1 + a2x

n−2 + · · · + an−1x+ an

So plot(X,polyval(a,X)) would be the command we want, except that with the presentdefinition of X this would only plot the polynomial at the interpolation points. What wewant is to plot the polynomial for all points, or at least for a large number. The commandlinspace(0,1,100) produces a vector of 100 linearly spaced points between 0 and 1, so thefollowing commands do the job.

>XL=linspace(0,1,100);

>YL=polyval(a,XL);

>plot(XL,YL);

>hold off

23

I Linear Equations

The result looks pretty good

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 0.2 0.4 0.6 0.8 1

The MATLAB/Octave commands for this example are in lagrange.m.

Unfortunately, things get worse when we increase the number of interpolation points. Oneclue that there might be trouble ahead is that even for only six points the condition numberof V is quite high (try it!). Let’s see what happens with 18 points. We will take the xvalues to be equally spaced between 0 and 1. For the y values we will start off by takingyi = sin(2πxi). We repeat the steps above.

>X=linspace(0,1,18);

>Y=sin(2*pi*X);

>plot(X,Y,’o’)

>axis([-0.1 1.1 -1.5 1.5])

>hold on

>V=vander(X);

>a=V\Y’;

>XL=linspace(0,1,500);

>YL=polyval(a,XL);

>plot(XL,YL);

The resulting picture looks okay.

-1.5

-1

-0.5

0

0.5

1

1.5

0 0.2 0.4 0.6 0.8 1

But look what happens if we change one of the y values just a little. We add 0.02 to thefifth y value, redo the Lagrange interpolation and plot the new values in red.

24

I.2 Interpolation

>Y(5) = Y(5)+0.02;

>plot(X(5),Y(5),’or’)

>a=V\Y’;

>YL=polyval(a,XL);

>plot(XL,YL,’r’);

>hold off

The resulting graph makes a wild excursion and even though it goes through the given points,it would not be a satisfactory interpolating function in a practical situation.

-1.5

-1

-0.5

0

0.5

1

1.5

0 0.2 0.4 0.6 0.8 1

A calculation reveals that the condition number is

>cond(V)

ans = 1.8822e+14

If we try to go to 20 points equally spaced between 0 and 1, the Vandermonde matrix is soill conditioned that MATLAB/Octave considers it to be singular.

25

I Linear Equations

I.2.3 Cubic splines

In the last section we saw that Lagrange interpolation becomes impossible to use in practiceif the number of points becomes large. Of course, the constraint we imposed, namely that theinterpolating function be a polynomial of low degree, does not have any practical basis. It issimply mathematically convenient. Let’s start again and consider how ship and airplane de-signers actually drew complicated curves before the days of computers. Here is a picture of adraughtsman’s spline (taken from http://pages.cs.wisc.edu/~deboor/draftspline.html

where you can also find a nice photo of such a spline in use)

It consists of a bendable but stiff strip held in position by a series of weights called ducks.We will try to make a mathematical model of such a device.

We begin again with points (x1, y1), (x2, y2), . . . (xn, yn) in the plane. Again we are lookingfor a function f(x) that goes through all these points. This time, we want to find the functionthat has the same shape as a real draughtsman’s spline. We will imagine that the given pointsare the locations of the ducks.

Our first task is to identify a large class of functions that represent possible shapes for thespline. We will write down three conditions for a function f(x) to be acceptable. Since thespline has no breaks in it the function f(x) should be continuous. Moreover f(x) should passthrough the given points.

Condition 1: f(x) is continuous and f(xi) = yi for i = 1, . . . , n.

The next condition reflects the assumption that the strip is stiff but bendable. If the stripwere not stiff, say it were actually a rubber band that just is stretched between the ducks,then our resulting function would be a straight line between each duck location (xi, yi). Ateach duck location there would be a sharp bend in the function. In other words, even thoughthe function itself would be continuous, the first derivative would be discontinuous at theduck locations. We will interpret the words “bendable but stiff” to mean that the firstderivatives of f(x) exist. This leads to our second condition.

26

I.2 Interpolation

Condition 2: The first derivative f ′(x) exists and is continuous everywhere, including eachinterior duck location xi.

In between the duck locations we will assume that f(x) is perfectly smooth and that higherderivatives behave nicely when we approach the duck locations from the right or the left.This leads to

Condition 3: For x in between the duck points xi the higher order derivatives f ′′(x), f ′′′(x), . . .all exist and have left and right limits as x approaches each xi.

In this condition we are allowing for the possibility that f ′′(x) and higher order derivativeshave a jump at the duck locations. This happens if the left and right limits are different.

The set of functions satisfying conditions 1, 2 and 3 are all the possible shapes of the spline.How do we decide which one of these shapes is the actual shape of the spline? To do this weneed to invoke a bit of the physics of bendable strips. The bending energy E[f ] of a stripwhose shape is described by the function f is given by the integral

E[f ] =

∫ xn

x1

(

f ′′(x))2dx

The actual spline will relax into the shape that makes E[f ] as small as possible. Thus, amongall the functions satisfying conditons 1, 2 and 3, we want to choose the one that minimizesE[f ].

This minimization problem is similiar to ones considered in calculus courses, except thatinstead of real numbers, the variables in this problem are functions f satisfying conditons 1,2 and 3. In calculus, the minimum is calculated by “setting the derivative to zero.” A similarprocedure is described in the next section. Here is the result of that calculation: Let F (x)be the function describing the shape that makes E[f ] as small as possible. In other words,

• F (x) satisfies condtions 1, 2 and 3.

• If f(x) also satisfies conditions 1, 2 and 3, then E[F ] ≤ E[f ].

Then, in addition to conditions 1, 2 and 3, F (x) satisfies

Condition a: In each interval (xi, xi+1), the function F (x) is a cubic polynomial. In otherwords, for each interval there are coefficients Ai, Bi, Ci and Di such that F (x) =Aix

3 +Bix2 +Cix+Di for all x between xi and xi+1. The coefficients can be different

for different intervals.

Condition b: The section derivative F ′′(x) is continuous.

Condition c: When x is an endpoint (either x1 or xn) then F ′′(x) = 0

As we will see, there is exactly one function satisfying conditions 1, 2, 3, a, b and c.

27

I Linear Equations

I.2.4 The minimization procedure

In this section we explain the minimization procedure leading to a mathematical descriptionof the shape of a spline. In other words, we show that if among all functions f(x) satisfyingconditions 1, 2 and 3, the function F (x) is the one with E[f ] the smallest, then F (x) alsosatisfies conditions a, b and c.

The idea is to assume that we have found F (x) and then try to deduce what properties itmust satisfy. There is actually a is a hidden assumption here — we are assuming that theminimizer F (x) exists. This is not true for every minimization problem (think of minimizingthe function (x2+1)−1 for −∞ < x <∞). However the spline problem does have a minimizer,and we will leave out the step of proving it exists.

Given the minimizer F (x) we want to wiggle it a little and consider functions of the formF (x)+ ǫh(x), where h(x) is another function and ǫ be a number. We want to do this in sucha way that for every ǫ, the function F (x) + ǫh(x) still satisfies conditions 1, 2 and 3. Thenwe will be able to compare E[F ] with E[F + ǫh]. A little thought shows that functions ofform F (x) + ǫh(x) will satsify conditions 1, 2 and 3 for every value of ǫ if h satisfies

Condition 1’: h(xi) = 0 for i = 1, . . . , n.

together with conditions 2 and 3 above.

Now, the minimization property of F says that each fixed function h satisfying 1’, 2 and 3the function of ǫ given by E[F + ǫh] has a local minimum at ǫ = 0. From Calculus we knowthat this implies that

dE[F + ǫh]

dǫ

∣

∣

∣

∣

ǫ=0

= 0. (I.1)

Now we will actually compute this derivative with respect to ǫ and see what informationwe can get from the fact that it is zero for every choice of h(x) satisfying conditions 1’, 2and 3. To simplify the presentation we will assume that there are only three points (x1, y1),(x2, y2) and (x3, y3). The goal of this computation is to establish that equation (??) can berewritten as (??).

To begin, we compute

0 =dE[F + ǫh]

dǫ

∣

∣

∣

∣

ǫ=0

=

∫ x3

x1

d(F ′′(x) + ǫh′′(x))2

dǫ

∣

∣

∣

∣

ǫ=0

dx

=

∫ x3

x1

2 (F ′′(x) + ǫh′′(x))h′′(x)∣

∣

ǫ=0dx

= 2

∫ x3

x1

F ′′(x)h′′(x)dx

= 2

∫ x2

x1

F ′′(x)h′′(x)dx+ 2

∫ x3

x2

F ′′(x)h′′(x)dx

28

I.2 Interpolation

We divide by 2 and integrate by parts in each integral. This gives

0 = F ′′(x)h′(x)∣

∣

x=x2

x=x1

−∫ x2

x1

F ′′′(x)h′(x)dx+ F ′′(x)h′(x)∣

∣

x=x3

x=x2

−∫ x3

x2

F ′′′(x)h′(x)dx

In each boundary term we have to take into account the possibility that F ′′(x) is not con-tinuous across the points xi. Thus we have to use the appropriate limit from the left or theright. So, for the first boundary term

F ′′(x)h′(x)∣

∣

x=x2

x=x1

= F ′′(x2−)h′(x2) − F ′′(x1+)h′(x1)

Notice that since h′(x) is continuous across each xi we need not distinguish the limits fromthe left and the right. Expanding and combining the boundary terms we get

0 = −F ′′(x1+)h′(x1) +(

F ′′(x2−) − F ′′(x2+))

h′(x2) + F ′′(x3−)h′(x3)

−∫ x2

x1

F ′′′(x)h′(x)dx−∫ x3

x2

F ′′′(x)h′(x)dx

Now we integrate by parts again. This time the boundary terms all vanish because h(xi) =0 for every i. Thus we end up with the equation

0 = −F ′′(x1+)h′(x1) +(

F ′′(x2−) − F ′′(x2+))

h′(x2) + F ′′(x3−)h′(x3)

+

∫ x2

x1

F ′′′′(x)h(x)dx −∫ x3

x2

F ′′′′(x)h(x)dx (I.2)

as desired.

Recall that this equation has to be true for every choice of h satisfying conditions 1’, 2and 3. For different choices of h(x) we can extract different pieces of information about theminimizer F (x).

To start, we can choose h that is zero everywhere except in the open interval (x1, x2). Forall such h we then obtain 0 =

∫ x2

x1F ′′′′(x)h(x)dx. This can only happen if

F ′′′′(x) = 0 for x1 < x < x2

Thus we conclude that the fourth derivative F ′′′′(x) is zero in the interval (x1, x2).

Once we know that F ′′′′(x) = 0 in the interval (x1, x2), then by integrating both sides wecan conclude that F ′′′(x) is constant. Integrating again, we find F ′′(x) is a linear polynomial.By integrating four times, we see that F (x) is a cubic polynomial in that interval. Whendoing the integrals, we must not extend the domain of integration over the boundary pointx2 since F ′′′′(x) may not exist (let alone by zero) there.

Similarly F ′′′′(x) must also vanish in the interval (x2, x3), so F (x) is a (possibly different)cubic polynomial in the interval (x2, x3).

29

I Linear Equations

(An aside: to understand better why the polynomials might be different in the intervals(x1, x2) and (x3, x4) consider the function g(x) (unrelated to the spline problem) given by

g(x) =

{

0 for x1 < x < x2

1 for x2 < x < x3

Then g′(x) = 0 in each interval, and an integration tells us that g is constant in each interval.However, g′(x2) does not exist, and the constants are different.)

We have established that F (x) satisfies condition a.

Now that we know that F ′′′′(x) vanishes in each interval, we can return to (??) and writeit as

0 = −F ′′(x1+)h′(x1) +(

F ′′(x2−) − F ′′(x2+))

h′(x2) + F ′′(x3−)h′(x3)

Now choose h(x) with h′(x1) = 1 and h′(x2) = h′(x3) = 0. Then the equation reads

F ′′(x1+) = 0

Similarly, choosing h(x) with h′(x3) = 1 and h′(x1) = h′(x2) = 0 we obtain

F ′′(x3−) = 0

This establishes condition c.

Finally choosing h(x) with h′(x2) = 1 and h′(x1) = h′(x3) = 0 we obtain

F ′′(x2−) − F ′′(x2+) = 0

In other words, F ′′ must be continuous across the interior duck position. Thus shows thatcondition b holds, and the derivation is complete.

This calculation is easily generalized to the case where there are n duck positions x1, . . . , xn.

A reference for this material is Essentials of numerical analysis, with pocket calculatordemonstrations, by Henrici.

I.2.5 The linear equations for cubic splines

Let us now turn this description into a system of linear equations. In each interval (xi, xi+1),for i = 1, . . . n− 1, f(x) is given by a cubic polynomial pi(x) which we can write in the form

pi(x) = ai(x− xi)3 + bi(x− xi)

2 + ci(x− xi) + di

for coefficients ai, bi, ci and di to be determined. For each i = 1, . . . n − 1 we require thatpi(xi) = yi and pi(xi+1) = yi+1. Since pi(xi) = di, the first of these equations is satisfied ifdi = yi. So let’s simply make that substitution. This leaves the n− 1 equations

pi(xi+1) = ai(xi+1 − xi)3 + bi(xi+1 − xi)

2 + ci(xi+1 − xi) + yi = yi+1.

30

I.2 Interpolation

Secondly, we require continuity of the first derivative across interior xi’s. This translates top′i(xi+1) = p′i+1(xi+1) or

3ai(xi+1 − xi)2 + 2bi(xi+1 − xi) + ci = ci+1

for i = 1, . . . , n− 2, giving an additional n− 2 equations. Next, we require continuity of thesecond derivative across interior xi’s. This translates to p′′i (xi+1) = p′′i+1(xi+1) or

6ai(xi+1 − xi) + 2bi = 2bi+1

for i = 1, . . . , n− 2, once more giving an additional n− 2 equations. Finally, we require thatp′′1(x1) = p′′n−1(xn) = 0. This yields two more equations

2b1 = 0

6an−1(xn − xn−1) + 2bn−1 = 0

for a total of 3(n − 1) equations for the same number of variables.

We now specialize to the case where the distances between the points xi are equal. LetL = xi+1 − xi be the common distance. Then the equations read

aiL3 + biL

2 +ciL = yi+1 − yi

3aiL2 + 2biL +ci − ci+1 = 0

6aiL+ 2bi −2bi+1 = 0

for i = 1 . . . n− 2 together with

an−1L3 + bn−1L

2 +cn−1L = yn − yn−1

+ 2b1 = 0

6an−1L+ 2bn−1 = 0

We make one more simplification. After multiplying some of the equations with suitablepowers of L we can write these as equations for αi = aiL

3, βi = biL2 and γi = ciL. They

have a very simple block structure. For example, when n = 4 the matrix form of the equationsis

1 1 1 0 0 0 0 0 03 2 1 0 0 −1 0 0 06 2 0 0 −2 0 0 0 00 0 0 1 1 1 0 0 00 0 0 3 2 1 0 0 −10 0 0 6 2 0 0 −2 00 0 0 0 0 0 1 1 10 2 0 0 0 0 0 0 00 0 0 0 0 0 6 2 0

α1

β1

γ1

α2

β2

γ2

α3

β3

γ3

=

y2 − y1

00

y3 − y2

00

y4 − y3

00

Notice that the matrix in this equation does not depend on the points (xi, yi). It has a 3× 3

31

I Linear Equations

block structure. If we define the 3 × 3 blocks

N =

1 1 13 2 16 2 0

M =

0 0 00 0 −10 −2 0

0 =

0 0 00 0 00 0 0

T =

0 0 00 2 00 0 0

V =

1 1 10 0 06 2 0

then the matrix in our equation has the form

S =

N M 0

0 N MT 0 V

Once we have solved the equation for the coefficients αi, βi and γi, the function F (x) in theinterval (xi, xi+1) is given by

F (x) = pi(x) = αi

(

x− xi

L

)3

+ βi

(

x− xi

L

)2

+ γi

(

x− xi

L

)

+ yi

Now let us use MATLAB/Octave to plot a cubic spline. To start, we will do an examplewith four interpolation points. The matrix S in the equation is defined by

>N=[1 1 1;3 2 1;6 2 0];

>M=[0 0 0;0 0 -1; 0 -2 0];

>Z=zeros(3,3);

>T=[0 0 0;0 2 0; 0 0 0];

>V=[1 1 1;0 0 0;6 2 0];

>S=[N M Z; Z N M; T Z V]

S =

1 1 1 0 0 0 0 0 0

3 2 1 0 0 -1 0 0 0

32

I.2 Interpolation

6 2 0 0 -2 0 0 0 0

0 0 0 1 1 1 0 0 0

0 0 0 3 2 1 0 0 -1

0 0 0 6 2 0 0 -2 0

0 0 0 0 0 0 1 1 1

0 2 0 0 0 0 0 0 0

0 0 0 0 0 0 6 2 0

Here we used the function zeros(n,m) which defines an n×m matrix filled with zeros.

To proceed we have to know what points we are trying to interpolate. We pick four (x, y)values and put them in vectors. Remember that we are assuming that the x values areequally spaced.

>X=[1, 1.5, 2, 2.5];

>Y=[0.5, 0.8, 0.2, 0.4];

We plot these points on a graph.

>plot(X,Y,’o’)

>hold on

Now let’s define the right side of the equation

>b=[Y(2)-Y(1),0,0,Y(3)-Y(2),0,0,Y(4)-Y(3),0,0];

and solve the equation for the coefficients.

>a=S\b’;

Now let’s plot the interpolating function in the first interval. We will use 50 closely spacedpoints to get a smooth looking curve.

>XL = linspace(X(1),X(2),50);

Put the first set of coefficients (α1, β1, γ1, y1) into a vector

>p = [a(1) a(2) a(3) Y(1)];

33

I Linear Equations

Now we put the values p1(x) into the vector YL. First we define the values (x−x1)/L and putthem in the vector XLL. To get the values x− x1 we want to subtract the vector with X(1)

in every position from X. The vector with X(1) in every position can be obtained by takinga vector with 1 in every position (in MATLAB/Octave this is obtained using the functionones(n,m)) and multiplying by the number X(1). Then we divide by the (constant) spacingbetween the xi values.

>L = X(2)-X(1);

>XLL = (XL - X(1)*ones(1,50))/L;

Now we evaluate the polynomial p1(x) and plot the resulting points.

>YL = polyval(p,XLL);

>plot(XL,YL);

To complete the plot, we repeat this steps for the intervals (x2, x3) and (x3, x4).


>p = [a(4) a(5) a(6) Y(2)];

>XLL = (XL - X(2)*ones(1,50))/L;


>plot(XL,YL);


>p = [a(7) a(8) a(9) Y(3)];

>XLL = (XL - X(3)*ones(1,50))/L;


>plot(XL,YL);

The result looks like this:

34

I.2 Interpolation

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 1.2 1.4 1.6 1.8 2 2.2 2.4

I have automated the procedure above and put the result in two files splinemat.m andplotspline.m. splinemat(n) returns the 3(n−1)×3(n−1) matrix used to compute a splinethrough n points while plotspline(X,Y) plots the cubic spline going through the points inX and Y. If you put these files in you MATLAB/Octave directory you can use them like this:

>splinemat(3)

ans =

1 1 1 0 0 0

3 2 1 0 0 -1

6 2 0 0 -2 0

0 0 0 1 1 1

0 2 0 0 0 0

0 0 0 6 2 0

and

>X=[1, 1.5, 2, 2.5];

>Y=[0.5, 0.8, 0.2, 0.4];

>plotspline(X,Y)

35

I Linear Equations

to produce the plot above.

Let’s use these functions to compare the cubic spline interpolation with the Lagrangeinterpolation by using the same points as we did before. Remember that we started with thepoints

>X=linspace(0,1,18);

>Y=sin(2*pi*X);

Let’s plot the spline interpolation of these points

>plotspline(X,Y);

Here is the result with the Lagrange interpolation added (in red). The red (Lagrange) curvecovers the blue one and its impossible to tell the curves apart.

-1.5

-1

-0.5

0

0.5

1

1.5

0 0.2 0.4 0.6 0.8 1

Now we move one of the points slightly, as before.

>Y(5) = Y(5)+0.02;

Again, plotting the spline in blue and the Lagrange interpolation in red, here are the results.

36

I.2 Interpolation

-1.5

-1

-0.5

0

0.5

1

1.5

0 0.2 0.4 0.6 0.8 1

This time the spline does a much better job! Let’s check the condition number of thematrix for the splines. Recall that there are 18 points.

>cond(splinemat(18))

ans = 32.707

Recall the Vandermonde matrix had a condition number of 1.8822e+14. This shows thatthe system of equations for the splines is very much better conditioned, by 13 orders ofmagnitude!!

Code for splinemat.m and plotspline.m

function S=splinemat(n)

L=[1 1 1;3 2 1;6 2 0];

M=[0 0 0;0 0 -1; 0 -2 0];

Z=zeros(3,3);

T=[0 0 0;0 2 0; 0 0 0];

V=[1 1 1;0 0 0;6 2 0];

S=zeros(3*(n-1),3*(n-1));

for k=[1:n-2]

37

I Linear Equations

for l=[1:k-1]

S(3*k-2:3*k,3*l-2:3*l) = Z;

end

S(3*k-2:3*k,3*k-2:3*k) = L;

S(3*k-2:3*k,3*k+1:3*k+3) = M;

for l=[k+2:n-1]

S(3*k-2:3*k,3*l-2:3*l) = Z;

end

end

S(3*(n-1)-2:3*(n-1),1:3)=T;

for l=[2:n-2]

S(3*(n-1)-2:3*(n-1),3*l-2:3*l) = Z;

end

S(3*(n-1)-2:3*(n-1),3*(n-1)-2:3*(n-1))=V;

end

function plotspline(X,Y)

n=length(X);

L=X(2)-X(1);

S=splinemat(n);

b=zeros(1,3*(n-1));

for k=[1:n-1]

b(3*k-2)=Y(k+1)-Y(k);

b(3*k-1)=0;

b(3*k)=0;

end

a=S\b’;

npoints=50;

XL=[];

YL=[];

for k=[1:n-1]

XL = [XL linspace(X(k),X(k+1),npoints)];

p = [a(3*k-2),a(3*k-1),a(3*k),Y(k)];

XLL = (linspace(X(k),X(k+1),npoints) - X(k)*ones(1,npoints))/L;

YL = [YL polyval(p,XLL)];

end

plot(X,Y,’o’)

38

I.2 Interpolation

hold on

plot(XL,YL)

hold off

I.2.6 Summary of MATLAB/Octave commands used in this section

How to access elements of a vector

a(i) returns the i-th element of the vector a

How to create a vector with linearly spaced elements

linspace(x1,x2,n) generates n points between the values x1 and x2.

How to create a matrix by concatenating other matrices

C= [A B] takes two matrices A and B and creates a new matrix C by concatenating A andB horizontally

Other specialized matrix functions

zeros(n,m) creates a n-by-m matrix filled with zeros

ones(n,m) creates a n-by-m matrix filled with ones

vander(X) creates the Vandermonde matrix corresponding to the points in the vector X.Note that the columns of the Vandermonde matrix are powers of the vector X.

Other useful functions and commands

polyval(a,X) takes a vector X of x values and returns a vector containing the values ofa polynomial p evaluated at the x values. The coefficients of the polynomial p (indescending powers) are the values in the vector a.

sin(X) takes a vector X of values x and returns a vector containing the values of the functionsinx

plot(X,Y) plots vector Y versus vector X. Points are joined by a solid line. To change linetypes (solid, dashed, dotted, etc.) or plot symbols (point, circle, star, etc.), include anadditional argument. For example, plot(X,Y,’o’) plots the points as little circle.

39

I Linear Equations

I.3 Finite difference approximations



• explain what it is meant by a boundary value problem.


• Take a second order linear boundary value problem and write down the correspondingfinite difference equation.

• Use the finite difference equation and MATLAB/Octave to compute an approximatesolution.

• Use the MATLAB/Octave command diag.

• Describe the action of . (period) before a MATLAB/Octave operator.

I.3.1 Introduction and example

One of the most important applications of linear algebra is the approximate solution ofdifferential equations. In a differential equation we are trying to solve for an unknownfunction. The basic idea is to turn a differential equation into a system of N × N linearequations. As N becomes large, the vector solving the system of linear equations becomes abetter and better approximation to the function solving the differential equation.

In this section we will learn how to use linear algebra to find approximate solutions to aboundary value problem of the form

f ′′(x) + q(x)f(x) = r(x) for 0 ≤ x ≤ 1

subject to boundary conditions

f(0) = A, f(1) = B.

This is a differential equation where the unknown quantity to be found is a function f(x).The functions q(x) and r(x) are given (known) functions.

As differential equations go, this is a very simple one. For one thing it is an ordinarydifferential equation (ODE), because it only involves one independent variable x. But thefinite difference methods we will introduce can also be applied to partial differential equations(PDE).

It can be useful to have a picture in your head when thinking about an equation. Here isa situation where an equation like the one we are studying arises. Suppose we want to findthe shape of a stretched hanging cable. The cable is suspended above the points x = 0 andx = 1 at heights of A and B respectively and hangs above the interval 0 ≤ x ≤ 1. Our goalis to find the height f(x) of the cable above the ground at every point x between 0 and 1.

40


x0 1

A

B

u(x)

The loading of the cable is described by a function 2r(x) that takes into account both theweight of the cable and any additional load. Assume that this is a known function. Theheight function f(x) is the function that minimizes the sum of the stretching energy and thegravitational potential energy given by

E[f ] =

∫ 1

0[(f ′(x))2 + 2r(x)f(x)]dx

subject to the condition that f(0) = A and f(1) = B. An argument similar (but easier) tothe one we did for splines shows that the minimizer satisfies the differential equation

f ′′(x) = r(x).

So we end up with the special case of our original equation where q(x) = 0. Actually,this special case can be solved by simply integrating twice and adjusting the constants ofintegration to ensure f(0) = A and f(1) = B. For example, when r(x) = r is constant andA = B = 1, the solution is f(x) = 1 − rx/2 + rx2/2. We can use this exact solution tocompare against the approximate solution that we will compute.

I.3.2 Discretization

In the finite difference approach to solving differential equations approximately, we want toapproximate a function by a vector containing a finite number of sample values. Pick equallyspaced points xk = k/N , k = 0, . . . , N between 0 and 1. We will represent a function f(x)by its values fk = f(xk) at these points. Let

F =

f0

f1...fN

.

41

I Linear Equations

x

f(x)

x x x x x x x x x0 1 2 3 4 5 6 7 8

f f f f f f f ff0 1 2 3 4 5 6 7 8

At this point we throw away all the other information about the function, keeping only thevalues at the sampled points.

x

f(x)

x x x x x x x x x0 1 2 3 4 5 6 7 8

f f f f f f f ff0 1 2 3 4 5 6 7 8

If this is all we have to work with, what should we use as an approximation to f ′(x)? Itseems reasonable to use the slopes of the line segments joining our sampled points.

x

f(x)

x x x x x x x x x0 1 2 3 4 5 6 7 8

f f f f f f f ff0 1 2 3 4 5 6 7 8

Notice, though, that there is one slope for every interval (xi, xi+1) so the vector containingthe slopes has one fewer entry than the vector F. The formula for the slope in the interval

42


(xi, xi+1) is (fi+1 − fi)/∆x where the distance ∆x = xi+1 − xi (in this case ∆x = 1/N).Thus the vector containing the slopes is

F′ = (∆x)−1

f1 − f0

f2 − f1

f3 − f2...

fN − fN−1

= (∆x)−1

−1 1 0 0 · · · 00 −1 1 0 · · · 00 0 −1 1 · · · 0...

......

.... . .

...0 0 0 0 · · · 1

f0

f1

f2

f3...fN

= (∆x)−1DNF

where DN is the N × (N + 1) finite difference matrix in the formula above. The vector F′ isour approximation to the first derivative function f ′(x).

To approximate the second derivative f ′′(x), we repeat this process to define the vector F′′.There will be one entry in this vector for each adjacent pair of slopes, that is, each adjacentpair of entries of F′. These are naturally labelled by the interior points x1, x2, . . . , xn−1.Thus we obtain

F′′ = (∆x)−2DN−1DNF = (∆x)−2

1 −2 1 0 · · · 0 0 00 1 −2 1 · · · 0 0 00 0 1 −2 · · · 0 0 0...

......

.... . .

......

...0 0 0 0 · · · 1 −2 1

f0

f1

f2

f3...fN

.

Let rk = r(xk) be the sampled points for the load function r(x) and define the vectorapproximation for r at the interior points

r =

r1...

rN−1

.

The reason we only define this vector for interior points is that that is where F′′ is defined.Now we can write down the finite difference approximation to f ′′(x) = r(x) as

(∆x)−2DN−1DNF = r or DN−1DNF = (∆x)2r

This is a system of N − 1 equations in N + 1 unknowns. To get a unique solution, we needtwo more equations. That is where the boundary conditions come in! We have two boundaryconditions, which in this case can simply be written as f0 = A and fN = B. Combining these

43

I Linear Equations

with the N − 1 equations for the interior points, we may rewrite the system of equations as

1 0 0 0 · · · 0 0 01 −2 1 0 · · · 0 0 00 1 −2 1 · · · 0 0 00 0 1 −2 · · · 0 0 0...

......

.... . .

......

...0 0 0 0 · · · 1 −2 10 0 0 0 · · · 0 0 1

F =

A(∆x)2r1(∆x)2r2

...(∆x)2rN−1

B

.

Note that it is possible to incorporate other types of boundary conditions by simply changingthe first and last equations.

Let’s define L to be the (N + 1) × (N + 1) matrix of coefficients for this equation, so thatthe equation has the form

LF = b.

The first thing to do is to verify that L is invertible, so that we know that there isa unique solution to the equation. It is not too difficult to compute the determinantif you recall that the elementary row operations that add a multiple of one row to an-other do not change the value of the determinant. Using only this type of elementaryrow operation, we can reduce L to an upper triangular matrix whose diagonal entries are1,−2,−3/2,−4/3,−5/4, . . . ,−N/(N−1), 1. The determinant is the product of these entries,and this equals ±N . Since this value is not zero, the matrix L is invertible.

It is worthwhile pointing out that a change in boundary conditions (for example, prescribingthe values of the derivative f ′(0) and f ′(1) rather than f(0) and f(1)) results in a differentmatrix L that may fail to be invertible.

We should also ask about the condition number of L to determine how large the relativeerror of the solution can be. We will compute this using MATLAB/Octave below.

Now let’s use MATLAB/Octave to solve this equation. We will start with the test casewhere r(x) = 1 and A = B = 1. In this case we know that the exact solution is f(x) =1 − x/2 + x2/2.

We will work with N = 50. Notice that, except for the first and last rows, L has a constantvalue of −2 on the diagonal, and a constant value of 1 on the off-diagonals immediately aboveand below.

Before proceeding, we introduce the MATLAB/Octave command diag. For any vector D,diag(D) is a diagonal matrix with the entries of D on the diagonal. So for example

>D=[1 2 3 4 5];

>diag(D)

44


ans =

1 0 0 0 0

0 2 0 0 0

0 0 3 0 0

0 0 0 4 0

0 0 0 0 5

An optional second argument offsets the diagonal. So, for example

>D=[1 2 3 4];

>diag(D,1)

ans =

0 1 0 0 0

0 0 2 0 0

0 0 0 3 0

0 0 0 0 4

0 0 0 0 0

>diag(D,-1)

ans =

0 0 0 0 0

1 0 0 0 0

0 2 0 0 0

0 0 3 0 0

0 0 0 4 0

Now returning to our matrix L we can define it as

>N=50;

>L=diag(-2*ones(1,N+1)) + diag(ones(1,N),1) + diag(ones(1,N),-1);

>L(1,1) = 1;

>L(1,2) = 0;

>L(N+1,N+1) = 1;

>L(N+1,N) = 0;

The condition number of L for N = 50 is

45

I Linear Equations

>cond(L)

ans = 1012.7

We will denote the right side of the equation by b. To start, we will define b to be (∆x)2r(xi)and then adjust the first and last entries to account for the boundary values. Recall thatr(x) is the constant function 1, so its sampled values are all 1 too.

>dx = 1/N;

>b=ones(N+1,1)*dx^2;

>A=1; B=1;

>b(1) = A;

>b(N+1) = B;

Now we solve the equation for F.

>F=L\b;

The x values are N + 1 equally spaced points between 0 and 1,

>X=linspace(0,1,N+1);

Now we plot the result.

>plot(X,F)

0.88

0.9

0.92

0.94

0.96

0.98

1

0 0.2 0.4 0.6 0.8 1

46


Let’s superimpose the exact solution in red.

>hold on

>plot(X,ones(1,N+1)-X/2+X.^2/2,’r’)

(The . before an operator tells MATLAB/Octave to apply that operator element by element,so X.^2 returns an array with each element the corresponding element of X squared.)

0.88

0.9

0.92

0.94

0.96

0.98

1

0 0.2 0.4 0.6 0.8 1

The two curves are indistinguishable.

What happens if we increase the load at a single point? Recall that we have set the loadingfunction r(x) to be 1 everywhere. Let’s increase it at just one point. Adding, say, 5 to one ofthe values of r is the same as adding 5(∆x)2 to the right side b. So the following commandsdo the job. We are changing b11 which corresponds to changing r(x) at x = 0.2.

>b(11) = b(11) + 5*dx^2;

>F=L\b;

>hold on

>plot(X,F);

Before looking at the plot, let’s do this one more time, this time making the cable reallyheavy at the same point.

47

I Linear Equations

>b(11) = b(11) + 50*dx^2;

>F=L\b;

>hold on

>plot(X,F);

Here is the resulting plot.

0.75

0.8

0.85

0.9

0.95

1

0 0.2 0.4 0.6 0.8 1

So far we have only considered the case of our equation f ′′(x) + q(x)f(x) = r(x) whereq(x) = 0. What happens when we add the term containing q? We must sample the functionq(x) at the interior points and add the corresponding vector. Since we multiplied the equa-tions for the interior points by (∆x)2 we must do the same to these terms. Thus we mustadd the term

(∆x)2

0q1f1

q2f2...

qN−1fN−1

0

= (∆x)2

0 0 0 0 · · · 0 0 00 q1 0 0 · · · 0 0 00 0 q2 0 · · · 0 0 00 0 0 q3 · · · 0 0 0...

......

.... . .

......

...0 0 0 0 · · · 0 qN−1 00 0 0 0 · · · 0 0 0

F.

In other words, we replace the matrix L in our equation with L + (∆x)2Q where Q is the(N + 1)× (N + 1) diagonal matrix with the interior sampled points of q(x) on the diagonal.

48


I’ll leave it to a homework problem to incorporate this change in a MATLAB/Octavecalculation. One word of caution: the matrix L by itself is always invertible (with reasonablecondition number). However L + (∆x)2Q may fail to be invertible. This reflects the factthat the original differential equation may fail to have a solution for some choices of q(x) andr(x).

I.3.3 Another example: the heat equation

In the previous example involving the loaded cable there was only one independent variable,x, and as a result we ended up with an ordinary differential equation which determinedthe shape. In this example we will have two independent variables, time t, and one spatialdimension x. The quantities of interest can now vary in both space and time. Thus wewill end up with a partial differential equation which will describe how the physical systembehaves.

Imagine a long thin rod (a one-dimensional rod) where the only important spatial directionis the x direction. Given some initial temperature profile along the rod and boundary condi-tions at the ends of the rod, we would like to determine how the temperature, T = T (x, t),along the rod varies over time.

Consider a small section of the rod between x and x+ ∆x. The rate of change of internalenergy, Q(x, t), in this section is proportional to the heat flux, q(x, t), into and out of thesection. That is

∂Q

∂t(x, t) = −q(x+ ∆x, t) + q(x, t).

Now the internal energy is related to the temperature by Q(x, t) = ρCp∆xT (x, t), where ρand Cp are the density and specific heat of the rod (assumed here to be constant). Also,from Fourier’s law, the heat flux through a point in the rod is proportional to the (negative)temperature gradient at the point, i.e., q(x, t) = −K0∂T (x, t)/∂x, where K0 is a constant(the thermal conductivity); this basically says that heat “flows” from hotter to colder regions.Substituting these two relations into the above energy equation we get

ρCp∆x∂T

∂t(x, t) = K0

(

∂T

∂x(x+ ∆x, t) − ∂T

∂x(x, t)

)

⇒ ∂T

∂t(x, t) =

K0

ρCp

∂T∂x (x+ ∆x, t) − ∂T

∂x (x, t)

∆x.

Taking the limit as ∆x goes to zero we obtain

∂T

∂t(x, t) = k

∂2T

∂x2(x, t),

where k = K0/ρCp is a constant. This partial differential equation is known as the heatequation and describes how the temperature along a one-dimensional rod evolves.

49

I Linear Equations

We can also include other effects. If there was a temperature source or sink, S(x, t), thenthis will contribute to the local change in temperature:

∂T

∂t(x, t) = k

∂2T

∂x2(x, t) + S(x, t).

And if we also allow the rod to cool down along its length (because, say, the surrounding airis a different temperature than the rod), then the differential equation becomes

∂T

∂t(x, t) = k

∂2T

∂x2(x, t) −HT (x, t) + S(x, t),

where H is a constant (here we have assumed that the surrounding air temperature is zero).

In certain cases we can think about what the steady state of the rod will be. That is aftersufficiently long time (so that things have had plenty of time for the heat to “move around”and for things to heat up/cool down), the temperature will cease to change in time. Oncethis steady state is reached, things become independent of time, and the differential equationbecomes

0 = k∂2T

∂x2(x) −HT (x) + S(x),

which is of the same form as the ordinary differential equation that we considered at thestart of this section.

50

Chapter II

Subspaces, Bases and Dimension

51

II Subspaces, Bases and Dimension

II.1 Subspaces, basis and dimension



• Write down a vector as a linear combination of a set of vectors.

• Define linear independence for a collection of vectors.

• Define a basis for a vector subspace.


• Know the definitions of vector addition and scalar multiplication for vector spaces offunctions

• Decide whether a given collection of vectors forms a subspace.

• Recast the dependence or independence of a collection of vectors in Rn or C

n as astatement about existence of solutions to a system of linear equations.

• Decide if a collection of vectors are dependent or independent.

• Define the span of a collection of vectors; show that given a set of vectors v1, . . . ,vk

the span span(v1, . . . ,vk) is a subspace.

• Describe the significance of the two parts (independence and span) of the definition ofa basis.

• Check if a collection of vectors is a basis.

• Show that any basis for a subspace has the same number of elements.

• Show that any set of k linearly independent vectors v1, . . . ,vk in a k dimensionalsubspace S is a basis of S.

• Define the dimension of a subspace.

52


II.1.1 Vector spaces and subspaces

In your previous linear algebra course, and for most of this course, vectors are n-tuples

x1...xn

of numbers, either real or complex. The sets of all n-tuples, denoted Rn or C

n, are examplesof vector spaces.

In more advanced applications vector spaces of functions often occur. For example, anelectrical signal can be thought of as a real valued function x(t) of time t. If two signals x(t)and y(t) are superimposed, the resulting signal is the sum that has the value x(t) + y(t) attime t. This motivates the definition of vector addition for functions: the vector sum of thefunctions x and y is the new function x+ y defined by (x+ y)(t) = x(t) + y(t). Similarly, ifs is a scalar, the scalar multiple sx is defined by (sx)(t) = sx(t). If you think of t as being acontinuous index, these definitions mirror the componentwise definitions of vector additionand scalar multiplication for vectors in R

n or Cn.

It is possible to give an abstract definition of a vector space as any collection of objects(the vectors) that can be added and multiplied by scalars, provided the addition and scalarmultiplication rules obey a set of rules. We won’t follow this abstract approach in this course.

A collection of vectors V contained in a given vector space is called a subspace if vectoraddition and scalar multiplication of vectors in V stay in V . In other words, for any vectorsv1,v2 ∈ V and any scalars c1 and c2, the vector c1v1 + c2v2 lies in V too.

In three dimensional space R3, examples of subspaces are lines and planes through the

origin. If we add or scalar multiply two vectors lying on the same line (or plane) the resultingvector remains on the same line (or plane). Additional examples of subspaces are the trivialsubspace, containing the single vector 0, as well as the whole space itself.

Here is another example of a subspace. The set of n × n matrices can be thought of asan n2 dimensional vector space. Within this vector space, the set of symmetric matrices(satisfying AT = A) is a subspace. To see this, suppose A1 and A2 are symmetric. Then,using the linearity property of the transpose, we see that

(c1A1 + c2A2)T = c1A

T1 + c2A

T2 = c1A1 + c2A2

which shows that c1A1 + c2A2 is symmetric too.

We have encountered subspaces of functions in the section on interpolation. In Lagrangeinterpolation we considered the set of all polynomials of degree at most m. This is a subspaceof the space of functions, since adding two polynomials of degree at most m results in anotherpolynomial, again of degree at most m, and scalar multiplication of a polynomial of degreeat most m yields another polynomial of degree at most m.

Another example of a subspace of functions is the set of all functions y(t) that satisfy thedifferential equation y′′(t) + y(t) = 0. To check that this is a subspace, we must verify thatif y1(t) and y2(t) both solve the differential equation, then so does c1y1(t) + c2y2(t) for anychoice of scalars c1 and c2.

53


II.1.2 Linear dependence and independence

To begin we review the definition of linear dependence and independence. A linear combi-nation of vectors v1, . . . ,vk is a vector of the form

k∑

i=1

civi = c1v1 + c2v2 + · · · + ckvk

for some choice of numbers c1, c2, . . . , ck.

The vectors v1, . . . ,vk are called linearly dependent if there exist numbers c1, c2, . . . , ckthat are not all zero, such that the linear combination

∑ki=1 civi = 0

On the other hand, the vectors are called linearly independent if the only linear combinationof the vectors equaling zero has every ci = 0. In other words

k∑

i=1

civi = 0 implies c1 = c2 = · · · = ck = 0

For example, the vectors

111

,

101

and

717

are linearly dependent because

1

111

+ 6

101

− 1

717

=

000

If v1, . . . ,vk are linearly dependent, then at least one of the vi’s can be written as a linearcombination of the others. To see this suppose that

c1v1 + c2v2 + · · · + ckvk = 0

with not all of the ci’s zero. Then we can solve for any of the vi’s whose coefficient ci is notzero. For instance, if c1 is not zero we can write

v1 = −(c2/c1)v2 − (c3/c1)v3 − · · · − (ck/c1)vk

This means any linear combination we can make with the vectors v1, . . . ,vk can be achievedwithout using v1, since we can simply replace the occurrence of v1 with the expression onthe right.

Sometimes it helps to have a geometrical picture. In three dimensional space R3, three

vectors are linearly dependent if they lie in the same plane.

The columns of a matrix in echelon form are linearly independent if and only if everycolumn is a pivot column. We illustrate this with two examples.

54


The matrix

1 ∗ ∗0 2 ∗0 0 3

is an example of a matrix in echelon form where each column is a

pivot column. Here ∗ denotes an arbitrary entry.

To see that the columns are linearly independent suppose that

c1

100

+ c2

∗20

+ c3

∗∗3

=

000

Then, equating the bottom entries we find 3c3 = 0 so c3 = 0. But once we know c3 = 0 thenthe equation reads

c1

100

+ c2

∗20

=

000

which implies that c2 = 0 too, and similarly c1 = 0

Similarly, for a matrix in echelon form (even if, as in the example below, it is not completelyreduced), the pivot columns are linearly independent. For example the first, second and fifthcolumns in the matrix

1 1 1 1 00 1 2 5 50 0 0 0 1

are independent. However, the non-pivot columns can be written as linear combination ofthe pivot columns. For example

120

= −

100

+ 2

110

so if there are non-pivot columns, then the set of all columns is linearly dependent. This isparticularly easy to see for a matrix in reduced row echelon form, like

1 0 1 1 00 1 2 5 00 0 0 0 1

.

In this case the pivot columns are standard basis vectors (see below), which are obviouslyindependent. It is easy to express the other columns as linear combinations of these.

Recall that for a matrix U in echelon form, the presence or absence of non-pivot columnsdetermines whether the homogeneous equation Ux = 0 has any non-zero solutions. By thediscussion above, we can say that the columns of a matrix U in echelon form are linearlydependent exactly when the homogeneous equation Ux = 0 has a non-zero solution.

In fact, this is true for any matrix. Suppose that the vectors v1, . . . ,vk are the columns ofa matrix A so that

A =[

v1 v2 · · · vk

]

.

55


If we put the coefficients c1, c2, . . . , ck into a vector

c =

c1c2...ck

thenAc = c1v1 + c2v2 + · · · + ckvk

is the linear combination of the columns v1, . . . ,vk with coefficients ci.

Now it follows directly from the definition of linear dependence that the columns of A arelinearly dependent if there is a non-zero solution c to the homogeneous equation

Ac = 0

On the other hand, if the only solution to the homogeneous equation is c = 0 then thecolumns v1, . . . ,vk are linearly independent.

To compute whether a given collection of vectors is dependent or independent we can placethem in the columns of a matrix A and reduce to echelon form. If the echelon form has onlypivot columns, then the vectors are independent. On the other hand, if the echelon form hassome non-pivot columns, then the equation Ac = 0 has some non-zero solutions and so thevectors are dependent.

Let’s try this with the vectors in the example above in MATLAB/Octave.

>V1=[1 1 1]’;

>V2=[1 0 1]’;

>V3=[7 1 7]’;

>A=[V1 V2 V3]

A =

1 1 7

1 0 1

1 1 7

>rref(A)

ans =

1 0 1

0 1 6

0 0 0

Since the third column is a non-pivot column, the vectors are linearly dependent.

56


II.1.3 Span

Given a collection of vectors v1, . . . ,vk we may form a subspace of all possible linear combi-nations. This subspace is called span(v1, . . . ,vk) or the space spanned by the vi’s. It is a sub-space because if we start with any two elements of span(v1, . . . ,vk), say c1v1+c2v2+· · ·+ckvk

and d1v1 + d2v2 + · · ·+ dkvk then a linear combination of these linear combinations is againa linear combination since

s1(c1v1 + c2v2 + · · · + ckvk) + s2(d1v1 + d2v2 + · · · + dkvk) =

(s1c1 + s2d1)v1 + (s1c2 + s2d2)v2 + · · · + (s1ck + s2dk)vk

For example the span of the three vectors

100

,

010

and

001

is the whole three dimensional

space, because every vector is a linear combination of these. The span of the four vectors

100

,

010

,

001

and

111

is the same.

II.1.4 Basis

A collection of vectors v1, . . . ,vk contained in a subspace V is called a basis for that subspaceif

1. span(v1, . . . ,vk) = V , and

2. v1, . . . ,vk are linearly independent.

Condition (1) says that any vector in V can be written as a linear combination of v1, . . . ,vk.Condition (2) says that there is exactly one way of doing this. Here is the argument. Supposethere are two ways of writing the same vector v ∈ V as a linear combination:

v = c1v1 + c2v2 + · · · + ckvk

v = d1v1 + d2v2 + · · · + dkvk

Then by subtracting these equations, we obtain

0 = (c1 − d1)v1 + (c2 − d2)v2 + · · · + (ck − dk)vk

Linear independence now says that every coefficient in this sum must be zero. This impliesc1 = d1, c2 = d2 . . . ck = dk.

57


Example: Rn has the standard basis e1, e2, . . . , en where

e1 =

10

...

e2 =

01

...

· · ·

Another basis for R2 is

[

11

]

,

[

1−1

]

. To see this, notice that saying that any vector y can

be written in a unique way as c1

[

11

]

+ c2

[

1−1

]

is the same as saying that the equation

[

1 11 −1

] [

c1c2

]

= x

always has a unique solution. This is true.

A basis for the vector space P2 of polynomials of degree at most two is given by {1, x, x2}.These polynomials clearly span P2 since every polynomial p ∈ P2 can be written as a linearcombination p(x) = c0 ·1+c1x+c2x

2. To show independence, suppose that c0 ·1+c1x+c2x2

is the zero polynomial. This means that c0 · 1 + c1x+ c2x2 = 0 for every value of x. Taking

the first and second derivatives of this equation yields that c1 + 2c2x = 0 and 2c2 = 0 forevery value of x. Substituting x = 0 into each of these equations we find c0 = c1 = c2 = 0.

Notice that if we represent the polynomial p(x) = c0 · 1 + c1x + c2x2 ∈ P2 by the vector

of coefficients

c0c1c2

∈ R3, then the vector space operations in P2 are mirrored perfectly in

R3. In other words, adding or scalar multiplying polynomials in P2 is the same as adding or

scalar multiplying the corresponding vectors of coefficients in R3.

This sort of correspondence can be set up whenever we have a basis v1,v2, . . . ,vk for avector space V . In this case every vector v has a unique representation c1v1+c2v2+· · ·+ckvk

and we can represent the vector v ∈ V by the vector

c1c2...ck

∈ Rk (or C

k). In some sense this

says that we can always think of finite dimensional vector spaces as being copies of Rn or

Cn. The only catch is that the the correspondence that gets set up between vectors in V and

vectors in Rn or C

n depends on the choice of basis.

It is intuitively clear that, say, a plane in three dimensions will always have a basis of twovectors. Here is an argument that shows that any two bases for a subspace S of R

k or Ck

will always have the same number of elements. Let v1, . . . ,vn and w1, . . . ,wm be two bases

58


for a subspace S. Let’s try to show that n must be the same as m. Since the vi’s span V wecan write each wi as a linear combination of vi’s. We write

wj =

n∑

i=1

ai,jvi

for each j = 1, . . . ,m. Let’s put all the coefficients into an n ×m matrix A = [ai,j]. If weform the matrix k×m matrix W = [w1|w2| · · · |wm] and the k×n matrix V = [v1|v2| · · · |vm]then the equation above can be rewritten

W = V A

To understand this construction consider the two bases

10−1

,

1−10

and

42−6

,

1−21

for a subspace in R3 (in fact this subspace is the plane through the origin with normal vector

111

). Then we may write

42−6

= 6

10−1

− 2

1−10

1−21

= −

10−1

+ 2

1−10

and the equation W = V A for this example reads

4 12 −2−6 1

=

1 10 −1−1 0

[

6 −1−2 2

]

Returning now to the general case, suppose that m > n. Then A has more columns thanrows. So its echelon form must have some non-pivot columns which implies that there mustbe some non-zero solution to Ac = 0. Let c 6= 0 be such a solution. Then

Wc = V Ac = 0

But this is impossible because the columns of W are linearly dependent. So it can’t be truethat m > n. Reversing the roles of V and W we find that n > m is impossible too. So itmust be that m = n.

We have shown that any basis for a subspace S has the same number of elements. Thusit makes sense to define the dimension of a subspace S to be the number of elements in any

59


basis for S.

Here is one last fact about bases: any set of k linearly independent vectors {v1, . . . ,vk} ina k dimensional subspace S automatically spans S and is therefore a basis. To see this (inthe case that S is a subspace of R

n or Cn) we let {w1, . . . ,wk} be a basis for S, which also

will have k elements. Form V = [v1| · · · |vk] and W = [w1| · · · |wk]. Then the constructionabove gives V = WA for a k×k matrix A. The matrix A must be invertible. Otherwise therewould be a non-zero solution c to Ac = 0. This would imply V c = WAc = 0 contradictingthe independence of the rows of V . Thus we can write W = V A−1 which shows that everywk is a linear combination of vi’s.

This shows that the vi’s must span S because every vector in S is a linear combination ofthe basis vectors wk’s which in turn are linear combinations of the vi’s.

As an example of this, consider again the space P2 of polynomials of degree at most 2.We claim that the polynomials {1, (x − a), (x − a)2} (for any constant a) form a basis. Wealready know that the dimension of this space is 3, so we only need to show that these threepolynomials are independent. The argument for that is almost the same as before.

II.2 The four fundamental subspaces for a matrix


• Recognize and use the property of transposes for which (AB)T = BTAT for any ma-trices A and B.

• Define the inner (dot) product of two vectors, and its properties (symmetry, linearity),and explain its geometrical meaning.

• Use the inner product to decide if two vectors are orthogonal, and to compute the anglebetween two vectors.

• State the Cauchy-Schwarz inequality and know for which vectors the inequality is anequality.


• Define the four fundamental subspaces N(A), R(A), N(AT ), and R(AT ), associated toa matrix A and its transpose AT .

• Express the Gaussian elimination process performed to reduce a matrix A to its rowreduced echelon form matrix U as a matrix factorization, A = EU , using elementarymatrices, and be able to perform the steps using MATLAB/Octave.

• Compute bases for each of the four fundamental subspaces N(A), R(A), N(AT ) andR(AT ) of a matrix A.

60


• Be able to compute the rank of a matrix.

• State the formulas for the dimension of each of the four subspaces and be able to explainwhy they are true.

• Explain what it means for two subspaces to be orthogonal (V ⊥ W ) and for onesubspace to be the orthogonal complement of another (V = W⊥).

• State which of the fundamental subspaces are orthogonal to each other and explainwhy, verify the orthogonality relations in examples, and use the orthogonality relationfor R(A) to test whether the equation Ax = b has a solution.

• Use MATLAB/Octave to compute the inner product of two vectors and the anglebetween them.

• Be familiar with the MATLAB/Octave function eye().

II.2.1 Nullspace N(A) and Range R(A)

There are two important subspaces associated to any matrix. Let A be an n×m matrix. Ifx is m dimensional, then Ax makes sense and is a vector in n dimensional space.

The first subspace associated to A is the nullspace (or kernel) of A denoted N(A) (orKer(A)). It is defined as all vectors x solving the homogeneous equation for A, that is

N(A) = {x : Ax = 0}

This is a subspace because if Ax1 = 0 and Ax2 = 0 then

A(c1x1 + c2x2) = c1Ax1 + c2Ax2 = 0 + 0 = 0.

The nullspace is a subspace of m dimensional space Rm.

The second subspace is the range (or column space) of A denoted R(A) (or C(A)). It isdefined as all vectors of the form Ax for some x. From our discussion above, we see thatR(A) is the the span (or set of all possible linear combinations) of its columns. This explainsthe name “column space”. The range is a subspace of n dimensional space R

n.

The four fundamental subspaces for a matrix are the nullspace N(A) and range R(A) forA together with the nullspace N(AT ) and range R(AT ) for the transpose AT .

61


II.2.2 Finding basis and dimension of N(A)

Example: Let

A =

1 3 3 102 6 −1 −11 3 1 4

.

To calculate a basis for the nullspace N(A) and determine its dimension we need to find thesolutions to Ax = 0. To do this we first reduce A to reduced row echelon form U and solveUx = 0 instead, since this has the same solutions as the original equation.

>A=[1 3 3 10;2 6 -1 -1;1 3 1 4];

>rref(A)

ans =

1 3 0 1

0 0 1 3

0 0 0 0

This means that x =

x1

x2

x3

x4

is in N(A) if

1 3 0 10 0 1 30 0 0 0

x1

x2

x3

x4

=

000

We now divide the variables into basic variables, corresponding to pivot columns, and freevariables, corresponding to non-pivot columns. In this example the basic variables are x1

and x3 while the free variables are x2 and x4. The free variables are the parameters in thesolution. We can solve for the basic variables in terms of the free ones, giving x3 = −3x4

and x1 = −3x2 − x4. This leads to

x1

x2

x3

x4

=

−3x2 − x4

x2

−3x4

x4

= x2

−3100

+ x4

−10−31

The vectors

−3100

and

−10−31

span the nullspace since every element of N(A) is a linear

combination of them. They are also linearly independent because if the linear combination

62


on the right of the equation above is zero, then by looking at the second entry of the vec-tor (corresponding to the first free variable) we find x2 = 0 and looking at the last entry(corresponding to the second free variable) we find x4 = 0. So both coefficients must be zero.

To find a basis for N(A) in general we first compute U = rref(A) and determine whichvariables are basic and which are free. For each free variable we form a vector as follows.First put a 1 in the position corresponding to that free variable and a zero in every otherfree variable position. Then fill in the rest of the vector in such a way that Ux = 0. (Thisis easy to do!) The set all such vectors - one for each free variable - is a basis for N(A).

II.2.3 The matrix version of Gaussian elimination

How are a matrix A and its reduced row echelon form U = rref(A) related? If A and U aren×m matrices, then there exists an invertible n× n matrix such that

A = EU E−1A = U

This immediately explains why the N(A) = N(U), because if Ax = 0 then Ux = E−1Ax = 0

and conversely if Ax = 0 then Ux = EAx = 0.

What is this matrix E? It can be thought of as a matrix record of the Gaussian eliminationsteps taken to reduce A to U . It turns out performing an elementary row operation is thesame as multiplying on the left by an invertible square matrix. This invertible square matrix,called an elementary matrix, is obtained by doing the row operation in question to the identitymatrix.

Suppose we start with the matrix

>A=[1 3 3 10;2 6 -1 -1;1 3 1 4]

A =

1 3 3 10

2 6 -1 -1

1 3 1 4

The first elementary row operation that we want to do is to subtract twice the first rowfrom the second row. Let’s do this to the 3 × 3 identity matrix I (obtained with eye(3) inMATLAB/Octave) and call the result E1

>E1 = eye(3)

E1 =

1 0 0

0 1 0

0 0 1

63


>E1(2,:) = E1(2,:)-2*E1(1,:)

E1 =

1 0 0

-2 1 0

0 0 1

Now if we multiply E1 and A we obtain

>E1*A

ans =

1 3 3 10

0 0 -7 -21

1 3 1 4

which is the result of doing that elementary row operation to A. Let’s do one more step.The second row operation we want to do is to subtract the first row from the third. Thuswe define

>E2 = eye(3)

E2 =

1 0 0

0 1 0

0 0 1

>E2(3,:) = E2(3,:)-E2(1,:)

E2 =

1 0 0

0 1 0

-1 0 1

and we find

64


>E2*E1*A

ans =

1 3 3 10

0 0 -7 -21

0 0 -2 -6

which is one step further along in the Gaussian elimination process. Continuing in this wayuntil we eventually arrive at U so that

EkEk−1 · · ·E2E1A = U

Thus A = EU with E = E−11 E−1

2 · · ·E−1k−1E

−1k . For the example above it turns out that

E =

1 3 −62 −1 −181 1 −9

which we can check:

>A=[1 3 3 10;2 6 -1 -1;1 3 1 4]

A =

1 3 3 10

2 6 -1 -1

1 3 1 4

>U=rref(A)

U =

1 3 0 1

0 0 1 3

0 0 0 0

>E=[1 3 -6; 2 -1 -18; 1 1 -9];

>E*U

ans =

1 3 3 10

2 6 -1 -1

1 3 1 4

65


If we do a partial elimination then at each step we can write A = E′U ′ where U ′ is theresulting matrix at the point we stopped, and E′ is obtained from the Gaussian eliminationstep up to that point. A common place to stop is when U ′ is in echelon form, but the entriesabove the pivots have not been set to zero. If we can achieve this without doing any rowswaps along the way then E′ turns out to be lower triangular matrix. Since U ′ is uppertriangular, this is called the LU decomposition of A.

II.2.4 A basis for R(A)

The ranges or column spaces R(A) and R(U) are not the same in general, but they arerelated. In fact, the vectors in R(A) are exactly all the vectors in R(U) multiplied by E,where E is the invertible matrix in the equation A = EU . We can write this relationship as

R(A) = ER(U)

To see this notice that if x ∈ R(U), that is, x = Uy for some y then Ex = EUy = Ay is inR(A). Conversely if x ∈ R(A), that is, x = Ay for some y then x = EE−1Ay = EUy so x

is E times a vector in R(U).

Now if we can find a basis u1,u2, . . . ,uk for R(U), the vectors Eu1, Eu2, . . . , Euk form abasis for R(A). (Homework exercise)

But a basis for the column space R(U) is easy to find. They are exactly the pivot columnsof U . If we multiply these by E we get a basis for R(A). But if

A =

[

a1

∣

∣

∣

∣

∣

a2

∣

∣

∣

∣

∣

· · ·∣

∣

∣

∣

∣

am

]

, U =

[

u1

∣

∣

∣

∣

∣

u2

∣

∣

∣

∣

∣

· · ·∣

∣

∣

∣

∣

um

]

then the equation A = EU can be written

[

a1

∣

∣

∣

∣

∣

a2

∣

∣

∣

∣

∣

· · ·∣

∣

∣

∣

∣

am

]

=

[

Eu1

∣

∣

∣

∣

∣

Eu2

∣

∣

∣

∣

∣

· · ·∣

∣

∣

∣

∣

Eum

]

From this we see that the columns of A that correspond to pivot columns of U form a basisfor R(A). This implies that the dimension of R(A) is the number of pivot columns in U .

II.2.5 The rank of a matrix

We define the rank of the matrix A, denoted r(A) to be the number of pivot columns of U .Then we have shown that for an n×m matrix A

dim(R(A)) = r(A)

dim(N(A)) = m− r(A)

66


II.2.6 Bases for R(AT ) and N(AT )

Of course we could find R(AT ) and N(AT ) by computing the reduced row echelon form forAT and following the steps above. But then we would miss an important relation betweenthe dimensions of these spaces.

Let’s start with the column space R(AT ). The columns of AT are the rows of A (writtenas column vectors instead of row vectors). So R(AT ) is the row space of A.

It turns out that R(AT ) and R(UT ) are the same. This follows from A = EU . To see thistake the transpose of this equation. Then AT = UTET . Now suppose that x ∈ R(AT ). Thismeans that x = ATy for some y. But then x = UTETy = UTy′ where y′ = ETy so x ∈R(UT ). Similarly, if x = UTy for some y then x = UTET (ET )−1y = AT (ET )−1y = ATy′

for y′ = (ET )−1y. So every vector in R(UT ) is also in R(AT ). Here we used that E andhence ET is invertible.

Now we know that R(AT ) = R(UT ) is spanned by the columns of UT . But since UT isin reduced row echelon form, its non-zero columns are independent. Therefore, the non-zerocolumns of UT form a basis for R(AT ). There is one of these for every pivot. This leads to

dim(R(AT )) = r(A) = dim(R(A))

The final subspace to consider is N(AT ). From our work above we know that

dim(N(AT )) = n− dim(R(AT )) = n− r(A).

Finding a basis is trickier. It might be easiest to find the reduced row echelon form of AT .But if we insist on using A = EU or AT = UTET we could proceed by multiplying on theright be the inverse of ET . This gives

AT (ET )−1 = UT

Now notice that the last n− r(A) columns of UT are zero, since U is in reduced row echelonform. So the last n− r(A) columns of (ET )−1 are in the the nullspace of AT . They also haveto be independent, since (ET )−1 is invertible.

Thus the last n− r(A) of (ET )−1 form a basis for N(AT ).

From a practical point of view, this is not so useful since we have to compute the inverseof a matrix. It might be just as easy to reduce AT . (Actually, things are slightly better ifwe use the LU decomposition. The same argument shows that the last n− r(A) columns of(LT )−1 also form a basis for N(AT ). But LT is an upper triangular matrix, so its inverse isfaster to compute.)

67


II.2.7 Orthogonal vectors and subspaces

In preparation for our discussion of the orthogonality relations for the fundamental subspacesof matrix we review some facts about orthogonal vectors and subspaces.

Recall that the dot product, or inner product of two vectors

x =

x1

x2...xn

y =

y1

y2...yn

is denoted by x · y or 〈x,y〉 and defined by

xT y =[

x1 x2 · · · xn

]

y1

y2...yn

=n∑

i=1

xiyi

Some important properties of the inner product are symmetry

x · y = y · x

and linearity(c1x1 + c2x2) · y = c1x1 · y + c2x2 · y.

The (Euclidean) norm, or length, of a vector is given by

‖x‖ =√

x · x =

√

√

√

√

n∑

i=1

x2i

An important property of the norm is that ‖x‖ = 0 implies that x = 0.

The geometrical meaning of the inner product is given by

x · y = ‖x‖‖y‖ cos(θ)

where θ is the angle between the vectors. The angle θ can take values from 0 to π.

The Cauchy–Schwarz inequality states

|x · y| ≤ ‖x‖‖y‖.

It follows from the previous formula because | cos(θ)| ≤ 1. The only time that equality occursin the Cauchy–Schwarz inequality, that is x ·y = ‖x‖‖y‖, is when cos(θ) = ±1 and θ is either0 or π. This means that the vectors are pointed in the same or in the opposite directions.

68


The vectors x and y are orthogonal if x · y = 0. Geometrically this means either that oneof the vectors is zero or that they are at right angles. This follows from the formula above,since cos(θ) = 0 implies θ = π/2.

Another way to see that x · y = 0 means that vectors are orthogonal is from Pythagoras’formula. If x and y are at right angles then ‖x‖2 + ‖y‖2 = ‖x + y‖2.

x

yx y+

But ‖x + y‖2 = (x + y) · (x + y) = ‖x‖2 + ‖y‖2 + 2x · y so Pythagoras’ formula holdsexactly when x · y = 0.

To compute the inner product of (column) vectors X and Y in MATLAB/Octave we usethe formula x · y = xTy. Thus the inner product can be computed using X’*Y. (If X and Y

are row vectors, the formula is X*Y’.)

The norm of a vector X is computed by norm(X). In MATLAB/Octave inverse trig functionsare computed with asin(), acos() etc. So the angle between column vectors X and Y couldbe computed as

> acos(X’*Y/(norm(X)*norm(Y)))

Two subspaces V and W are said to be orthogonal if every vector in V is orthogonal toevery vector in W . In this case we write V ⊥W .

V

W

TS

69


In this figure V ⊥W and also S ⊥ T .

A related concept is the orthogonal complement. The orthogonal complement of V , denotedV ⊥, is the subspace containing all vectors orthogonal to V . In the figureW = V ⊥ but T 6= S⊥

since T contains only some of the vectors orthogonal to S.

If we take the orthogonal complement of V ⊥ we get back the original space V : This iscertainly plausible from the pictures. It is also obvious that V ⊆ (V ⊥)⊥, since any vectorin V is perpendicular to vectors in V ⊥. If there were a vector in (V ⊥)⊥ not contained inV we could subtract its projection onto V (defined in the next chapter) and end up with anon-zero vector in (V ⊥)⊥ that is also in V ⊥. Such a vector would be orthogonal to itself,which is impossible. This shows that

(V ⊥)⊥ = V.

One consequence of this formula is that V = W⊥ implies V ⊥ = W . Just take the orthogonalcomplement of both sides and use (W⊥)⊥ = W .

II.2.8 Orthogonality relations for the fundamental subspaces of a matrix

Let A be an n ×m matrix. Then N(A) and R(AT ) are subspaces of Rm while N(AT ) and

R(A) are subspaces of Rn.

These two pairs of subspaces are orthogonal:

N(A) = R(AT )⊥

N(AT ) = R(A)⊥

We will show that the first equality holds for any A. The second equality then follows byapplying the first one to AT .

These relations are based on the formula

(AT x) · y = x · (Ay)

This formula follows from the product formula (AB)T = BTAT for transposes, since

(AT x) · y = (AT x)Ty = xT (AT )Ty = xTAy = x · (Ay)

First, we show that N(A) ⊆ R(AT )⊥. To do this, start with any vector x ∈ N(A). Thismeans that Ax = 0. If we compute the inner product of x with any vector in R(AT ), thatis, any vector of the form ATy, we get (AT y) · x = y · Ax = y · 0 = 0. Thus x ∈ R(AT )⊥.This shows N(A) ⊆ R(AT )⊥.

Now we show the opposite inclusion, R(AT )⊥ ⊆ N(A). This time we start with x ∈R(AT )⊥. This means that x is orthogonal to every vector in R(AT ), that is, to every

70


vector of the form AT y. So (AT y) · x = y · (Ax) = 0 for every y. Pick y = Ax. Then(Ax) · (Ax) = ‖Ax‖2 = 0. This implies Ax = 0 so x ∈ N(A). We can conclude thatR(AT )⊥ ⊆ N(A).

These two inclusions establish that N(A) = R(AT )⊥.

Let’s verify these orthogonality relations in an example. Let

A =

1 2 1 11 3 0 12 5 1 2

Then

rref(A) =

1 0 3 10 1 −1 00 0 0 0

rref(AT ) =

1 0 10 1 10 0 00 0 0

Thus we get

N(A) = span

−3110

,

−1001

R(A) = span

112

,

235

N(AT ) = span

−1−1

1

R(AT ) = span

1031

,

01

−10

We can now verify directly that every vector in the basis for N(A) is orthogonal to everyvector in the basis for R(AT ), and similarly for N(AT ) and R(A).

Does the equation

Ax =

213

have a solution? We can use the ideas above to answer this question easily. We are really

asking whether

213

is contained in R(A). But, according to the orthogonality relations, this

71


is the same as asking whether

213

is contained in N(AT )⊥. This is easy to check. Simply

compute the dot product

213

·

−1−1

1

= −2 − 1 + 3 = 0.

Since the result is zero, we conclude that a solution exists.

72

II.3 Graphs and Networks



From your work in previous courses you should be able to

• State Ohm’s law for a resistor.

• State Kirchhoff’s laws for a resistor network.


• Be able to write down the incidence matrix of a directed graph, and to draw the graphgiven the incidence matrix.

• Define the Laplace operator or Laplacian for a graph and be able to write it down.

• When the edges of a graph represent resistors or batteries in a circuit, you should beable to

– interpret each of the four subspaces associated with the incidence matrix and theirdimensions in terms of voltage and current vectors, and verify their orthogonalityrelations.

– write down Ohm’s law for all the edges of the graph in matrix form using theLaplacian.

– express the connection between Kirchoff’s law and the nullspace of the Laplacian.

– use the voltage-to-current map to calculate the voltages and currents in the net-work when a battery is attached .

– use the voltage-to-current map to calculate the effective resistance between twonodes in the network.

• Re-order rows and columns of a matrix and extract submatrices in MATLAB/Octave.

II.3.1 Directed graphs and their incidence matrix

A directed graph is a collection of vertices (or nodes) connected by edges with arrows. Hereis a graph with 4 vertices and 5 edges.

73


1

2

3

4

2

3

45

1

Graphs come up in many applications. For example, the nodes could represent computersand the arrows internet connections. Or the nodes could be factories and the arrows representmovement of goods. We will mostly focus on a single interpretation where the edges representresistors or batteries hooked up in a circuit.

In this interpretation we will be assigning a number to each edge to indicate the amountof current flowing through that edge. This number can be positive or negative. The arrowsindicate the direction associated to a positive current.

The incidence matrix of a graph is an n×m matrix, where n is the number of edges and mis the number of vertices. We label the rows by the edges in the graph and the columns bythe vertices. Each row of the matrix corresponds to an edge in the graph. It has a −1 in theplace corresponding to the vertex where the arrow starts and a 1 in the place correspondingto the vertex where the arrow ends.

Here is the incidence matrix for the illustrated graph.

1 2 3 4

©1©2©3©4©5

−1 1 0 00 −1 1 00 0 −1 10 −1 0 11 0 0 −1

The columns of the matrix have the following interpretation. The column representinga given vertex has a +1 for each arrow coming in to that vertex and a −1 for each arrowleaving the vertex.

Given an incidence matrix, the corresponding graph can easily be drawn. What is thegraph for

−1 1 00 −1 11 0 −1

?

(Answer: a triangular loop.)

74


II.3.2 Nullspace and range of incidence matrix and its transpose

We now wish to give an interpretation of the fundamental subspaces associated with theincidence matrix of a graph. Let’s call the matrix D. In our example D acts on vectors

v ∈ R4 and produces a vector Dv in R

5. We can think of the vector v =

v1v2v3v4

as an

assignment of a voltage to each of the nodes in the graph. Then the vector Dv =

v2 − v1v3 − v2v4 − v3v4 − v2v1 − v4

assigns to each edge the voltage difference across that edge. The matrix D is similar to thederivative matrix when we studied finite difference approximations. It can be thought of asthe derivative matrix for a graph.

II.3.3 The null space N(D)

This is the set of voltages v for which the voltage differences in Dv are all zero. This meansthat any two nodes connected by an edge will have the same voltage. In our example, this

implies all the voltages are the same, so every vector in N(D) is of the form v = s

1111

for

some s. In other words, the null space is one dimensional with basis

1111

.

For a graph that has several disconnected pieces, Dv = 0 will force v to be constant oneach connected component of the graph. Each connected component will contribute one basisvector to N(D). This is the vector that is equal to 1 on that component and zero everywhereelse. Thus dim(N(D)) will be equal to the number of disconnected pieces in the graph.

II.3.4 The range R(D)

The range of D consists of all vectors b in R5 that are voltage differences, i.e., b = Dv for

some v. We know that the dimension of R(D) is 4 − dim(N(D)) = 4 − 1 = 3. So the set ofvoltage difference vectors must be restricted in some way. In fact a voltage difference vectorwill have the property that the sum of the differences around a closed loop is zero. In the

75


example the edges ©1 ,©4 ,©5 form a loop, so if b =

b1b2b3b4b5

is a voltage difference vector then

b1 + b4 + b5 = 0 We can check this directly in the example. Since b = Dv =

v2 − v1v3 − v2v4 − v3v4 − v2v1 − v4

we check that (v2 − v1) + (v4 − v2) + (v1 − v4) = 0. In the example graph there are threeloops, namely ©1 ,©4 ,©5 and ©2 ,©3 ,©4 and ©1 ,©2 ,©3 ,©5 . The corresponding equations thatthe components of a vector b must satisfy to be in the range of D are

b1 + b4 + b5 = 0

b2 + b3 − b4 = 0

b1 + b2 + b3 + b5 = 0

Notice the minus sign in the second equation corresponding to a backwards arrow. Howeverthese equations are not all independent, since the third is obtained by adding the first two.There are two independent equations that the components of b must satisfy. Since R(D) is3 dimensional, there can be no additional constraints.

Now we wish to find interpretations for the null space and the range of DT . Let y =

y1

y2

y3

y4

y5

be a vector in R5 which we interpret as being an assignment of a current to each edge in

the graph. Then DTy =

y5 − y1

y1 − y2 − y4

y2 − y3

y3 + y4 − y5

. This vector assigns to each node the amount of

current collecting at that node.

II.3.5 The null space N(DT )

This is the set of current vectors y ∈ R5 which do not result in any current building up

(or draining away) at any of the nodes. We know that the dimension of this space must be5 − dim(R(DT )) = 5 − dim(R(D)) = 5 − 3 = 2. We can guess at a basis for this space bynoting that current running around a loop will not build up at any of the nodes. The loop

vector

10011

represents a current running around the loop ©1 ,©4 ,©5. We can verify that this

76


vector lies in the null space of DT :

−1 0 0 0 11 −1 0 −1 00 1 −1 0 00 0 1 1 −1

10011

=

0000

The current vectors corresponding to the other two loops are

011−10

and

11101

. However

these three vectors are not linearly independent. Any choice of two of these vectors areindependent, and form a basis.

II.3.6 The range R(DT )

This is the set of vectors in R4 of the form

x1

x2

x3

x4

= DTy. With our interpretation these are

vectors which measure how the currents in y are building up or draining away from eachnode. Since the current that is building up at one node must have come from some othernodes, it must be that

x1 + x2 + x3 + x4 = 0

In our example, this can be checked directly. This one condition in R4 results in a three

dimensional subspace.

II.3.7 Summary and Orthogonality relations

The two subspaces R(D) and N(DT ) are subspaces of R5. The subspace N(DT ) contains all

linear combination of loop vectors, while R(D) contains all vectors whose dot product withloop vectors is zero. This verifies the orthogonality relation R(D) = N(DT )⊥.

The two subspaces N(D) and R(DT ) are subspaces of R4. The subspace N(D) contains

constant vectors, while R(DT ) contains all vectors orthogonal to constant vectors. Thisverifies the other orthogonality relation N(D) ⊥ R(DT ).

77


II.3.8 Resistors and the Laplacian

Now we suppose that each edge of our graph represents a resistor. This means that weassociate with the ith edge a resistance Ri. Sometimes it is convenient to use conductancesγi which are defined to be the reciprocals of the resistances, that is, γi = 1/Ri.

R

R

R

R

R1

2

3

4

2

3

5

4

1

If we begin by an assignment of voltage to every node, and put these numbers in a vectorv ∈ R

4. Then Dv ∈ R5 represents the vector of voltage differences for each of the edges.

Given the resistance Ri for each edge, we can now invoke Ohm’s law to compute the currentflowing through each edge. For each edge, Ohm’s law states that

(∆V )i = jiRi,

where (∆V )i is the voltage drop across the edge, ji is the current flowing through that edge,and Ri is the resistance. Solving for the current we obtain

ji = R−1i (∆V )i.

Notice that the voltage drop (∆V )i in this formula is exactly the ith component of the vectorDv. So if we collect all the currents flowing along each edge in a vector j indexed by theedges, then Ohm’s law for all the edges can be written as

j = R−1Dv

where

R =

R1 0 0 0 00 R2 0 0 00 0 R3 0 00 0 0 R4 00 0 0 0 R5

is the diagonal matrix with the resistances on the diagonal.

78


Finally, if we multiply j by the matrix DT the resulting vector

J = DT j = DTR−1Dv

has one entry for each node, representing the total current flowing in or out of that nodealong the edges that connect to it.

The matrixL = DTR−1D

appearing in this formula is called the Laplacian. It is similar to the second derivative matrixthat appeared when we studied finite difference approximations.

One important property of the Laplacian is symmetry, that is the fact that LT = L. Tosee this recall that the transpose of a product of matrices is the product of the transposes inreverse order ((ABC)T = CTBTAT ). This implies that

LT = (DTR−1D)T = DTR−1TD = L

Here we used that DT T= D and that R−1, being a diagonal matrix, satisfies R−1T

= R−1.

Let’s determine the entries of L. To start we consider the case where all the resistanceshave the same value 1 so that R = R−1 = I. In this case L = DTD. Let’s start with theexample graph above. Then

L =

−1 0 0 0 11 −1 0 −1 00 1 −1 0 00 0 1 1 −1

−1 1 0 00 −1 1 00 0 −1 10 −1 0 11 0 0 −1

=

2 −1 0 −1−1 3 −1 −1

0 −1 2 −1−1 −1 −1 3

Notice that the ith diagonal entry is the total number of edges connected to the ith node.The i, j entry is −1 if the ith node is connected to the jth node, and 0 otherwise.

This pattern describes the Laplacian L for any graph. To see this, write

D = [d1|d2|d3| · · · |dm]

Then the i, j entry of DTD is dTi dj. Recall that di has an entry of −1 for every edge leaving

the ith node, and a 1 for every edge coming in. So dTi di, the diagonal entries of DTD, are

the sum of (±1)2, with one term for each edge connected to the ith node. This sum givesthe total number of edges connected to the ith node. To see this in the example graph, let’sconsider the first node. This node has two edges connected to it and

d1 =

−10001

79


Thus the 1, 1 entry of the Laplacian is

dT1 d1 = (−1)2 + 12 = 2

On the other hand, if i 6= j then the vectors di and dj have a non-zero entry in the sameposition only if one of the edges leaving the ith node is coming in to the jth node or viceversa. For a graph with at most one edge connecting any two nodes (we usually assume this)this means that dT

i dj will equal −1 if the ith and jth nodes are connected by an edge, andzero otherwise. For example, in the graph above the first edge leaves the first node, so thatd1 has a −1 in the first position. This first edge comes in to the second node so d2 has a+1 in the first position. Otherwise, there is no overlap in these vectors, since no other edgestouch both these nodes. Thus

dT1 d2 =

[

−1 0 0 0 1]

1−1

0−1

0

= −1

What happens if the resistances are not all equal to one? In this case we must replaceD with R−1D in the calculation above. This multiplies the kth row of D with γk = 1/Rk.Making this change in the calculations above leads to the following prescription for calculatingthe entries of L. The diagonal entries are given by

Li,i =∑

k

γk

Where the sum goes over all edges touching the ith node. When i 6= j then

Li,j =

{

−γk if nodes i and j are connected with edge k

0 if nodes i and j are not connected

II.3.9 Kirchhoff’s law and the null space of L

Kirchhoff’s law states that currents cannot build up at any node. If v is the voltage vectorfor a circuit, then we saw that Lv is the vector whose ith entry is the total current buildingup at the ith node. Thus, for an isolated circuit that is not hooked up to any batteries,Kirchhoff’s law can be written as

Lv = 0

By definition, the solutions are exactly the vectors in the nullspace N(L) of L. It turns outthat N(L) is the same as N(D), which contains all constant voltage vectors. This is whatwe should expect. If there are no batteries connected to the circuit the voltage will be thesame everywhere and no current will flow.

80


To see N(L) = N(D) we start with a vector v ∈ N(D). Then Dv = 0 implies Lv =DTR−1Dv = DTR−10 = 0. This show that v ∈ N(L) too, that is, N(D) ⊆ N(L)

To show the opposite inclusion we first note that the matrix R−1 can be factored into aproduct of invertible matrices R−1 = R−1/2R−1/2 where R−1/2 is the diagonal matrix withdiagonal entries 1/

√Ri. This is possible because each Ri is a positive number. Also, since

R−1/2 is a diagonal matrix it is equal to its transpose, that is, R−1/2 = (R−1/2)T .

Now suppose that Lv = 0. This can be written DT (R−1/2)TR−1/2Dv = 0. Now wemultiply on the left with vT . This gives

vTDT (R−1/2)TR−1/2Dv = (R−1/2Dv)TR−1/2Dv = 0

But for any vector w, the number wTw is the dot product of w with itself which is equal tothe length of w squared. Thus the equation above can be written

‖R−1/2Dv‖2 = 0

This implies that R−1/2Dv = 0. Finally, since R−1/2 is invertible, this yields Dv = 0. Wehave shown that any vector in N(L) also is contained in N(D). Thus N(L) ⊆ N(D) andtogether with the previous inclusion this yields N(L) = N(D).

II.3.10 Connecting a battery

To see more interesting behaviour in a circuit, we pick two nodes and connect them to abattery. For example, let’s take our example circuit above and connect the nodes 1 and 2.

R

R

R

R

R

1

2

3

4

2

3

5

4

1

The terminals of a battery are kept at a fixed voltage. Thus the voltages v1 and v2 arenow known, say,

v1 = b1

v2 = b2

81


Of course, it is only voltage differences that have physical meaning, so we could set b1 = 0.Then b2 would be the voltage of the battery.

At the first and second nodes there now will be current flowing in and out from the battery.Let’s call these currents J1 and J2. At all the other nodes the total current flowing in andout is still zero, as before.

How are the equations for the circuit modified? For simplicity let’s set all the resistancesRi = 1. The new equations are

2 −1 0 −1−1 3 −1 −1

0 −1 2 −1−1 −1 −1 3

b1b2v3v4

=

J1

J2

00

Two of the voltages v1 and v2 have changed their role in these equations from being unknownsto being knowns. On the other hand, the first two currents, which were originally knownquantities (namely zero) are now unknowns.

Since the current flowing into the network should equal the current flowing out, we expect

that J1 = −J2. This follows from the orthogonality relations for L. The vector

J1

J2

00

is

contained in R(L). But R(L) = N(LT )⊥ = N(L)⊥ (since L = LT ). But we know that N(L)consists of all constant vectors. Hence

J1

J2

00

·

1111

= J1 + J2 = 0

To solve this system of equations we write it in block matrix form

[

A BT

B C

] [

b

v

]

=

[

J

0

]

where

A =

[

2 −1−1 3

]

B =

[

0 −1−1 −1

]

C =

[

2 −1−1 3

]

and

b =

[

b1b2

]

v =

[

v3v4

]

J =

[

J1

J2

]

0 =

[

00

]

Our system of equations can then be written as two 2 × 2 systems.

Ab +BTv = J

Bb + Cv = 0

82


We can solve the second equation for v. Since C is invertible

v = −C−1Bb

Using this value of v in the first equation yields

J = (A−BTC−1B)b

The matrix A−BTC−1B is the voltage-to-current map. In our example

A−BTC−1B = (8/5)

[

1 −1−1 1

]

In fact, for any circuit the voltage to current map is given by

A−BTC−1B = γ

[

1 −1−1 1

]

This can be deduced from two facts: (i) A−BTC−1B is symmetric and (ii) R(A−BTC−1B) =

span

([

1−1

])

. You are asked to carry this out in a homework problem.

Notice that this form of the matrix implies that if b1 = b2 then the currents are zero.

Another way of seeing this is to notice that if b1 = b2 then

[

b1b2

]

is orthogonal to the range

of A−BTC−1B by (ii) and hence in the nullspace N(A−BTC−1B).

The number

R =1

γ

is the ratio of the applied voltage to the resulting current, is the effective resistance of thenetwork between the two nodes.

So in our example circuit, the effective resistance between nodes 1 and 2 is 5/8.

If the battery voltages are b1 = 0 and b2 = b then the voltages at the remaining nodes are

[

v3v4

]

= −C−1B

[

0b

]

=

[

4/53/5

]

b

II.3.11 Two resistors in series

Let’s do a trivial example where we know the answer. If we connect two resistors in series,the resistances add, and the effective resistance is R1 +R2. The graph for this example lookslike

83


R

R

1

2

3

1

2

The Laplacian for this circuit is

L =

γ1 −γ1 0−γ1 γ1 + γ2 −γ2

0 −γ2 γ2

with γi = 1/Ri, as always. We want the effective resistance between nodes 1 and 3. Althoughit is not strictly necessary, it is easier to see what the submatrices A, B and C are if we reorderthe vertices so that the ones we are connecting, namely 1 and 3, come first. This reshufflesthe rows and columns of L yielding

1 3 2

132

γ1 0 −γ1

0 γ2 −γ2

−γ1 −γ2 γ1 + γ2

Here we have labelled the re-ordered rows and columns with the nodes they represent. Nowthe desired submatrices are

A =

[

γ1 00 γ2

]

B =[

−γ1 −γ2

]

C =[

γ1 + γ2

]

and

A−BTC−1B =

[

γ1 00 γ2

]

− 1

γ1 + γ2

[

γ21 γ1γ2

γ1γ2 γ22

]

=γ1γ2

γ1 + γ2

[

1 −1−1 1

]

This gives an effective resistance of

R =γ1 + γ2

γ1γ2=

1

γ1+

1

γ2= R1 +R2

as expected.

84


II.3.12 Example: a resistor cube

Hook up resistors along the edges of a cube. If each resistor has resistance Ri = 1, what isthe effective resistance between opposite corners of the cube?

4

7

5 6

8

3

1 2

We will use MATLAB/Octave to solve this problem. To begin we define the Laplacematrix L. Since each node has three edges connecting it, and all the resistances are 1, thediagonal entries are all 3. The off-diagonal entries are −1 or 0, depending on whether thecorresponding nodes are connected or not.

>L=[3 -1 0 -1 -1 0 0 0;-1 3 -1 0 0 -1 0 0;

0 -1 3 -1 0 0 -1 0;-1 0 -1 3 0 0 0 -1;

-1 0 0 0 3 -1 0 -1;0 -1 0 0 -1 3 -1 0;

0 0 -1 0 0 -1 3 -1;0 0 0 -1 -1 0 -1 3];

We want to find the effective resistance between 1 and 7. To compute the submatrices A, Band C it is convenient to re-order the nodes so that 1 and 7 come first. In MATLAB/Octave,this can be achieved with the following statement.

>L=L([1,7,2:6,8],[1,7,2:6,8]);

In this statement the entries in the first bracket [1,7,2:6,8] indicates the new orderingof the rows. Here 2:6 stands for 2,3,4,5,6. The second bracket indicates the re-orderingof the columns, which is the same as for the rows in our case.

Now it is easy to extract the submatrices A, B and C and compute the voltage-to-currentmap DN

>N = length(L);

>A = L(1:2,1:2);

>B = L(3:N,1:2);

>C = L(3:N,3:N);

>DN = A - B’*C^(-1)*B;

85


The effective resistance is the reciprocal of the first entry in DN. The command format rat

gives the answer in rational form. (Note: this is just a rational approximation to the floatingpoint answer, not an exact rational arithmetic as in Maple or Mathematica.)

>format rat

>R = 1/DN(1,1)

R = 5/6

86

Chapter III

Orthogonality

87

III Orthogonality

III.1 Projections



• write down the definition of an orthogonal projection matrix

• use propeties of a projection matrix to deduce facts like the orthogonality of the nullspace and range.

• compute the orthogonal projection matrix whose range in the span of a given collectionof vectors.

• use orthogonal projection matrices to decompose a vector in to components parallel toand perpendicular to a given subspace.

• use least squares to compute approximate solutions to systems of equations with nosolutions.

• perform least squares calculations in applications where overdetermined systems arise.

III.1.1 Projections onto lines and planes in R3

Recall the projection of a vector x onto a line containing the non-zero vector a is given byp = Px, where P is the projection matrix

P =1

‖a‖2aaT .

Let’s review why this formula is true, using properties of the dot product. Here is a diagramof the situation.

x

p

a

θ

88

III.1 Projections

The length of the projected vector p is

‖p‖ = ‖x‖ cos(θ) =‖a‖‖x‖ cos(θ)

‖a‖ =a · x‖a‖ =

aTx

‖a‖

To get the vector p start with the unit vector a/‖a‖ and stretch it by an amount ‖p‖. Thisgives

p = ‖p‖ a

‖a‖ =1

‖a‖2aaTx

This can be written p = Px where P is the projection matrix above.

Notice that the matrix P satisfies P 2 = P since

P 2 =1

‖a‖4aaTaaT =

1

‖a‖4a‖a‖2aT 1

‖a‖2aaT = P

In addition, P T = P since

P T =1

‖a‖2(aaT )T =

1

‖a‖2(aT )TAT =

1

‖a‖2aaT = P

We will discuss the significance of the two properties P 2 = P and P T = P below.

Example: What is the projection of x =

111

in the direction of a =

12

−1

? Let’s calculate

the projection matrix P and compute Px and verify that P 2 = P and P T = P .

>x = [1 1 1]’;

>a = [1 2 -1]’;

>P = (a’*a)^(-1)*a*a’

P =

0.16667 0.33333 -0.16667

0.33333 0.66667 -0.33333

-0.16667 -0.33333 0.16667

>P*x

ans =

0.33333

0.66667

-0.33333

>P*P

89

III Orthogonality

ans =

0.16667 0.33333 -0.16667

0.33333 0.66667 -0.33333

-0.16667 -0.33333 0.16667

The projection of x on to the plane orthogonal to a is given by q = x− p.

a

qp

x

Thus we can writeq = x − Px = (I − P )x = Qx

whereQ = I − P

Notice that, like the matrix P , the matrix Q also satisfies Q2 = Q and QT = Q since

Q2 = (I − P )(I − P ) = I − 2P + P 2 = I − P = Q

andQT = IT − P T = I − P = Q.

Continuing with the example above, if we want to compute the projection matrix onto theplane perpendicular to a we compute Q = I − P . Then Qx is the projection of x onto theplane. We can also check that Q2 = Q.

> Q = eye(3) - P

Q =

0.83333 -0.33333 0.16667

-0.33333 0.33333 0.33333

0.16667 0.33333 0.83333

90

III.1 Projections

>Q*x

ans =

0.66667

0.33333

1.33333

>Q^2

ans =

0.83333 -0.33333 0.16667

-0.33333 0.33333 0.33333

0.16667 0.33333 0.83333

III.1.2 Orthogonal Projections

Any matrix P satisfying P 2 = P is called a projection matrix. If, in addition P T = P thenP is P is called an orthogonal projection. (Warning: this is a different concept than that ofan orthogonal matrix which we will see later.)

The property P 2 = P says that any vector in the range of P is not changed by P , sinceP (Px) = P 2x = Px.

The property P T = P implies that N(P ) = R(P )⊥. This follows from the orthogonalityrelation N(P ) = R(P T )⊥

If P is an orthgonal projection, so is Q = I − P , as you can check. Clearly P + Q = I.Also

PQ = P (I − P ) = P − P 2 = P − P = 0

and similarly QP = 0. These formulas show that R(P ) = N(Q). To see this, first noticethat if x ∈ R(P ), so that x = Px, then Qx = QPx = 0, which means x ∈ N(Q). Converselyif x ∈ N(Q) then x = (P +Q)x = Px ∈ R(P ).

As a consequence (since N(Q) = R(QT )⊥ = R(Q)⊥) we see that the ranges of P and Qare orthogonal complements, that is, R(P ) = R(Q)⊥.

In the example of the last section R(P ) = N(Q) is the line through a while N(P ) = R(Q)is the plane orthogonal to a.

The orthgonality of Pa and Qb implies that

‖Pa +Qb‖2 = (Pa +Qb) · (Pa +Qb) = ‖Pa‖2 + ‖Qb‖2

91

III Orthogonality

since the cross terms vanish.

Let P be an orthogonal projection. Let’s show that given any vector x, the vector Px isthe vector in R(P ) that is closest to x. First we compute the square of the distance from Px

to x. This is given by‖Px − x‖2 = ‖Qx‖2

Now let Py be any other vector in the range R(P ). Then the square of the distance fromPy to x is

‖Py − x‖2 = ‖Py − (P +Q)x‖2 = ‖P (y − x) +Qx‖2 = ‖P (y − x)‖2 + ‖Qx‖2

This implies that ‖Py − x‖2 ≥ ‖Qx‖2 = ‖Px − x‖2, or equivalently ‖Py − x‖ ≥ ‖Px − x‖,with equality exactly when Px = Py.

III.1.3 Least squares solutions

We now consider linear equationsAx = b

That do not have a solution. This is the same as saying that b 6∈ R(A) What vector x isclosest to being a solution?

R(A) = possible values of Ax

Ax−bAx

b

We want to determine x so that Ax is as close as possible to b. In other words, we wantto minimize ‖Ax − b‖. This will happen when Ax is the projection of b onto R(A), thatis, Ax = Pb, where P is the projection matrix. In this case Qb = (I − P )b is orthogonalto R(A). But (I − P )b = b − Ax. Therefore (and this is also clear from the picture), wesee that Ax − b is orthogonal to R(A). But the vectors orthogonal to R(A) are exactly thevectors in N(AT ). Thus the vector we are looking for will satisfy AT (Ax − b) = 0 or theequation

ATAx = ATb

This is the least squares equation, and a solution to this equation is called a least squaressolution.

(Aside: We can also use Calculus to derive the least squares equation. We want to minimize‖Ax − b‖2. Computing the gradient and setting it to zero results in the same equations.)

92

III.1 Projections

It turns out that the least squares equation always has a solution. Another way of sayingthis is R(AT ) = R(ATA). Instead of checking this, we can verify that the orthogonal com-plements N(A) and N(ATA) are the same. But this is something we showed before, whenwe considered the incidence matrix D for a graph.

If x solves the least squares equation, the vector Ax is the projection of b onto the rangeR(A), since Ax is the closest vector to x in the range of A. In the case where ATA isinvertible (this happens when N(A) = N(ATA) = {0}), we can obtain a formula for theprojection. Starting with the least squares equation we multiply by (ATA)−1 to obtain

x = (ATA)−1ATb

so thatAx = A(ATA)−1ATb.

Thus the projection matrix is given by

P = A(ATA)−1AT

Notice that the formula for the projection onto a line through a is a special case of this, sincethen ATA = ‖a‖2.

It is worthwhile pointing out that if we say that the solution of the least squares equationgives the “best” approximation to a solution, what we really mean is that it minimizes thedistance, or equivalently, its square

‖Ax − b‖2 =∑

((Ax)i − bi)2.

There are other ways of measuring how far Ax is from b, for example the so-called L1 norm

‖Ax − b‖1 =∑

|(Ax)i − bi|

Minimizing the L1 norm will result in a different “best” solution, that may be preferableunder some circumstances. However, it is much more difficult to find!

III.1.4 Polynomial fit

Suppose we have some data points (x1, y1), (x2, y2), . . . , (xn, yn) and we want to fit a polyno-mial p(x) = a1x

m−1 + a2xm−2 + · · · + am−1x+ am through them. This is like the Lagrangeinterpolation problem we considered before, except that now we assume that n > m. Thismeans that in general there will no such polynomial. However we can look for the leastsquares solution.

To begin, let’s write down the equations that express the desired equalities p(xi) = yi fori = 1, . . . m. These can be written in matrix form

93

III Orthogonality

xm−11 xm−2

1 · · · x1 1

xm−12 xm−2

2 · · · x2 1...

... · · · ......

...... · · · ...

......

... · · · ......

xm−1n xm−2

n · · · xn 1

a1

a2...am

=

y1

y2.........yn

or Aa = y. where A is a submatrix of the Vandermonde matrix. To find the least squaresapproximation we solve ATAa = ATy. In a homework problem, you are asked to do thisusing MATLAB/Octave.

In case where the polynomial has degree one this is a straight line fit, and the equation wewant to solve are

x1 1x2 1...xn 1

[

a1

a2

]

=

y1

y2...yn

These equations will not have a solution (unless the points really do happen to lie on thesame line.) To find the least squares solution, we compute

[

x1 x2 · · · xn

1 1 · · · 1

]

x1 1x2 1...xn 1

=

[∑

x2i

∑

xi∑

xi n

]

and

[

x1 x2 · · · xn

1 1 · · · 1

]

y1

y2...yn

=

[∑

xiyi∑

yi

]

This results in the familiar equations

[∑

x2i

∑

xi∑

xi n

] [

a1

a2

]

=

[∑

xiyi∑

yi

]

which are easily solved.

III.1.5 Football rankings

We can try to use least squares to rank football teams. To start with, suppose we have threeteams. We pretend each team has a value v1, v2 and v3 such that when two teams play, the

94

III.1 Projections

difference in scores is the difference in values. So, if the season’s games had the followingresults

1 vs. 2 30 401 vs. 2 20 402 vs. 3 10 03 vs. 1 0 53 vs. 2 5 5

then the vi’s would satisfy the equations

v2 − v1 = 10

v2 − v1 = 20

v2 − v3 = 10

v1 − v3 = 5

v2 − v3 = 0

Of course, there is no solution to these equations. Nevertheless we can find the least squaressolution. The matrix form of the equations is

Dv = b

with

D =

−1 1 0−1 1 0

0 1 −1−1 0 10 1 −1

b =

10201050

The least squares equation isDTDv = DTv

or

3 −2 −1−2 4 −2−1 −2 3

v =

−3540−5

Before going on, notice that D is an incidence matrix. What is the graph? (Answer: thenodes are the teams and they are joined by an edge with the arrow pointing from the losingteam to the winning team. This graph may have more than one edge joining to nodes, iftwo teams play more than once. This is sometimes called a multi-graph.). We saw that inthis situation N(D) is not empty, but contains vectors whose entries are all the same. Thesituation is the same as for resistances, it is only differences in vi’s that have a meaning.

We can solve this equation in MATLAB/Octave. The straightforward way is to compute

>L = [3 -2 -1;-2 4 -2;-1 -2 3];

95

III Orthogonality

>b = [-35; 40; -5];

>rref([L b])

ans =

1.00000 0.00000 -1.00000 -7.50000

0.00000 1.00000 -1.00000 6.25000

0.00000 0.00000 0.00000 0.00000

As expected, the solution is not unique. The general solution, depending on the parameters is

v = s

111

+

−7.56.250

We can choose s so that the vi for one of the teams is zero. This is like grounding a node

in a circuit. So, by choosing s = 7.5, s = −6.25 and s = 0 we obtain the solutions

013.757.5

,

−13.750

−6.25

or

−7.56.250

.

Actually, it is easier to compute a solution with one of the vi’s equal to zero directly. If

v =

0v2v3

then v2 =

[

v2v3

]

satisfies the equation L2v2 = b2 where the matrix L2 is the bottom

right 2 × 2 block of L and b2 are the last two entries of b.

>L2 = L(2:3,2:3);

>b2 = b(2:3);

>L2\b2

ans =

13.7500

7.5000

We can try this on real data. The football scores for the 2007 CFL season can be found athttp://www.cfl.ca/index.php?module=sked&func=view&year=2007. The differences inscores for the first 20 games are in cfl.m. The order of the teams is BC, Calgary, Edmonton,Hamilton, Montreal, Saskatchewan, Toronto, Winnipeg. Repeating the computation abovefor this data we find the ranking to be (running the file cfl.m)

96

III.1 Projections

v =

0.00000

-12.85980

-17.71983

-22.01884

-11.37097

-1.21812

0.87588

-20.36966

Not very impressive, if you consider that the second-lowest ranked team (Winnipeg) endedup in the Grey Cup game!

97

III Orthogonality

III.2 Orthonormal bases and Orthogonal Matrices



• write down the definition of an orthonormal basis.

• compute the coefficients in the expansion of an orthonormal basis

• compute the norm of a vector from its coefficients in an orthonormal basis

• write down the definition of an orthogonal matrix

• recognize an orthogonal matrix by its rows or columns

• know how to characterize an orthogonal matrix by its action on vectors

III.2.1 Orthonormal bases

A basis q1,q2, . . . is called orthonormal if

1. ‖qi‖ = 1 for every i (normal)

2. qi · qj = 0 for i 6= j (ortho).

The standard basis for Rn given by

e1 =

100...

, e2 =

010...

, e3 =

001...

, · · ·

is an orthonormal basis for Rn. Another orthonormal basis for R

2 is

q1 =1√2

[

11

]

, q1 =1√2

[

−11

]

If you expand a vector in an orthonormal basis, it’s very easy to find the coefficients in theexpansion. Suppose

v = c1q1 + c2q2 + · · · + cnqn

98

III.2 Orthonormal bases and Orthogonal Matrices

for some orthonormal basis q1,q2, . . .. Then, if we take the dot product of both sides withqk, we get

qk · v = c1qk · q1 + c2qk · q2 + · · · + ckqk · qk · · · + cnqk · qn

= 0 + 0 + · · · + ck + · · · + 0

= ck

This gives a convenient formula for each ck. For example, in the expansion

[

12

]

= c11√2

[

11

]

+ c21√2

[

−11

]

we have

c1 =1√2

[

11

]

·[

12

]

=3√2

c2 =1√2

[

−11

]

·[

12

]

=1√2

Notice also that the norm of v is easily expressed in terms of the coefficients ci. We have

‖v‖2 = v · v= (c1q1 + c2q2 + · · · + cnqn) · (c1q1 + c2q2 + · · · + cnqn)

= c21 + c22 + · · · + c2n

Another way of saying this is that the vector c = [c1, c2, . . . , cn] of coefficients has the samenorm as v.

III.2.2 Orthogonal matrices

An n × n matrix Q is called orthogonal if QTQ = I (equivalently if QT = Q−1). If thecolumns of Q are q1,q2, . . . ,qn then Q is orthogonal if

QTQ =

qT1

qT2...

qTn

[

q1 q2 · · · qn

]

=

q1 · q1 q1 · q2 · · · q1 · qn

q2 · q1 q2 · q2 · · · q2 · qn...

......

...qn · q1 qn · q2 · · · qn · qn

=

1 0 · · · 00 1 · · · 0...

......

...0 0 · · · 1

.

This is the same as saying that the columns of Q form an orthonormal basis.

Another way of recognizing orthogonal matrices is by their action on vectors. Suppose Qis orthogonal. Then

‖Qv‖2 = (Qv) · (Qv) = v · (QTQv) = v · v = ‖v‖2

99

III Orthogonality

This implies that ‖Qv‖ = ‖v‖. In other words, orthogonal matrices don’t change the lengthsof vectors.

The converse is also true. If a matrix Q doesn’t change the lengths of vectors then it mustbe orthogonal. To see this, suppose that ‖Qv‖ = ‖v‖ for every v. Then the calculationabove shows that v · (QTQv) = v · v for every v. Applying this to v + w we find

(v + w) ·(

QTQ(v + w))

= (v + w) · (v + w)

Expanding, this gives

v · (QTQv) + w · (QTQw) + v · (QTQw) + w · (QTQv) = v · v + w · w + v ·w + w · v

Since v · (QTQv) = v · v and w · (QTQw) = w · w we can cancel these terms. Alsow · (QTQv) = ((QTQ)Tw) ·v = (QTQw) ·v = v · (QTQw). So on each side of the equation,the two remaining terms are the same. Thus

v · (QTQw) = v ·w

This equation holds for every choice of vectors v and w. If v = ei and w = ej then the leftside is the i, jth matrix element Qi,j of Q while the right side is the ei · ej, which is i, jthmatrix element of the identity matrix. Thus QTQ = I and Q is orthogonal.

We can recast the problem of finding coefficients c1, c2, . . . , cn in the expansion v = c1q1 +c2q2 + · · · + cnqn in an orthonormal basis as the solution of the matrix equation Qc = v

where Q is the orthogonal matrix whose columns contain the orthonormal basis vectors. Thesolution is obtained by multiplying by QT . Since QTQ = I multiplying both sides by QT

gives c = QTv. The fact that ‖c‖ = ‖v‖ follows from the length preserving property oforthogonal matrices.

Recall that for square matrices a left inverse is automatically also a right inverse. So ifQTQ = I then QQT = I too. This means that QT is an orthogonal matrix whenever Qis. This proves the (non-obvious) fact that if the columns of an square matrix form anorthonormal basis, then so do the rows!

A set G of invertible matrices is called a (matrix) group if

1. I ∈ G (G contains the identity matrix)

2. If A,B ∈ G then AB ∈ G (G is closed under matrix multiplication)

3. If A ∈ G then A is invertible and A−1 ∈ G (G is closed under taking the inverse)

In a homework problem, you are asked to verify that the set of n× n orthogonal matricesis a group.

100

III.3 Complex vector spaces and inner product




• perform arithmetic with complex numbers.

• write down the definition of complex conjugate, modulus and argument of a complexnumber

• write down the definition of the complex exponential, addition formula, differentiationand integration of complex exponentials.


• write down the definition of complex inner product and the norm of a complex vector

• write down the definition of the matrix adjoint, its relation to the complex inner prod-uct.

• write down the definition of a unitary matrix and its properties.

• use complex numbers in MATLAB/Octave computations, specifically real(z), imag(z),conj(z), abs(z), exp(z) and A’ for complex matrices.

III.3.1 Review of complex numbers

Complex numbers can be thought of as points on the (x, y) plane. The point

[

xy

]

, thought

of as a complex number, is written x+ iy (or x+ jy if you are an electrical engineer).

If z = x+ iy then x is called the real part of z and y is called the imaginary part of z.

Complex numbers are added just as if they were vectors in two dimensions. If z = x+ iyand w = s+ it, then

z + w = (x+ iy) + (s+ it) = (x+ s) + i(y + t)

To multiply two complex numbers, just remember that i2 = −1. So if z = x + iy andw = s+ it, then

zw = (x+ iy)(s+ it) = xs+ i2yt+ iys+ ixt = (xs− yt) + i(xt+ ys)

101

III Orthogonality

The modulus of a complex number, denoted |z| is simply the length of the correspondingvector in two dimensions. If z = x+ iy

|z| = |x+ iy| =√

x2 + y2

An important property is|zw| = |z||w|

The complex conjugate of a complex number z, denoted z, is the reflection of z across thex axis. Thus x+ iy = x− iy. Thus complex conjugate is obtained by changing all the i’s to−i’s. We have

zw = zw

andzz = |z|2

This last equality is useful for simplifying fractions of complex numbers by turning thedenominator into a real number, since

z

w=

zw

|w|2

For example, to simplify (1 + i)/(1 − i) we can write

1 + i

1 − i=

(1 + i)2

(1 − i)(1 + i)=

1 − 1 + 2i

2= i

A complex number z is real (i.e. the y part in x+ iy is zero) whenever z = z. We also havethe following formulas for the real and imaginary part. If z = x+ iy then x = (z + z)/2 andy = (z − z)/(2i)

We define the exponential, eit, of a purely imaginary number it to be the number

eit = cos(t) + i sin(t)

lying on the unit circle in the complex plane.

The complex exponential satisfies the familiar rule ei(s+t) = eiseit since by the additionformulas for sine and cosine

ei(s+t) = cos(s+ t) + i sin(s+ t)

= cos(s) cos(t) − sin(s) sin(t) + i(sin(s) cos(t) + cos(s) sin(t))

= (cos(s) + i sin(s))(cos(t) + i sin(t))

= eiseit

The exponential of a number that has both a real and imaginary part is defined in thenatural way.

ea+ib = eaeib = ea(cos(b) + i sin(b))

102


The derivative of a complex exponential is given by the formula

d

dte(a+ib)t = (a+ ib)e(a+ib)t

while the anti-derivative, for (a+ ib) 6= 0 is

∫

e(a+ib)tdt =1

(a+ ib)e(a+ib)t +C

If (a+ ib) = 0 then e(a+ib)t = e0 = 1 so in this case

∫

e(a+ib)tdt =

∫

dt = t+ C

III.3.2 Complex vector spaces and inner product

So far in this course, our scalars have been real numbers. We now want to allow complexnumbers. The basic example of a complex vector space is the space C

n of n-tuples of complexnumbers. Vector addition and scalar multiplication are defined as before:

z1z2...zn

+

w1

w2...wn

=

z1 + w1

z2 + w2...

zn + wn

, s

z1z2...zn

=

sz1sz2...szn

,

where now zi, wi and s are complex numbers.

For complex matrices (or vectors) we define the complex conjugate matrix (or vector) byconjugating each entry. Thus, if A = [ai,j ], then

A = [ai,j ].

The product rule for complex conjugation extends to matrices and we have

AB = AB

The inner product of two complex vectors w =

w1

w2...wn

and z =

z1z2...zn

is defined by

〈w, z〉 = wTz =

n∑

i=1

wizi

103

III Orthogonality

With this definition the norm of z is always positive since

〈z, z〉 = ‖z‖2 =

n∑

i=1

|zi|2

For complex matrices and vectors we have to modify the rule for bringing a matrix to theother side of an inner product.

〈w, Az〉 = wTAz

= (ATw)T z

=

(

(ATw)

)T

z

= 〈ATw, z〉

This leads to the definition of the adjoint of a matrix

A∗ = AT.

(In physics you will also see the notation A†.) With this notation 〈w, Az〉 = 〈A∗w, z〉.

The complex analogue of an orthogonal matrix is called a unitary matrix. A unitary matrixU is a square matrix satisfying

U∗U = UU∗ = I.

Notice that a unitary matrix with real entries is an orthogonal matrix since in that caseU∗ = UT . The columns of a unitary matrix form an orthonormal basis (with respect to thecomplex inner product.)

MATLAB/Octave deals seamlessly with complex matrices and vectors. Complex numberscan be entered like this

>z= 1 + 2i

z = 1 + 2i

There is a slight danger here in that if i has be defined to be something else (e.g. i =16)then z=i would set z to be 16. You could use z=1i to get the desired result, or use thealternative syntax

>z= complex(0,1)

z = 0 + 1i

104


The functions real(z), imag(z), conj(z), abs(z) compute the real part, imaginary part,conjugate and modulus of z.

The function exp(z) computes the complex exponential if z is complex.

If a matrix A has complex entries then A’ is not the transpose, but the adjoint (conjugatetranspose).

>z = [1; 1i]

z =

1 + 0i

0 + 1i

z’

ans =

1 - 0i 0 - 1i

Thus the square of the norm of a complex vector is given by

>z’*z

ans = 2

This gives the same answer as

>norm(z)^2

ans = 2.0000

(Warning: the function dot in Octave does not compute the correct inner product for com-plex vectors (it doesn’t take the complex conjugate). This is probably a bug. In MATLABdot works correctly for complex vectors.)

105

III Orthogonality

III.3.3 Vector spaces of complex-valued functions

Let [a, b] be an interval on the real line. Recall that we introduced the vector space of realvalued functions defined for x ∈ [a, b]. The vector sum f + g of two functions f and g wasdefined to be the function you get by adding the values, that is, (f + g)(x) = f(x) + g(x)and the scalar multiple sf was defined similarly by (sf)(x) = sf(x).

In exactly the same way, we can introduce a vector space of complex valued functions.The independent variable x is still real, taking values in [a, b]. But now the values f(x) ofthe functions may be complex. Examples of complex valued functions are f(x) = x+ ix2 orf(x) = eix = cos(x) + i sin(x).

Now we introduce the inner product of two complex valued functions on [a, b]. In analogywith the inner product for complex vectors we define

〈f, g〉 =

∫ b

af(x)g(x)dx

and the assoicated norm defined by

‖f‖2 = 〈f, f〉 =

∫ b

a|f(x)|2dx

For real valued functions we can ignore the complex conjugate.

Example: the inner product of f(x) = 1 + ix and g(x) = x2 over the interval [0, 1] is

〈1 + ix, x2〉 =

∫ 1

0(1 + ix) · x2dx =

∫ 1

0(1 − ix) · x2dx =

∫ 1

0x2 − ix3dx =

1

3− i

1

4

It will often happen that a function, like f(x) = x is defined for all real values of x. In thiscase we can consider inner products and norms for any interval [a, b] including semi-infiniteand infinite intervals, where a may be −∞ or b may be +∞. Of course the values of theinner product an norm depend on the choice of interval.

There are technical complications when dealing with spaces of functions. In this course wewill deal with aspects of the subject where these complications don’t play an important role.However, it is good to aware that they exist, so we will mention a few.

One complication is that the integral defining the inner product may not exist. For examplefor the interval (−∞,∞) = R the norm of f(x) = x is infinite since

∫ ∞

−∞|x|2dx = ∞

Even if the interval is finite, like [0, 1], the function might have a spike. For example, iff(x) = 1/x then

∫ 1

0

1

|x|2 dx = ∞

106


too. To overcome this complication we agree to restrict our attention to square integrablefunctions. For any interval [a, b], these are the functions f(x) for which |f(x)|2 is integrable.They form a vector space that is usually denoted L2([a, b]). It is an example of a Hilbert spaceand is important in Quantum Mechanics. The L in this notation indicates that the integralsshould be defined as Lebesgue integrals rather than as Riemann integrals usually taught inelementary calculus courses. This plays a role when discussing convergence theorems. Butfor any functions that come up in this course, the Lebesgue integral and the Riemann integralwill be the same.

The question of convergence is another complication that arises in infinite dimensionalvector spaces of functions. When discussing infinite orthonormal bases, infinite linear com-binations of vectors (functions) will appear. There are several possible meanings for anequation like

∞∑

i=0

ciφi(x) = φ(x).

since we are talking about convergence of an infinite series of functions. The most obviousinterpretation is that for every fixed value of x the infinite sum of numbers on the left handside equals the number on the right.

Here is another interpretation: the difference of φ and the partial sums∑N

i=0 ciφi tends tozero when measured in the L2 norm, that is

limN→∞

‖N∑

i=0

ciφi − φ‖ = 0

With this definition, it might happen that there are individual values of x where the firstequation doesn’t hold. This is the meaning that we will give to the equation.

107

III Orthogonality

III.4 Fourier series



• compute the Fourier series (in real and complex form) of a function defined on theinterval [0, L]

• interpret each of these series as the expansion of a function (vector) in an infiniteorthogonal basis.

• use Parseval’s formula to sum certain infinite series

• use MATLAB/Octave to plot the partial sums of Fourier series.

• explain what an amplitude-frequency plot is and be able to compute it in examples.

III.4.1 An infinite orthonormal basis for L2([a, b])

Let [a, b] be an interval of length L = b− a. For every integer n, define the function

en(x) = e2πinx/L.

Then infinite collection of functions

{. . . , e−2, e−1, e0, e1e2, . . .}

forms an orthonormal basis for the space L2([a, b]), except that each function en has norm√L instead of 1. (Since this is the usual normalization, we will stick with it. To get a true

orthonormal basis, we must divide each function by√L.)

Let’s verify that these function form an orthonormal set (scaled by√L). To compute the

norm we calculate

‖en‖2 = 〈en, en〉 =

∫ b

ae2πinx/Le2πinx/Ldx

=

∫ b

ae−2πinx/Le2πinx/Ldx

=

∫ b

a1dx.

= L

108


This shows that ‖en‖ =√L for every n. Next we check that if n 6= m then en and em are

orthogonal.

〈en, em〉 =

∫ b

ae−2πinx/Le2πimx/Ldx

=

∫ b

ae2πi(m−n)x/Ldx

=L

2πi(m − n)e2πi(m−n)x/L

∣

∣

∣

b

x=a

=L

2πi(m − n)

(

e2πi(m−n)b/L − e2πi(m−n)a/L)

= 0

Here we used that e2πi(m−n)b/L = e2πi(m−n)(b−a+a)/L = e2πi(m−n)e2πi(m−n)a/L = e2πi(m−n)a/L.This shows that the functions {. . . , e−2, e−1, e0, e1e2, . . .} form an orthonormal set (scaled by√L).

To show these functions form a basis we have to verify that they span the space L2[a, b].In other words, we must show that any function f ∈ L2[a, b] can be written as an infinitelinear combination

f(x) =∞∑

n=−∞cnen(x) =

∞∑

n=−∞cne

2πinx/L.

This is a bit tricky, since it involves infinite series of functions. For a finite dimensional space,to show that an orthogonal set forms a basis, it suffices to count that there are the samenumber of elements in an orthogonal set as there are dimensions in the space. For an infinitedimensional space this is no longer true. For example, the set of en’s with n even is also aninfinite orthonormal set, but it doesn’t span all of L2[a, b].

In this course, we will simply accept that it is true that {. . . , e−2, e−1, e0, e1e2, . . .} spanL2[a, b]. Once we accept this fact, it is very easy to compute the coefficients in a Fourierexpansion. The procedure is the same as in finite dimensions. Starting with

f(x) =

∞∑

n=−∞cnen(x)

we simply take the inner product of both sides with em. The only term in the infinite sumthat survives is the one with n = m. Thus

〈em, f〉 =

∞∑

n=−∞cn〈em, en〉 = cmL

and we obtain the formula

cm =1

L

∫ b

ae−2πimx/Lf(x)dx

109

III Orthogonality

III.4.2 Real form of the Fourier series

Fourier series are often written in terms of sines and cosines as

f(x) =a0

2+

∞∑

n=1

(an cos(2πnx/L) + bn sin(2πnx/L))

To obtain this form, recall that

e±2πinx/L = cos(2πnx/L) ± i sin(2πnx/L)

Using this formula we find

∞∑

n=−∞cne

2πnx/L = c0 +∞∑

n=1

cne2πnx/L +

∞∑

n=1

c−ne−2πnx/L

= c0 +

∞∑

n=1

cn(cos(2πnx/L) + i sin(2πnx/L)) +

∞∑

n=1

c−n(cos(2πnx/L) − i sin(2πnx/L))

= c0 +

∞∑

n=1

((cn + c−n) cos(2πnx/L) + i(cn − c−n) sin(2πnx/L)))

Thus the real form of the Fourier series holds with

a0 = 2c0

an = cn + c−n for n > 0

bn = icn − ic−n for n > 0.

Equivalently

c0 =a0

2

cn =an

2+bn2i

for n > 0

cn =a−n

2− b−n

2ifor n < 0.

The coefficients an and bn in the real form of the Fourier series can also be obtained directly.The set of functions

{1/2, cos(2πx/L), cos(4πx/L), cos(6πx/L), . . . , sin(2πx/L), sin(4πx/L), sin(6πx/L), . . .}

also form an orthogonal basis where each vector has norm√

L/2. This leads to the formulas

an =2

L

∫ b

acos(2πnx/L)f(x)

110


for n = 0, 1, 2, . . . and

bn =2

L

∫ b

asin(2πnx/L)f(x)

for n = 1, 2, . . .. The desire to have the formula for an work out for n = 0 is the reason fordividing by 2 in the constant term a0/2 in the real form of the Fourier series.

One advantage of the real form of the Fourier series is that if f(x) is a real valued function,then the coefficients an and bn are real too, and the Fourier series doesn’t involve any complexnumbers. However, it is often to calculate the coefficients cn because exponentials are easierto integrate than sines and cosines.

III.4.3 An example

Let’s compute the Fourier coefficients for the square wave function. In this example L = 1.

f(x) =

{

1 if 0 ≤ x ≤ 1/2−1 if 1/2 < x ≤ 1

If n = 0 then e−i2πnx = e0 = 1 so c0 is simply the integral of f .

c0 =

∫ 1

0f(x)dx =

∫ 1/2

01dx−

∫ 1

1/21dx = 0

Otherwise, we have

cn =

∫ 1

0e−i2πnxf(x)dx

=

∫ 1/2

0e−i2πnxdx−

∫ 1

1/2e−i2πnxdx

=e−i2πnx

−i2πn∣

∣

∣

x=1/2

x=0− e−i2πnx

−i2πn∣

∣

∣

x=1

x=1/2

=2 − 2eiπn

2πin

=

{

0 if n is even2/iπn if n is odd

Thus we conclude that

f(x) =

∞∑

n=−∞

n odd

2

iπnei2πnx

To see how well this series is approximating f(x) we go back to the real form of the series.Using an = cn + c−n and bn = icn − ic−n we find that an = 0 for all n, bn = 0 for n even and

111

III Orthogonality

bn = 4/πn for n odd. Thus

f(x) =

∞∑

n=1

n odd

4

πnsin(2πnx) =

∞∑

n=0

4

π(2n + 1)sin(2π(2n + 1)x)

We can use MATLAB/Octave to see how well this series is converging. The file ftdemo1.mcontains a function that take an integer N as an argument and plots the sum of the first2N + 1 terms in the Fourier series above. Here is a listing:

function ftdemo1(N)

X=linspace(0,1,1000);

F=zeros(1,1000);

for n=[0:N]

F = F + 4*sin(2*pi*(2*n+1)*X)/(pi*(2*n+1));

end

plot(X,F)

end

Here are the outputs for N = 0, 1, 2, 10, 50:

-1.5

-1

-0.5

0

0.5

1

1.5

-0.2 0 0.2 0.4 0.6 0.8 1 1.2-1.5

-1

-0.5

0

0.5

1

1.5

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

-1.5

-1

-0.5

0

0.5

1

1.5

-0.2 0 0.2 0.4 0.6 0.8 1 1.2-1.5

-1

-0.5

0

0.5

1

1.5

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

112


-1.5

-1

-0.5

0

0.5

1

1.5

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

III.4.4 Parseval’s formula

If v1, v2, . . . , vn is an orthonormal basis in a finite dimensional vector space and the vectorv has the expansion

v = c1v1 + · · · + cnvn =

n∑

i=1

civi

then, taking the inner product of v with itself, and using the fact that the basis is orthonor-mal, we obtain

〈v,v〉 =n∑

i=1

n∑

j=1

cicj〈vi,vj〉 =n∑

i=1

|ci|2

The same formula is true in Hilbert space. If

f(x) =

∞∑

n=−∞cnen(x)

Then∫ 1

0|f(x)|2dx = 〈f, f〉 =

∞∑

n=−∞|cn|2

In the example above, we have 〈f, f〉 =∫ 10 1dx = 1 so we obtain

1 =n=∞∑

n=−∞

n odd

4

π2n2= 2

n=∞∑

n=0

n odd

4

π2n2=

8

π2

∞∑

n=0

1

(2n + 1)2

or ∞∑

n=0

1

(2n + 1)2=π2

8

III.4.5 Interpretation of Fourier series

What is the meaning of a Fourier series in a practical example? Consider the sound made bya musical instrument in a time interval [0, T ]. This sound can be represented by a function

113

III Orthogonality

y(t) for t ∈ [0, T ], where y(t) is the air pressure at a point in space, for example, at youreardrum.

A complex exponential e2πiωt = cos(2πωt) ± i sin(2πωt) can be thought of as a pure oscil-lation with frequency ω. It is a periodic function whose values are repeated when t increasesby ω−1. If t has units of time (seconds) then ω has units of Hertz (cycles per second). Inother words, in one second the function e2πiωt cycles though its values ω times.

The Fourier basis functions can be written as e2πiωnt with ωn = n/T . Thus Fourier’stheorem states that for t ∈ [0, T ]

y(t) =

∞∑

n=−∞cne

2πiωnt.

In other words, the audio signal y(t) can be synthesized as a superposition of pure oscillationswith frequencies ωn = n/T . The coefficients cn describe how much of the frequency ωn ispresent in the signal. More precisely, writing the complex number cn as cn = |cn|e2πiτn wehave cne

2πiωnt = |cn|e2πi(ωnt+τn). Thus |cn| represents the amplitude of the oscillation withfrequency ωn while τn represents a phase shift.

A frequency-amplitude plot for y(t) is a plot of the points (ωn, |cn|). It should be thoughtof as a graph of the amplitude as a function of frequency and gives a visual representationof how much of each frequency is present in the signal.

If y(t) is defined for all values of t we can use any interval that we want and expand therestriction of y(t) to this interval. Notice that the frequencies ωn = n/T in the expansionwill be different for different values of T .

Example: Let’s illustrate this with the function y(t) = e2πit and intervals [0, T ]. Thisfunction is itself a pure oscillation with frequency ω = 1. So at first glance one would expectthat there will be only one term in the Fourier expansion. This will turn out to be correctif number 1 is one of the available frequencies, that is, if there is some value of n for whichωn = n/T = 1. (This happens if T is an integer.) Otherwise, it is still possible to reconstructy(t), but more frequencies will be required. In this case we would expect that |cn| should belarge for ωn close to 1. Let’s do the calculation. Fix T . Let’s first consider the case when Tis an integer. Then

cn =1

T

∫ T

0e−2πint/T e2πitdt

=1

T

∫ T

0e2πi(1−n/T )tdt

=

{

1 n = T1

2Tπi(1−n/T )

(

e2πi(T−n) − e0)

= 0 n 6= T,

114


as expected. Now let’s look at what happens when T is not an integer. Then

cn =1

T

∫ T

0e−2πint/T e2πitdt

=1

2πi(T − n)

(

e2πi(T−n) − 1)

A calculation (that we leave as an exercise) results in

|cn| =

√

2 − 2 cos(2πT (1 − ωn))

2πT |1 − ωn|

We can use MATLAB/Octave to do an amplitude-frequency plot. Here are the commandsfor T = 10.5 and T = 100.5

N=[-200:200];

T=10.5;

omega=N/T;

absc=sqrt(2-2*cos(2*pi*T*(1-omega)))./(2*pi*T*abs(1-omega));

plot(omega,absc)

T=100.5;

omega=N/T;

absc=sqrt(2-2*cos(2*pi*T*(1-omega)))./(2*pi*T*abs(1-omega));

hold on;

plot(omega,absc, ’r’)

Here is the result

0

0.1

0.2

0.3

0.4

0.5

0.6

-20 -15 -10 -5 0 5 10 15 20

As expected, the values of |cn| are largest when ωn is close to 1.

115

III Orthogonality

III.5 The Discrete Fourier Transform



• write down the definition of the discrete Fourier transform, and compute the matrixthat implements it.

• explain why the Fast Fourier transform algorithm is a faster method.

• compute the discrete Fourier transform of a vector using the fft algorithm.

• compute the fft and its inverse using MATLAB/Octave.

• compute a frequency-amplitude plot for a sampled signal using MATLAB/Octave, andinterpret the result.

III.5.1 Definition

In the previous section we saw that the functions ek(x) = e2πikx for k ∈ Z form an infiniteorthonormal basis for the Hilbert space of functions L2([0, 1]). Now we will introduce adiscrete, finite dimensional version of this basis.

To motivate the definition of this basis, imagine taking a function defined on the interval[0, 1] and sampling it at the point at the N points 0, 1/N, 2/N, . . . , j/N, . . . , (N − 1)/N . Ifwe do this to the basis functions ek(x) we end up with vectors ek given by

ek =

e2πi0k/N

e2πik/N

e2πi2k/N

...

e2πi(N−1)k/N

=

1ωk

N

ω2kN...

ω(N−1)kN

whereωN = e2πi/N

The complex number ωN lies on the unit circle, that is, |ωN | = 1. Moreover ωN is a primitiveNth root of unity. This means that ωN

N = 1 and ωjN 6= 1 unless j is a multiple of N .

Because ωk+NN = ωk

NωNN = ωk

N we see that ek+N = ek. Thus, although the vectors ek aredefined for every integer k, they start repeating themselves after N steps. Thus there areonly N distinct vectors, e0, e1, . . . , eN−1.

116


These vectors, ek for k = 0, . . . , N − 1 form an orthogonal basis for CN . To see this we use

the formula for the sum of a geometric series:

N−1∑

j=0

rj =

N r = 1

1 − rN

1 − rr 6= 1

Using this formula, we compute

〈ek, el〉 =

N−1∑

j=0

ωNkjωlj

N =

N−1∑

j=0

ω(l−k)jN =

N l = k

1 − ω(l−k)NN

1 − ωl−kN

= 0 l 6= k

Now we can expand any vector f ∈ CN in this basis. Actually, to make our discrete Fourier

transform agree with MATLAB/Octave we divide each basis vector by N . Then we obtain

f =1

N

N−1∑

j=0

cjej

where

ck = 〈ek, f〉 =

N−1∑

j=0

e−2πikj/Nfj

The map that send the vector f to the vector of coefficients c = [c0, . . . , cN−1]T is the

discrete Fourier transform. We can write this in matrix form as

c = F f , f = F−1c

where the matrix F−1 has the vectors ek as its columns. Since this vectors are an orthogonalbasis, the inverse is the transpose, up to a factor of N . Explicitly

F =

1 1 1 · · · 1

1 ωN ω2N · · · ωN−1

N

1 ω2N ω4

N · · · ω2(N−1)N

......

......

1 ωN−1N ω

2(N−1)N · · · ω

(N−1)(N−1)N

and

F−1 =1

N

1 1 1 · · · 11 ωN ωN

2 · · · ωNN−1

1 ωN2 ωN

4 · · · ωN2(N−1)

......

......

1 ωNN−1 ωN

2(N−1) · · · ωN(N−1)(N−1)

117

III Orthogonality

The matrix F = N−1/2F is a unitary matrix (F−1 = F ∗). Recall that unitary matricespreserve the length of complex vectors. This implies that the lengths of the vectors f =[f0, f1, . . . , fN−1] and c = [c0, c1, . . . , cN−1] are related by

N‖c‖2 = ‖f‖2

or

N

N−1∑

k=0

|ck|2 =

N−1∑

k=0

|fk|2

This is the discrete version of Parseval’s formula.

III.5.2 The Fast Fourier transform

Multiplying an N×N matrix with a vector of length N normally requires N2 multiplications,since each entry of the product requires N , and there are N entries. It turns out that thediscrete Fourier transform, that is, multiplication by the matrix F , can be carried out usingonly N log2(N) multiplications (at least if N is a power of 2). The algorithm that achievesthis is called the Fast Fourier Transform, or FFT. This represents a tremendous saving intime: calculations that would require weeks of computer time can be carried out in seconds.

The basic idea of the FFT is to break the sum defining the Fourier coefficients ck into asum of the even terms and a sum of the odd terms. Each of these turns out to be (up toa factor we can compute) a discrete Fourier transform of half the length. This idea is thenapplied recursively. Starting with N = 2n and halving the size of the Fourier transform ateach step, it takes n = log2(N) steps to arrive at Fourier transforms of length 1. This iswhere the log2(N) comes in.

To simplify the notation, we will ignore the factor of 1/N in the definition of the discreteFourier transform (so one should divide by N at the end of the calculation.) We now alsoassume that

N = 2n

so that we can divide N by 2 repeatedly. The basic formula, splitting the sum for ck into a

118


sum over odd and even j’s is

ck =

N−1∑

j=0

e−i2πkj/Nfj

=

N−1∑

j=0

j even

e−i2πkj/Nfj +

N−1∑

j=0

j odd

e−i2πkj/Nfj

=

N/2−1∑

j=0

e−i2πk2j/Nf2j +

N/2−1∑

j=0

e−i2πk(2j+1)/Nf2j+1

=

N/2−1∑

j=0

e−i2πkj/(N/2)f2j + e−i2πk/N

N/2−1∑

j=0

e−i2πkj/(N/2)f2j+1

Notice that the two sums on the right are discrete Fourier transforms of length N/2.

To continue, it is useful to write the integers j in base 2. Lets assume that N = 23 = 8.Once you understand this case, the general case N = 2n will be easy. Recall that

0 = 000 (base 2)

1 = 001 (base 2)

2 = 010 (base 2)

3 = 011 (base 2)

4 = 100 (base 2)

5 = 101 (base 2)

6 = 110 (base 2)

7 = 111 (base 2)

The even j’s are the ones whose binary expansions have the form ∗ ∗ 0, while the odd j’shave binary expansions of the form ∗ ∗ 1.

For any pattern of bits like ∗ ∗ 0, I will use the notation F<pattern> to denote the discreteFourier transform where the input data is given by all the fj ’s whose j’s have binary expansionfitting the pattern. Here are some examples. To start, F ∗∗∗

k = ck is the original discreteFourier transform, since every j fits the pattern ∗ ∗ ∗. In this example k ranges over 0, . . . , 7,that is, the values start repeating after that.

Only even j’s fit the pattern ∗ ∗ 0, so F ∗∗0 is the discrete Fourier transform of the even j’sgiven by

F ∗∗0k =

N/2−1∑

j=0

e−i2πkj/(N/2)f2j.

119

III Orthogonality

Here k runs from 0 to 3 before the values start repeating. Similarly, F ∗00 is a transform oflength N/4 = 2 given by

F ∗00k =

N/4−1∑

j=0

e−i2πkj/(N/4)f4j.

In this case k = 0, 1 and then the values repeat. Finally, the only j matching the pattern010 is j = 2, so F 010

k is a transform of length one term given by

F 010k =

N/8−1∑

j=0

e−i2πkj/(N/8)f2 =

0∑

j=0

e0f2. = f2

With this notation, the basic even–odd formula can be written

F ∗∗∗k = F ∗∗0

k + ωkNF

∗∗1k .

Recall that ωN = ei2π/N , so ωN = e−i2π/N .

Lets look at this equation when k = 0. We will represent the formula by the followingdiagram.

F** 00

***F 0

**10F

ωN0

120


This diagram means that F ∗∗∗0 is obtained by adding F ∗∗0

0 to ω0NF

∗∗10 . (Of course ω0

N = 1so we could omit it.) Now lets add the diagrams for k = 1, 2, 3.

F** 00

***F 0

**10F

ωN0

***FωN33

F** 0 ***F

**1F

ωN

F** 0 ***F

**1F

ωN

F** 0

**1F

11

22

3

1

2

3

1

2

121

III Orthogonality

Now when we get to k = 4, we recall that F ∗∗0 and F ∗∗1 are discrete transforms of lengthN/2 = 4. Therefore, by periodicity F ∗∗0

4 = F ∗∗00 , F ∗∗0

5 = F ∗∗01 , and so on. So in the formula

F ∗∗∗4 = F ∗∗0

4 + ω4NF

∗∗14 we may replace F ∗∗0

4 and F ∗∗14 with F ∗∗0

0 and F ∗∗10 respectively.

Making such replacements, we complete the first part of the diagram as follows.

F** 00

***F 0

**10F

ωN0

***FωN33

F** 0 ***F

**1F

ωN

F** 0 ***F

**1F

ωN

F** 0

**1F

11

22

3

1

2

3

1

2

***FωN

***FωN

***FωN

***FωN

4

5

6

7

4

5

6

7

122


To move to the next level we analyze the discrete Fourier transforms on the left of thisdiagram in the same way. This time we use the basic formula for the transform of lengthN/2, namely

F ∗∗0k = F ∗00

k + ωkN/2F

∗10k

andF ∗∗1

k = F ∗01k + ωk

N/2F∗11k .

The resulting diagram shows how to go from the length two transforms to the final transformon the right.

F

F

F

F

F

F

1

1

1

2

F 0

0F

0*

*

*

*

*

*

00

10

10

* 01

11*

00

0

1

0

1

***F

***F

1

2

***F

***F

***F

***F

4

5

6

7

***F 0

F** 0

**1F

ω

F** 0

**1F

ω

F** 0

**1F

1

2

3

1

2

3

1

2

ω

ω

ω

ω

4

5

6

7

F** 00

**10F

ω0

***Fω 33

N

N

N

N

N

N

N

N

N/2

N/2

N/2

N/2

N/2

ω

ω

ω

ω

ω

ω

ω

ωN/2

N/2

N/2

0

3

1

2

3

01

11

123

III Orthogonality

Now we go down one more level. Each transform of length two can be constructed fromtransforms of length one, i.e., from the original data in some order. We complete the diagramas follows. Here we have inserted the value N = 8.

0

1

2

3

4

5

6

7

***F

***F

1

2

***F

***F

***F

***F

4

5

6

7

***F 0

F** 0

**1F

ω

F** 0

**1F

ω

F** 0

**1F

1

2

3

1

2

3

1

2

ω

ω

ω

ω

4

5

6

7

F** 00

**10F

ω0

***Fω 33

F

F

F

F

F

F

F 0

0F

0

0

1

0

ω

ω

ω

ω

ω

ω

ω

ω

0

1

1

2

0

ω

ω

ω

ω

ω

ω

ω

ω

0

3

1

2

3

F

F

F

F

F

F

1

1

F 0

0F

*

*

*

*

*

*

00

10

10

* 01

11*

00

0

1

0

1

01

11

111

011

101

001

110

010

100

000

0

0

0

0

2

2

2

2

0

1

2

2

20

21

4

4

4

4

4

4

4

4 8

8

8

8

8

8

8

8f

f

f

f

f

f

f

f0

4

2

5

6

1

3

7=

=

=

=

=

=

=

=

= c

= c

= c

= c

= c

= c

= c

= c

Notice that the fj’s on the left of the diagram are in bit reversed order. In other words,if we reverse the order of the bits in the binary expansion of the j’s, the resulting numbersare ordered from 0 (000) to 7 (111).

Now we can describe the algorithm for the fast Fourier transform. Starting with the originaldata [f0, . . . , f7] we arrange the values in bit reversed order. Then we combine them pairwise,as indicated by the left side of the diagram, to form the transforms of length 2. To do this wewe need to compute ω2 = e−iπ = −1. Next we combine the transforms of length 2 accordingto the middle part of the diagram to form the transforms of length 4. Here we use thatω4 = e−iπ/2 = −i. Finally we combine the transforms of length 4 to obtain the transform oflength 8. Here we need ω8 = e−iπ/4 = 2−1/2 − i2−1/2.

The algorithm for values of N other than 8 is entirely analogous. For N = 2 or 4 we stopat the first or second stage. For larger values of N = 2n we simply add more stages. Howmany multiplications do we need to do? Well there are N = 2n multiplications per stage ofthe algorithm (one for each circle on the diagram), and there are n = log2(N) stages. So thenumber of multiplications is 2nn = N log2(N)

As an example let us compute the discrete Fourier transform with N = 4 of the data[f0, f1, f2, f3] = [1, 2, 3, 4]. First we compute the bit reversed order of 0 = (00), 1 = (01), 2 =(10), 3 = (11) to be (00) = 0, (10) = 2, (01) = 1, (11) = 3. We then do the rest of thecomputation right on the diagram as follows.

124


0

1

2

3f

f

f

f0

=

=

=

=

3

2

1

1

3

2

4

1

1

−1

−1

1

1−3=−2

1+3=4

2+4=6

2−4=−2

4+6=10

4−6=−2

−2+2i

−2−2i

−i

i

−1

= c

= c

= c

= c

The MATLAB/Octave command for computing the fast Fourier transform is fft. Let’sverify the computation above.

> fft([1 2 3 4])

ans =

10 + 0i -2 + 2i -2 + 0i -2 - 2i

The inverse fft is computed using ifft.

III.5.3 A frequency-amplitude plot for a sampled audio signal

Recall that a frequency-amplitude plot for the function y(t) defined on the interval [0, T ] isa plot of the points (ωn, |cn|), where ωn = n/T and cn are the numbers appearing in theFourier series

y(t) =

∞∑

n=−∞cne

2πiωnt =

∞∑

n=−∞cne

2πint/T

If y(t) represents the sound of a musical instrument, then the frequency-amplitude plot givesa visual representation of the strengths of the various frequencies present in the sound.

Of course, for an actual instrument there is no formula for y(t) and the best we cando is to measure this function at a discrete set of points. To do this we pick a samplingfrequency Fs samples/second. Then we measure the function y(t) at times t = tj = j/Fs,j = 1, . . . N , where N = FsT (so that tN = T ) and put the results yj = y(tj) in a vectory = [y1, y2, . . . , yN ]T . How can we make an approximate frequency-amplitude plot with thisinformation?

The key is to realize that the coefficients in the discrete Fourier transform of y can be used toapproximate the Fourier series coefficients cn. To see this, do a Riemann sum approximation

125

III Orthogonality

of the integral in the formula for cn. Using the equally spaced points tj with ∆tj = 1/Fs,and recalling that N = TFs we obtain

cn =1

T

∫ T

0e−2πint/T y(t)dt

≈ 1

T

N−1∑

j=0

e−2πintj/T y(tj)∆tj

=1

TFs

N−1∑

j=0

e−2πinj/(TFs)yj

=1

Ncn

where cn is the nth coefficient in the discrete Fourier transform of y

The frequency corresponding to cn is ωn = n/T = nFs/N . So, for an approximatefrequency-amplitude plot, we can plot the points (nFs/N, |cn|/N).

However, it is important to realize that the approximation cn ≈ cn/N is only good for smalln. The reason is that the Riemann sum will do a worse job in approximating the integralwhen the integrand is oscillating rapidly, that is, when n is large. So we should only plot arestricted range of n. In fact, it never makes sense to plot more than N/2 points. The reasonfor this is cn+N = cn and, for y real valued, c−n = cn. These facts imply that |cn| = |cN−n|,so that the values of |cn| in the range [0, N/2− 1] are the same as the values in [N/2, N − 1],with the order reversed.

To compare the meanings of the coefficients cn and cn it is instructive to consider theformulas (both exact) for the Fourier series and the discrete Fourier transform for yj = y(tj):

yj =1

N

N−1∑

n=0

cne2πinj/N

y(tj) =

∞∑

n=−∞cne

2πintj/T =

∞∑

n=−∞cne

2πinj/N

The coefficients cn and cn/N are close for n close to 0, but then their values diverge so thatthe infinite sum and the finite sum above both give the same answer.

Now let’s try and make a frequency amplitude plot using MATLAB/Octave for a sampledflute contained in the audio file F6.baroque.au available at

http://www.phys.unsw.edu.au/music/flute/baroque/sounds/F6.baroque.au.

This file contains a sampled baroque flute playing the note F6, which has a frequency of1396.91 Hz. The sampling rate is Fs = 22050 samples/second.

Audio processing is one area where MATLAB and Octave are different. The Octave codeto load the file F6.baroque.au is

126


y=loadaudio(’F6.baroque’,’au’,8);

while the MATLAB code is

y=auread(’F6.baroque.au’);

After this step the sampled values are loaded in the vector y. Now we compute the FFTof y and store the resulting values cn in a vector tildec. Then we compute a vector omega

containing the frequencies and make a plot of these frequencies against |cn|/N . We plot thefirst Nmax=N/4 values.

tildec = fft(y);

N=length(y);

Fs=22050;

omega=[0:N-1]*Fs/N;

Nmax=floor(N/4);

plot(omega(1:Nmax), abs(tildec(1:Nmax)/N));

Here is the result.

0

2

4

6

8

10

12

0 1000 2000 3000 4000 5000

Notice the large spike at ω ≈ 1396 corresponding to the note F6. Smaller spikes appear atthe overtone series, but evidently these are quite small for a flute.

127

Chapter IV

Eigenvalues and Eigenvectors

129

IV Eigenvalues and Eigenvectors

IV.1 Eigenvalues and Eigenvectors



• write down the definition of eigenvalues and eigenvectors and be able to compute themusing the standard procedure.

• use MATLAB/Octave commands poly and root to compute the characteristic poly-nomial and its roots, and eig to compute the eigenvalues and eigenvectors.

• write down the definitions of algebraic and geometric multiplicities of eigenvectors whenthere are repeated eigenvalues.

• use eigenvalues and eigenvectors to perform matrix diagonalization.

• recognize the form of the Jordan Canonical Form for non-diagonalizable matrices.

• explain the relationship between eigenvalues and the determinant and trace of a matrix.

• use eigenvalues to compute powers of a diagonalizable matrix.

IV.1.1 Definition

Let A be an n× n matrix. A number λ and non-zero vector v are an eigenvalue eigenvectorpair for A if

Av = λv

Although v is required to be nonzero, λ = 0 is possible. If v is an eigenvector, so is sv forany number s 6= 0.

Rewrite the eigenvalue equation as

(λI −A)v = 0

Then we see that v is a non-zero vector in the nullspace N(λI − A). Such a vector onlyexists if λI −A is a singular matrix, or equivalently if

det(λI −A) = 0

130


IV.1.2 Standard procedure

This leads to the standard textbook method of finding eigenvalues. The function of λ definedby p(λ) = det(λI−A) is a polynomial of degree n, called the characteristic polynomial, whosezeros are the eigenvalues. So the standard procedure is:

• Compute the characteristic polynomial p(λ)

• Find all the zeros (roots) of p(λ). This is equivalent to completely factoring p(λ) as

p(λ) = (λ− λ1)(λ− λ2) · · · (λ− λn)

Such a factorization always exists if we allow the possibility that the zeros λ1, λ2, . . .are complex numbers. But it may be hard to find. In this factorization there maybe repetitions in the λi’s. The number of times a λi is repeated is called its algebraicmultiplicity.

• For each distinct λi find N(λI −A), that is, all the solutions to

(λiI −A)v = 0

The non-zero solutions are the eigenvectors for λi.

IV.1.3 Example 1

This is the typical case where all the eigenvalues are distinct. Let

A =

3 −6 −71 8 5−1 −2 1

Then, expanding the determinant, we find

det(λI −A) = λ3 − 12λ2 + 44λ− 48

This can be factored as

λ3 − 12λ2 + 44λ − 48 = (λ− 2)(λ − 4)(λ − 6)

So the eigenvalues are 2, 4 and 6.

These steps can be done with MATLAB/Octave using poly and root. If A is a squarematrix, the command poly(A) computes the characteristic polynomial, or rather, its coeffi-cients.

131


> A=[3 -6 -7; 1 8 5; -1 -2 1];

> p=poly(A)

p =

1.0000 -12.0000 44.0000 -48.0000

Recall that the coefficient of the highest power comes first. The function roots takes asinput a vector representing the coefficients of a polynomial and returns the roots.

>roots(p)

ans =

6.0000

4.0000

2.0000

To find the eigenvector(s) for λ1 = 2 we must solve the homogeneous equation (2I−A)v = 0.Recall that eye(n) is the n× n identity matrix I

>rref(2*eye(3) - A)

ans =

1 0 -1

0 1 1

0 0 0

From this we can read off the solution

v1 =

1−11

Similarly we find for λ2 = 4 and λ3 = 6 that the corresponding eigenvectors are

v2 =

−1−11

v3 =

−210

The three eigenvectors v1, v2 and v3 are linearly independent and form a basis for R3.

The MATLAB/Octave command for finding eigenvalues and eigenvectors is eig. Thecommand eig(A) lists the eigenvalues

132


>eig(A)

ans =

4.0000

2.0000

6.0000

while the variant [V,D] = eig(A) returns a matrix V whose columns are eigenvectors and adiagonal matrix D whose diagonal entries are the eigenvalues.

>[V,D] = eig(A)

V =

5.7735e-01 5.7735e-01 -8.9443e-01

5.7735e-01 -5.7735e-01 4.4721e-01

-5.7735e-01 5.7735e-01 2.2043e-16

D =

4.00000 0.00000 0.00000

0.00000 2.00000 0.00000

0.00000 0.00000 6.00000

Notice that the eigenvectors have been normalized to have length one. Also, since theyhave been computed numerically, they are not exactly correct. The entry 2.2043e-16 (i.e.,2.2043 × 10−16) should actually be zero.

IV.1.4 Example 2

This example has a repeated eigenvalue.

A =

1 1 00 2 00 −1 1

The characteristic polynomial is

det(λI −A) = λ3 − 4λ2 + 5λ− 2 = (λ− 1)2(λ− 2)

In this example the eigenvalues are 1 and 2, but the eigenvalue 1 has algebraic multiplicity2.

133


Let’s find the eigenvector(s) for λ1 = 1 we have

I −A =

0 1 00 1 00 −1 0

From this it is easy to see that there are two linearly independent eigenvectors for thiseigenvalue:

v1 =

100

and w1 =

001

In this case we say that the geometric multiplicity is 2. In general, the geometric multiplicityis the number of independent eigenvectors, or equivalently the dimension of N(λI −A)

The eigenvalue λ2 = 2 has eigenvector

v2 =

−1−11

So, although this example has repeated eigenvalues, there still is a basis of eigenvectors.

IV.1.5 Example 3

Here is an example where the geometric multiplicity is less than the algebraic multiplicity. If

A =

2 1 00 2 10 0 2

then the characteristic polynomial is

det(λI −A) = (λ− 2)3

so there is one eigenvalue λ1 = 2 with algebraic multiplicity 3.

To find the eigenvectors we compute

2I −A =

0 −1 00 0 −10 0 0

From this we see that there is only one independent solution

v1 =

100

134


Thus the geometric multiplicity dim(N(2I − A)) is 1. What does MATLAB/Octave do inthis situation?

>A=[2 1 0; 0 2 1; 0 0 2];

>[V D] = eig(A)

V =

1.00000 -1.00000 1.00000

0.00000 0.00000 -0.00000

0.00000 0.00000 0.00000

D =

2 0 0

0 2 0

0 0 2

It simply returned the same eigenvector three times.

In this example, there does not exist a basis of eigenvectors.

IV.1.6 Example 4

Finally, here is an example where the eigenvalues are complex, even though the matrix hasreal entries. Let

A =

[

0 −11 0

]

Thendet(λI −A) = λ2 + 1

which has no real roots. However

λ2 + 1 = (λ+ i)(λ− i)

so the eigenvalues are λ1 = i and λ2 = −i. The eigenvectors are found with the sameprocedure as before, except that now we must use complex arithmetic. So for λ1 = i wecompute

iI −A =

[

i 1−1 i

]

There is trick for computing the null space of a singular 2 × 2 matrix. Since the two rowsmust be multiples of each other (in this case the second row is i times the first row) we simply

135


need to find a vector

[

ab

]

with ia + b = 0. This is easily achieved by flipping the entries in

the first row and changing the sign of one of them. Thus

v1 =

[

1−i

]

If a matrix has real entries, then the eigenvalues and eigenvectors occur in conjugate pairs.This can be seen directly from the eigenvalue equation Av = λv. Taking complex conjugates(and using that the conjugate of a product is the product of the conjugates) we obtainAv = λv But if A is real then A = A so Av = λv, which shows that λ and v are also aneigenvalue eigenvector pair.

From this discussion it follows that v2 is the complex conjugate of v1

v2 =

[

1i

]

IV.1.7 A basis of eigenvectors

In three of the four examples above the matrix A had a basis of eigenvectors. If all theeigenvalues are distinct, as in the first example, then the corresponding eigenvectors arealways independent and therefore form a basis.

To see why this is true, suppose A has eigenvalues λ1, . . . , λn that are all distinct, that is,λi 6= λj for i 6= j. Let v1, . . . ,vn be the corresponding eigenvectors.

Now, starting with the first two eigenvectors, suppose a linear combination of them equalszero:

c1v1 + c2v2 = 0

Multiplying by A and using the fact that these are eigenvectors, we obtain

c1Av1 + c2Av2 = c1λ1v1 + c2λ2v2 = 0

On the other hand, multiplying the original equation by λ2 we obtain

c1λ2v1 + c2λ2v2 = 0.

Subtracting the equations givesc1(λ2 − λ1)v1 = 0

Since (λ2 −λ1) 6= 0 and, being an eigenvector, v1 6= 0 it must be that c1 = 0. Now returningto the original equation we find c2v2 = 0 which implies that c2 = 0 too. Thus v1 and v2 arelinearly independent.

Now let’s consider three eigenvectors v1, v2 and v3. Suppose

c1v1 + c2v2 + c3v3 = 0

136


As before, we multiply by A to get one equation, then multiply by λ3 to get another equation.Subtracting the resulting equations gives

c1(λ1 − λ3)v1 + c2(λ2 − λ3)v2 = 0

But we already know that v1 and v2 are independent. Therefore c1(λ1−λ3) = c2(λ2−λ3) = 0.Since λ1 − λ3 6= 0 and λ2 − λ3 6= 0 this implies c1 = c2 = 0 too. Therefore v1, v2 and v3 areindependent.

Repeating this argument, we eventually find that all the eigenvectors v1, . . . ,vn are inde-pendent.

In example 2 above, we saw that it might be possible to have a basis of eigenvectors evenwhen there are repeated eigenvalues. For some classes of matrices (for example symmetricmatrices (AT = A) or orthogonal matrices) a basis of eigenvectors always exists, whether ornot there are repeated eigenvalues. Will will consider this in more detail later in the course.

IV.1.8 When there are not enough eigenvectors

Let’s try to understand a little better the exceptional situation where there are not enougheigenvectors to form a basis. Consider

Aǫ =

[

1 10 1 + ǫ

]

when ǫ = 0 this matrix has a single eigenvalues λ = 1 and only one eigenvector v1 =

[

10

]

.

What happens when we change ǫ slightly? Then the eigenvalues change to 1 and 1 + ǫ, andbeing distinct, they must have independent eigenvectors. A short calculation reveals thatthey are

v1 =

[

10

]

v2 =

[

1ǫ

]

These two eigenvectors are almost, but not quite, dependent. When ǫ becomes zero theycollapse and point in the same direction.

In general, if you start with a matrix with repeated eigenvalues and too few eigenvectors,and change the entries of the matrix a little, some of the eigenvectors (the ones correspondingto the eigenvalues whose algebraic multiplicity is higher than the geometric multiplicity) willsplit into several eigenvectors that are almost parallel.

IV.1.9 Diagonalization

Suppose A is an n × n matrix with eigenvalues λ1, · · · , λn and a basis of eigenvectorsv1, . . . ,vn. Form the matrix with eigenvectors as columns

S =[

v1 v2 · · · vn

]

137


Then

AS =[

Av1 Av2 · · · Avn

]

=[

λ1v1 λ2v2 · · · λnvn

]

=[

v1 v2 · · · vn

]

λ1 0 0 · · · 00 λ2 0 · · · 00 0 λ3 · · · 0...

...0 0 0 · · · λn

= SD

where D is the diagonal matrix with the eigenvalues on the diagonal. Since the columns ofS are independent, the inverse exists and we can write

A = SDS−1 S−1AS = D

This is called diagonalization.

Notice that the matrix S is exactly the one returns by the MATLAB/Octave call [S D] = eig(A).

>A=[1 2 3; 4 5 6; 7 8 9];

>[S D] = eig(A);

>S*D*S^(-1)

ans =

1.0000 2.0000 3.0000

4.0000 5.0000 6.0000

7.0000 8.0000 9.0000

IV.1.10 Jordan canonical form

If A is a matrix that cannot be diagonalized, there still exits a similar factorization calledthe Jordan Canonical Form. It turns out that any matrix A can be written as

A = SBS−1

where B is a block diagonal matrix. The matrix B has the form

B =

B1 0 0 · · · 0

0 B2 0 · · · 0

0 0 B3 · · · 0...

...0 0 0 · · · Bk

138


Where each submatrix Bi (called a Jordan block) has a single eigenvalue on the diagonaland 1’s on the superdiagonal.

Bi =

λi 1 0 · · · 00 λi 1 · · · 00 0 λi · · · 0...

...0 0 0 · · · λi

If all the blocks are of size 1 × 1 then there are no 1’s and the matrix is diagonalizable.

IV.1.11 Eigenvalues, determinant and trace

Recall that the determinant satisfies det(AB) = det(A) det(B) and det(S−1) = 1/det(S).Also, the determinant of a diagonal matrix (or more generally of an upper triangular matrix)is the product of the diagonal entries. Thus if A is diagonalizable then

det(A) = det(SDS−1) = det(S) det(D) det(S−1) = det(S) det(D)/det(S) = det(D) = λ1λ2 · · · λn

Thus the determinant of a matrix is the product of the eigenvalues. This is true for non-diagonalizable matrices as well, as can be seen from the Jordan Canonical Form. Notice thatthe number of times a particular λi appears in this product is the algebraic multiplicity ofthat eigenvalues.

The trace of a matrix is the sum of the diagonal entries. If A = [ai,j] then tr(A) =∑

i ai,i.Even though it is not true that AB = BA in general, the trace is not sensitive to the changein order:

tr(AB) =∑

i,j

ai,jbj,i =∑

i,j

bj,iai,j = tr(BA)

Thus (taking A = SD and B = S−1)

tr(A) = tr(SDS−1) = tr(S−1SD) = tr(D) = λ1 + λ2 + · · · + λn

Thus the trace of a diagonalizable matrix is the sum of the eigenvalues. Again, this is truefor non-diagonalizable matrices as well, and can be seen from the Jordan Canonical Form.

IV.1.12 Powers of a diagonalizable matrix

If A is diagonalizable then its powers Ak are easy to compute.

Ak = SDS−1SDS−1SDS−1 · · ·SDS−1 = SDkS−1

139


because all of the factors S−1S cancel. Since powers of the diagonal matrix D are given by

Dk =

λk1 0 0 · · · 00 λk

2 0 · · · 00 0 λk

3 · · · 0...

...0 0 0 · · · λk

n

this formula provides an effective way to understand and compute Ak for large k.

140

IV.2 Power Method for Computing Eigenvalues




• write down the properties of the eigenvalues and eigenvectors of real symmetric matri-ces.

• write down the definition and properties of Hermitian matrices.

• use the power method to compute the eigenvalue/eigenvector of a Hermitian matrix,where the eigenvalue is closest to a given number.

IV.2.1 Eigenvalues of real symmetric matrices

If A is real (that is, the entries are all real numbers) and symmetric (that is AT = A) thenthe eigenvalues of A are all real, and the eigenvectors can be chosen to form an orthonormalbasis.

To see that the eigenvalues must be real, let’s start with an eigenvalue eigenvector pair λ,v for A. For the moment, we allow the possibility that λ and v are complex.

Since A is real and symmetric we have

〈v, Av〉 = 〈Av,v〉

and since Av = λv this implies〈v, λv〉 = 〈λv,v〉

Here we are using the inner product for complex vectors given by 〈0,w〉 = 0Tw. This means

that the λ on the right side is conjugated, that is,

λ〈v,v〉 = λ〈v,v〉.

Since v is an eigenvector, it cannot be zero. So 〈v,v〉 = ‖v‖2 6= 0. Therefore we may divideby 〈v,v〉 to conclude

λ = λ.

This shows that λ is real.

Now let’s show that eigenvectors corresponding to two distinct eigenvalues must be orthog-onal. If Av1 = λ1v1 and Av2 = λ2v2 with λ1 6= λ2, then starting with the equation thatfollows from the symmetry of A

〈Av1,v2〉 = 〈v1, Av2〉

141


we findλ1〈v1,v2〉 = λ2〈v1,v2〉

Here λ1 should appear as λ1, but we already know that eigenvalues are real so λ1 = λ1. Thiscan be written

(λ1 − λ2)〈v1,v2〉 = 0

and since λ1 − λ2 6= 0 this implies〈v1,v2〉 = 0

This calculation shows that if A has distinct eigenvalues then the eigenvectors are allorthogonal, and by rescaling them, we can obtain an orthonormal basis of eigenvectors.In fact, even it A has repeated eigenvalues, it is still true that an orthonormal basis ofeigenvectors exists.

If A is real and symmetric, then the eigenvectors can be chosen to be real. One way tosee this is to notice that if once we know that λ is real then all the calculations involved incomputing the nullspace of λI − A only involve real numbers. This implies that the matrixthat diagonalizes A can be chosen to be an orthogonal matrix.

If A has complex entries, but satisfies A∗ = A it is called Hermitian. (Recall that A∗ = AT.)

The argument above still is valid for Hermitian matrices and shows that all the eigenvaluesare real. There also exists an orthonormal basis of eigenvectors. However, in contrast tothe case where A is real and symmetric, the eigenvectors may have complex entries. Thus aHermitian matrix may be diagonalized by a unitary matrix.

If A is any matrix with real entries, then A+ AT is real symmetric. (The matrix ATA isalso real symmetric, and has the additional property that all the eigenvalues are positive.)We can use this to produce random symmetric matrices in MATLAB/Octave like this:

>A = rand(4,4);

>A = A+A’

A =

0.043236 1.240654 0.658890 0.437168

1.240654 1.060615 0.608234 0.911889

0.658890 0.608234 1.081767 0.706712

0.437168 0.911889 0.706712 1.045293

Let’s check the eigenvalues and vectors or A

>[V, D] = eig(A)

V =

142


-0.81345 0.33753 0.23973 0.40854

0.54456 0.19491 0.55585 0.59707

0.15526 0.35913 -0.78824 0.47497

-0.13285 -0.84800 -0.11064 0.50100

D =

-0.84168 0.00000 0.00000 0.00000

0.00000 0.36240 0.00000 0.00000

0.00000 0.00000 0.55166 0.00000

0.00000 0.00000 0.00000 3.15854

The eigenvalues are real, as expected. Also, the eigenvectors contained in the columns of thematrix V have been normalized. Thus V is orthogonal:

>V’*V

ans =

1.0000e+00 6.5066e-17 -1.4700e-16 1.4366e-16

9.1791e-17 1.0000e+00 -1.0432e-16 2.2036e-17

-1.4700e-16 -1.0012e-16 1.0000e+00 -1.2617e-16

1.3204e-16 2.2036e-17 -1.0330e-16 1.0000e+00

(at least, up to numerical error.)

IV.2.2 The power method

The power method is a very simple method for finding a single eigenvalue–eigenvector pair.

Suppose A is an n × n matrix. We assume that A is real symmetric, so that all theeigenvalues are real. Now let x0 be any vector of length n. Perform the following steps:

• Multiply by A

• Normalize to unit length.

repeatedly. This generates a series of vectors x0,x1,x2, . . .. It turns out that these vectorsconverge to the eigenvector corresponding to the eigenvalue whose absolute value is thelargest.

143


To verify this claim, let’s first find a formula for xk. At each stage of this process, we aremultiplying by A and then by some number. Thus xk must be a multiple of Akx0. Since theresulting vector has unit length, that number must be 1/‖Akx0‖. Thus

xk =Akx0

‖Akx0‖

We know that A has a basis of eigenvectors v1,v2, . . . ,vn. Order them so that |λ1| > |λ2| ≥· · · ≥ |λn|. (We are assuming here that |λ1| 6= |λ2|, otherwise the power method runs intodifficulty.) We may expand our initial vector x0 in this basis

x0 = c1v1 + c2v2 + · · · cnvn

We need that c1 6= 0 for this method to work, but if x0 is chosen at random, this is almostalways true.

Since Akvi = λki vi we have

Akx0 = c1λk1v1 + c2λ

k2v2 + · · · cnλk

nvn

= λk1

(

c1v1 + c2(λ2/λ1)kv2 + · · · cn(λn/λ1)

kvn

)

= λk1(c1v1 + ǫk)

where ǫk → 0 as k → ∞. This is because |(λi/λ1)| < 1 for every i > 1 so the powers tend tozero. Thus

‖Akx0‖ = |λ1|k‖c1v1 + ǫk‖so that

xk =Akx0

‖Akx0‖

=

(

λ1

|λ1|

)k c1v1 + ǫk‖c1v1 + ǫk‖

→ (±)k(

± v1

‖v1‖

)

We have shown that xk converges, except for a possible sign flip at each stage, to a normal-ized eigenvector corresponding to λ1. The sign flip is present exactly when λ1 < 0. Knowingv1 (or a multiple of it) we can find λ1 with

λ1 =〈v1, Av1〉‖v1‖2

This gives a method for finding the largest eigenvalue (in absolute value) and the corre-sponding eigenvector. Let’s try it out.

144


>A = [4 1 3;1 3 2; 3 2 5];

>x=rand(3,1);

>for k = [1:10]

>y=A*x;

>x=y/norm(y)

>end

x =

0.63023

0.38681

0.67319

x =

0.58690

0.37366

0.71828

x =

0.57923

0.37353

0.72455

x =

0.57776

0.37403

0.72546

x =

0.57745

0.37425

0.72559

x =

0.57738

0.37433

0.72561

x =

145


0.57736

0.37435

0.72562

x =

0.57735

0.37436

0.72562

x =

0.57735

0.37436

0.72562

x =

0.57735

0.37436

0.72562

This gives the eigenvector. We compute the eigenvalue with

>x’*A*x/norm(x)^2

ans = 8.4188

Let’s check:

>[V D] = eig(A)

V =

0.577350 0.577350 0.577350

0.441225 -0.815583 0.374359

-0.687013 -0.038605 0.725619

D =

1.19440 0.00000 0.00000

0.00000 2.38677 0.00000

0.00000 0.00000 8.41883

146


As expected, we have computed the largest eigenvalue and eigenvector. Of course, a seriousprogram that uses this method would not just iterate a fixed number (above it was 10) times,but check for convergence, perhaps by checking whether ‖xk−xk−1‖ was less than some smallnumber, and stopping when this was achieved.

So far, the power method only computes the eigenvalue with the largest absolute value, andthe corresponding eigenvector. What good is that? Well, it turns out that with an additionaltwist we can compute the eigenvalue closest to any number s. The key observation is thatthe eigenvalues of (A−sI)−1 are exactly (λi−s)−1 (unless, of course, A−sI is not invertible.But then s is an eigenvalue itself and we can stop looking.) Moreover, the eigenvectors of Aand (A− sI)−1 are the same.

Let’s see why this is true. IfAv = λv

then(A− sI)v = (λ− s)v.

Now if we multiply both sides by (A− sI)−1 and divide by λ− s we get

(λ− s)−1v = (A− sI)−1v.

These steps can be run backwards to show that if (λ − s)−1 is an eigenvalue of (A − sI)−1

with eigenvector v, then λ is an eigenvalue of A with the same eigenvector.

Now start with an arbitrary vector x0 and define

xk+1 =(A− sI)−1xk

‖(A− sI)−1xk‖.

Then xk will converge to the eigenvector vi of (A− sI)−1 for which |λi − s|−1 is the largest.But, since the eigenvectors of A and A− sI are the same, vi is also an eigenvector of A. Andsince |λi − s|−1 is largest when λi is closest to s, we have computed the eigenvector vi of Afor which λi is closest to s. We can now compute λi by comparing Avi with vi

Here is a crucial point: when computing (A − sI)−1xk in this procedure, we should notactually compute the inverse. We don’t need to know the whole matrix (A− sI)−1, but justthe vector (A− sI)−1xk. This vector is the solution y of the linear equation (A− sI)y = xk.In MATLAB/Octave we would therefore use something like (A - s*eye(n))\Xk.

Let’s try to compute the eigenvalue of the matrix A above closest to 3.

>A = [4 1 3;1 3 2; 3 2 5];

>x=rand(3,1);

>for k = [1:10]

>y=(A-3*eye(3))\x;

>x=y/norm(y)

>end

147


x =

0.649008

-0.756516

0.080449

x =

-0.564508

0.824051

0.047657

x =

0.577502

-0.815593

-0.036045

x =

-0.576895

0.815917

0.038374

x =

0.577253

-0.815659

-0.038454

x =

-0.577311

0.815613

0.038562

x =

0.577338

-0.815593

-0.038590

x =

148


-0.577346

0.815587

0.038600

x =

0.577349

-0.815585

-0.038603

x =

-0.577350

0.815584

0.038604

This gives the eigenvector. Now we can find the eigenvalue

> lambda = x’*A*x/norm(x)^2

lambda = 2.3868

Comparing with the results of eig above, we see that we have computed the middle eigenvalueand eigenvector.

149


IV.3 Recursion Relations



• use matrix equations to solve a recurrence relation, for example the relation definingFibonacci numbers.

• determine initial values for which the solution of a recurrence relation will become largeor small (depending of the eigenvalues of the associated matrix).

IV.3.1 Fibonacci numbers

Consider the sequence of numbers given by a multiple of powers of the golden ratio

1√5

(

1 +√

5

2

)n

n = 1, 2, 3 . . . .

When n is large, these numbers are almost integers:

>format long;

>((1+sqrt(5))/2)^30/sqrt(5)

ans = 832040.000000241

>((1+sqrt(5))/2)^31/sqrt(5)

ans = 1346268.99999985

>((1+sqrt(5))/2)^32/sqrt(5)

ans = 2178309.00000009

Why? To answer this question we introduce the Fibonacci sequence:

0 1 1 2 3 4 5 8 13 . . .

where each number in the sequence is obtained by adding the previous two. If you go farenough along in this sequence you will encounter

. . . 832040 1346269 2178309 . . .

150

IV.3 Recursion Relations

and you can check (without using MATLAB/Octave, I hope) that the third number is thesum of the previous two.

But why should powers of the golden ratio be very nearly, but not quite, equal to Fibonaccinumbers? The reason is that the Fibonacci sequence is defined by a recursion relation. Forthe Fibonacci sequence F0, F1, F2, . . . the recursion relation is

Fn+1 = Fn + Fn−1

This equation, together with the identity Fn = Fn can be written in matrix form as

[

Fn+1

Fn

]

=

[

1 11 0

] [

Fn

Fn−1

]

Thus, taking n = 1, we find[

F2

F1

]

=

[

1 11 0

] [

F1

F0

]

Similarly[

F3

F2

]

=

[

1 11 0

] [

F2

F1

]

=

[

1 11 0

]2 [F1

F0

]

and continuing like this we find

[

Fn+1

Fn

]

=

[

1 11 0

]n [F1

F0

]

Finally, since F0 = 0 and F1 = 1 we can write

[

Fn+1

Fn

]

=

[

1 11 0

]n [10

]

We can diagonalize the matrix to get a formula for the Fibonacci numbers. The eigenvalues

and eigenvectors of

[

1 11 0

]

are

λ1 =1 +

√5

2v1 =

[

1+√

521

]

and

λ2 =1 −

√5

2v2 =

[

1−√

521

]

This implies

[

λ1 λ2

1 1

] [

λn1 00 λn

2

] [

λ1 λ2

1 1

]−1

=1√5

[

λ1 λ2

1 1

] [

λn1 00 λn

2

] [

1 −λ2

−1 λ1

]

151


so that[

Fn+1

Fn

]

=1√5

[

λ1 λ2

1 1

] [

λn1 00 λn

2

] [

1 −λ2

−1 λ1

] [

10

]

=1√5

[

λn+11 − λn+1

2

λn1 − λn

2

]

In particular

Fn =1√5

(λn1 − λn

2 )

Since λ2 ∼ −0.6180339880 is smaller than 1 in absolute value, the powers λn2 become small

very quickly as n becomes large. This explains why

Fn ∼ 1√5λn

1

for large n.

If we want to use MATLAB/Octave to compute Fibonacci numbers, we don’t need tobother diagonalizing the matrix.

>[1 1;1 0]^30*[1;0]

ans =

1346269

832040

produces the same Fibonacci numbers as above.

IV.3.2 Other recursion relations

The idea that was used to solve for the Fibonacci numbers can be used to solve other recursionrelations. For example the three-step recursion

xn+1 = axn + bxn−1 + cxn−2

can be written as a matrix equation

xn+1

xn

xn−1

=

a b c1 0 00 1 0

xn

xn−1

xn−2

so given three initial values x0, x1 and x2 we can find the rest by computing powers of a3 × 3 matrix.

In the next section we will solve a recurrence relation that arises in Quantum Mechanics.

152

IV.4 The Anderson Tight Binding Model




• describe a bound state with energy E for the discrete Schrodinger equation for a singleelectron moving in a one dimensional semi-infinite crystal.

• describe a scattering state with energy E.

• compute the energies for which a bound state exists and identify the conduction band,for a potential that has only one non-zero value.

• compute the conduction bands for a one dimensional crystal.

IV.4.1 Description of the model

Previously we studied how to approximate differential equations by matrix equations. If weapply this discretization procedure to the Schrodinger equation for an electron moving in asolid we obtain the Anderson tight binding model.

We will consider a single electron moving in a one dimensional semi-infinite crystal. Theelectron is constrained to live at discrete lattice points, numbered 0, 1, 2, 3, . . .. These can bethought of as the positions of the atoms. For each lattice point n there is a potential Vn thatdescribes how much the atom at that lattice point attracts or repels the electron. PositiveVn’s indicate repulsion, whereas negative Vn’s indicate attraction. Typical situations studiedin physics are where the Vn’s repeat the same pattern periodically (a crystal), or where theyare chosen at random (disordered media). In fact, the term Anderson model usually refersto the random case, where the potentials are chosen at random, independently for each site.

The wave function for the electron is a sequence of complex numbers Ψ = {ψ0, ψ1, ψ2, . . .}.The sequence Ψ is called a bound state with energy E if satisfies the following three condi-tions:

(1) The discrete version of the time independent Schrodinger equation

−ψn+1 − ψn−1 + Vnψn = Eψn,

(2) the boundary conditionψ0 = 0,

(3) and the normalization condition

N2 =

∞∑

n=0

|ψn|2 <∞.

153


This conditions are trivially satisfies if Ψ = {0, 0, 0, . . .} so we specifically exclude this case.(In fact Ψ is actually the eigenvector of an infinite matrix so this is just the condition thateigenvectors must be non-zero.)

Given an energy E, it is always possible to find a wave function Ψ to satisfy conditions(1) and (2). However for most energies E, none of these Ψ’s will be getting small for largen, so the normalization condition (3) will not be satisfied. There are only a discrete set ofenergies E for which a bound state satisfying all three conditions is satisfied. In other words,the energy E is quantized.

If E is one of the allowed energy values and Ψ is the corresponding bound state, then thenumbers |ψn|2/N2 are interpreted as the probabilities of finding an electron with energy E atthe nth site. These numbers add up to 1, consistent with the interpretation as probabilities.

Notice that if Ψ is a bound state with energy E, then so is any non-zero multiple aΨ ={aψ0, aψ1, aψ2, . . .}. Replacing Ψ with aΨ has no effect on the probabilities because Nchanges to aN , so the a’s cancel in |ψn|2/N2.

IV.4.2 Recursion relation

The discrete Schrodinger equation (1) together with the initial condition (2) is a recursionrelation that can be solved using the method we saw in the previous section. We have

[

ψn+1

ψn

]

=

[

Vn − E −11 0

] [

ψn

ψn−1

]

so if we set

xn =

[

ψn+1

ψn

]

and

A(z) =

[

z −11 0

]

then this impliesxn = A(Vn − E)A(Vn−1 −E) · · ·A(V1 − E)x0.

Condition (2) says that

x0 =

[

ψ1

0

]

,

since ψ0 = 0.

In fact, we may assume ψ1 = 1, since replacing Ψ with aΨ where a = 1/ψ1 results in abound state where this is true. Dividing by ψ1 is possible unless ψ1 = 0. But if ψ1 = 0 thenx0 = 0 and the recursion implies that every ψk = 0. This is not an acceptable bound state.Thus we may assume

x0 =

[

10

]

154


So far we are able to compute xn (and thus ψn) satisfying conditions (1) and (2) for anyvalues of V1, V2, . . .. Condition (3) is a statement about the large n behaviour of ψn. Thiscan be very difficult to determine, unless we know more about the values Vn.

IV.4.3 A potential with most values zero

We will consider the very simplest situation where V1 = −a and all the other Vn’s are equalto zero. Let us try to determine for what energies E a bound state exists.

In this situationxn = A(−E)n−1A(−a− E)x0 = A(−E)n−1x1

where

x1 = A(−a− E)x0 =

[

−a− E −11 0

] [

10

]

=

[

−(a+ E)0

]

The large n behavior of xn can be computed using the eigenvalues and eigenvectors of A(−E).Suppose they are λ1,v1 and λ2,v2. Then we expand

x1 = a1v1 + a2v2

and conclude that

xn = A(−E)n−1(a1v1 + a2v2) = a1λn−11 v1 + a2λ

n−12 v2

Keep in mind that all the quantities in this equation depend on E. Our goal is to choose Eso that the xn become small for large n.

Before computing the eigenvalues of A(−E), let’s note that det(A(−E)) = 1. This impliesthat λ1λ2 = 1

Suppose the eigenvalues are complex. Then, since A(−E) has real entries, they must becomplex conjugates. Thus λ2 = λ1 and 1 = λ1λ2 = λ1λ1 = |λ1|2. This means that λ1 andλ2 lie on the unit circle in the complex plane. In other words, λ1 = eiθ and λ2 = e−iθ forsome θ. This implies that λn−1

1 = ei(n−1)θ is also on the unit circle, and is not getting small.Similarly λn−1

2 is not getting small. So complex eigenvalues will never lead to bound states.In fact, complex eigenvalues correspond to scattering states, and the energy values for whicheigenvalues are complex are the energies at which the electron can move freely through thecrystal.

Suppose the eigenvalues are real. If |λ1| > 1 then |λ2| = 1/|λ1| < 1 and vice versa. So oneof the products λn−1

1 , λn−12 will be growing large, and one will be getting small. So the only

way that xn can be getting small is if the coefficient a1 or a2 sitting in front of the growingproduct is zero.

Now let us actually compute the eigenvalues. They are

λ =−E ±

√E2 − 4

2.

155


If −2 < E < 2 then the eigenvalues are complex, so there are no bound states. The interval[−2, 2] is the conduction band, where the electrons can move through the crystal.

If E = ±2 then there is only one eigenvalue, namely 1. In this case there actually is onlyone eigenvector, so our analysis doesn’t apply. However there are no bounds states in thiscase.

Now let us consider the case E < −2. Then the large eigenvalue is λ1 = (−E+√E2 − 4)/2

and the corresponding eigenvector is v1 =

[

−1E + λ1

]

. The small eigenvalue is λ2 = (−E −√E2 − 4)/2 and the corresponding eigenvector is v2 =

[

−1E + λ2

]

. We must now compute a1

and determine when it is zero. We have [v1|v2]

[

a1

a2

]

= x1. This is 2 × 2 matrix equation

that we can easily solve for

[

a1

a2

]

. A short calculation gives

a1 = (λ1 − λ2)−1(−(a+ E)(E + λ2) + 1).

Thus we see that a1 = 0 whenever

(a+ E)(E −√

E2 − 4) − 2 = 0

Let’s consider the case a = 5 and plot this function on the interval [−10,−2]. To see if itcrosses the axis, we also plot the function zero.

>N=500;

>E=linspace(-10,-2,N);

>ONE=ones(1,N);

>plot(E,(5*ONE+E).*(E-sqrt(E.^2 - 4*ONE)) - 2*ONE)

>hold on

>plot(E,zeros(1,N))

Here is the result

156


-20

0

20

40

60

80

100

-10 -9 -8 -7 -6 -5 -4 -3 -2

We can see that there is a single bound state in this interval, just below −5. In fact, thesolution is E = −5.2.

The case E > 2 is similar. This time we end up with

(a+ E)(E +√

E2 − 4) − 2 = 0

When a = 5 this never has a solution for E > 2. In fact the right side of this equation isbigger than (5 + 2)(2 + 0) − 2 = 12 and so can never equal zero.

In conclusion, if V1 = −5, and all the other Vn’s are zero, then there is exactly one boundstate with energy E = −5.2. Here is a diagram of the energy spectrum for this potential.

−4 −2 0 2−6 E

Bound state energy Conduction band

For the bound state energy of E = −5.2, the corresponding wave function Ψ, and thus theprobability that the electron is located at the nth lattice point can now also be computed.The evaluation of the infinite sum that gives the normalization constant N2 can be doneusing a geometric series.

157


IV.4.4 Conduction bands for a crystal

The atoms in a crystal are arranged in a periodic array. We can model a one dimensionalcrystal in the tight binding model by considering potential values that repeat a fixed pattern.Let’s focus on the case where the pattern is 1, 2, 3, 4 so that the potential values are

V1, V2, V3, . . . = 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, . . .

In this case, if we start with the formula

xn = A(Vn − E)A(Vn−1 − E) · · ·A(V1 − E)

[

10

]

we can group the matrices into groups of four. The product

T (E) = A(V4 − E)A(V3 − E)A(V2 − E)A(V1 −E)

is repeated so that

x4n = T (E)n[

10

]

Notice that the matrix T (E) has determinant 1 since it is a product of matrices withdeterminant 1. So, as above, the eigenvalues λ1 and λ2 are either real with λ2 = 1/λ1, orcomplex conjugates on the unit circle. As before, the conduction bands are the energies Efor which the eigenvalues of T (E) are complex conjugates. It turns out that this happensexactly when

|tr(T (E))| ≤ 2

To see this, start with the characteristic polynomial for T (E)

det(λI − T (E)) = λ2 − tr(T (E))λ + det(T (E))

(see homework problem). Since det(T (E)) = 1 the eigenvalues are given by

λ =tr(T (E)) ±

√

tr(T (E))2 − 4

2.

When |tr(T (E))| ≤ 2 the quantity under the square root sign is negative, and so the eigen-values have a non-zero imaginary part.

Let’s use MATLAB/Octave to plot the values of tr(T (E)) as a function of E. For conve-nience we first define a function that computes the matrices A(z). To to this we type thefollowing lines into a file called A.m in our working directory.

function A=A(Z)

A=[Z -1; 1 0];

end

158


Next we start with a range of E values and define another vector T that contains the corre-sponding values of tr(T (E)).

N=100;

E=linspace(-1,6,N);

T=[];

for e = E

T=[T trace(A(4-e)*A(3-e)*A(2-e)*A(1-e))];

end

Finally, we plot T against E. At the same time, we plot the constant functions E = 2 andE = −2.

plot(E,T)

hold on

plot(E,2*ones(1,N));

plot(E,-2*ones(1,N));

axis([-1,6,-10,10])

On the resulting picture the energies where T (E) lies between −2 and 2 have been highlighted.

−10

−5

0

5

10

−1 0 1 2 3 4 5 6

We see that there are four conduction bands for this crystal.

159


IV.5 Markov Chains



• write down the definition of a stochastic matrix and its properties

• explain why the probabilities in a random walk approach limiting values

• write down the stochastic matrix for a random walk and calculate the limiting proba-bilities.

• use stochastic matrices to solve practical Markov chain problems.

• write down the stochastic matrix associated with the Google Page rank algorithm fora given damping factor α, and compute the ranking of the sites for a specified internet.

• use the Metropolis algorithm to produce a stochastic matrix with a predeterminedlimiting probability distribution

IV.5.1 Random walk

In the diagram below there are three sites labelled 1, 2 and 3. Think of a walker movingfrom site to site. At each step the walker either stays at the same site, or moves to one ofthe other sites according to a set of fixed probabilities. The probability of moving to the ithsite from the jth site is denoted pi,j. These numbers satisfy

0 ≤ pi,j ≤ 1

because they are probabilities (0 means “no chance” and 1 means “for sure”). On thediagram they label the arrows indicating the relevant transitions. Since the walker has togo somewhere at each step the sum of all the probabilities leaving a given site must be one.Thus for every j,

∑

i

pi,j = 1

160

IV.5 Markov Chains

p3,3

1

23

p1,1

p2,2

p

p

p3,1

p1,3

p

p

2,3

3,2

2,1

1,2

Let xn,i be the probability that the walker is at site i after n steps. We collect theseprobabilities into a sequence of vectors called state vectors. Each state vector contains theprobabilities for the nth step in the walk.

xn =

xn,1

xn,2...

xn,k

The probability that the walker is at site i after n+1 steps can be calculated from probabilitiesfor the previous step. It is the sum over all sites of the probability that the walker was atthat site, times the probability of moving from that site to the ith site. Thus

xn+1,i =∑

j

pi,jxn,j

This can be written in matrix form as

xn = Pxn−1

where P = [pi,j]. Using this relation repeatedly we find

xn = Pnx0

where x0 contains the probabilities at the beginning of the walk.

The matrix P has two properties:

1. All entries of P are non-negative.

2. Each column of P sum to 1.

161


A matrix with these properties is called a stochastic matrix.

The goal is to determine where the walker is likely to be located after many steps. Inother words, we want to find the large n behaviour of xn = Pnx0. Let’s look at an example.Suppose there are three sites, the transition probabilities are given by

P =

0.5 0.2 0.10.4 0.2 0.80.1 0.6 0.1

and the walker starts at site 1 so that initial state vector is

x0 =

100

Now let’s use MATLAB/Octave to calculate the subsequent state vectors for n = 1, 10, 100, 1000.

>P=[.5 .2 .1; .4 .2 .8; .1 .6 .1];

>X0=[1; 0; 0];

>P*X0

ans =

0.50000

0.40000

0.10000

>P^10*X0

ans =

0.24007

0.43961

0.32032

>P^100*X0

ans =

0.24000

0.44000

0.32000

P^1000*X0

162

IV.5 Markov Chains

ans =

0.24000

0.44000

0.32000

The state vectors converge. Let’s see what happens if the initial vector is different, say withequal probabilities of being at the second and third sites.

>X0 = [0; 0.5; 0.5];

>P^100*X0

ans =

0.24000

0.44000

0.32000

The limit is the same. Of course, we know how to compute high powers of a matrix usingthe eigenvalues and eigenvectors. A little thought would lead us to suspect that P has aneigenvalue of 1 that is largest in absolute value, and that the corresponding eigenvector isthe limiting vector, up to a multiple. Let’s check

>eig(P)

ans =

1.00000

0.35826

-0.55826

>P*[0.24000; 0.44000; 0.32000]

ans =

0.24000

0.44000

0.32000

163


IV.5.2 Properties of stochastic matrices

The fact that the matrix P in the example above has an eigenvalue of 1 and an eigenvectorthat is a state vector is no accident. Any stochastic matrix P has the following properties:

(1) If x is a state vector, so is Px.

(2) P has an eigenvalue λ1 = 1.

(3) The corresponding eigenvector v1 has all non-negative entries.

(4) The other eigenvalues λi have |λi| ≤ 1

If P or some power P k has all positive entries (that is, no zero entries) then

(3’) The eigenvector v1 has all positive entries.

(4’) The other eigenvalues λi have |λi| < 1

(Since eigenvectors are only defined up to non-zero scalar multiples, strictly speaking, (3)and (3’) should say that after possibly multiplying v1 by −1 the entries are non-negativeor positive.) These properties explain the convergence properties of the state vectors of therandom walk. Suppose (3’) and (4’) hold and we expand the initial vector x0 in a basis ofeigenvectors. (Here we are assuming that P is diagonalizable, which is almost always true.)Then

x0 = c1v1 + c2v2 + · · · + ckvk

so thatxn = Pnx0 = c1λ

n1v1 + c2λ

n2v2 + · · · + ckλ

nkvk

Since λ1 = 1 and |λi| < 1 for i 6= 1 we find

limn→∞

xn = c1v1

Since each xn is a state vector, so is the limit c1v1. This allows us to compute c1 easily. Itis the reciprocal of the sum of the entries of v1. In particular, if we chose v1 to be a statevector then c1 = 1.

Now we will go through the properties above and explain why they are true

(1) P preserves state vectors:

Suppose x is a state vector, that is, x has non-negative entries which sum to 1. Then Px

has non-negative entries too, since all the entries of P are non-negative. Also

∑

i

(Px)i =∑

i

∑

j

Pi,jxj =∑

j

∑

i

Pi,jxj =∑

j

xj

∑

i

Pi,j =∑

j

xj = 1

Thus the entries of Px also sum to one, and Px is a state vector.

164

IV.5 Markov Chains

(2) P has an eigenvalue λ1 = 1

The key point here is that P and P T have the same eigenvalues. To see this recall thatdet(A) = det(AT ). This implies that

det(λI − P ) = det((λI − P )T ) = det(λIT − P T ) = det(λI − P T )

So P and P T have the same characteristic polynomial. Since the eigenvalues are the zeros ofthe characteristic polynomial, they must be the same for P and P T . (Notice that this doesnot say that the eigenvectors are the same.)

Since P has columns adding up to 1, P T has rows that add up to 1. This fact can beexpressed as the matrix equation

P

11...1

=

11...1

.

But this equation says that 1 is an eigenvalue for P T . Therefore 1 is an eigenvalue for P aswell.

(4) Other eigenvalues of P have modulus ≤ 1

To show that this is true, we use the 1-norm introduced at the beginning of the course.Recall that the norm ‖ · ‖1 is defined by

∥

∥

∥

x1

x2...xn

∥

∥

∥

1= |x1| + |x2| + · · · + |xn|.

Multiplication by P decreases the length of vectors if we use this norm to measure length.In other words

‖Px‖1 ≤ ‖x‖1

165


for any vector x. This follows from the calculation (almost the same as one above, and againusing the fact that columns of P are positive and sum to 1)

‖Px‖1 =∑

i

|(Px)i|

=∑

i

|(∑

j

Pi,jxj)|

≤∑

i

∑

j

Pi,j|xj |

=∑

j

∑

i

Pi,j|xj |

=∑

j

|xj |∑

i

Pi,j

=∑

j

|xj |

= ‖x‖1

Now suppose that λ is an eigenvalue, so that Pv = λv for some non-zero v. Then

‖λv‖1 = ‖Pv‖1

Since ‖λv‖1 = |λ|‖v‖1 and ‖Pv‖1 ≤ ‖v‖1 this implies

|λ|‖v‖1 ≤ ‖v‖1

Finally, since v is not zero, ‖v‖1 > 0. Therefore we can divide by ‖v‖1 to obtain

|λ| ≤ 1

(3) The eigenvector v1 (or some multiple of it) has all non-negative entries

We can give a partial proof of this, in the situation where the eigenvalues other than λ1 = 1obey the strict inequality |λi| < 1. In this case the power method implies that starting withany initial vector x0 the vectors Pnx0 converge to a multiple c1v1 of v1. If we choose theinitial vector x0 to have only positive entries, then every vector in the sequence Pnx0 has onlynon-negative entries. This implies that the limiting vector must have non-negative entries.

(3) and (4) vs. (3’) and (4’) and Pn with all positive entries

Saying that Pn has all positive entries means that there is a non-zero probability of movingbetween any two sites in n steps. The fact that in this case all the eigenvalues other thanλ1 = 1 obey the strict inequality |λi| < 1 follows from a famous theorem in linear algebracalled the Perron–Frobenius theorem.

Although we won’t be able to prove the Perron–Frobenius theorem, we can give someexamples to show that if the condition that Pn has all positive entries for some n is violated,then (3’) and (4’) need not hold.

166

IV.5 Markov Chains

The first example is the matrix

P =

[

0 11 0

]

This matrix represents a random walk with two sites that isn’t very random. Starting at siteone, the walker moves to site two with probability 1, and vice versa. The powers Pn of P

are equal to I or P depending on whether n is even or odd. So Pn

[

10

]

doesn’t converge: it

is equal to

[

10

]

for even n and

[

01

]

for odd n. The eigenvalues of P are easily computed to

be 1 and −1. They both have modulus one.

For the second example, consider a random walk where the sites can be divided into twosets A and B and the probability of going from any site in B to any site in A is zero. Inthis case the i, j entries Pn with the i site in A and the jth site in B are always zero. Also,applying P to any state vector drains probability from A to B without sending any back.This means that in the limit Pnx0 (that is, the eigenvector v1) will have zero entries for allsites in A. For example, consider a three site random walk (the very first example above)where there is no chance of ever going back to site 1. The matrix

P =

1/3 0 01/3 1/2 1/21/3 1/2 1/2

corresponds to such a walk. We can check

>P=[1/3 0 0;1/3 1/2 1/2; 1/3 1/2 1/2];

>[V D] = eig(P)

V =

0.00000 0.00000 0.81650

0.70711 0.70711 -0.40825

0.70711 -0.70711 -0.40825

D =

1.00000 0.00000 0.00000

0.00000 0.00000 0.00000

0.00000 0.00000 0.33333

The eigenvector corresponding to the eigenvalue 1 is

01/21/2

(after normalization to make it

a state vector).

167


IV.5.3 Google PageRank

I’m going to refer you to the excellent article by David Austin:http://www.ams.org/featurecolumn/archive/pagerank.html

Here are the main points:

The sites are web pages and connections are links between pages. The random walk is aweb surfer clicking links at random. The rank of a page is the probability that the surferwill land on that page in the limit of infinitely many clicks.

We assume that the surfer is equally likely to click on any link on a given page. In otherwords, we assign to each outgoing link a transition probability of

1/(total number of outgoing links for that page).

This rule doesn’t tell us what happens when the surfer lands on a page with no outgoinglinks (so-called dangling nodes). In this case, we assume that any page on the internet ischosen at random with equal probability.

The two rules above define a stochastic matrix

P = P1 + P2

where P1 contains the probabilities for the outgoing links and P2 contains the probabilitiesfor the dangling nodes. The matrix P1 is very sparse, since each web page has typically about10 outgoing links. This translates to 10 non-zero entries (out of about 2 billion) for eachcolumn of P1. The matrix P2 has a non-zero column for each dangling node. The entries ineach non-zero column are all the same, and equal to 1/N where N is the total number ofsites.

If x is a state vector, then P1x is easy to compute, because P1 is so sparse, and P2x isa vector with all entries equal to the total probability that the state vector x assigns todangling nodes, divided by N , the total number of sites.

We could try to use the matrix P to define the rank of a page, by taking the eigenvector v1

corresponding to the eigenvalue 1 of P and defining the rank of a page to be the entry in v1

corresponding to that page. There are problems with this, though. Because P has so manyzero entries, it is virtually guaranteed that P will have many eigenvalues with modulus 1 (orvery close to 1) so we can’t use the power method to compute v1. Moreover, there probablywill also be many web pages assigned a rank of zero.

To avoid these problems we choose a number α between 0 and 1 called the damping factor.We modify the behaviour of the surfer so that with probability α the rules correspondingto the matrix P above are followed, and with probability 1 − α the surfer picks a page atrandom. This behaviour is described by the stochastic matrix

S = (1 − α)Q+ αP

168

IV.5 Markov Chains

where Q is the matrix where each entry is equal to 1/N (N is the total number of sites.) Ifx is a state vector, then Qx is a vector with each entry equal to 1/N .

The value of α used in practice is about 0.85. The final ranking of a page can then bedefined to be the entry in v1 for S corresponding to that page. The matrix S has all non-zeroentries

IV.5.4 The Metropolis algorithm

So far we have concentrated on the situtation where a stochastic matrix P is given, andwe are interested in finding invariant distribution (that is, the eigenvector with eigenvalue1). Now we want to turn the situation around. Suppose we have a state vector π andwe want to find a stochastic matrix that has this vector as its invariant distribution. TheMetropolis algorithm, named after the mathematician and physicist Nicholas Metropolis,takes an arbitrary stochastic matrix P and modifies to produce a stochastic matrix Q thathas π as its invariant distribution.

This is useful in a situation where we have an enormous number of sites and some functionp that gives a non-negative value for each site. Suppose that there is one site where p isvery much larger than for any of the other sites, and that our goal is to find that site. Inother words, we have a discrete maximization problem. We assume that it is not difficult tocompute p for any given site, but the number of sites is too huge to simply compute p forevery site and see where it is the largest.

To solve this problem let’s assume that the sites are labeled by integers 1, . . . , N . Thevector p = [p1, . . . , pN ]T has non-negative entries, and our goal is to find the largest one. Wecan form a state vector π (in principle) by normalizing p, that is, π = p/

∑

i pi. Then thestate vector π gives a very large probability to the site we want to find. Now we can use theMetropolis algorithm to define a random walk with π as its invariant distribution. If we stepfrom site to site according to this random walk, chances are high that after a while we endup at the site where π is large.

In practice we don’t want to compute the sum∑

i pi in the denominator of the expressionfor π vector, since the number of terms is huge. An important feature of the Metropolisalgorithm is that this computation is not required.

You can more learn about the Metropolis algorithm in the article The Markov chain MonteCarlo revolution by Persi Diaconis at

http://www.ams.org/bull/2009-46-02/S0273-0979-08-01238-X/home.html.

In this article Diaconis presents an example where the Metropolis algorithm is used to solve asubstitution cipher, that is, a code where each letter in a message is replaced by another letter.The “sites” in this example are all permutations of a given string of letters and punctuation.The function p is defined by analyzing adjacent pairs of letters. Every adjacent pair of lettershas a certain probablility of occuring in an English text. Knowing these probablities is it

169


possible to construct a function p that is large on strings that are actually English. Here isthe output of a random walk that is attempting to maximizde this function.

IV.5.5 Description of the algorithm

To begin, notice that a square matrix P with non-negative entries is stochastic if P T1 = 1,where 1 = [1, 1, . . . , 1]T is the vector with entries all equal to 1. This is just another way ofsaying that the columns of P add up to 1.

Next, suppose we are given a state vector π = [π1, π2, . . . , πn]T . Form the diagonal matrixΠ that has these entries on the diagonal. Then, if P is a stochastic matrix, the condition

PΠ = ΠP T

implies that π is the invartiant distribution for P . To see this, notice that Π1 = π so applyingthe both sides of the equation to 1 yields Pπ = π.

This condition can be written as a collection of conditions on the components of P .

pi,jπj = πipj,i

Notice that for diagonal entries pi,i this condition is always true. So we may make any changeswe want to the diagonal entries without affecting this condition. For the off-diagonal entries(i 6= j) there is one equation for each pair pi,j, pj,i.

Here is how the Metropolis algorithm works. We start with a stochastic matrix P wherethese equations are not neccesarily true. For each off-diagonal pair pi,j, pj,i, of matrix entries,we make the equation above hold by by decreasing the value of the one of the entries, whileleaving the other entry alone. It is easy to see that the adjusted value will still be non-negative.

170

IV.5 Markov Chains

Let Q denote the adjusted matrix. Then QΠ = ΠQT as required, but Q is not neccesarilya stochastic matrix. However we have the freedom to change the diagonal entries of Q. Soset the diagonal entries of Q equal to

qi,i = 1 −∑

j 6=i

qj,i

Then the columns of Q add up to 1 as required. The only thing left to check is that theentries we just defined are non-negative. Since we have always made the adjustment so thatqi,j ≤ pi,j we have

qi,i ≥ 1 −∑

j 6=i

pj,i = pi,i ≥ 0

In practice we may not be presented with state vector π but with a positive vector p =[p1, p2, . . . , pn]T whose maximal component we want to find. Although in principle it is easyto obtain π as π=p/(

∑

i pi), in practice there may be so many terms in the sum that it isimpossible to compute. However notice that the crucial equation pi,jπj = πipj,i is true for πi

and πj if and only if it is true for pi and pj . Thus we may use the values of p instead.

Notice that if one of pi,j or pj,i is 0 then qi,j = qj,i = 0. So it is possible that this algorithmwould yield Q = I. This is indeed a stochastic matrix where every state vector is an invariantdistribution, but it is not very useful. To avoid examples like this, one could restrict the useof the Metropolis algorithm to matrices P where pi,j and pj,i are always either both zero orboth nonzero.

IV.5.6 An example

In this example we will use the Metropolis algorithm to try to maximize a function f(x, y)defined in the square −1 ≤ x ≤ 1 and −1 ≤ y ≤ 1. We put down a uniform (2N+1)×(2N+1)grid and take the resulting lattice points to be our sites. The initial stochastic matrix P isthe one that gives equal probabilities for travelling form a site to each neighbouring site. Weuse the Metropolis algorithm to modify this, obtaining a stochastic matrix Q whose invariantdistribution is proportional to the sampled values of f . We then use Q to produce a randompath that has a high probability of converging to the maximum of f .

To begin we write a MATLAB/Octave function f.m that defines a Gaussian function fwith a peak at (0, 0).

function f=f(x,y)

a = 10;

f = exp(-(a*x)^2-(a*y)^2);

end

171


Next we write a function p.m such that p(i,j,k,l,N) defines the matrix element forstochastic matrix P corresponding to the sites (i, j) and (k, l) when the grid size is (2N +1) × (2N + 1).

function p = p(i,j,k,l,N)

% [i,j] and [k,l] are two points on a grid with

% -N <= i,j,k,l <= N. The probabilities are those for

% a uniform random walk, taking into account the edges

% Note: with our convention, p([i,j],[k,l],N) is the

% probability of going from [k,l] to [i,j]

% Note: if i,j,k,l are out of range, then p=0

if( ([k,l]==[N,N] | [k,l]==[N,-N] | [k,l]==[-N,N] | [k,l]==[-N,-N])

& abs(i-k)+abs(j-l)==1)

% starting point [k,l] is a corner:

p = 1/2;

elseif( (k==N | k==-N | l==N | l==-N) & abs(i-k)+abs(j-l)==1)

% starting point [k,l] is on the edge:

p = 1/3;

elseif(abs(i-k)+abs(j-l)==1)

% starting point is inside

p = 1/4;

else

p = 0;

end

% handle out of range cases:

if(i<-N | i>N | j<-N | j>N | k<-N | k>N | l<-N | l>N )

p=0;

end

end

The next function q.m implements the Metropolis algorithm to give the matrix elementsof Q.

function qq = q(i,j,k,l,N)

if(i~=k | j~=l)

if(p(i,j,k,l,N)==0)

172

IV.5 Markov Chains

qq = 0;

elseif(p(i,j,k,l,N)*f(k/N,l/N) > f(i/N,j/N)*p(k,l,i,j,N))

qq = p(i,j,k,l,N)*f(i/N,j/N)/f(k/N,l/N);

else

qq = p(i,j,k,l,N);

end

else

qq = 1 - q(k+1,l,k,l,N) - q(k-1,l,k,l,N) - q(k,l+1,k,l,N) - q(k,l-1,k,l,N);

end

Finally, here are the commands that compute a random walk defined by Q

% set the grid size

N=50;

% set the number of iterations

niter=1000

% pick starting grid point [i,j] at random

k=discrete_rnd(1,[-N:N],ones(1,2*N+1)/(2*N+1));

l=discrete_rnd(1,[-N:N],ones(1,2*N+1)/(2*N+1));

% or start in a corner (if these are uncommented)

k=N;

l=N;

% initialize: X and Y contain the path

% F contains f values along the path

X=zeros(1,niter);

Y=zeros(1,niter);

F=zeros(1,niter);

% the main loop

for count=1:niter

% pick the direction to go in the grid,

% according to the probabilites in the stochastic matrix q

probs = [q(k,l+1,k,l,N),q(k+1,l,k,l,N),q(k,l-1,k,l,N),q(k-1,l,k,l,N),q(k,l,k,l,N)];

dn = discrete_rnd(1,[1,2,3,4,5],probs);

switch dn

case 1

l=l+1;

case 2

k=k+1;

case 3

173


l=l-1;

case 4

k=k-1;

end

% calculate X, Y and F values for the new grid point

X(count)=k/N;

Y(count)=l/N;

F(count)=f(k/N,l/N);

end

% plot the path

subplot(1,2,1);

plot(X,Y);

axis([-1,1,-1,1]);

axis equal

%plot the values of F along the path

subplot(1,2,2);

plot(F);

The resulting plot looks like

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000

174

IV.6 The Singular Value Decomposition




• explain what the singular value decomposition of a matrix is

• explain why a Hermitian matrix always has real eigenvalues and an orthonormal basisof eigenvectors.

• write down the relationship between the singular values of a matrix and its matrixnorm

IV.6.1 The matrix norm and eigenvalues

Recall that the matrix norm ‖D‖ of a diagonal matrix

D =

λ1 0 0 · · · 00 λ2 0 · · · 00 0 λ3 · · · 0...

......

. . ....

0 0 0 · · · λn

is the largest absolute value of a diagonal entry, that is, the largest value of |λi|. Since,for a diagonal matrix the diagonal entries λi are also the eigenvalues of D it is natural toconjecture that for any matrix A the matrix norm ‖A‖ is the largest absolute value of aneigenvalue of A.

Let’s test this conjecture on a 2 × 2 example.

A=[1 2; 3 4]

A =

1 2

3 4

eig(A)

ans =

-0.37228

5.37228

norm(A)

ans = 5.4650

175


Since 5.37228 6= 5.4650 we see that the conjecture is false. But before giving up on thisidea, let’s try one more time.

B=[1 3; 3 4]

B =

1 3

3 4

eig(B)

ans =

-0.85410

5.85410

norm(B)

ans = 5.8541

This time the norm and the largest absolute value of an eigenvalue are both equal to 5.8541

The difference between these two examples is that B is a Hermitian matrix (in fact, a realsymmetric matrix) while A is not. It turns out that for any Hermitian matrix (recall thismeans A∗ = A) the matrix norm is equal to the largest eigenvalue in absolute value. Thisis related to the fact that every Hermitian matrix A can be diagonalized by a unitary U :A = UDU∗ with D diagonal with the eigenvalues on the diagonal and U unitary. The pointis that multiplying A by a unitary matrix doesn’t change the norm. Thus D has the samenorm as UD which has the same norm as UDU∗ = A. But the norm of D is the largesteigenvalue in absolute value.

The singular value is a decompostion similar to this that is valid for any matrix. For anarbitrary matrix it takes the form

A = UΣV ∗

where U and V are unitary and Σ is a diagonal matrix with positive entries on the diagonal.These positve numbers are called the singular values of A. As we will see it is the largestsingluar value that is equal to the matrix norm. The matrix A does not have to be a squarematrix. In this case, the unitary matrices U and V are not only different, but have differentsizes. The matrix Σ has the same dimensions as A.

IV.6.2 Diagonalizing Hermitian matrices

We already know that

• The eigenvalues of any Hermetian matrix are real

176


• Eigenvectors belonging to distinct eigenvalues are orthogonal

This means that if all the eigenvalues of a Hermetian matrix A are distinct, we can choosethe correpsonding eigenvectors to form an orthonormal basis. The matrix U with with thisorthonormal basis as its columns is unitary (that is, U∗ = U−1) and diagonalizes A. ThusU∗AU = U−1AU = D, where D is a diagonal matrix with eigenvalues of A as diagonalentries.

The goal of this section is to show that any Hermitian matrix A can be diagonalized by aunitary. In other words, there exists a unitary matrix U such that U∗AU = U−1AU = D,where D is a diagonal matrix. This equation is a compact way of saying that columns of Uform a basis of eigenvectors, with eigenvalues given by the diagonal entries of D. Since U isunitary, this basis of eigenvectors is orthonormal.

The argument we present works whether or not A has repeated eigenvalues and also givesa new proof of the fact that the eigenvalues are real.

To begin we show that that if A is any n×n matrix (not neccesarily Hermitian), then thereexists a unitary matrix U such that U∗AU is upper triangular. To see this start with anyeigenvector v of A with eigenvalue λ. (Every matrix has at least one eigenvalue.) Normalize v

so that ‖v‖ = 1. Now choose and orthonormal basis q2, . . . ,qn for the orthogonal complementof the subspace spanned by v, so that v,q2, . . . ,qn form an orthonormal basis for all of C

n.Form the matrix U1 with these vectors as columns. Then, using ∗ to denotes a number that

177


is not neccesarily 0, we have

U∗1AU1 =

vT

qT1...

qTn

A[

v

∣

∣

∣q2

∣

∣

∣ . . .∣

∣

∣qn

]

=

vT

qT1...

qTn

[

λv∣

∣

∣Aq2

∣

∣

∣. . .∣

∣

∣Aqn

]

=

λ‖v‖2 ∗ . . . ∗λ〈q2,v〉 ∗ . . . ∗

......

......

λ〈qn,v〉 ∗ . . . ∗

=

λ ∗ . . . ∗0 ∗ . . . ∗...

......

...0 ∗ . . . ∗

=

λ ∗ . . . ∗0... A2

0

.

Here A2 is an (n− 1) × (n− 1) matrix.

Repeating the same procedure with A2 we can construct an (n−1)×(n−1) unitary matrixU2 with

U∗2A2U2 =

λ2 ∗ . . . ∗0... A3

0

.

Now we use the(n− 1) × (n− 1) unitary matrix U2 to construct an n× n unitary matrix

U2 =

1 0 . . . 00... U2

0

178


Then it is not hard to see that

U∗2U

∗1AU1U2 =

λ ∗ ∗ ∗ ∗0 λ2 ∗ . . . ∗0 0...

... A3

0 0

.

Continuing in this way, we find unitary matrices U3, U4, . . . , Un−1 so that

U∗n−1 · · ·U∗

2U∗1AU1U2 · · ·Un−1 =

λ ∗ ∗ ∗ ∗0 λ2 ∗ . . . ∗0 0...

.... . .

0 0 λn

.

Define U = U1U2 · · ·Un−1. Then U is unitary, since the product of unitary matricesis unitary, and U∗ = U∗

n−1 · · ·U∗2U

∗1 . Thus the equation above says that U∗AU is upper

triangular, that is,

U∗AU =

λ ∗ ∗ ∗ ∗0 λ2 ∗ . . . ∗0 0...

.... . .

0 0 λn

Notice that if we take the adjoint of this equation, we get

U∗A∗U =

λ 0 0 0 0

∗ λ2 0 . . . 0∗ ∗...

.... . .

∗ ∗ λn

Now lets return to the case where A is Hermitian. Then A∗ = A so that the matricesappearing in the previous two equations are equal. Thus

λ ∗ ∗ ∗ ∗0 λ2 ∗ . . . ∗0 0...

.... . .

0 0 λn

=

λ 0 0 0 0

∗ λ2 0 . . . 0∗ ∗...

.... . .

∗ ∗ λn

179


This implies that all the entries denoted ∗ must actually be zero. This also shows that λi = λi

for every i. In other words, Hermitian matrices can be diagonalized by a unitary matrix, andall the eigenvalues are real.

IV.6.3 The singular value decomposition

Let A be an m× n matrix. Then A can be factored as

A = UΣV ∗

where U is an n×n unitary matrix, V is an n×n unitary matrix and Σ is an n×m diagonalmatrix with non-negative entries. (If n 6= m the diagonal of Σ starts at the top left cornerand runs into one of the sides of the matrix at the other end.) Here is an example.

1 11 −10 1

=

1/√

3 −1/√

2 −1/√

6

−1/√

3 −1/√

2 1/√

6

1/√

3 0 2√

6

√3 0

0√

20 0

[

0 1−1 0

]

The positive diagonal entries of Σ are called the singular values of A.

Here is how to construct the singular value decomposition in the special case where A isan invertible square (n× n) matrix. In this case U Σ and V will all be n× n as well.

To begin, notice that A∗A is Hermitian, since (A∗A)∗ = A∗A∗∗ = A∗A), Moreover, the(real) eigenvalues are all positive. To see this, suppose that A∗Av = λv. Then, takingthe inner product of both sides with v, we obtain 〈v, A∗Av〉 = 〈A∗v, Av〉 = λ〈v,v〉. Theeigenvector v is by definition not the zero vector, and because we have assumed that A isinvertible Av is not zero either. Thus λ = ‖Av‖2/‖v‖2 > 0.

Write the positive eigenvalues of A∗A as σ21 , σ

22 , . . . , σ

2n and let Σ be the matrix with

σ1, σ2, . . . , σn on the diagonal. Notice that Σ is invertible, and Σ2 has the eigenvalues ofA∗A on the diagonal. Since A∗A is Hermitian, we can find a unitary matrix V such thatA∗A = V Σ2V ∗. Define U = AV Σ−1. Then U is unitary, since U∗U = Σ−1V ∗A∗AV Σ−1 =Σ−1Σ2Σ−1 = I.

With these definitionUΣV ∗ = AV Σ−1ΣV ∗ = AV V ∗ = A

so we have produced the singular value decomposition.

Notice that U is in fact the unitary matrix that diagonalizes AA∗ since UΣ2U∗ = AV Σ−1Σ2Σ−1V ∗A∗ =AV V ∗A∗ = AA∗. Moreover, this formula shows that the eigenvalues of AA∗ are the same asthose of A∗A, since they are the diagonal elements of Σ2.

In MATLAB/Octave, the singular value decomposition is computed using [U,S,V]=svd(A).Let’s try this on the example above:

180


[U S V] = svd([1 1;1 -1;0 1])

U =

0.57735 -0.70711 -0.40825

-0.57735 -0.70711 0.40825

0.57735 0.00000 0.81650

S =

1.73205 0.00000

0.00000 1.41421

0.00000 0.00000

V =

0 -1

1 -0

IV.6.4 The matrix norm and singular values

The matrix norm of a matrix is the value of the largest singular value. This follows from thefact that multiplying a matrix by a unitary matrix doesn’t change its matrix norm. So, ifA = UΣV ∗, then the matrix norm ‖A‖ is the same as the matrix norm ‖Σ‖. But the Σ is adiagonal matrix with the singular values of A along the diagonal. Thus the matrix norm ofΣ is the largest singular value of A.

Actually, the Hilbert Schmidt norm is also related to the singular values. Recall that fora matrix with real entries ‖A‖HS =

√

tr(ATA). For a matrix with complex entries, thedefinition is ‖A‖HS =

√

tr(A∗A). Now, if A = UΣV ∗ is the singular value decompositionof A, then A∗A = V Σ2V ∗. Thus tr(A∗A) = tr(V Σ2V ∗) = tr(V ∗V Σ2) = tr(Σ2). Herewe used that tr(AB) = tr(BA) for any two matrices A and B and that V ∗V = I. Thus

‖A‖HS =√

∑

i σ2i , where σi are the singular values.

Notice that if we define the vector of singular values σ = [σ1, . . . , σn]T then ‖A‖ = ‖σ‖∞and ‖A‖HS = ‖σ‖2. By taking other vector norms for σ, like the 1-norm, or the p-norm forother values of p, we can define a whole new family of norms for matrices. These norms arecalled Schatten p-norms and are useful in some situations.

IV.6.5 Applications of the SVD

The idea is that setting all but the largest singular values of matrix to zero gives an ap-proximation of a matrix that has low rank. More explanation will have to wait for the nextversion of these notes!

181

Documents

Lecture Notes Math 307