Lecture notesrachh/LecNotes.pdf · Lecture notes December 16, 2015 1 Sep18 1.1 Vectorspaces Let’sstartofwithsomeexamplesofspaceswehavebeenworkingwithsofar. R;R2:::Rn.R isthespaceofrealnumbers

Lecture notes

December 16, 2015

1 Sep 18

1.1 Vector spacesLet’s start of with some examples of spaces we have been working with so far.

R,R2 . . .Rn. R is the space of real numbers. R2 is the space of vectors with 2 real components. R2 can also be thoughtof the x− y plane. R3 is the space of vectors with 3 real components. The physical space around us is a subset of R3. Andin general Rn is the space of vectors with n real components.

What are the operationns we can perform of vectors? Addition and multiplication by a scalar. Given two vectors[a1a2

]and

[b1b2

]in R2, we can add them to get

[a1 + b1a2 + b2

]or we can multiply by a scalar c, to get

[ca1ca2

]and we still get a

vector in R2. This leads us to the definiton of what it means to be a vector space.A vector space V is a collection of vectors or a space of vectors which satisfies the following two properties:

• If v and w in V, then v + w ∈ V

• If v in V, then cv ∈ V.

1.2 Subspaces of vector spacesAre there subsets of R2 (x-y plane) which can be regarded as spaces in their own right? By spaces, I mean that the abovetwo properties must be satisfied for it. So the question, does there exist a subset U ⊂ R2 which satisfies

• If v and w in U , then v + w ∈ U

• If v in U , then cv ∈ U .

The answer is yes and we will call these spaces subspaces. In our example, U will be a subspace of R2. Vectors lying on anyline passing through the origin satisfy this property. Consider the line x− 2y = 0. Every point on that line is a vector of the

form[

2cc

]where c is a real number. Check for your self that the above two properties are satisfied.

Does this mean that every subset of R2 gets to be a subspace? No! Try and think of a few examples yourself.How about R3? How do the different subspaces of R3 look like? It can either be, the whole of R3, or a plane passing

through the origin, or a line passing through the origin, or just the origin.In general, given a vector space V and U ⊂ V , then U is a subspace of V if

• If v and w in U , then v + w ∈ U

• If v in U , then cv ∈ U .

Exercises:Convince yourself that the set of 3× 2 matrices form a vector space, where addition is defined as entry wise addition and

scalar multiplication by entry wise multiplication by the scalar.What are the different possible subspace of R4

Can you think of an example of a subset of R3 which is not a vector space.If U and V are subspaces, then is U ∪ V still a subspace. Can you give a counter examples in R2. If U ∩ V always a

subspace?Is the space of invertible 3× 3 matrices a subspace of the space of 3× 3 matrices?Harder: The space of all continuous functions C0 also forms a vector space. In fact it is an infinite dimensional vector

space. Prove that the definitions are satisfied

1

Even harder: Prove that the space of all continuous periodic functions, i.e satisfying f (x+ 1) = f (x) forms a subspaceof C0

Which of the following are subspaces. What rule gets violated?

a) The plane of vectors b =

b1b2b3

such that b1 = 0?

b) The plane of vectors b such that b1 = 1?c) The vectors b such that b2b3 = 0

d) All linear combinations of

101

and

021

?e) The plane of vectors that satisfy b1 + 2b2 + b3 = 0f) The plane of vectors that satisfy b1 + 2b2 + 3b3 = 6

1.3 Column space of a matrixConsider the matrix

A =

1 2 30 1 10 4 40 1 1

(1)

The columns of A are vectors in R4. Can we form a vector space which contains the columns of A. Do the columns bythemselves form a subspace? No! Neither of the definition of a subspace is satisfied. We’d at least need to include multiplesof the columns and their sums. In fact, we’d need to include all possible linear combinations of the columns of A. This iswhat we will define to be the column space of A and will be denoted by C (A). Given a bunch of vectors, that is the simplestway to create a subspace which contains those vectors. In our example, all vectors of the form

v = c1

1000

+ c2

2141

+ c3

3141

(2)

form C (A)How big is this space? Is it all of R4? A simple counting argument shows that it should not be possible. R4 has 4

independent numbers you can choose, here you can choose 3 corresponding to weights c1, c2, c3 and thus morally speaking, itshould not be possible.

So the question we want to answer is when can we solve Ax = b,

A

x1x2x3

=

b1b2b3b4

(3)

From the discussion above, it is clear that we cannot solve it for all b. The above equation can only be solved if and onlyif b ∈ C (A).

Exercises:Does the set of b for which you cannot solve Ax = b form a subspace?Does the set of solutions x to Ax = b for a fixed b form a subspace?

2

2 Sep 21

2.1 Null space of a matrixClaim: The solutions x to Ax = 0 form a vector space. Verify this yourself.The column space of A tells us for which b, can we solve the problem Ax = b. The null space on the other hand tells us howmany such solutions are there when they exist. Suppose we can find an xn 6= 0 such that Axn = 0 and xp solves Axp = b,then A (xn + xp) = b, which says that xp + xn is also a solution.

Recollect the two ways of creating subspaces:1) Given vectors v1, v2, v3, . . . v17 ∈ V, all linear combinations of the form

∑17i=1 civi form a subspace of V

2) Solutions to homogeneous linear algebraic equations Ax = 0.

2.2 Solving Ax=02.2.1 Echelon form

To understand all solutions to Ax = b, we first need to understand all solutions to Ax = 0. So how do we compute thesolutions to Ax = 0 when A is an m× n matrix. Let us look at the extension of Gaussian elimination to the general case. Infact, it is exactly the steps of Gaussian elimination, we just won’t stop when the algorithm fails us.

Algorithm 1 Computing the echelon form of a matrix row exchanges not required (Similar to getting to U in Gaussianelimination)Step 1: Set pivot row to 1Step 2: Find the first column which has a non zero entry either in the pivot row position or below it. This will be a pivotcolumnStep 3: Select a non zero entry in the pivot column (note the non zero entry should be from a row below the pivot position)Step 4: Use row eliminations to create zeros in all positions below itStep 5: pivot row = pivot row + 1Step 6: Repeat steps 2-5 until you run out of rows or columns

Algorithm in action for an example matrix

A =

1 2 3 22 4 9 71 2 −3 −4

(4)

Step 1: Pivot row is 1Step 2,3: First column has non zero entry in pivot row position

A =

1 2 3 22 4 9 71 2 −3 −4

(5)

Step 4: Eliminate 1 2 3 22 4 9 71 2 −3 −4

R2→R2−2R1−−−−−−−−→

1 2 3 20 0 3 31 2 −3 −4

R3→R3−R1−−−−−−−→

1 2 3 20 0 3 30 0 −6 −6

Step 5: Pivot row 2Step 6: Back to step 2Step 2: No pivot in column 2, move to column 3Step 3: Column 3, has non zero entry in pivot row position, so column 3 is our new pivot column 1 2 3 2

0 0 3 30 0 −6 −6

(6)

Step 4: Eliminate 1 2 3 2

0 0 3 30 0 −6 −6

R3→R3+2R2−−−−−−−−→

1 2 3 2

0 0 3 30 0 0 0

= U (7)

3

Step 5: Pivot row = 3Step 6: Back to step 2Step 2: No more pivot columnsEnd of algorithm

Pivot columns: Columns in which we found the pivots: In our example column 1 and column 3Free columns: Columns in which we could not find the pivots: Column 2 and column 4 in our caseU is our echelon matrix (note that it is still upper triangular)What have we done in all of this in matrix language?

E32E31E21A = U (8)

Question 1: What are E21, E31 and E32?Question 2: Is A = LU? where,

L =

1 0 02 1 01 −1 1

= E−121 E−131 E

−132 (9)

An alternate way to ask the same question is that is L still the matrix of multipliers?Question: Is U the same as the upper triangular matrix if A is square and invertible?

2.2.2 Reduced row echelon form

Algorithm 2 Computing the Reduced row echelon form from U

Step 1: Find the right most pivot columnStep 2: Scale the pivot entry to 1Step 3: Perform row operations to eliminate all entries above the pivotStep 4: Find the first pivot column to the left of the current pivot columnStep 5: Repeat steps 2-4 until you are done with all pivots

The same example continuedStep 1: The right most pivot column in column 3Step 2: Rescale 1 2 3 2

0 0 3 30 0 0 0

R2→ R23−−−−−→

1 2 3 2

0 0 1 10 0 0 0

(10)

Step 3: Eliminate 1 2 3 2

0 0 1 10 0 0 0

R1→R1−3R2−−−−−−−−→

1 2 0 −1

0 0 1 10 0 0 0

(11)

Step 4: Next pivot column is column 1Step 5: Done with all pivots as we do not need to eliminate anything using pivot in column 1So our reduced row echelon form is

R =

1 2 0 −1

0 0 1 10 0 0 0

(12)

Question 3: Let

A =

1 3 22 9 71 −3 −3

. (13)

What is the reduced row echelon form of A?Question 4: What would R be if A was a non singular square matrix? Note that this is a generalization of question 3.Question: How are the solutions of Ax = 0, Ux = 0 and Rx = 0 related?

4

2.2.3 Finding the null vectors

Algorithm 3 Finding the null vectors xStep 1: Divide the variables into two groups. If column i is a pivot column then xi is a pivot variable else it is a free variableStep 2: Choose arbitrary values for the free variables and solve for the pivot variables (If there are k free variables then thereare k dimensional space of null vectors

Working it out in our example. The two equations in Rx = 0 read

x3 + x4 = 0 x1 + 2x2 − x4 = 0 (14)

Step 1: x1 and x3 are our pivot variables. x2 and x4 are our free variablesStep 2: Set x2 = 1 and x4 = 0, to compute one of the null vectors

x3 = 0 x1 = −2 (15)

xn1=

−2100

(16)

Step 2: Continued.. Set x2 = 0 and x4 = 1, to compute the second null vector

x3 = −1 x1 = 1 (17)

xn2=

10−11

(18)

The general null vector is then given by

xn = c1xn1+ c2xn2

= c1

−2100

+ c2

10−11

(19)

5

3 Sep 23

3.1 Null space - contdSuppose there are more columns than rows, n > m. If we follow the algorithm, we will run out of pivots after m columns, sothere have to be at least n−m free columns. That means that the null space has at least n−m distinct vectors. The nullspace has a dimension of the number of free columns.

3.2 Ax = bSo now we return to the question of when can we solve Ax = b? Let us return to our example and see what happens to ourright hand side. Now, we need to work with the augmented matrix. So suppose in our example we have a right hand sideb = (b1, b2, b3) 1 2 3 2 b1

2 4 9 7 b21 2 −3 −4 b3

R2→R2−2R1−−−−−−−−→

1 2 3 2 b10 0 3 3 b2 − 2b11 2 −3 −4 b3

R3→R3−R1−−−−−−−→

1 2 3 2 b10 0 3 3 b2 − 2b10 0 −6 −6 b3 − b1

1 2 3 2 b1

0 0 3 3 b2 − 2b10 0 −6 −6 b3 − b1

R3→R3+2R2−−−−−−−−→

1 2 3 2 b1

0 0 3 3 b2 − 2b10 0 0 0 b3 + 2b2 − 5b1

(20)

In our problem, we knew to begin with that there are two independent columns. The column space of A is a plane in 3dimensions. So we should not be able to solve Ax = b when b lies in a plane. For the last equation to be consistent, we needthat b3 +2b2−5b1 = 0. This is precisely the plane that b needs to be in for us to be able to solve the problem. Here is anotherway to see that the columns of A also lie in the plane. The general equation of that plane is given by −5x+ 2y + z = 0.

Exercise: Verify that all the columns of A satisfy −5x+ 2y + z = 0.Let us pick one such vector then. Suppose b = (1, 2, 1). Similar to gaussian elimination, we can do a backsolve to find x.

Again we can use the same trick as solving for the null vectors. We are free to choose the free variables. In our problem x2and x4 are the free variables. If we set x2 = c1 and x4 = c2, we get

3x3 + 3x4 = 0 x1 + 2x2 + 3x3 + 2x4 = 1 .

Solving for x1 and x3, we get, x3 = −c2 and x1 = 1− 2c1 + c2. So the general solution is given by1000

+ c1

−2100

+ c2

−10−11

.We see that the null vectors actually automatically show up. So we did not really need to compute the null vectors. Thisalso shows us that the row reduced echelon form was actually not required. We could have just used the echelon form U toboth compute the null vectors, to figure out when we can solve Ax = b and to find the general solution in that case.

Question: Is the space of solutions a subspace?The number of pivot variables is called the rank of the matrix. This is probably the most important thing you need to

know about a matrix.Another deep observation is the fact that the number of non zero rows in U or R is the same as the number of pivot columns,which means to say that the number of independent rows is exactly the same as the number of independent columns.Question: What is the maximum rank that a matrix can have, m, n, max (m,n), min (m,n) ?Question: What is the rank of a 5× 5 identity matrix?Question: What is the rank of a 7× 7 invertible matrix?

6

4 Sep 25

4.1 Linear independenceDefinition 1 Linear independence: A set of k in Rn, v1, v2, . . . vk are linearly independent if and only if

k∑i=1

civi = 0

implies that c1 = c2 . . . ck = 0

Let us look at a few examples in R3. Example 1. The zero vector is linearly dependent.Example 2. The columns of the identity matrix are linearly independent.Example 3. The columns of L in the LU decomposition are always linearly independent.Question. Let us look at the columns of A defined above, are they linearly independent or dependent?

Alternate way to look at the definition of linear independence in terms of null space of a matrix. Stack them up intocolumns and form a matrix A, if N (A). Just for simplicity let us consider 3 vectors v1, v2 and v3. The statement of linearindependence says,

c1v1 + c2v2 + c3v3 = 0 .

. Let A be the matrix whose columns are v1, v2 and v3.

A =[v1 v2 v3

]Then the above statement is the same as

A

c1c2c3

=

000

.Thus, if we find N (A) =

{0}, then that would imply that the vectors are linearly independent.

So which of the columns of A are linearly independent? Which rows of the echelon form of A are linearly indepedent?

4.2 Spanning a subspaceRecollect the definition of column space of A, we defined it as all linear combinations of columns of A. This subspace isdefined to be span of the columns of A. There is a distinction between the subspace and the vectors spanning this subspace.Let us return to our favorite matrix A.

A =

1 2 3 22 4 9 71 2 −3 −4

(21)

In our example, we saw that there are only two linearly indepdent columns. So the column space of A is a plane in R3. Thisis a subspace in R3. This subspace is also spanned by columns 1,3,4. This subspace is also spanned by columns 1,3. Thissubspace is also spanned by columns 1,4. The vectors spanning a subspace is different from the subspace it self. There canbe many collections of vectors that can span the same subspace. If it is a plane in 3 dimensions, we need at least 2 vectorsto span it. If it is a 10 dimensional plane in R17, then we need at least 10 vectors to span it.

Let us consider another example, consider the plane in R4 which is all linear combinations of1110

,

1100

. (22)

This defines a plane in R4. This subspace is also spanned by0010

,

1100

. (23)

This subspace is also spanned by 0010

,

1110

. (24)

7

This subspace is also spanned by 0010

,

1110

,

1100

. (25)

4.3 Basis for a vector space/subspaceDefinition 2 Basis: A basis for a vector space V is a collection of vectors which satisfy:

• The vectors are linearly independent.

• They span the vector space V.

In the example above, the first three collections form a basis, the last one doesn’t.In our matrix example A, all columns of A do not form a basis for the column space of A. Columns 1,3 form a basis for thecolumn space of A. Columns 1,4 form a basis for the column space of A. Columns 1,3,4 do not form a basis for the columnspace of A.

So what is the big deal about a basis? Given a vector v in a vector space V, then there is only one unique way to expressv as a combination of the basis vectors of the vector space.

Every vector space V has infinitely many bases.

4.4 DimensionDefinition 3 Dimension: The dimension of a vector space V is the number of basis vectors for that vector space.

Even though there are infintely many bases given a vector space, the point is that all of them have the same number ofvectors.

Any linearly indepdent set in V can be extended to a basis, by adding more vectors if necessary.Any spanning set in V can be reduced to a basis, by discarding vectors if nececssary.Back to our two examples. Let us get back to the column space of our matrix A. If we were just given the first column

in that vector space, we can form a basis, by adding in column 3 or column 4. If we were given all the columns of A, we canform a basis by eliminating columns 2 and 4.Question: What is the dimension of Rn?Question: What is the dimension of the column space of A in our example?Question: If the columns of A are linearly indepdent, then Ax = b has exactly one solution for every b?Question: If the columns of A are linearly indepdent and A is square, then Ax = b has exactly one solution for every b?Question: A 5 by 7 matrix never has linearly indepdent columns.Question: True or false: If the columns of a matrix are depdent, so are the rows?Question: True or false: The column space of a 2 by 2 matrix is the same as its row space?Question: Suppose v1, v2, v3 . . . v6 are 6 vectors in R4, then a) Those vectors (do)(do not)(might not) span R4.b) Those vectors (are)(are not)(might be) linearly indepdent.c) Any four of those vectors (are)(are not)(might be) a basis of R4.d) IF those vectors are the columns of A, then Ax = b (has)(does not have)(might not have) a solution.

8

5 Sep 28

5.1 Transpose of a matrixDefinition 4 Transpose of a matrix: If A is an m× n matrix, then the transpose of A denoted by AT is the n×m matrixformed by interchanging its rows and columns. Thus aTij = aji.

Example 1 Transpose

A =

1 2 3 22 4 9 71 2 −3 −4

, AT =

1 2 12 4 23 9 −32 7 −4

(26)

Example 2

A =

1 3 22 −7 5−1 2 12

, AT =

1 2 −13 −7 22 5 12

(27)

Question 1 What is the transpose of a column vector?

Question 2 What is the following quantity, a number, a vector or a matrix? yTAx, where A is a m× n matrix, x is an ncolumn vector and y is an m column vector?

5.2 Four fundamental subspaces of a matrix• The column space of A denoted by C (A) which has dimension r.

• The null space of A denoted by N (A) which has dimension n− r.

• The row space of A is the column space of AT . It is C(AT), and it is spanned by the columns of AT which is also

the set of vectors spanned by the rows of A.

• The left null space of A is the null space of AT . It contains all vectors Y such that, AT y = 0 or equivalently, yTA = 0.

5.2.1 Column space of A

Back to our favorite example. We saw the column space of A and the null space of A. The column space of A was spannedby

v1 =

121

v2 =

39−3

How does the computer figure out which are the redundant columns. One way is to reduce it to the row echelon form U , andthe columns that had pivots are the linearly indepdent columns. We saw that columns 1,3 were the pivot columns, so thecorresponding columns of A form a basis for the column space of A. Note that the corresponding columns of U also form abasis for the column space of U . But it is NOT true that the columns 1,3 of U form a basis for the column space of A.

Question 3 Verify the above statement.

Let us consider another 2× 2 matrix with a non trivial non null space as well to demonstrate what is going on. Let B bethe matrix given by

B =

[1 22 4

]The column space of B is spanned by

v1 =

[12

](28)

9

5.2.2 Null space of A

We saw that the null space of A was spanned by the vectors,

v1 =

−2100

, v2 =

10−11

(29)

Question 4 Verify that these vectors are linearly independent. In fact, these form a basis for the null space of A as well.We can verify that every null vector can be expressed as a linear combination of these vectors.

How about the matrix B. It is easy to see that the null space is spanned by

v1 =

[−21

](30)

Theorem 1 rank of A + dimension of null space of A = n = number of columns of A

5.2.3 Row space of A or column space of AT

Theorem 2 Row rank = column rank

We can deduce the row space of A by thinking about the column space of AT . But we can also get that information fromour row echelon form U . For our matrix A, above, the row space is spanned by the rows of A. However, from the abovetheorem, we know that they do not form a basis! Since we would need only two vectors to form a basis. So what are thosevectors. We could just take the first two vectors (as they are not multiples of each other). This is a small system and we canspot this by the eye, but how do we automate this process. One strategy is to consider the column space of AT and applythe same strategy as the column space of A.

Here is another strategy, The row space of A is exactly the same as the row space of U . This is because the rows of Uare just linear combinations of the rows of A and hence they span the same subspace. So a set of basis vectors for the rowspace of A is the non zero rows of the echelon form of the matrix.

By strategy 1, the basis vectors are

v1 =

1232

v2 =

2497

(31)

By strategy 2, the basis vectors are

v1 =

1232

v2 =

0033

(32)

For the matrix B, a basis for the row space is given by

v1 =

[12

](33)

10

6 Sep 30

6.1 Left null space of AEven though not evident why it would be important to know what the left null space, but soon we shall find out why? Butjust for the sake of completion let us discuss ways to compute it.

Question 5 What is the dimension of N(AT)?

Computing the left null space. Firstly, we observe that computing yTA = 0 is equivalent to computing AT y = 0 andthen just taking the transpose. So strategy 1 as before will be to just look at the transpose, and compute the null space byreducing it to the row echelon form.

However, we can infer the null space of A from the process of Gaussian elimination or in case of rectangular matricesreducing it to the row echelon form. In the general case, we have PA = LU which also means that L−1PA = U . If the rowechelon form has a row of zeros, then thinking about multiplication on the left as a linear combination of its rows, we getthat the rows in L−1P corresponding to the zero rows of U form a basis for the left null space of A.

For the matrix A, the elimination matrices we used were

E21 =

1 0 0−2 1 00 0 1

, E31 =

1 0 00 1 0−1 0 1

, E32 =

1 0 00 1 00 2 1

. (34)

L−1 = E32E31E21, then

L−1 =

1 0 0−2 1 0−5 2 1

Moreover, if yT is the last row of L−1, then yTA is the last row of U which is the zero row in this case.

For the matrix B, the left null vector is given by

yT = (−2, 1) (35)

6.2 Existence of inversesDefinition 5 Left inverse The left inverse of a matrix A is a matrix B such that BA = I

Definition 6 Right inverse The right inverse of a matrix A is a matrix C such that AC = I

Question 6 If A is m× n, what is the dimension of I for the left inverse? What is its dimension for the right inverse?

The question we want to ask is when can we find inverses for rectangular matrices and the rank of a matrix is what givesus this information. And the answer to that question is when the rank is the max it can be. We know that the row rank isthe same as the column rank and is a measure of the number of linearly indepdent rows and columns respectively. Thus, wenecessarily know that the rank r satisfies r ≤ m and r ≤ n. If r = m and m ≤ n then A has a right inverse and if r = n andn ≤ m then A has a left inverse B.

In the event that A has a right inverse, then Ax = b has at least solution always. This is because the column rank isalso m and hence the columns of A span all of Rm. This means that there always exists a solution to Ax = b. However thesolution is guaranteed to be non unique as A necessarily has a null space.

In the event that A has a left inverse, then Ax = b has at most one solution for every b. The subspace of Rm for whichit has a solution is precisely when b ∈ C (A). In this case there can exist at most one solution to Ax = b and that A has noright null vector.

There are simple formulae for the left and right inverses given by

B =(ATA

)−1AT C = AT

(AAT

)−1Example 3 Consider the simple 2× 3 matrix of rank 2.

A =

[2 0 00 3 0

]Question 7 Which inverse can we expect will exist?

Question 8 What can we say about solutions to Ax = b? Will there always exist a solution, will it be unqiue?

11

Question 9 Is the right inverse that we compute unique?

Question 10 What would go wrong if we try to find a left inverse here?

All possible right inverses of this matrix are given by

C =

12 00 1

3c31 c32

The specific right inverse that we end up picking with the formula AT

(AAT

)−1 is given by

AT(AAT

)−1=

2 00 30 0

[ 14 00 1

9

]=

12 00 1

30 0

.

Question 11 What can you say about the right inverses of AT ? Are they unique? Which one does our formula pick out?Can we solve ATx = b for all b?

The situation for a square n× n matrix is rather simpler. Here if the rows are linearly indepdent i.e. the row rank is n,then the column rank is also n and the matrix A has both a left inverse and a right inverse and both of them are the samematrix. In this case, we have existence of solutions for all b and the solution is unique. Here are a bunch of conditions whichare equivalent to the invertibility of a square matrix.

• The columns span Rn, so Ax = b has at least one solution for every b

• The columns are independent. The null space contains only the zero vector

• The rows span Rn

• The rows are linearly independent

• Gaussian elimination can be completed with n pivots

• The determinant of A is not zero

The list can be extended further. If any of the above statements are true, then all of them have to be true.

12

7 Oct 2

7.1 Graphs and networksWe now look at the interpretation of the four subspaces in context of a “ graph”. A graph is a collection of vertices V andedges E between them. A directional graph has a directionality to these edges E. Consider a graph which is 4 nodes and 5edges. The edge-nodes incidence matrix is a 5× 4 matrix with a row for every edge. If the edge goes from node j to node k,then that row has −1 in column j and 1 in column k. Consider a sample edge-node incidence matrix

A =

−1 1 0 0−1 0 1 00 −1 1 00 0 −1 10 −1 0 1

Applications: Google’s page rank algorithm, solving for currents and voltages in complicated electrical circuits, hydrodynam-ics ...

7.1.1 Interpretation of various spaces in the context of electrostatics

The matrix A: If x represents the potential at each node then Ax returns the the potential difference between the edges.The matrix AT : If y represents the current through each edge then AT y measures the net flow of current through eachnode.

7.1.2 Null space of A and the row space of A

We are looking for solutions of Ax = 0. If we think of x to be the potentials at each of the nodes, then Ax measures thepotential difference between the edges. So given the potential differences across the edges b, can we find the potentials xuniquely which have those potential differences across the edges. The answer is obviously no, because if (x1, x2, x3, x4)

T

satisfied Ax = b, then so does the (x1 + c, x2 + c, x3 + c, x4 + c)T . Raising all the potentials by the same amount does not

change the potential difference across edges. So the null space of the matrix minimally contains the subspace spanned by(1, 1, 1, 1)

T . Turns out that the null space isn’t any bigger.

Question 12 What is the rank of our matrix A?

Question 13 What is the subspace corresponding to the row space of A?

The equations AT y = b ends up expressing Kirchoff’s current law. If y1 is the current in edge 1, then it is leaving node 1 andentering node 2. Thus, the sum of currents into all edges and leaving all edges for AT y is 0 and if we are to be able to solveAT y = b, then b also better satisfy that condition. Thus the column space of AT contains all vectors whose components sumup to 0. This is not a surprise as we know that the row space of A is orthogonal to the null space of A.

7.1.3 Column space of A and the null space of AT

Can we solve Ax = b for all b? In every closed circuit the answer is NO. Let us return to the interpretation of Ax. Axrepresents the potential difference across each edge. If a collection of edges form a loop, then it must be the case that thesum of the potential differences between them add up to zero. Let us look at the graph we have. In our case, edges 1, 3 and−2 form a loop. The nodes forming this loop are nodes 1,2 and 3. Thus when we add up rows 1 and 3 and subtract row 2 ofAx we should get 0 as they correspond to a cyclic set of potential differences. The same must be true of any combination ofrows which form a loop. Since this is the necessary condition for solvability of Ax = b, we must have that all columns of Aalso satisfy this condition. In the language of electrostatics, the above observation corresponds to the Kirchoff’s voltage law,which states that the sum of potential differences around a loop must be 0.

Question 14 What is the dimension of the null space of AT ?

Question 15 From the above argument, can we directly find a basis for the null vectors of AT ?

Physical significanc of the left null space. The left null space corresponds to a set of currents that do not “pile up”.

13

7.1.4 Solving an arbitrary network with resistors, current sources and batteries

All of this discussion is interesting, but it also has applications. Imagine now a circuit of resistors, batteries and currentsources. We can use the incidence matrix and its transpose discussed above to solve for all currents and voltages. Considerthe set up in example 1 in the text. In order to be able to solve the system uniquely, we need to ground one of the potentials.Let x4 be the potential that is grounded. In electrostatics, a resistor by definition is an object which induces a potential dropof IR where I is the current flowing through the resistor and R is its resistance. The resistance is a material property of theobject and the current is usually driven by an external source.

Convince yourself that Ohm’s law corresponding to potential difference across each edge in this case is given by Ry+Ax = bwhere y is the vector of currents and x is the vector of potentials at the nodes. Here R is a diagonal matrix correspondingto resistances on each edge given by

R =

R1

R2

R3

R4

R5

To close the system, we write down the Kirchoff’s law corresponding to the currents in and currents out at each of the nodeswhich is given by AT y = f . Here b corresponds to the set of batteries on each edge and f corresponds to the set of currentsources at each nodes. In block form, we can write down the system of equations as[

R AAT 0

] [yx

]=

[b

f

]We can perform the equivalent of Gaussian elimination on blocks, we can eliminate y from the lower set of equations by

subtracting ATR−1 times “Row 1” from “Row 2” to get the upper triangular matrix given by[R A0 −ATR−1A

] [yx

]=

[b

f −ATR−1b

]And as long as we ground one of the potentials, it is guaranteed that ATR−1A is always invertible and we can always

solve the system. Well, we expected that from physics anyway and that the solution would also be unique.

14

8 Oct 5

8.1 Gaussian eliminationThings to remember: elimination algorithm, what are elimination matrices, when does gaussian eliminationwork, how do we get to the LU decomposition from the elimination process?

Example 4

A =

1 2 −12 3 −20 1 3

(36)

1 2 −12 3 −20 1 3

R2→R2−2R1−−−−−−−−→

1 2 −10 −1 00 1 3

R3→R3+R2−−−−−−−→

1 2 −10 −1 00 0 3

The elimination/elementary matrices and the resultant upper triangular matrix in the above example are given by

E2,1 =

1 0 0−2 1 00 0 1

, E3,2 =

1 0 00 1 00 1 1

, U =

1 2 −10 −1 00 0 3

L is then just the matrix of multipliers used

L =

1 0 02 1 00 −1 1

.and

A = LU (37)

To solve Ax = b using Gaussian elimination, we carry out the elimination process on the augmented matrix. Once wehave U , we can just use back substitution to solve for x.On the other hand if we are given A = LU and are asked to solve for Ax = b, then we let Ux = y, solve Ly = b for y usingforwatd subsitution and once we have y, we can solve for x using Ux = y using backward subsitution.

8.2 Inverses and transposesThings to remember: How to compute an inverse using Gauss Jordan algorithm for square matrices, inverseof products of matrices, definition of a transpose, transposes of products, vTu is a number and uvT is amatrix, different ways of interpreting the product of two matrices.The columns of AB are linear combinations of the columns of A, so the column space of AB is still spanned by the columnsof A (Note that it may not be a basis). The rows of AB are linear combinations of the rows of B, so the row space of AB isspanned by the rows of B (This may still not be a basis).

Example 5

A =

1 24 52 7

[ 3 0 31 1 2

](38)

Basis for the row space of A is given by [3 0 3

],[

1 1 2]

(39)

This happens because multiplication by a matrix on the left gives a linear combination of rows of matrix on the right. Tomake it more precise, the first row of the product is given by 1

[3 0 3

]+ 2

[1 1 2

], the second row is given by

4[

3 0 3]

+ 5[

1 1 2]and so on. By similar reasonsing, the column space of A is spanned by 1

42

, 2

57

(40)

8.3 Vector spaces and subspacesThings to remember: Definition of a vector space, definition of a subspace of a vector space.Examples of all possible subspaces of R3 are the 0 vector, any line passing through the origin, a plane passing through theorigin and all of R3.

15

8.4 Linear indepedence, span, basis, dimensionThings to remember: Span of a vectors in Rm form a subspace of Rm, definition of linear independence,difference between a collection of spanning vectors and a basis and the defintion of dimension of a subspace,there are infinitely many collection of vectors which span the same subspace, there are infinitely manycollection of vectors that form a basis for a subspace, however the number of vectors that form a basis for asubspace is fixed and that is defined to be the dimension of the subspace

Given a collection of vectors v1, v2, . . . vn in Rm, one can find out if they are linearly independent/dependent by formingthe matrix A whose columns are v1, v2 . . . vn and solve for a null vector of Ax = 0.There are many different collection of vectors whose span constitutes the same subspace. For example the 2-D plane in R4

corresponding to x3 = 0, x4 = 0, is spanned by 1000

,

0100

(41)

and is also spanned by 1100

,

0100

(42)

and is also spanned by 1000

,

0100

,

1100

(43)

In the above example, the first two cases, the set of spanning vectors were linearly indepdent, so they form a basis of thesubspace as well. However, the third collection of vectors even though they span the subspace do not form a basis as they arenot linearly indepedent. The dimension of the subspace is 2 as every collection of vectors which forms a basis of the subspacewill have 2 vectors.

Hard exercise: Can you think of every possible basis of this subspace?

8.5 Four fundamental subspacesThings to remember: How to compute all four subspaces in the process of reducing the matrix to its echelonform, the two important theorems, row rank = column rank and rank + nullity = number of columns, theability to write down a basis for all the four subspaces, conditions for a right inverse to exist, conditions fora left inverse to exist, what does the relation between rank,number of rows and number of columns tell youabout solutions of Ax = b, how can you deduce the rank of the matrix and relation between number of rowsand columns from properties of solutions of Ax = b

The process of reducing a matrix to its echelon form gives us all the infromation of bases for all subspaces that we need.Let us discuss that in the context of our favorite example as usual.

A =

1 2 3 22 4 9 71 2 −3 −4

R2→R2−2R1−−−−−−−−→

1 2 3 20 0 3 31 2 −3 −4

R3→R3−R1−−−−−−−→

1 2 3 20 0 3 30 0 −6 −6

1 2 3 2

0 0 3 30 0 −6 −6

R3→R3+2R2−−−−−−−−→

1 2 3 2

0 0 3 30 0 0 0

= U (44)

The rank of the matrix is the number of pivot columns which is 2 in our case. The row rank is the same as the columnrank. A basis for the column space of A is given by the columns of the original matrix corresponding to the pivot columns.For our matrix, these are columns 1 and 3, 1

21

, 3

9−3

When we perform elimination, rows are still linear combinations of the rows. So the span of the rows of U is necessarily

a subset of the span of the rows of U . However, if we can find the right number of linearly independent rows in U , we would

16

have found a basis for the row space of A or equivalently the column space of AT . That will always be the case. Thus thenon zero rows of U form a basis for the row space of A. In our problem, the row space is spanned by[

1 2 3 2],[

0 0 3 3]

(45)

The dimensionality of the left and right null space can be found out using the rank nullity theorem. For the matrix A,the dimension of the left null space is 1 and that of the right null space is 2. A basis for the right null space can be computedusing the following algorithm. Firstly, we observe that solutions of Ax = 0 are the same as the solutions of Ux = 0. Also, inthe process of reducing A to U , we had a bunch of pivot variables and a bunch of free variables. Here, variables x2 and x4were the free variables. The null vectors can be computed by assigning arbitrary values to the free variables and solving forthe pivot variables in Ux = 0. To be guaranteed to have a basis we can use the following strategy. Set x2 = 1 and x4 = 0 tocompute the first null vector and set x2 = 0 and x4 = 1 to compute the second null vector. The null vectors are given by

−2100

,

10−11

To compute the left null vectors, we carefully look at the process of elimination again. First observe that we still have

A = LU where L is still the matrix of multipliers used and also L = E−121 E−131 E

−132 . 1

−2 1−5 2 1

A = U

and also

A =

12 11 −2 1

UIf we look at the first equation in this set and think about the row picture of matrix multiplication, we see that that the

last row of the equation says that −5 times row 1 of A + 2 times row 2 of A + row 3 of A = 0 which is equivalent to sayingthat

[−5 2 1

]is in the left null space of A. In the general case, if the last k rows of U were 0, then the last k rows of

the matrix L−1 would form a basis for the left null space of A.Connections to solutions of Ax = b. Connection between non uniqueness and existence of a null vector. Suppose you knewthat there was b for which there was more than one solution then the matrix necessarily has a right null vector.

Connection between non-existence and existence of a left null vector. Suppose there was a b for which Ax = b had nosolution. This means that the column space of A is not Rm if A is m×n and since the left null space is perpendicular to thecolumn space of A, we know that the matrix has a non trivial left null space.

If the column rank of the matrix A is m, then we know that a solution exists for all b. In this case, the matrix also hasfull row rank and we are guaranteed the existence of a right inverse given by the formula AT (AAT )−1.

If the columns of A are linearly indepdent then the matrix has rank n. In this case, we are guaranteed the existence of aleft inverse, given by the formula (ATA)−1AT .

Both the left inverse and the right inverse are non unique, but these inverses are the “best” inverses in some sense.

17

9 Oct 9

9.1 OrthogonalityRecollect that a basis is a collection of linearly independent vectors which span a given subspace. The great thing about abasis is that given any vector v in the subspace, there is a unique collection of c = (c1, c2 . . . cl)

T such that v =∑i civi where

vi are the basis vector. Finding this decomposition is equivalent to solving Ac = b where A is the matrix whose columns arevi. The fact that this is unique follows from the fact that the columns of A are linearly indepdent and has full column rank,thus from our discussion before, A has no null space there exists a unique solution c whenever b is in the column space of A.Thus every vector can be described in terms of its components along a basis. Thus to describe any vector in 12 dimensionalsubspace in 17 dimensions, you need its components along 12 basis components and not all of its 17 components.

Usually, when we write a vector in Rn, we are expanding it in the standard basis. For example in R3, vectors can bethought of as its expansion in the standard basis given by v1

v2v3

= v1

100

+ v2

010

+ v3

001

The one thing we notice about the standard basis is its orthogonality. By orthogonality, we mean that the different basis

vectors are perpendicular to each other. We know how to measure angle in 2D, but how do we generalize the idea in higherdimensions. For this we turn to the concept of inner products. Remember that yTx which is the dot product of vectors yand x is also given by |x| |y| cos (θ) where θ is the “angle” between the two vectors. Thus, we will declare two vectors to beorthogonal or perpendicular if yTx = 0.

Question 16 Show that a collection of mutually orthogonal vectors are linearly independent.

9.2 Orthogonal subspacesTwo subspaces U and V are said to be orthogonal if every vector u ∈ U is perpendicular to v ∈ V.

Example 6 The z axis (subspace spanned by (0, 0, 1)T ) is perpendicular to the x-y plane.

Example 7 Let V be the subspac4e spanned by (1, 0, 0, 0)T and (1, 1, 0, 0)

T and U be the subspace spanned by (0, 0, 3, 2)T and

(0, 0, 2, 3)T . Then U is orthogonal to V.

Example 8 The row space of a matrix is perpendicular is orthogonal to the null space. Here is the proof. Suppose a vectorv is in the row space, then by definition, there exists a vector x such that v = ATx. Let y be in the null space of A, thenAy = 0. Then vT y = xTAy = 0.

Example 9 The column space of a matrix is orthogonal to the left null space proof. Similar to the reasoning above.

We note that not only is the left null space orthogonal to the column space of A, but every vector orthogonal to thecolumn space of A must be in the left null space of A. So the left null space is the collection of all possible orthogonal vectorsto the column space of A.

Definition 7 Orthogonal complement: Given a subspace V of Rn, the space of all vectors orthogonal to V is called theorthogonal complement of V denoted by V⊥.

Question 17 If the dimension of V ⊆ Rn is r, then what is the dimension of the V⊥?

Let us make a distinction here, Consider the z axis and the x axis in R3. Both of these subpsaces are orthogonal to eachother. But the x axis is not the orthogonal complement of the z axis. The orthogonal complement is the x− y plane.

It is not just the case that the row space is orthogonal to the null space of a matrix or the column space of a matrix isorthogonal to the left null space. In fact, the row space is the orthogonal complement of the null space of the matrix andvice vrsa. Similarly for the column space.

Let us return to our favorite matrix

A =

1 2 3 22 4 9 71 2 −3 −4

The basis for the row space is

1232

,

0033

18

A basis for the null space is −2100

,

10−11

We see clearly that both vectors in the row space are perpendicular to both vectors in the left null space.Similarly, a basis for the column space of A is given by 1

21

, 3

9−3

And finally, a basis for the left null space of A is given by −5

21

And as we expected, the left null space is orthogonal to the column space of A.Finally, if we draw the euqivalent of the 2×2 example, we’d seen before, we get to see that the row space of A gets mapped

to the column space of A. By this we mean the following. Every vector x ∈ Rn can be decomposed into two components, avector in the null space of A given by xn and a vector in the row space of A, say xr. The matrix A sends xn to 0 and xrto some vector in the row space. In the end, the matrix ends up sending the whole r dimensional row space to the whole rdimensional column space of A. In fact, if we interpreted A as a linear operator from the row space to the column space,then it is always invertible in that sense.

19

10 Oct 12 - Matrices as linear transformationsSo far, we’ve been thinking about matrices as taking on input a vector and on output giving another vector. Alternatively,one can think about matrices as a function which transforms Rn → Rm. If it is a square matrix then the matrix maps Rninto itself.

Example 10 The identity matrix does nothing. It sends x → x. Instead if we have cI, then the matrix just stretches thespace by a factor c. It takes in any vector x and returns cx.

10.1 RotationsExample 11 A rotation matrix rotates the whole space around the origin. Here is an example of matrix that rotates spaceby π/2,

A =

[0 −11 0

].

Example 12 The matrix which rotates all of R2 by an angle θ is given by

Aθ =

[cos (θ) − sin (θ)sin (θ) cos (θ)

].

Question 18 Verify that A−θ is the inverse of Aθ

Question 19 Verify that AθAφ = AφAθ = Aθ+φ (Note that verifying this allows us to automatically verify the first exercisein this section.)

10.2 ProjectionsGiven a subspace U of a vector space V , a projection is a linear transformation (Matrix) which maps every vector v ∈ V toa vector u ∈ U which minimizes the distance between u and v. Let us denote that projection by P . Thus for any vector vwhich is already contained in U , the projection does nothing to it, ie Pv = v for all v ∈ U . Moreover, it is also known thatto minimize the distance between v and u, we need that (Pv − v) ⊥ U .

10.2.1 Projection onto a line

Using the definition above, let us consider the simple case of the projection operator along a line in Rn whose direction vectoris given by a. Equivalently, a basis vector for the subspace is given by a. Then to find the projection of x onto this subspace,we first note that Px = cxa where cx is the unknown constant to be determined. Secondly, we use the fact that Px− x ⊥ a,to get cx = aT x

aT a. This also tells us that P is the matrix given by P = aaT

aaT.

Question 20 The projection operator is independent of the choice of the basis. Suppose in our example above, we decidedto use ca as a basis vector of our subspace instead of a. Verify that the projection matrix does not change in either case.

Question 21 The projection operator is not invertible. For our example above, find a vector in the null space of the operatorP .

Question 22 Find a general description of the null space of the operator P (Think orthogonal complements)

Question 23 What is the dimension of the null space of the operator P?

10.3 Thinking of matrices of linear transformationsIf we know that action of a matrix on all vectors of a basis for the space, then we know the action of the matrix for all vectorsin the space. Suppose v1, v2 . . . vn form a basis for the vector space V and w1, w2 . . . wm are a basis for the vector space W.Suppose A is a linear transformation from V → W. Moreover suppose we know that Avi =

∑mj aj,iwj . Then the linear

transformation A is encoded by the n×m matrix whose entries are aj,i.We saw earlier that every vector in a vector space can be unqiuely represented as a linear combination of the basis vectors.

Thus, once we know the action of the transformation on the basis vectors, combining the fact above with the linearity of theoperator, we know its action for all vectors in the space. The matrix encoding just takes our output and represents it as aunique decomposition of the basis vectors of the output space W.

Now using this fact, we return to the invertibility of a matrix from the row space of a matrix to the column space of amatrix. We will do this in steps in the form of a collection of exercises. Consider the following 3× 3

A =

1 2 32 4 91 2 −3

20

Question 24 First show that the matrix is not invertible.

Question 25 What is the dimension of the row space and the column space of the matrix

Question 26 Find a basis for the row space of the matrix (write it as column vectors)

Question 27 Find a basis for the column space of the matrix

Given these two bases for the row and column space of the matrix A and using the idea above, we construct a new matrixA which corresponds to a mapping from the row space of the matrix to the column space of a matrix. Let the two basisvectors computed be r1 and r2. Then any vector in the row space of our matrix is given by c1r1 + c2r2. The matrix A isthen a mapping from c1, c2 to the column space of our matrix. To compute the matrix A, proceed with the following steps.

Question 28 Compute Ar1 = v1. Express v1 in terms of the basis vectors of the column space c1 and c2. That is findconstants α1,1 and α2,1 such that v1 = α1,1c1 + α2,1c2.

Question 29 Compute Ar2 = v2. Express v2 in terms of the basis vectors of the column space c1 and c2. That is findconstants α1,2 and α2,2 such that v2 = α1,2c1 + α2,2c2.

Question 30 Form the matrix A given by

A =

[α1,1 α1,2

α2,1 α2,2

]. Now consider any vector in the row space r. Compute Ar. Now express r in terms of the basis vectors r1 and r2 as

β1r1+β2r2. Now using the matrix A, compute new coefficients given by γ =

[γ1γ2

]= A

[β1β2

]. Verify that γ1c1+γ2c2 = Ar.

Thus the above question shows that A is encoding the action from A from the row space to the column space of A.

Question 31 Verify that A is invertible.

21

11 Oct 14 - Least squares and projectionIn some applications such as model parameter estimation, signal denoising/data smoothing, optimal input design, estimatingthe mean and covariance of a least square estimator, we need to solve an over determined system of equations which typicallyhave full rank. An overdetermined system of equations is where the number of equations is greater than the number ofunknowns in which case we are not necessarily guaranteed a solution. Let us look at the problem of model parameterestimation in a simple case. Suppose we are given data (xi, yi) from some experimental setup. Ideally, the physics of theproblem for some reason said that y must be a linear function of x, however due to the experimental setup, there is somenoise in the data. Thus yi = mxi + c+ εi. We want to determine m, c which minimizes the residue of the system

e =

N∑i=1

(yi −mxi − c)2

This can be achieved by finding the best solution tox1 1x2 1...

...xN 1

[mc

]=

y1y2...yN

Just like in the projecting onto the line case, if we manage to project onto the column space of the matrix A, then the

error i.e b − Ax must be minimized in norm as we discussed. Thus, finding a m, c to minimize the error e is equivalent tofinding the best projection of b on to the column space of A.

Just as the case of the line, we figure out the projection of b on to the column space of A by enforcing that the columnspace of A is perpendicular to the error e = b − Ax. One way to ensure that a vector is orthogonal to a subspace is byensuring that it is orthogonal to a basis of the subspace. Thus in our problem, we can enforce aTi

(b−Ax

)= 0 for each

column i. In our case there are two columns, x1x2...xN

,

11...1

Enforcing the orthogonality condition is equivalent to

[x1, x2, . . . , xN ](b−Ax

)= 0 (46)

[1, 1, . . . , 1](b−Ax

)= 0 (47)

Stacking these equations together, we getAT(b−Ax

)= 0

This is not too surprising. Let us connect it with the fact that the left null space of A is the orthogonal complement ofthe column space of A. We know that the error e = b−Ax is orthogonal to the column space of A. Thus, it must lie in theorthogonal complement of the column space of A and hence must be in the left null space of A and we must have

AT(b−Ax

)= 0 .

Both of these correspond to the same equations for x given by

ATAx = AT b

which we will call the normal equations. This is a much smaller system the original system in our example was N ×2 system,where as this system is 2 × 2. The next natural question to ask is why should we still expect to be able to find such an x.Why would we expect the above system to be invertible. We said earlier that if A has full column rank then ATA has fullrank and is invertible. To show that, we will show that both A and ATA have the same null space. Clearly if Ax = 0 thenATAx = 0. For the other direction, let us assume that ATAx = 0, then taking the dot product with xT , we get

xTATAx = 0 =⇒ ‖Ax‖2 = 0 =⇒ Ax = 0 .

Thus, if A has full column rank, the null space of A is just the 0 vector and by the argument above, the null space ofATA is also just the 0 vector. The procedure of being able to find a minimizer does not really depend on the columns of Abeing full rank but the argument in that case is slightly more technical.

22

Solving for x in equation (11) we get x =(ATA

)−1AT b which is the best estimate for minimizing the error. On the other

hand, the projection of b on to the column space of A is given by Ax = A(ATA

)−1AT b and the operator that projects any

vector on to the column space of A is then given by A(ATA

)−1AT .

At no point did we need to restrict our basis functions to be linear. For example in smoothing of data, we are looking forcoefficients ci such that c0 +c1t+c2t

2 . . .+cmtm is the best polynomial approximating our data. In applications in chemistry,

it is usually sums of exponentials ,i.e., find ci such that c1e−λ1t + c2e−λ2t is the best double exponential approximating our

data. However, it should be noted that in the given form, we cannot have λ1 and λ2 to be our unknowns for which we oursolving for. Least square works when the problem is linear in the unknown coefficients we are solving for. In order to solvethe λ1, λ2 problem, we can still use least squares but a different version of it.

23

12 Oct 16 - Orthogonal bases and gram-schmidtIn the previous class, we saw how to solve the least square problem using the normal equations. Given Ax = b where Ahas full column rank, we can find the solution x which minimizes the error ‖Ax − b‖2 by solving the normal equationsATAx = AT b. On the computer, this is not the best strategy to solve the problem not in terms of computational complexitybut rather in terms of numerical stability.

The idea here will not be not to go to the normal equations but instead form an orthogonal basis for the column space ofA. Recall that a basis for a vector space is a collection of linearly independent vectors which span the subspace which is notnecessarily unique. Let v1, v2 . . . vk be our collection of basis vectors. A basis will be called orthogonal if all the vectors aremutually perpendicular i.e. vectors vi and vj are in the basis, then vTi vj = 0 for all i 6= j. The basis will further be calledorthonormal if vTi vi = 1. Let Q be the matrix whose columns are vi and thus the column space of Q will be the same as thespan of the basis vectors. If the basis is orthonormal then, QTQ = I. A matrix will be called an orthogoanl matrix (will beused interachangably with orthonormal matrix) if its columns are mutually orthogonal and are norm 1. For such a matrix,its transpose is its left inverse.

We’ve already seen two classes of orthgonoal matrices.

Example 13 Rotation matrices are orthogonal:

Q =

[cos(θ) − sin(θ)sin(θ) cos(θ)

], QT = Q−1 =

[cos(θ) sin(θ)− sin(θ) cos(θ)

]Example 14 Permutation matrices are also orthogonal:

P =

0 1 00 0 11 0 0

, PT = P−1 =

0 0 11 0 00 1 0

In fact, the most general Q is the product of a rotation and a reflection.

For the rest of this section, we will assume that q1, q2 . . . qn are column vectors of our orthonormal basis. We saw beforethat rotations and reflections preserve length. In fact, it is also the case that orthogonal matrices preserve length. In fact,something stronger is also true. Not only does it preserve the length of a vector, it also preserves the angle between twovectors. Mathematically, ‖Qx‖ = ‖x‖ and the angle between Qx and Qy is the same as the angle between x and y.

‖Qx‖2 = (Qx)T

(Qx) = xTQTQx = xTx = ‖x‖2

. Since Q preserves length, for it to preserve angle as well, we just need to verify that the dot product between Qx and Qyis the same as the dot product between x and y.

(Qx)TQy = xTQTQy = xT y

Given a vector b in a subspace, one way to express b in terms of its basis vectors is by solving the system Ax = b wherethe columns of A are the basis vectors. For an orthonormal basis, we still need to solve the same system, but we know aformula for the left inverse. Qx = b =⇒ x = QT b. Another way to express the same argument is the following, we areinterested in computing x1, x2 . . . xn, all scalars such that b = x1q1 + x2q2 . . . xnqn. Taking the inner product with qi andusing orthonormality, i.e qTi qj = 0 if i 6= j and qTi qi = 1, we get

qTi bi = x1qTi q1 + x2q

Ti q2 . . .+ xiq

Ti qi + . . .+ xnq

Ti qn

which is equivalent to xi = qTi b. This equation is same as reading off the ith component in QT b.Remark: The rows of a square orthonormal matrix are also orthonormal.

12.1 Least squares with orthogonal matricesWe first note that if Q is a rectangular matrix m > n (Q: why can’t we have n > m ) then QTQ which is n×n is the identityand QQT need not necessarily be the identity. We will discover its interpretation soon.

Suppose, we want to solve Qx = b where the system is over determined. Just as before, we can proceed to the normalequations and multiply both sides by QT . We then get QTQx = QT b. Since QT is the left inverse of Q, we have QTQ = Iand that x = QT b. Thus, we instantaneously are able to get the solution to our least square problem. The projection onthe subspace spanned by the columns of Q is then given by Qx = QQT b. Thus, QQT represents the projection on to thesubspace spanned by the columns of Q. WE could have derived both the best estimate and the projection matrix from thediscussions of last class as x =

(ATA

)−1AT b and P = A

(ATA

)−1A by plugging in Q for A. Using QTQ = I, we end up as

the same formulae as above. The action of P on the columns of Q is the identity matrix and it annihilates in every directionperpendicular to the column space of Q.

24

Let us look demonstrate this with an example. Suppose q1 = (1, 0, 0)T and q2 = (0, 1, 0)

T . The column space of Q thenis the x− y plane. Q is the matrix given by

Q =

1 00 10 0

QTQ =

[1 0 00 1 0

] 1 00 10 0

=

[1 00 1

]QQT is the projection on to the column space of Q and thus in this case on to the x− y plane.

QQT b =

b1b20

QQT =

1 00 10 0

[ 1 0 00 1 0

]=

1 0 00 1 00 0 0

And this is what I meant by the action of QQT being that of the identity along the columns of Q and annihilating thecomponents perpendicular to it. It preserves b1 and b2 and kills b3.

Note that for a general projection matrix as well, the action is that of the identity for the component along the columnspace and annihilation along the orthogonal complement. However it is not true that the action is that of P along each columnof the basis vector is that of the identity, it is the collective action along the component in the subspace. For orthogonalmatrices, it is a stronger statement and the action of QQT along each column of Q is the identity and hence also the collectiveaction along the subspace spanned by the columns of Q.

12.2 Finding an orthonormal basis for a given subspace - Gram schmidt algorithm and theQR factorization

Given a basis, v1, v2 . . . vn for a subspace, we are interested in forming an orthonormal basis for the subspace q1, q2 . . . qn.Intuitively, we know that this should be possible. So let us demonstrate that by constructing the basis it self.

We are going to construct this orthogonal basis iteratively, by taking every new basis vector and orthogonalizing it toour already existing subspace vectors. In fact, we will also have that span (v1, v2 . . . vl) = span (q1, q2 . . . ql) The first vectorv1, we keep as is. So we set q1 = v1

‖v1‖ . The second vector v2 may not necessarily be orthogonal to q1. So we can find theprojection of v2 along q1 and subtract it off from v2 which can be written as

V 2 = v2 −(qT1 v2

)q1

. At the end of this step, we have removed the projection of q1 from the vector v2 since qT1 v2q1 represents the projection ofv2 in the direction of q1. That is the vector V 2 is now perpendicular to q1. We can verify this by taking the inner productwith q1, qT1 V 2 = qT1 v2 −

(qT1 v2

)qT1 q1 = 0. We then set q2 = V 2

‖V 2‖. Also note that at this stage the vectors q1 and q2 are

linear combinations of v1 and v2 and hence the span of q1, q2 is the same as the span of v1 and v2. Since v1, v2 are linearcombinations of q1 and q2, there is an upper triangular matrix R such that

[v1|v2] = [q1|q2]

[r1,1 r1,20 r2,2

]. It is not hard to verify that r1,1 = ‖v1‖, r1,2 = qT1 v2 and r2,2 = ‖V 2‖

Let us carry out one more step in detail. Now we get to v3. We already have our subspace due to v1 and v2 beingspanned by q1 and q2. v3 might not be perpendicular to this subspace. But as before we can remove the projection alongthis subspace by removing its projection along q1 and q2. Note that we can do this separately since q1 and q2 are orthogonal.Thus, we set

V 3 = v3 − qT1 v3q1 − qT2 v3q2The claim is that V 3 is orthogonal to q1 and q2.

qT1 V 3 = qT1 v3 −(qT1 v3

)qT1 q1 −

(qT2 v3

)qT1 q2 = 0

qT2 V 3 = qT2 v3 −(qT1 v3

)qT2 q1 −

(qT2 v3

)qT2 q2 = 0

25

And as before, we can now normalize V 3 to get q3 = V 3

‖V 3‖. At this stage the vectors q1, q2 and q3 are linear combinations of

v1, v2 and v3 and hence the span of q1, q2, q3 is the same as the span of v1,v2,v3. Since v1, v2,v3 are linear combinations ofq1,q2,q3, there is an upper triangular matrix R such that

[v1|v2|v3] = [q1|q2|q3]

r1,1 r1,2 r1,30 r2,2 r2,30 0 r3,3

. where r1,1, r1,2, r2,2 are as before and r1,3 = qT1 v3, r2,3 = qT2 v3 and r3,3 = ‖V 3‖.

26

13 Least squares using QR and function spacesIn the last class, we formed a new decomposition of our matrix A = QR where Q is orthogonal and R is an upper triangularmatrix. Moreover the column space of A is the same as the column space of Q. If A is an m× n matrix, then Q is an m× nmatrix and R is an n × n matrix. ATA = RTQTQR = RTR. The normal equations are then given by ATAx = AT b =

RTRx = RTQT b. Since R is upper triangular, it is invertible. Thus, we can multiply(RT)−1 and get the equation Rx = QT b

which is a much better to be solving on the computer. In fact, if the matrix was square n× n, then the same strategy wouldstill work and we’d just need to do a matrix multiplication and solve an upper triangular system.

(NOTE: Everything from this point on in this section will not be tested on exams. This is just for mathematicalentertainment.) These ideas very easily also extend to infinite dimensional function spaces where we consider vectors withan infinite number of components. Let us now instead consider the space R∞ which is the vector space which has an infinitenumber of components. So now each vector is given by v = (v1, v2 . . .). The length of a vector as before now is given by‖v‖2 = v21 + v22 + v23 . . .. Note that it is easily possible that the vector have infinite length. For example, the vector of 1’s hasinfinite length in this space. So we shall just consider the set of vectors which have finite length to be allowed in this space.If a vector v has finite length, then cv still has finite length and it is infact given by c times the original length. It can alsobe shown that ‖v+w‖ ≤ ‖v‖+ ‖w‖ and thus two vectors of finite length can be added and lead to a vector of a finte length.So the collection of vectors v : ‖v‖ < ∞ does form a vector space. In fact, we can even extend the notion of angle betweentwo vectors by defining the dot product in the standard way, vTw = v1w1 + v2w2 + . . .. As before if ‖v‖ and |w| have finitelength, then the dot product is also finite.

Let us make things more concrete and look at the space of functions on [0, 2π]. A bunch of examples of functions on thisspace are xn, sin (nx), cos (nx) are a few examples. On this space, we can define the notion of “length” and “angle” betweentwo “vectors” as the well. The “length” of a function will be measured by the square root of the square of its integral given by‖f‖2 =

∫ 2π

0(f (x))

2dx. Apologies, but we will now swtich notation for the dot product. We will use (x, y) to denote the dot

product between two vectors. The dot product between two “vectors” (f, g) is then given by∫ 2π

0f (x) g (x) dx and as before

we still have (f, f) = ‖f‖2. We shall define our infinite dimensional space as the collection of functions f such that ‖f‖2. Bythe abstract non sense discussed earlier, we know that collection of objects forms a vector space.

Two sets of basis for this infinite dimensional space are

• 1, x, x2 . . . , xn . . .

• 1, cos (x) , sin (x) , cos (2x) , sin (2x) , . . . , cos (nx) , sin (nx) . . .

In fact, turns out that the second basis is orthogonal upto scaling i.e. the vectors are mutually orthogonal but notnecessarily norm 1.

Question 32 Show that∫ 2π

0sin (nx) cos (mx) dx = 0


0sin (nx) sin (mx) dx = 0 if n 6= m.


0cos (nx) cos (mx) dx = 0 if n 6= m.

Trig identities will be your friends for the questions above.Thus given any function in our space, we can find coefficients ai and bi such that

f = a1 sin (x) + a2 sin (2x) + . . . an sin (nx) + . . .+ b0 + b1 cos (x) + b2 cos (2x) + . . .+ bn cos (nx) + . . .

Since the basis of functions is orthogonal, we can just take the inner product with sin (nx) to compute an as

(f, sin (nx)) = a1 (sin (x) , sin (nx)) + a2 (sin (2x) , sin (nx)) + . . . an (sin (nx) , sin (nx)) + . . .+

b0 (1, sin (nx)) + b1 (cos (x) , sin (nx)) + b2 (cos (2x) , sin (nx)) + . . .+ bn (cos (nx) , sin (nx)) + . . .

All other terms other than the one corresponding to an are 0 due to orthogonality. Thus, (f, sin (nx)) = an (sin (nx) , sin (nx))which gives us a formula for an.

We could similarly find coefficients cn such that f = c0 + c1x + c2x2 . . . cnx

n . . . but the process would be a lot morecomplicated as the basis is not orthogonal.

We can apply the process of Gram-Schmidt to this basis as well to obatain an orthogonal set of polynomials called theLegendre polynomials. The Legendre polynomials are a very special class of polynomials with many other applications aswell including in electrostatics and designing good algorithms for integrating functions accurately on the computer.

Question 35 Compute the first 3 Legendre polynomials. Verify that they are orthogonal to each other.

27

We can also do a least square problem in such spaces. A sample question would be of the form, what is the bestpolynomial of degree 2 that approximates sin (x). So the goal is to find the polynomial p (x) = c0 + c1x+ c2x

2 such that theerror ‖ sin (x)− p (x) ‖2 is minimized. The “rectangular matrix” in this set up would be,

[1|x|x2

] c0c1c2

= [sin (x)]

The normal equations corresponding to this set up would be

ATA

c0c1c2

=

(1, 1) (1, x)(1, x2

)(x, 1) (x, x)

(x, x2

)(x2, 1

) (x2, x

) (x2, x2

) c0

c1c2

=

∫1

∫x

∫x2∫

x∫x2

∫x3∫

x2∫x3

∫x4

c0c1c2

AT sin (x) =

∫sin (x)∫x sin (x)x2 sin (x)

And as before, we could solve the normal equations. Alternatively, we could have found the best degree 2 polynomial

by expressing it as a linear combination in the legendre basis given by c0P0(x) + c1P1(x) + c2P2(x). But since this basis isorthogonal, ATA would be the diagonal matrix corresponding to the norms of ‖Pi(x)‖2 which can be easily inverted.

28

14 Oct 26 - Introduction to determinantsBefore starting to discuss determinants let me discuss a reasons as to why we could care about them. Machine learning isa hot topic of research currently. Some examples include image classification and robotic arm control to name a few. Thestandard process in these applications is to estimate model parameters based on the training a data set which is a collectionof data points. The goal is to determine the set of parameters for which the given set of data would be most likely. The ideais to approximate a given set of data with a function f (x) where x is the set of data points where the model yi = f (xi) + εi.We already saw one approach to solve this problem in the past. This is where we try to express f as a linear combinationof a known set of basis functions and discover the optimal parameters which would minimize the least squares error. Thisis alright when we know we have an idea about the underlying model. However in a different class of applications, we haveno idea about what basis functions to use and in that set up a method that is commonly used is to sample functions from aunderlying “probability distribution” with some “covariance matrix”. In the process of estimating the best set of parametersof the covariance matrix, we need to compute the determinant of a matrix.

The determinant is a measure of the volume of a box in n dimensional space and is relevant when we switch coordinatesfor integrating functions. Suppose now we need to compute

∫Vf(x, y, z)dxdydz. In many scenarios, this might not be the

best set of coordinates to be integrating in. It might be more convenient to use polar coordinates or sphericial coordinates asthe computation might be easier in that basis. Let us recall the chain rule in 1D. Suppose we want to integrate

∫If(x)dx.

If we change coordinates to x = g(u), we get∫g−1(I)

f(g(u))g′(u)du. In higher dimensions, the equivalent of the chain rulerequires the determination of the determinant. To make the example more concrete, when we change coordinates from(x, y, z) → (r, θ, z), we set x = r cos θ, y = r sin θ and z = z. dx changed to dx/dudu. In higher dimensions, the stretchingelement is dxdydz = Jdrdθdz where J is the determinant of the Jacobian given by

J =

∣∣∣∣∣∣ ∂x/∂r ∂x/∂θ ∂x/∂z∂y/∂r ∂y/∂θ ∂y/∂z∂z/∂r ∂z/∂θ ∂z/∂z

∣∣∣∣∣∣ =

∣∣∣∣∣∣ cos θ −r sin θ 0

sin θ r cos θ 00 0 1

∣∣∣∣∣∣ = r

Just to retierate the determinant is the measure of a volume of a box. The edges of the box would correspond to differentrows of our matrix.

Another application which we will see later is in the determination of eigenvalues of a matrix.More the expressions of the determinant, it is the properties that they posses which makes it more interesting. Let me

contradict myself right away and start off with the formula for the determinant of a 2 × 2 matrix so that we can verify itsproperties. The determinant of a 2× 2 matrix is given by

det[a bc d

]= ad− bc

A determinant is uniquely defined based on these three properties in higher dimensions

1. The determinant changes when two rows are exchanged

Question 36 What is the determinant of a permutation matrix?

2. The determinant depends linearly on the first row,∣∣∣∣ a+ a′ b+ b′

c d

∣∣∣∣ =

∣∣∣∣ a b′

c d

∣∣∣∣+

∣∣∣∣ a′ b′

c d

∣∣∣∣ ,and, ∣∣∣∣ ta tb

c d

∣∣∣∣ = t

∣∣∣∣ a bc d

∣∣∣∣It should be noted that det(B + C) 6= det B + det C and secondly det tA 6= t det A.

3. Scaling required to define uniquely. The determinant of the identity matrix is 1.

Question 37 Does the determinant of a matrix change under unitary transformations?

29

15 Oct 30More properties of the determinant If two rows of A are equal, then det A = 0. It follows from the formula for a 2 × 2determinant. The intuitive picture to remember is that determinant measures the volume of a box. If two of the rows areequal then there is no box it is just a line so the volume of is 0.

Elementary row transformations do not change the determinant. Not the ones that scale a row of matrix, but the onesthat subtract a multiple of one of the rows from another row. Suppose, we subtract a l times row 2 from row 1, we have thenew matrix

A =

[a− lc b− ldc d

]Then from the linearity of the determinant in the first row, we get

detA = det[a bc d

]− ldet

[c dc d

]And from the previous rule, the second determinant is 0, so the determinant doesn’t change under elementary row transfor-mations.

If A has a row of 0 then det A = 0. Again follows from the formula and the same picture for the rows being equal tellsus why the determinant must be 0.

If A is triangular, then det A is the product of the diagonal elements a11a22a33 . . . ann. Since the determinant is unchangedunder elementary row transformations, we can reduce an upper triangular or a lower triangular matrix, using the steps ofelimination, we can reduce it to a diagonal matrix, while the diagonal elements remain unchanged. The determinant of adiagonal matrix is the product of the diagonal elements. The picture is the following, in higher dimensions as well, the rowsare the along the coordinate axes, so the box is a scaled cube in that dimension. The volume then is just a product of thelength of the sides which is the same as the product of the diagonal elements.

The determinant of a singular matrix is 0. In fact the determinant of matrix is non zero if and only if it is invertible. Thepicture to remember is this. If A is not invertible, then the column space of A is not all of Rn. The row rank is the same asthe column rank and thus the row rank is also not all of Rn. This means, that all rows lie in a lower dimensional plane andthe “parallelopiped” formed in that space will have 0 volume.

The determinant of a product is the product of the determinants. That is det AB = det A det B. One can easily verifythis for 2× 2 matrices. For higher dimensions, we consider the following function of the matrix A, d(A) = det(AB)/det(B).To check that d(A) is the determinant, we just need to verify that d(A) satifies the first three properties listed here. Clearly,we see that d(I) = det(B)/det(B) = 1. When we exchange two rows of A, by the row picture of matrix multiplication, thesame two rows of AB are exchanged. From property two of determinants, the sign of det (AB) changes and thereby the signin d(A) changes. A linear combination of the first row of A gives the same linear combination in the first row of AB. Thenrule 3 for the determinant of AB divided by the fixed quantity det(B), leads to property 3 being satisfied for d(A). Thusd(A) must be detA.

Question 38 If a 4× 4 matrix has det A = 0.5, find det(2A) , det(−A), det(A2), and det(A−1).

Question 39 If a 3× 3 matrix has det A = −1, find det(0.5A) , det(−A), det(A2), and det(A−1).

30

16 Nov 2 - Different ways of computing the determinant.Strategy 1:My default strategy in the absense of any information for computing of the determinant of a matrix would be to first findthe LU decomposition of the matrix if row exchanges are not required and by the PA = LU decomposition in the event rowexchanges are required. Let us review the strategy. Given any matrix A, we can always find a decomposition PA = LU ,where P is a permutation matrix , L is a lower triangular matrix and U is an upper triangular matrix. We know how tocompute the determinants of lower and upper triangular matrices. Thus, if we can compute the determinant of a permutationmatrix, using the product rule, we can compute the determinant of matrix A as det(A) = det(L)det(U)/det(P ).

Determinants of a permutation matrix. A permutation matrix is an identity matrix whose rows/coulmns have beeninterchanged. Thus, by performing row change operations, we can get to the identity matrix. Thus the determinant of apermutation matrix is the same as that of an identity matrix up to sign. i.e. detP = ±detI = ±1.

Thus, given the PLU decomposition of a matrix, we can compute the determinant. Using this result, we can showthat det(AT ) = det(A). Consider the PLU decomposition of a matrix, PA = LU . Taking the transpose of both equa-tions, we get (PA)T = (LU)T . Thus, ATPT = UTLT . Using the product rule of determinants, we get det(AT ) =det(UT )det(LT )/det(PT ). The diagonal elements in LT and L are the same and hence the determinant of LT is the same asthe determinant of L. Similarly, the determinant of UT , is the determinant of U . And lastly, PPT = I. So by the productrule det(PT ) = 1/det(P ). Thus, det(P ) = det(PT ). For two by two matrices, we can just verify from the formula

det[a bc d

]= a · d− b · c det

[a cb d

]= a · d− c · b

Once we know that det(A) = det(AT ), we can easily compute the determinant of orthogonal matrices. We know thatQTQ = I. By using the product formula for determinants, we get

det(Q)det(QT = 1 =⇒ det(Q)2 = 1 =⇒ det(Q) = ±1

. Let us think about the volume picture of determinants. The rows of Q are orthogonal and norm 1, thus the rows of Q alsocorrespond to the equivalent of a rotated + reflected cube in n dimensions with each side having length 1. Thus the volumeof the cube has to be 1 up to sign. This is also consistent with our expectation that Q is a composition of a rotation and areflection both of which have determinant 1.

A projection, whose column space is not all of Rn has to have determinant 0, as the projection operator maps theorthogonal complement of the subspace it projects onto to 0.

Even though in practice, we would not use any other strategy other than strategy 1 for computing determinants of genericmatrices, it helps to study alternate formulae for the determinant which would allow us to derive interesting properties ofblock determinants and which would come in handy for computing determinants faster. With that goal in mind, I discusstwo other strategies for computing the determinant which express the determinant in terms of the entries of A and do notrely on computing decompositions of the matrix A

Strategy 2:For two by two matrices, we had a simple formula to compute the determinant of a matrix in terms of its entries which wasgiven by ad−bc. Let us look at another derivation of a the same formula which would allow us to extend it to general matrices.We can decompose the computation into computation of the determinants of simpler matrices. Consider the matrices A1

and A2 given by

A1 =

[a 0c d

]A2 =

[0 bc d

]Since the determinant is linear in the first row, we get

det(A) = det(A1) + det(A2)

We can further then decompose A1 and A2 even further to get

A11 =

[a 0c 0

]A12 =

[a 00 d

]And similarly,

A21 =

[0 bc 0

]A22 =

[0 b0 d

]Since the determinant is linear in the second row as well, we get

det(A1) = det(A11) + det(A12), det(A2) = det(A21) + det(A22)

31

det(A) = det(A11) + det(A12) + det(A21)det(A22)

Clearly, the determinant of A11 and A12 is 0 as they have a column of 0’s. For the determinants that do survive, by usingthe linearity of the determinants in its rows, we get

det(A12) = addet([

1 00 1

])= ad det(P12)), det(A21) = bc det

([0 11 0

])= bc det(P21)

where P12 is the permutation matrix for which the first row is the first row of the identity and the second row is the secondrow of the identity and P21 is the permutation matrix for which the first row is the second row of the identitya and the secondrow of P21 is the first row of the identity. Thus only those matrices survived for which we chose non zero entries in differentrow for each column. The only matrices that survive are the ones which take in different columns from different rows.

This idea generalizes to higher dimensions as well. The explicit formula in higher dimensions is

det(A) =∑

all P’s

(a1αa2β . . . anν)det(P )

where (α, . . . ν) is a permutation of the numbers (1, 2, . . . , n).We can regroup the above formula in another way as well to obtain an alternate formula. Let us collect all the terms in

the above formula, there can be no other term in row 1 or column 1. All the terms that are there in the formula are

Cofactor of a11 : C11 =∑

all P’s

a2β . . . anνdet(P )

where now (β, . . . ν) are permutations of (2, n). The formula is exactly the formula of the submatrix of A with the first rowand the first column of A. Similarly a12 is multiplied by some smaller determinant C12. GRouping all the terms that startwith the same a1j the big formula becomes

det(A) = a11C11 + a12C12 + . . . a1nC1n

32

17 Nov 4: Working with block matricesWhen working with matrices, it is important to be able to deal with block linear algebra rather than handling things justterm by term. That is instead of entries in a matrix, we subdivide matrices into block of matrices. Let us make things alittle more precise. In this class, we will be dealing with objects of the form

M =

[A BC D

]where M is a (m+ n)× (m+ n) matrix, A is a m×m matrix, B is m× n matrix, C is n×m matrix and D is n× n matrix.For example, let

M1 =

1 2 0 02 5 1 00 1 1 01 0 0 1

with,

A1 =

[1 22 5

]B1 =

[0 01 0

]C1 =

[0 11 0

]D1 =

[1 00 1

]

17.1 Block matrix multiplicationWe’ve already seen how block matrix multiplication works in the past. Consider two matrices M1 and M2 given by

M1 =

[A1 B1

C1 D1

], M2 =

[A2 B2

C2 D2

]Then,

M1M2 =

[A1A2 +B1C2 A1B2 +B1D2

C1A2 +D1C2 C1B2 +D1D2

]Example time. Let M be the matrix above and N be the matrix given by

M2 =

0 1 0 01 0 0 00 0 1 00 0 0 1

A2 =

[0 11 0

]B2 =

[0 00 0

]C2 =

[0 00 0

]D2 =

[1 00 1

]If we observeM2, it is just a permutation matrix which when multiplied byM1 would exchange columns 1,2 ofM1. Thus,

M1M2 =

2 1 0 05 2 1 01 0 1 00 1 0 1

33

Let us verify that the block multiplication works. All terms involving B2 and C2 are zero since B2 = C2 = 0

A1A2 =

[2 15 2

]

B1D2 =

[0 01 0

]C1A2 =

[1 00 1

]D1D2 =

[1 00 1

]

17.2 Block Gaussian eliminationWe can perform Gaussian elimination on block matrices as well. We can eliminate the block C1 in matrix M1 by subtractingC1A

−11 times the first block and subtracting them. We can then form the block LU decomposition to get

M1 =

[I 0

C1A−11 I

] [A1 B1

0 D1 − C1A−11 B1

]= LU

We can verify this using the block matrix multiplication picture.

17.3 Determinants in block matrix landOne might be tempted to compute the determinant like for 2× 2 matrices and claim the determinant to be

det(M1) = det(A1)det(D1)− det(C1)det(B1) , OR det(M1) = det(A1D1 − C1B1)

Firstly as matrix operations, these do not make sense. So morally they do not deserve to be correct. We can verify it withour matrix M1. Here are the determinants of the four matrices in question

det(M1) = 0 , det(A1) = 1 ,det(B1) = 0 , det(C1) = −1 , det(D1) = 1 , det(A1D1) = 1 det(C1B1) = 0 .

However, that idea does work for triangular matrices. In the example above,

det(L) = det(I) · det(I) = 1 , det(U) = det(A1) · det(D1 − C1A−11 B1)

Question 40 Can you think of some moral justification for why the off-diagonal term should not affect the determinant ?(Hint: Think of the permutation formula for the determinant)

34

18 Nov 13 - Eigenvalues introductionLet’s look at a system of differential equations. Suppose we are interested in solving the equation

du

dt= Au(t)

Here u is a vector of functions in Rn and A is an n×n matrix. Usually, the question is to solve an initial value problem withinitial condtions u(0) = u0 where u0 is a known constant vector. Examples of such systems include spring mass systems,gravitational systems to name a few.

Let us look at the scalar version of this equation. We get

u′(t) = λu(t)

We know the solution to this equation is given byu(t) = u0e

λt

Using the same idea, we look for a similar solution to our matrix equation. Let u = veλt. Plugging this ansatz intoequation (18), we get

eλtλv = eλtAv

which is equivalent toAv = λv

The solution is uninteresting if v in the above equation turns out to be the zero vector. Thus, it is only interesting if for agiven value λ, we can find a non-zero vector v which satisfies the above equation. Here both λ and v need to be determined.This is not a standard Ax = b problem where b is a known vector and x is unknown. However in the above equation, both vand λ are unkwown. These problems are called eigenvalue problems and are important in various fields of study.

The above equation can alternatively be written as

(A− λI) v = 0

The values λ corresponding to which there exists a non zero v is called an eigenvalue of the matrix A and v is the correspondingeigenvector. If a non-zero vector v exists, then it is a null vector of the matrix A − λI. That gives us a characterizationof the eigenvalue λ. λ is an eigenvalue if A − λI is a singular matrix which is equivalent to det(A − λI) = 0. Note settingdet(A− λI) = 0 from the permutation formula gives us a polynomial of degree n in λ. Thus to compute the eigenvalues of amatrix we will use the following algorithm.

1. Using your favorite formula for the determinant, compute the determinant of A − λI. This will be a polynomial ofdegree n in λ with the leading order coefficient (−λ)n.

2. Using your favorite method for finding the roots of a polynomial, find the roots of the above polynomial. These are theeigenvalues of the matrix.

3. For each eigenvalue, solve (A− λI) v = 0, i.e. find the null vector(s) for the matrix A− λI.

Example 15 Consider the following system of differential equations

du1dt

= u1(t)− 2u2(t) (48)

du2dt

= −2u1(t) + u2(t) (49)

Subject to the initial conditions, u1 (0) = 2 and u2 (0) = 3

Plugging in the ansatz u1 = v1eλt and u2 = v2e

λt, we get the following system of equations

λv1eλt = v1e

λt − 2v2eλt (50)

λv2eλt = −2v1e

λt + v1eλt (51)

which corresponds to the eigenvalue problem[1− λ −2−2 1− λ

] [v1v2

]=

[00

]

35

To compute these eigenvalues, we follow our algorithm. Step 1 is to compute the determinant of A−λI. This is given by

det([

1− λ −2−2 1− λ

])= (1− λ)2 − (−2)2 = λ2 − 2λ− 3

The second step is to compute the roots of this polynomial. The roots of the above quadratic equation are given by

λ1,2 =2±√

4 + 4 · 32

= 3,−1

To compute the eigenvector corresponding to λ1 = 3, we find the null vector v1 of A− λ1I.

A− λ1I = A− 3I =

[−2 −2−2 −2

]The eigen vector assoicated with the eigenvalue λ1 = 3 is the null vector of A − 3I which can be computed by any of yourfavorite method. Let v be the null vector. Then v solves,

(A− 3I) v = 0[−2 −2−2 −2

] [v1v2

]=

[00

]It is easy to check that the eigenvector is

v =

[1−1

]Question 41 Find the eigenvector corresponding to the eigenvalue λ2 = −1.

Diagonal matrices: The eigenvalues of a diagonal matrix are given by the diagonal matrix. Consider the diagonalmatrix

D =

d1

d2. . .

dn

The eigenvalues associated with the matrix are given by

det(D − λI) = det

d1 − λ

d2 − λ. . .

dn − λ

= (d1 − λ)(d2 − λ) . . . (dn − λ)

The solutions to det(D − λI) = (d1 − λ)(d2 − λ) . . . (dn − λ) = 0 are given by λ = d1, d2 . . . dn.

Question 42 What is the eigenvector associated with d3?

Triangular matrices: The eigenvalues of triangular matrices are also the diagonal elements of the matrix. Let usconsider the example of an upper triangular matrix

U =

u11 x x x

u22 x x. . . x

unn

Here x indicates any random number. The eigenvalues associated with the matrix are given by

det(U − λI) = det

u11 − λ x x x

u22 − λ x x. . . x

unn − λ

= (u11 − λ)(u22 − λ) . . . (unn − λ)

The solutions to det(U − λI) = (u11 − λ)(u22 − λ) . . . (unn − λ) = 0 are given by λ = u11, u22 . . . unn.

36

Projection matrices: Projection matrices have eigenvalues 0 or 1. Let us consider the simple example of projectingon the line with direction vector v = (1, 1)

T . The corresponding projection is given by

P =vvT

vT v=

[0.5 0.50.5 0.5

]The eigenvalues of this matrix are given by setting det(P−λI) = 0.

det(P − λI) = det([

0.5− λ 0.50.5 0.5− λ

])= (0.5− λ)2 − (0.5)2 = λ2 − λ

Computing the roots of det(P − λI) = λ2 − λ = 0, we get λ = 0, 1. This is true of all projection matrices.

37

19 Nov 16Characteristic polynomial of a matrix The determinant of A − λI is a polynomial of degree n is λ is called thecharacteristic polynomial. If λi is are the eigenvalues then,

det (A− λI) =

n∏i=1

(λi − λ)

Computing eigenvalues is much more challenging than the solving Ax = b. From the above argument, it is equivalent tocomputing the roots of a polynomial. It is shown in the past that for polynomials of degree greater than or equal to 5 therecannot be simple closed form formulae for the roots of a polynomial. So there is no deterministic algorithm that would giveus the eigenvalues of a matrix. Towards the end of this class, we will see how to compute the eigenvalues approximately tohigh accuracy. However, some combinations of the eigenvalues are easy to compute. For example, the sum of the eigenvaluesof a matrix is given by the sum of the diagonal elements and the product of the eigenvalues is the determinant of the matrix.

a11 + a22 + . . . ann = λ1 + λ2 . . .+ λn det(A) = λ1 · λ2 . . . λn

19.1 Diagonalizing a matrixIf a matrix has n linearly independent eigenvectors and if the columns of S are these eigenvectors, then S−1AS is a diagonalmatrix Λ where the diagonal entries of the matrix are the eigenvalues. Here is a short proof.

Let x1, x2 . . . xn be the eigenvectors corresponding to the eigenvalues λ1, λ2 . . . λn.

AS = A [x1|x2| . . . |xn] = [λ1x1|λ2x2| . . . |λnxn]

The matrix on the right hand side can be written as SΛ, where Λ is the matrix,

Λ =

λ1

λ2. . .

λn

Thus,

AS = SΛ

Since the columns of S are linearly independent, S is invertible, and hence S−1AS = Λ.If a matrix has no repeated eigenvalues, then the corresponding eigenvectors are linearly independent. Eigenvectors

corresponding to different eigenvalues are linearly independent. Here is a short proof. Suppose v1 and v2 are eigenvectorscorresponding to distinct eigenvalues λ1, λ2. We need to find c1 and c2 such that

c1v1 + c2v2 = 0

Here c1, c2 are scalars and v1 and v2 are vectors. Applying, A to the above equation, we get

Ac1v1 +Ac2v2 = A0 = 0

λ1c1v1 + λ2c2v2 = 0

Multiplying equation (19.1) by λ2 and subtracting from the above equation, we get

c1 (λ1 − λ2) v1 = 0

Since λ1 6= λ2 and v1 6= 0, c1 = 0 which also implies that c2 = 0. Thus the vectors v1 and v2 are linearly independent.The diagonalization matrix is not unique as we could have scaled any eigenvector by an arbitrary scalar.Any other matrix S will not give a diagonal matrix corresponding to S−1AS.Some matrices are not diagonalizable. Consider the following matrix A,

A =

[0 10 0

]The eigenvalues of the above matrix are 0 since it is an upper triangular matrix with 0’s on the diagonal. So both theeigenvalues are 0. However, when we compute the eigenvectors, we just get one eigenvector

v1 =

[c0

]

38

λ = 0 is a repeated eigenvalue. The algebraic multiplicity is the number of times the eigenvalue is repeated. Thus, λ = 0has algebraic multiplicity of 2 in this case. However, there is just one eigenvector corresponding to this eigenvalue. Thusthe geometric multiplicity of this eigenvalue is 1. The minimum geometric multiplicity of an eigenvalue is 1. If the geometricmultiplicity does not equal the algebraic multiplicity then we cannot diagonalize the matrix

Not enough eigenvectors implies that a matrix is not diagonalizable Non-diagonalizability should not beconfused with non invertibility. A matrix is not invertible if there is a 0 eigenvalue.

39

20 Nov 18 - Dec 2If A is diagonalizable, the decomposition of the matrix A = SΛS−1 or A = SDS−1 is known as the eigenvalue decompositionof the matrix. Here is an alternate way to understand the eigenvalue decomposition. Given a vector x, we can computeAx in the following fashion. We first find the components of x in the basis of the eigenvectors, i.e. we find coefficientsc = (c1, c2, . . . cn)

T such that x = c1v1 + c2v2 + . . . cnvn, where v1, v2, . . . vn are the eigenvectors of the matrix A and hencealso the columns of S. This is equivalent to solving the following linear system for c,

Sc = x =⇒ c = S−1x .

We note that this is always possible for diagonalizable matrices, since a matrix is diagonalizable when the matrix has nlinearly indepdent eigenvectors which implies that S is full rank or equivalently the span of the eigenvectors is all vectors inRn.

Once we have this decomposition of the vector x, the action of the matrix A along each eigendirection vi, is just scalingthe corresponding eigenvectors by λi, i.e. Avi = λivi. Thus,

Ax = c1Av1 + c2Av2 + . . . cnAvn (52)= c1λ1v1 + c2λ2v2 + . . .+ cnλnvn (53)

The last expression from the column picutre of matrix multiplication is equivalent to Ax = Scscaled where, cscaled =(λ1c1, λ2c2, . . . λncn)

T . Furthermore, cscaled = Λc where Λ is the diagonal matrix whose diagonal entries are the eigen-values of A. Combining all of these results, we get

Ax = Scscaled = SΛc = SΛS−1x

So the action of A is encoded as, first find the decomposition in the eigenvector basis, scale the components in the basisand to return to the standard basis, express it as the linear combination of eigenvectors with the scaled weights.

20.1 Recurrence relations and Computing Ak

Linear recurrence relations are relations of the form xk = Axk−1 where xk, k = 0, 1, 2 . . . are vectors in Rn and A is an n×nmatrix. Here is an example. We know the nth Fibonacci number is the sum of the previous two numbers in the series. Letfn be the nth Fibonacci number, then fn = fn−1 + fn−2. This can be written as a linear recurrence relation by settingxn = (fn, fn−1)

T . Then

xn =

[fnfn−1

]=

[fn−1 + fn−2

fn−1

]=

[1 11 0

] [fn−1fn−2

]= Fxn−1

where A is the matrix given by

F =

[1 11 0

]Thus to compute the nth Fibonacci numbers, we wish to compute Fn−1x1 where x1 is the vector of first two Fibonaccinumbers.

Another example of such processes are Markov processes. The idea of these processes is that the future trajectory iscompletely determined by the current state and is independent of the past trajectory. For example simulations of cardgames/boards games, weather prediction models, queueing processes for computer servers and the page rank algorithm arebased off a markov process. Imagine you have a collection of “states”. In the weather prediction model, the states could be“clear”, “rainy”, “cloudy”, “sunny” etc. In the page rank algorithm, each webpage corresponds to a state. Let sn denote thestate of the system at time n. Given the current state sn, the system transitions to a new state at time n+ 1, according to aproability transition matrix given by A. Let’s consider a deterministic example corresponding to a Markov matrix. Supposethe population of New haven was x1 and that of the outside world was y1. Suppose every year, 60 percent of New Haven’spopulation stays back in New Haven and 40 percent moves outside; and 30 percent of the outisde world moves to New Havenand 70 percent of the outside world stays put. The populations at the end of the year x2 and y2 are then given by[

x2y2

]=

[0.6 0.30.4 0.7

] [x1y1

]To compute the population at the end of year n, we are intersted in computing Akx where A is the 2 × 2 matrix in theequation above and x is the initial state of populations.

If we observe the matrix above, all the entries are positive and the columns add up to 1. Such matrices are calledMarkov matrices. In probabilistic problems as well, the resultant matrices have these properties and we are interested in theprobability of observing the states “states” after n steps is given by Anx0 where x is the initial set of probabilities for eachstate. A second fact that is known about Markov matrices is that since the columns sum up to 1, one of the eigenvalues of

40

a Markov matrix is 1 with the rest being less than or equal to 1. The eigenvalue associated with this eigenvector of 1 is arelevant quantity of these Markov matrices which corresponds to the steady state distribution of the “states” or the averagetime spent in each state by the Markov process. For example, the rank of a page is determined by computing this eigenvectoror the steady state distribution associated with the corresponding Markov process. We’ve looked at the computation ofeigenvalues and eigenvectors already, so let us first focus on the computing An and we will revisit the problem of computingthe steady state later in this section.

We shall assume for now that the matrix is diagonalizable. Let A = SDS−1 be the eigenvalue decomposition of ourmatrix. Let the ith column of S be vi and its associated eigenvalue λi. Then Ak = SDkS−1 where Dk is still a diagonalmatrix with entries λki . This makes sense from the picture described above as well. Given a vector, we first express it inthe eigenvector basis of the matrix A (multiplication by S−1, then the action of Ak is just multiplication by the scalar λkicorresponding to the eigenvector vi which we convert back into the Euclidean basis by multiplication by S.

Practical computation of Ak. We first compute the eigenvalues and the associated eigenvectors for the matrix A. Givena vector x, we then find c1, c2 . . . cn such that x = c1v1 + c2v2 + . . . cnvn. Then Ax = c1λ1v1 + c2λ2v2 + . . .+ cnλnvn. Notethat in this equation ci are scalars, λi are scalars, but vi are vectors. Similarly,

Akx = c1λk1v1 + c2λ

k2v2 + . . .+ cnλ

knvn (54)

Example: Let us consider our Markov matrix. The eigenvalues of the matrix are 1 and 0.3 and the correspondingeigenvectors are

v1 =

[34

], v2 =

[−11

]Suppose the initial population is x1 and y1. This vector can be expressed in the eigenvector basis as[

x1y1

]=

1

7(x1 + y1) v1 +

1

7(−4x1 + 3y1) v2

[xk+1

yk+1

]= Ak

[x1y1

]= (1)k

1

7(x1 + y1) v1 + (0.3)k

1

7(−4x1 + 3y1) v2

In the limit k →∞, we get the steady state populations of New Haven and the rest of the world given by

limk→∞

Ak[x1y1

]= limk→∞

((1)k

1

7(x1 + y1) v1 + (0.3)k

1

7(−4x1 + 3y1) v2

)=

1

7(x1 + y1) v1

The components of the first eigenvector tell us the steady state proportions of the population in the two regions.

20.2 Stability of recurrence relationsA recurrence relation xk = Axk−1 is stable if in the limit k → ∞, oxk converges to the zero vector, neutrally stable if oxkremains bounded and unstable if oxk goes to ∞. The stability behaviour of recurrence relations is completely characterizedby the eigenvalues.

1. Stable : If all the eigenvalues are in absolute value less than 1, then the recurrence relation is stable. In equation (59),if |λi| < 1, then λki → 0 as k → ∞ and all the terms in that expression go to zero. Here is an example of a matrixwhich leads to a stable recurrence relation.

A =

[0.7 0.90 0.2

]2. Neutrally stable : If there exists at least one eigenvalue if in absolute value is 1, but the rest of the eigenvalues are

in magnitude less than 1. The makov matrix is an example which leads to a neutrally stable recurrence.

Remark 1 If there exists a repeated eigenvalue λi with absolute value 1 and furthermore if the matrix does not havesufficient eigenvectors corresponding to the eigenvalue λi, then the system will be unstable.

3. Unstable : If there exists at least one eigenvalue which is in magnitude greater than 1. The Fibonacci matrix is amatrix which leads to an unstable recurrence relation.

Note: even in cases when the recurrence relation is unstable, for example, in discrete models of evolution, populationsof various species would grow with time, the eigenvector corresponding to the largest eigenvalue describes the dominantbehaviour of the system. To make it more concrete, suppose λ1 is greater in magnitude than all the other eigenvalues, thenour solution can be rewritten as

Akx = λk1

(c1v1 + c2

λk2λk1v2 + . . .+ cn

λknλk1vn

)(55)

41

In the limit k → ∞, the relative effect of the vectors other than the first eigenvector, v2, v3 . . . vn is negligible. Theircontribution might not be absolutely negligible, but in all relative measures the dominant behaviour of Akx is like c1λk1v1. Inthe evolution example, if we divide Akx, by the total population of the system at time k (pk), which is the sum of componentsof the vector Akx,

limk→∞

1

pkAkx = c1v1

Remark 2 This observation also leads to algorithms for determining large eigenvalues and the corresponding eigenvectors ofa matrix. The idea is to look at powers of the matrix applied to any random vector Akv, which in some appropriate measureconverges to the largest eigenvector of the matrix.

20.3 Systems of differential equations using matrix exponentialsThe eigenvalue decomposition also allows to obtain solutions to systems of differential equations. Consider the followingsystem of differential equations,

du

dt= Au

The solution for this system is given byu(t) = eAtu0 ,

where eAt is the matrix defined byeAt = SeDtS−1 , (56)

where eDt is the diagonal matrix with diagonal entries eλit where λi are the eigenvalues of the matrix A.We already noted in the first class that the general solution to this differential equation is given by

u(t) = c1eλ1tv1 + c2e

λ2tv2 + . . .+ cneλntvn (57)

where the constants c1, c2 . . . cn are unknown constants determined by the initial data, λi are the eigenvalues and vi are thecorresponding eigenvectors. This exactly is a rewriting of equation (56).

Here is an alternate derivation for the exponential of a matrix. From calculus, we know that the exponential function isdefined as

ex = 1 + x+ x2/2 + x3/3! + . . .+ xn/n! + . . . (58)

If we formally subsitute a matrix B in the above expression, we get

eB = I +B +B2/2 +B3/3 + . . .+Bn/n! + . . .

All the terms on the right are n × n matrices and it can be show that the infinite sum of matrices makes sense and we canadd them up to obtain a meaningful result. Let us demonstrate this using the eigenvalue decomposition of the matrix. LetB = SDS−1 where the columns of S are the eigenvectors of B and D is a diagonal matrix with the eigenvalues λi as thediagonal entries.

eB = SS−1 + SDS−1 + SD2S−1/2 + SD3S−1/6 + . . .+ SDnS−1/n! + . . .

eB = S(I +D +D2/2 +D3/6 + . . .+Dn/n! + . . .

)S−1

The sum of matrices in the middle is still a diagonal matrix where on each diagonal element we have the exponential powerseries defined in equation (58). Thus,

eB = SeDS−1

where eD is a diagonal matrix with diagonal entries eλi

Example. Let A be the matrix given by

A =

[−2 11 −2

]The eigenvalues of the matrix A are given by λ1 = −3, λ2 = −1 and the corresponding eigenvectors are given by

v1 =

[−11

], v2 =

[11

]The exponential matrix is then given by the product of the following matrices

eAt =

[−1 11 1

] [e−3t 0

0 e−t

] [−1 11 1

]−1and the solution of the differential equation is given by

u(t) = c1e−3t[−11

]+ c2e

−t[

11

]where the constants c1, c2 are determined using the initial data u(0).

42

20.4 Stability of linear systemsA linear system of differential equations du

dt = Au is stable if in the limit t→∞, u(t) converges to the zero vector, neutrallystable if u(t) remains bounded and unstable if u(t) goes to ∞. The stability behaviour of recurrence relations is completelycharacterized by the eigenvalues.

1. Stable : If all the eigenvalues have negative real part, then the linear system is stable. In equation (57), if Re(λi) < 0,then eλit → 0 as k →∞ and all the terms in that expression go to zero. The example above is a stable system.

2. Neutrally stable : If there exists at least one eigenvalue has 0 real part, but the rest of the eigenvalues have negativereal part then the system is neutrally stable. Conservative systems described in the next section are usually neutrallystable.

Remark 3 If there exists a repeated eigenvalue λi with 0 real part and furthermore if the matrix does not have sufficienteigenvectors corresponding to the eigenvalue λi, then the system will be unstable.

3. Unstable : If there exists at least one eigenvalue which has positive real part, then the system is unstable.

Note: even in cases when the system of linear equations is unstable, for example, in continuous models of evolution, populationsof various species would grow with time, the eigenvector corresponding to the largest eigenvalue describes the dominantbehaviour of the system. To make it more concrete, suppose λ1 is greater in magnitude than all the other eigenvalues, thenour solution can be rewritten as

Akx = eλ1t

(c1v1 + c2

eλ2t

eλ1tv2 + . . .+ cn

eλnt

eλ1tvn

)(59)

In the limit t → ∞, the relative effect of the vectors other than the first eigenvector, v2, v3 . . . vn is negligible. Theircontribution might not be absolutely negligible, but in all relative measures the dominant behaviour of eAtu0 is like c1λk1v1.In the evolution example, if we divide eAtu0, by the total population of the system at time t (p(t)), which is the sum ofcomponents of the vector eAtu0,

limt→∞

1

p(t)eAtu0 = c1v1

20.5 Conservative systemsConsider a simple spring mass system, where an object of mass 1 is attached to a wall by a spring with spring constant 1.Let x be the displacement from the equilibrium position. From Newton’s laws it follows that the governing equations are

d2x

dt2+ x = 0

If we declare u =

[xdxdt

], we get the following set of equations for u

du

dt=

[0 1−1 0

]u = Au

Linear systems of equations with a skew-symmetric matrix A (AT = −A) are called conservative systems. They are conser-vative as they preserve a certain energy. In this case ‖u(t)‖2 = x(t)2 + dx

dt

2= Energy in spring + Kinetic energy is conserved.

The reason will become clearer in the next section. Every skew-symmetric has eigenvalues with 0 real part and an orthogonalset of eigenvectors. In this event, the exponential of the matrix, eAt is orthogonal. In our example above,

eAt =

[cos(t) sin(t)− sin(t) cos(t)

]We’ve already seen before that an orthgonal matrix does not change the length of vectors, thus ‖u(t)‖ = ‖eAtu0‖ = ‖u0‖.

21 Complex matrices and “Normal” matricesIn the previous section, we saw that even for real matrices we may have complex eigenvalues and associated eigenvectors.So it just helps to extend the concepts we have defined for real vectors and real matrices to complex vectors and complexmatrices. So instead of working with Rn and Rn×n for n dimensional vectors and n× n matrices, we will work with Cn andCn×n which are n dimensional vectors with complex components and n × n matrices where the entries of the matrix arecomplex.

43

We first focus on the vector spaces Cn. Clearly, Cn forms a vector space. Let us make things concrete and consider C3.C3 is the collection of all vectors of the form z1

z2z3

where z1, z2, z3 are all complex numbers. We can add two such vectors in the standard way and also take linear combinations,such as

α

z1z2z3

+ β

w1

w2

w3

=

αz1 + βw1

αz2 + βw2

αz3 + βw3

In the expressions above, z1, z2, z3, w1, w2, w3, α, β are all complex numbers. Clearly, sums and products of complex numbersare still complex numbers, so the vector on the rhs is still in C3.

Okay, so we have a vector space. What are some other operations we performed on vectors? Well, we had the notion ofa length of a vector given by ∣∣∣∣∣∣

x1x2x3

∣∣∣∣∣∣ =√

(x21 + x22 + x23)

We can similarly define the length of a vector in C3 given by∣∣∣∣∣∣ z1z2z3

∣∣∣∣∣∣ =

√(|z1|2 + |z2|2 + |z3|2

)Here, |z1| =

√(x21 + y21) where z1 = x1 + iy1 and x1, y1 are real numbers.

The other notion we had, was that of an inner product which measured the angle between two vectors. In real land, theinner product of two vectors x, y was given by

(x, y) = xT y = yTx = x1y1 + x2y2 + x3y3

where

x =

x1x2x3

y =

y1y2y3

An important propery that the inner product had was that (x, x) was a real number > 0 and was also equal to |x|2. Extendingthe definition above to complex land does not respect those properties. So for complex vectors, we define the inner productto be

(z, w) = z∗w = z1w1 + z2w2 + z3w3

wherez∗ = zT .

Note: (z, w) = (w, z) 6= (w, z).It is easy to verify in this case that (z, z) = |z|2

(z, z) = z1z1 + z2z2 + z3z3 = |z1|2 + |z2|2 + |z3|2 = |z|2

Both the definition of length and inner product for real numbers reduces to our original defintion or length and innerproduct. These definitions above are thus natural extensions of the corresponding definitions for R3 or in general Rn.

As already noted above, the analog of transpose is transpose conjugate for complex matrices. A∗ = AT . For example,

A =

[1 + i 2 + 3i1− 3i 2− 2i

]A∗ =

[1− i 1 + 3i2− 3i 2 + 2i

]

21.1 Normal matricesWe now wish to turn our attention to a special class of matrices for which the eigenvectors form an orthogonal basis. Inthis event, the eigenvalue decomposition of a matrix is given by A = QDQ∗ where Q is an orthonormal matrices. A lot ofmatrices of interest have this feature and let us look at a few examples.

Before we proceed to any further discussion about eigenvectors, let us introduce another tool to understand the distributionof eigenvalues better. The set of values R = {(x,Ax)} for |x| = 1 is the set of Rayleight quotients associated with the matrixA. The lemma below shows that every eigenvalue is contained in the set R of Rayleigh quotients.

44

Lemma 1 If λ is an eigenvalue, then λ ∈ R

Let x be the eigenvector associated with the eigenvalue λ, and moreover suppose |x| = 1. By definition, Ax = λx. Takingthe inner product with x we get (x,Ax) = λ(x, x) = λ. Thus λ ∈ R.

21.2 Hermitian matricesThese matrices are the analog of symmetric matrices in complex land. A hermitian matrix is one for which A = A∗. Tworesults.

• The set of Rayleigh quotients for a Hermitian matrix are real and hence the eigenvalues are also real.

• Eigenvectors corresponding to distinct eigenvectors are orthogonal.

Lemma 2 Rayleigh quotients for a Hermitian matrix are always real

(x,Ax) = x∗Ax = x∗A∗x = (Ax)∗x = (Ax, x) = (x,Ax)

Thus (x,Ax) is real. By the lemma above, all eigenvalues of a Hermitian matrix must be real.

Lemma 3 Eigenvectors corresponding to distinct eigenvalues are orthogonal.

Let λ1 and λ2 be two distinct eigenvalues and let x, y be the associated eigenvectors. Then Ax = λ1x and Ay = λ2y.

(x,Ay) = x∗Ay = x∗λ2y = λ2x∗y

(x,Ay) = x∗Ay = x∗A∗y = (Ax)∗y = (λ1x)∗y = λ1x∗y = λ1x

∗y

Thus, we get λ1 (x, y) = λ2 (x, y) which implies (x, y) = 0 since λ1 6= λ2.Example. Consider the matrix,

A =

[2 −1−1 2

]The eigenvalues of the matrix are 3, 1. The associated eigenvectors are

v1 =

[−11

]v2 =

[11

]

21.3 Unitary matricesUnitary matrices are the analog of orthogonal matrices in complex land. A unitary matrix is one for which UU∗ = U∗U = I.Four results.

• |Ux| = |x|. Unitary matrices preserve length.

• (x, y) = (Ux,Uy). Unitary matrices preserve angle.

• All eigenvalues are in norm 1.


Lemma 4 The eigenvalues of a unitary matrix are in norm 1

Let λ be an eigenvalue and x be the associated eigenvector. Then Ux = λx. Taking the norm on both sides, we get

|Ux| = |λx| =⇒ |x| = |λ| |x| =⇒ |λ| = 1


Let λ1 and λ2 be two distinct eigenvalues and let x, y be the associated eigenvectors. Then Ux = λ1x and

Uy = λ2y =⇒ y = λ2U∗y =⇒ U∗y =

1

λ2y =⇒ U∗y = λ2y

We used UU∗ = I and λ2λ2 = |λ2|2 = 1.

y∗Ux = (y, Ux) = y∗λ1x = λ1y∗x

y∗Ux = (U∗y)∗x = (λ2y)∗x = λ2y∗x

45

Thus, we get λ1 (y, x) = λ2 (y, x) which implies (y, x) = 0 since λ1 6= λ2.Example. Consider the matrix,

A =

[cos(t) − sin(t)sin(t) cos(t)

]The eigenvalues of the matrix are cos(t) + i sin(t), cos(t)− i sin(t). The associated eigenvectors are

v1 =

[1i

]v2 =

[1−i

]

21.4 Skew-Hermitian matricesThese matrices are the analog of skew-symmetric matrices in complex land. A skew-hermitian matrix is one for whichA = −A∗. Two results.

• The set of Rayleigh quotients for a skew-hermitian matrix are imaginary and hence the eigenvalues are also imaginary.


Lemma 6 Rayleigh quotients for a skew-hermitian matrix are always imaginary

(x,Ax) = x∗Ax = −x∗A∗x = −(Ax)∗x = −(Ax, x) = −(x,Ax)

Thus (x,Ax) is imaginary. By the lemma above, all eigenvalues of a skew-hermitian matrix must be imaginary.


Let λ1 and λ2 be two distinct eigenvalues and let x, y be the associated eigenvectors. Then Ax = λ1x and Ay = λ2y.

(x,Ay) = x∗Ay = x∗λ2y = λ2x∗y

(x,Ay) = x∗Ay = −x∗A∗y = −(Ax)∗y = −(λ1x)∗y = −λ1x∗y = λ1x∗y

The last equality follows since the eigenvalue λ2 is purely imaginary. Thus, we get λ1 (x, y) = λ2 (x, y) which implies (x, y) = 0since λ1 6= λ2.

Example. Consider the matrix,

A =

[0 −11 0

]The eigenvalues of the matrix are i,−i. The associated eigenvectors are

v1 =

[1−i

]v2 =

[1i

]

22 Similarity transforms, change of basis, Jordan matricesTwo matrices A and B are similar when B = M−1AM for an invertible matrix M . Note, that A and B are also similar ifB = NAN−1 for some invertible matrix N . Similar matrices naturally arise when we change variables in solving systems ofdifferential equations. Consider the solution of the following system of differential equations,

du

dt= Au

If we set u(t) = Mv(t) where v(t) are the new set of unknowns and M is some constant matrix, then the governing equationsfor v(t) are

Mdv

dt= AMv, or

dv

dt= M−1AMv

The matrix corresponding to the system of differential equations for v isM−1AM . Similarly, consider the difference equationsun = Aun−1. Let un = Mvn where vn are the new unknowns. The difference equations in these new set of unknowns aregiven by

Mvn = AMvn−1, or vn = M−1AMvn−1

Similar matrices share the same eigenvalues. If A and B are similar, then they have the same eigenvalues. Suppose x isan eigenvector of A with eigenvalue λ, then Ax = λx. Then M−1x is an eigenvector of B with eigenvalue λ.

BM−1x = M−1AMM−1x = M−1Ax = M−1λx = λM−1x

46

Proof that the eigenvalues of B are the same as the eigenvalues of A. The eigenvalues of A are the roots of the characteristicpolynomial given by det (A− λI). If we can show that the det (B − λI) = det (A− λI), then the eigenvalues of B and Awould be the same.

det (B − λI) = det(M−1AM − λI

)= det

(M−1AM − λM−1M

)= det

(M−1 (A− λI)M

)(60)

= det(M−1

)det (A− λI) detM = det (A− λI) (61)

Consider the following example. Let A be the matrix below

A =

[1 00 0

].

The eigenvalues of A are 0, 1. Each B below is M−1AM for different M ’s. Case 1,

M =

[1 b0 1

]then B =

[1 b0 0

]Case 2,

M =

[1 1−1 1

]then B =

[0.5 0.50.5 0.5

]In the first case, B is upper triangular and in the second case B is a projection matrix, but in both cases the eigenvalues are0, 1.

22.1 Change of basisAnother interpretation of the similarity transformations is that they correspond to change of basis. Right after the firstmidterm, we spoke about thinking of matrices as linear transformations. A matrix is just the representation of a linearoperator in a certain basis. So far, we’ve always been working with the standard basis ei which are the standard coordinatedirections. However, sometimes it might be advantageous to represent the same operator in another basis. The eigenvaluedecomposition and its application to solving systems of differential equations and difference equations is one such example.When we say A = SDS−1, it can be interpreted as the following. The matrix A corresponds to the representation of a linearoperator in the standard basis on the domain and the range. However, if we change our basis to the columns of S which arethe eigenvectors of the matrix, the action of the matrix in this basis is diagonal. Let me demonstrate this with an example.Consider the following matrix

A =

[1 32 2

]The columns of the matrix correspond to the mapping of the coordinate vectors ei. The first column of the matrix

[12

]=

1e1 + 2e2, is the result of Ae1 where e1 =

[10

]and e2 =

[01

]. Similarly the second column of A,

[32

]= 3e1 + 2e2 is

equal to Ae2. The eigenvectors of this matrix are

v1 =

[11

]v2 =

[−32

]Suppose, we use this basis to represent vectors in the domain and range, i.e. the vector[

5−6

]v

= 5v1 + 6v2 =

[23−7

]= 23e1 − 7e2

In this new basis, the action of the matrix A is given by

A

[10

]v

= Av1 = 4v1 =

[40

]v

Similarly,

A

[01

]v

= Av2 = −v2 =

[0−1

]v

Thus, in the basis v1, v2, the action of the same matrix A can be represented as

Avv =

[4 00 −1

]which is a diagonal matrix. Note that the input vectors for Avv are represented in the v1, v2 basis and the output vectors arealso expressed in the v1, v2 basis. Thus similar matrices represent the same linear operator in different bases.

47

22.2 Jordan matricesFor diagonalizable matrices, we have A = SDS−1. This is equivalent to saying that diagonalizable matrices are similar to adiagonal matrix D. The natural question to ask is whether all matrices are similar to a diagonal matrix, i.e. is there somebasis in each the action is always diagonal. Unfortunately, the answer is no. We know for a fact that not all matrices arediagonalizable. Consider the following example,

A =

[0 10 0

]The matrix above has eigenvalues 0, 0. However the matrix has only a one dimensional null space, the basis of which is

an eigenvector for the matrix,[

10

]. So clearly not all matrices are similar to diagonal matrix, but the example above

elucidates the worst we can do using similar matrices. That is all matrices are similar to block Jordan matrices. We will usethe following notation A ∼ B whenever A is similar to matrix B. Thus every matrix A ∼ J where J is block matrix given by

J =

J1

J2. . .

Jk−1Jk

where each Ji is a block Jordan matrix corresponding to eigenvalue λi given by

Ji =

λi 1

λi 1. . . . . .

λi 1λi

There are a few pieces of information available from this decomposition.

• The number of linearly independent eigenvectors is the number of Jordan blocks. i.e. there are k linearly indepdenteigenvectors in the above example

• The eigenvalue corresponding to each Jordan block has to be the same. For example above, Ji has the same λi on thediagonal. There can be more than one Jordan block with the same eigenvalue. For example, a 3× 3 matrix which has0 as its only eigenvalue and has 2 eigenvectors is similar to the following matrix 0 1 0

0 0 00 0 0

The 2× 2 diagonal block corresponds to a Jordan block of size 2 associated with one eigenvector of the matrix and the3, 3 element is the second Jordan block of size 1 corresponding to the second eigenvector.

• Associated to each distinct eigenvalue, there is at least one eigenvector. Moreover, eigenvectors corresponding to eacheigenvalue are linearly independent. Thus matrices with distinct eigenvalues are always similar to diagonal matrices.

• Normal matrices are always guaranteed to be similar to diagonal matrices. They do not have Jordan blocks. Considerthe following example

P =

0 1 01 0 00 0 1

The eigenvalues of the matrix are 1, 1,−1. Thus, we necessarily have 2 linearly indepdent eigenvectors, but infact, sincethe matrix is hermitian, we are acutally guarateed to have 3 linearly independent eigenvectors and they can be chosento be orthogonal to each other. In the example above, the eigenvectors corresponding to the eigenvalues 1, 1,−1 canbe chosen to be

v1 =1√2

110

v2 =

001

v3 =1√2

1−10

48

• diagonalizability and invertibility are different ideas. Invertibility corresponds to existence of a 0 eigenvalue. If thematrix has a zero eigenvalue, then it is not invertible. If the matrix has n linearly independent vectors, then it isdiagonalizable. In the example below, A is diagonalizable and invertible, B is not diagonalizable but invertible, C isdiagonalizable and not invertible and D is neither diagonalizable nor invertible.

A =

[1 00 1

]B =

[1 10 1

]C =

[1 00 0

]D =

[0 10 0

]

A couple of applications of the eigenvalue decomposition was in the computation of An and eAt for the solution of differenceequations and differential equations respectively. If A ∼ J , there exists a matrix M such that A = MJM−1. ThenAn = MJnM−1 and similarly eAt = MeJtM−1.Thus all we need to do is compute Jn and eJt for matrices that are notdiagonalizable. For the matrix J above, from the block matrix multiplication picture we get

Jn =

Jn1

Jn2. . .

Jnk−1Jnk

and

eJt =

eJ1t

eJ2t

. . .eJk−1t

eJkt

Thus, to compute Jn or eJt, we just need to compute it for a given Jordan block. Here is how they look for 3 × 3 Jordanblocks

Jni =

λn nλn−1 n(n−1)2 λn−2

λn nλn−1

λn

and

eJit =

eλt teλt t2

2 eλt

eλt teλt

eλt

49

Documents

Lecture notesrachh/LecNotes.pdf · Lecture notes December 16, 2015 1 Sep18 1.1 Vectorspaces Let’sstartofwithsomeexamplesofspaceswehavebeenworkingwithsofar. R;R2:::Rn.R isthespaceofrealnumbers