Matrix Algebra Notes Section 1 About these notes - mysmu.edu · Matrix Algebra Notes Anthony Tay 1-1 Matrix Algebra Notes Section 1 About these notes These are notes on matrix algebra

Matrix Algebra Notes Anthony Tay

1-1

Matrix Algebra Notes

Section 1 About these notes

These are notes on matrix algebra that I have written up for use in different courses

that I teach, to be prescribed either as refreshers, main reading, supplements, or

background readings. These courses are primarily

Intermediate Math for Economics

Economic Forecasting

Advanced Mathematical Methods

various Econometrics courses

Perhaps others will find them useful too. They were designed for self-study, and

intended as a quick and dirty introduction to the essentials in matrix algebra.

If you are using this, please note:

- The exercises are an essential part of the notes. Many of the important concepts

and results are in the exercises. Do the exercises.

- These notes are not meant to replace any textbooks on matrix algebra. Please use

these only as a supplement to the ‘real’ textbooks.

- The individual sections are not stand-alone. Nor complete in its exposition; I

have made a number of assumptions regarding what you already know.

- I call these notes to “matrix algebra” as opposed to “linear algebra”, which gets

into vector spaces. The coverage in these notes is terribly naive in comparison. Please

see Strang (2009) or take a real Linear Algebra course.

- These notes are incomplete. There are many topics I plan to add, and will do so

from time to time, with no particular sense of urgency.

Anthony Tay

Singapore Management University

Matrix Algebra NotesAnthony Tay

2-1

Section 2 Systems of Equations

Solving economic models often requires solving systems of equations. Here wediscuss solving systems of linear equations, meaning that the equations in thesystem take the form

1 1 2 2 ... n na x a x a x c .

Solution Possibilities Consider the system

(i) 1 2

1 2

2 4

2 2

x x

x x

A solution of this system is a pair of numbers 1 2( , )x x that satisfy both equations atthe same time. In this example, the solution is easily found to be 1 2( , ) (2,0)x x .

The system (i) has exactly one solution, but this is not true for all systems. Some havean infinite number of solutions, others have none. Consider

(ii) 1 2

1 2

3 5 6

6 10 12

x x

x x

The second equation is simply two times the first – they are not independentequations. There is effectively only one equation in two unknowns, and any 1 2( , )x xthat satisfies the first equation will satisfy the second. The system has infinitely manysolutions – any 1 2( , )x x satisfying

1 22 (5/ 3)x x or 2 1(6 / 5) (3/ 5)x x

is a solution. The set of solutions is usually written as

1 2

5( , ) 2 ,

3

sx x s

or 1 2

6 3( , ) ,

5 5

sx x s

.

The symbol ‘ s ’ is sometimes called a ‘parameter’.

The fact that a system may have no solution is illustrated by the next example:

(iii) 1 2

1 2

3 5 6

3 5 7

x x

x x

If the first equation is satisfied, the second cannot be satisfied. This system has nosolution. The two equations are conflicting equations.

Drawing the equations on the x-y plane (more appropriately here, the “ 1 2-x x ” plane)shows clearly what happens in all three situations. The two lines intersect in (i),coincide in (ii), and are parallel to each other in (iii).


2-2

A system with at least one solution is called a consistent system. A system with nosolution is said to be inconsistent. Whether a system has one solution or manysolutions depends on whether there are as many independent equations as there arevariables to be solved.

What if there are three equations in two variables? Here we have all threepossibilities. Consider

(iv)1 2

1 2

1 2

2 3 18

3 4 7

2 10

x x

x x

x x

This system has no solution. Solving the last two only gives the unique solution

1 13/ 5x , 2 37 /10x .

But this solution cannot be reconciled with the first equation:

1 22 3 2(13/ 5) 3(37 /10) 163/10 18x x .

Drawing the three equations in the 1x - 2x plane shows clearly what is going on here.For there to be a solution, these three equations must intersect at the same point. Inthis example, they don’t.

Here is an example where the three equations do intersect at a single point.

(v)1 2

1 2

1 2

4 2 3

3 4 7

2 10

x x

x x

x x

System (v) has exactly one solution. Solving the last two gives

1 13/ 5x , 2 37 /10x

as in the previous case. Substituting this solution into the first equation gives

1 24 2 4(13/ 5) 2(37 /10) 15/ 5 3x x so the first equation is also satisfied.

Another way of looking at this system is to observe that although there are threeequations in two unknowns, there are actually only two independent equations. Anyone of the three equations can be written as a combination of the other two (e.g., 3rdequation is 1st minus 2nd; or 2nd equation is 1st minus 3rd, etc.) Solving any twotherefore solves the third. It is the number of independent equations that matter indetermining the number of solutions: two independent equations in two unknownsproduces a single solution.


2-3

If a three-equation two-variable system contains only one independent equation, theneffectively we will have only one equation in two unknowns, leading to infinitelymany solutions. This is the case with the following system:

(vi)1 2

1 2

1 2

2 4 20

3 6 30

2 10

x x

x x

x x

Here the 1st equation is twice that of the 3rd, and the second is three times that of the3rd. Therefore there is only one independent equation, in two unknowns. There areinfinitely many solutions 1 2( , ) (10 2 , )x x s s .

*******

A linear equation involving three variables 1x , 2x , and 3x , represents a plane in the3-dimensional space. This visualization helps you to understand what can happen inlinear systems with three variables.

If you have two planes, then either the planes are parallel, are coincident, or intersect.If they intersect, the intersection produces a line in three-dimensional space.Therefore two linear equations involving three variables will either have no solutions(if the two planes are parallel), or an infinite number of solutions represented by theentire plane (if the two planes coincide) or an infinite number of solutions representedby the line of intersection of the two planes.

The following system has infinitely many solutions represented by a single line.

(vii) 1 2 3

1 2 3

2 3 0

1

x x x

x x x

Subtracting the second from the first eliminates 3x , and gives 1 24 1x x . Thus forany 2x s , we have 1 4 1x s . Substituting into the second equation gives

3 1 21 1 4 1 5x x x s s s .

We have infinitely many solutions given by 1 2 3( , , ) (4 1, , 5 )x x x s s s .

The system

(viii) 1 2 3

1 2 3

2 3 0

4 6 2 1

x x x

x x x

however, has no solution. Dividing the second equation by 2 gives

1 2 32 3 1/ 2x x x . Thus if 1 2 3( , , )x x x satisfies the first, it cannot satisfy the second.These two planes are parallel.


2-4

The system

(ix) 1 2 3

1 2 3

2 3 1

4 6 2 2

x x x

x x x

has infinitely many solutions. It is easy to see that the two equations are identical.Every point on the plane is a solution. For any 1x s , 2x t , we have

3 1 2 3x s t . The solution is

1 2 3( , , ) ( , , 1 2 3 )x x x s t s t .

*******

In order to get a unique solution to a three-equation system, there must be threeindependent non-conflicting equations. The three planes must intersect at a singlepoint. If there are any dependence among the three equations, then we will end upwith infinitely many solutions (either a line or a plane), or if there are conflictingequations, no solutions. We will explore this further in later sections.

Solving larger systems of equations may seem a more daunting task, but the processof solving small systems is easily systematized. This process is called ‘elimination’using “elementary row operations”. Take the system:

1 2 32 2 4 4x x x eq1

1 2 33 1 2 2x x x eq2

1 2 35 2 1 7x x x eq3

You would generally proceed by using equation 1 to eliminate 1x from eq2 and eq3:

1 2 32 2 4 4x x x eq1

eq2 minus 3 / 2 eq1 1 2 30 2 4 4x x x eq2’

eq3 minus 5 / 2 eq1 1 2 30 3 9 3x x x eq3’

Then use eq2 to eliminate 2x from eq3, to get the “triangular” system

1 2 32 2 4 4x x x eq1

1 2 30 2 4 4x x x eq2’

eq3 minus 3 / 2 eq2’ 1 2 30 0 3 3x x x eq3’’


2-5

Finally, multiply the third equation by 1/ 3 to get 3 1x . Substituting 3 1x upwards gives 2 4x , and 1 0x .

You may have to switch the positions of your equations first. If the coefficient on thefirst variable in the first equation is zero, then switch that equation with another:

1 2 30 2 4 4x x x 1 2 33 1 2 2x x x

1 2 33 1 2 2x x x switch eq 1 with eq2 1 2 30 2 4 4x x x

1 2 35 2 1 7x x x 1 2 35 2 1 7x x x

and proceed as usual.

Notice the important role played by the boxed numbers. These numbers are called thepivots of the coefficient matrix. If a system of equations has a unique solution it willhave non-zero pivots. Notice also that we only used three “types” of operations insolving our system of equations: switching equations, multiplying by a (non-zero)constant, and using one equation to eliminate another. These operations are called“row-operations” and underlie much of matrix algebra.

********

Systems of Nonlinear Equations

We often also have to solve systems of non-linear equations, such as

(x) 4xy

2 2 8 x y

The ‘method’ for solving systems of non-linear equations is effectively also toeliminate variables, although how best to do this depends of the system; it is a case-by-case situation.

For the system above, we might approach it in the following way: the first equationgives 4 /x y . Thus

2 216 / 8 y y 4 216 8 y y

which gives 2 2( 4) 0 y .

The solutions are, therefore, ( , ) (2,2)x y and ( , ) ( 2, 2) x y .


2-6

It is easy to see that solving larger systems of non-linear equations can get reallytricky. Even small systems can be tricky, and you’ll have to be careful of extraneoussolutions and missing solutions. For example, take

(xi) y x

2 y x

Perhaps the obvious thing to do here is

2 x x 2 2(2 ) 4 4 x x x x 0 ( 4)( 1) x x .

So you might conclude that 1x or 4x , with 1y and 2y respectively.However, ( , ) (4,2)x y is not a solution, since this does not lie on the secondequation. [Qn: why is ( , ) (4, 2) x y not a solution?]

Exercises

1. Plot systems (i) to (vi), and (x) and (xi), each in their own diagrams.

2. Solve the system2y x

2 2x y a ,

for 0x , 0y , 0a . Plot the system on the x - y plane for different values of a .


3-1

Section 3 The Summation Notation

You will frequently deal with complicated expressions involving a large number of

additions. Often, these expressions are simplified using the ‘summation’ notation.Many students find difficulty in manipulating such expressions. The purpose of this

section is to introduce the notation to you, and to get you comfortable with it.

3.1 Definition and Rules for Sums

The uppercase sigma “ ” is used to denote summation. For an arbitrary set ofnumbers 1 2{ , ,...., }nx x x define

1 21

...n

i ni

x x x x

.

Example 3.1.1 The average of a set of numbers 1 2, ,..., nx x x can be written

1

1 nii

x xn

.

Example 3.1.2 Write the sum 4 + 8 + 12 + 16 + 20 + 24 in summation notation:

Ans:6

14

ii

.

Example 3.1.3 Suppose the following payments are to be made: 1a in the firstperiod, 2a in the second period, and so on until na in the n th period. At a fixedinterest rate of r per period, the present value of the payments is

1 22

1

...1 (1 ) (1 ) (1 )

nn i

n ii

a aa a

r r r r

.

In Example 3.1.1, the “index of summation” i enter as subscripts, but otherwise do

not enter into the computation of the terms of the summation. In Example 3.1.2, the

index is actually part of the computation of the terms. In Example 3.1.3, the index is

used both ways.


3-2

Example 3.1.4 Economists often use an aggregate price index to track the overallprice level in an economy relative to some base year. This is usually done by trackinga weighted average of prices of a certain set of commodities. Let

1,...,i n represent n commodities

0iq be the quantity of good i purchased in period 0 (the base year)

0ip be the price of good i in period 0

t iq be the quantity of good i purchased in period t

t ip be the price of good i in period t

The Laspeyres Price Index is01

0 01

nt i ii

ni ii

p q

p q

The Paasche Price Index is 1

01

nt i t ii

ni t ii

p q

p q

Expressions using summation notation are not unique; more than one expression canbe used to represent a given sum.

Example 3.1.5 Write 1 – 1/3 + 1/5 – 1/7 + 1/9 – 1/11 in summation notation:

Ans:6 1

1

1( 1)

2 1i

i i

. Alternate answer:5

0

1( 1)

2 1i

i i

3.2 Rules for Working with the Summation Notation

The summation notation greatly simplifies notation (once you get used to it), but thisis only helpful then you know how to manipulate expressions written in it. There areonly two rules to learn

(i)1 1 1

( )n n n

i i i ii i i

a b a b

, (ii)1 1

n n

i ii i

ca c a

, where c is a constant.

Example 3.2.11

n

ic nc

Example 3.2.21

( ) 0n

ii

x x

, where1

1 n

ii

x xn

.

Proof:1 1 1

( ) 0n n n

i ii i i

x x x x nx nx


3-3

Example 3.2.31 1 1

( )( ) ( ) ( )n n n

i i i i i ii i i

x x y y x x y y y x

,

where1

1 n

ii

x xn

, and1

1 n

ii

y yn

Proof We have1 1 1

( )( ) ( ) ( )n n n

i i i i ii i i

x x y y x x y x x y

But the second term is zero:1 1

( ) ( ) 0n n

i ii i

x x y y x x

,

which gives the desired result.

The proof of1 1

( )( ) ( )n n

i i i ii i

x x y y y y x

is similar.

3.3 Some useful formulas involving summations

For every integer n,

1

( 1)1 2 3 ...

2

n

i

n ni n

2 2 2 2 2

1

( 1)(2 1)1 2 3 ...

6

n

i

n n ni n

23 3 3 3 3

1 1

1 2 3 ...n n

i i

i n i

Arithmetic Series

1

0 1( ) ( ( 1) )

( ) ( 2 ) ... ( ( 1) )

( 1)

2

n n

i ia id a i d

a a d a d a n d

n n dna

where a and d are real numbers.

Geometric Series

1 1 2 10 1

(1 )...

(1 )

nn ni i ni i

rar ar a ar ar ar a

r

,

where a and r are real numbers.


3-4

3.4 Double summations

Suppose we have a rectangular array of numbers

11 12 1

21 22 2

1 2

n

n

m m mn

a a a

a a a

a a a

Let the total sum of these numbers be S. To get S we can first add up the rows andthen add the results,

i.e., 1 21 1 1 1 1

n n n m n

j j m j ijj j j i j

S a a a a

,

or first add up the columns,

i.e., 1 21 1 1 1 1

m m m n m

i i i n iji i i j i

S a a a a

.

The parenthesis makes it clear which summation is to be done first, but it isconventional to leave out the parenthesis and write

1 1

m n

iji j

a or

1 1

n m

ijj i

a

with the understanding that the summations are carried out from right to left, i.e.,from the inner summation to the outer.

Example 3.4.1 Expand 2

1 1

m n

i j

i j .

One way to do this is 2 2 2 2 2

1 1 1 1

(1 2 )(1 2 )m n m n

i j i j

i j i j m n

A more explicit argument…

2 2 2 2

1 1 1 1 1 1 1 1

2 2 2(1 2 )(1 2 )

m n m n m n m n

i j i j i j i j

i j i j i j i j

m n


3-5

In the examples so far, we could interchange the summation signs, i.e.,

1 1

m n

iji j

a =

1 1

n m

ijj i

a .

We can do this only if the limits of the outer summation do not depend on the limit ofany of the inner summations.

Example 3.4.2 Suppose we have a triangular array of numbers to be expressed insummation notation.

11

21 22

1 2m m mm

a

a a

a a a

We can write1 1

m iiji j

a . We cannot interchange the summation symbols because

the upper limit in the inner summation depends on the index of the outer summation.

Example 3.4.3 Write the sum of the following triangular array using summationnotation.

11

21 22

,1 ,2 ,

,1 ,2 ,

n n n n

m m m n

a

a a

a a a

a a a

where n m .

Solution: Let ,i ja be the typical element in the sum. Then the first column has1j , and i running from 1 to m ; the second column has 2j , and i running from

2 to m . In general we have j running from 1 to n , for the j th column, i runningfrom j to m . Thus the sum is

1

n m

ijj i ja

.


3-6

Exercises

1. Write out in full, then evaluate or simplify

a.4

12

ii b.

3

0 iiix c.

4

11( 1) ii

i x d.10

12

i

e.10

1(2 1)

i

i f.10

1( 1)

i

ig.

1i

jj h.

1i

ji

2. Write in summation notation

a. 1 3 5 7 9

b. 1 2 2 1...1 2 1

m m m m mm m m ma a b a b ab b

m m

c. 2 + 3/2 + 4/3 + 5/4 + 6/5

d. 1 1 2 2 ... i j i j i n n ja b a b a b

3. Prove by writing out in full

a.10 3 3 3

1[( 1) ] 11 1

k

k k b.5 5

1 1 i j i jj j

x x x x

c.3 2 2 3

1 1 1 1 i j i ji j j i

x x x x d. 3 2 3 2

1 1 1 1 i j i ji j i j

x x x x

e.3 3

1 1 i ii i

x x x x where31

3 1 ii

x x

4. Prove using the rules of summation:

a. 2

1 1 1 n i n

i j ii i b. 2

1 1 n n n

i j i ij i

5. Prove all the equality relations in the following:

2 2 2

1 1 1

( ) ( )

n n n

i i i ii i i

x x x x x x nx

6. Show that1 1

( )( )

n n

i i i ii i

x x y y x y n x y .


3-7

7. Evaluate by first simplifying, then applying the appropriate formulas

a.30

1( 2)

k

k k b.1

( 2)( 2)

n

kk k k

8. Let 11 2 10 3{ , ,..., } {1, 2, 6, , , , 0, 1, 2, 4.2 }x x x e . Verify using excel or

otherwise (calculator, manual computation, mental computation, as youplease) that

a.10

1( ) 0

ii

x x

b.10 102 2 2

1 1( ) 10

i ii i

x x x x

c.10 102

1 1( ) ( )

i i ii i

x x x x x

If in (i) you get an answer a little different from 0, explain why this occurs.

9. Express the following in the form 1 2 3 4 ax bx cx dx :

(i)4 2

1 1 ji ji x (ii)

4 4

1 ji j ii x

(i.e., you have to tell me what a , b , c , and d are in each case.)

10. Let 1 2{ , ,..., }nx x x be an arbitrary set of n real numbers, and let

11

nin i

x x

.

Prove that

1 1( )( 1) ( )( 10000)

n ni i i ii i

x x x x x x

.


4-1

Section 4 Matrix Algebra: Definitions and Basic Operations

Definitions

Analyzing economic models often involve working with large sets of linear

equations. Matrix algebra provides a set of tools for dealing with such objects.

A matrix is a rectangular collection of numbers

11 12 1

21 22 2

1 2

n

n

m m mn

a a a

a a a

a a a

A .

The number of rows m need not be equal to the number of columns n. A matrix with

m rows and n columns is said to have order (m,n) or dimension (m,n), or we simply

call it a ( )m n matrix. The number that appears in the (i, j)th position is called the (i,

j)th element or the (i, j)th entry of the matrix. If m n , the matrix is a square

matrix. If 1m and 1n , it is called a row vector. If 1m and 1n , we have a

column vector. If 1m n , then we have a scalar. The elements of a vector are

often called the components of the vector.

Example A row vector 1 2 nc c cc , a column vector

1

2

m

b

b

b

b .

By simply stating that b is a vector we will usually mean that b is a column vector,

but you need to be aware whether a row or column vector is being referred to.

Matrices are often written in unitalicized bold uppercase letters, vectors in

unitalicized bold lowercase letters.

It is often convenient to write a matrix as

( )ij m na A ,

and often convenient to refer to the (i, j)th element of A using

ij

A .

There is variation in notation from author to author, so be careful in your reading.

Two matrices of the same dimension m n are said to be equal if all of their

corresponding elements are equal, i.e.,

[ ] [ ] 1,2,..., , 1,2,..., .ij ij i m j n A B A B

Matrices of different dimensions cannot be equal.


4-2

Basic Operations (Addition, Scalar Multiplication, Subtraction, Transpose)

Addition Let ( )ij m na A and ( )ij m nb B be two arbitrary ( )m n matrices.

Define

( )ij ij m na b A B ,

i.e., addition of matrices is defined to be element by element addition.

Example

1 4 6 9 1 6 4 9 7 13

3 2 1 2 3 1 2 2 4 4

6 5 1 10 6 1 5 10 7 15

Matrices being added together obviously must have the same dimensions. It should

also be obvious that

A B B A

( ) ( ) A B C A B C

This means that as far as addition is concerned, we can manipulate matrices in the

same way we manipulate ordinary numbers (as long as they have the same

dimensions)

Scalar Multiplication Let A be a ( )m n matrix, and be a scalar. Then define

( )ij m na A

i.e., the product of a scalar and a matrix is defined to be the multiplication of each

element of the matrix by the scalar.

Example

11 12 11 12

21 22 21 22

31 32 31 32

a a ba ba

b a a ba ba

a a ba ba

We can use scalar multiplication to define matrix subtraction. Let A and B be ( )m n

matrices. Then ( 1) A B A B .

Example

1 4 6 9 1 6 4 9 5 5

3 2 1 2 3 1 2 2 2 0

6 5 1 10 6 1 5 10 5 5


4-3

Transpose An important operator is the transpose. When we transpose a matrix,

we write its rows as its column, and its columns as its rows. For example, denoting

the transpose of A by A we have

1 41 3 6

3 24 2 5

6 5

Put more succinctly, [ ] [ ]ij ji A A . Note that transposes are often denoted using

A instead of A .

One application of the transpose operator is in defining symmetric matrices. A

symmetric matrix is defined as one where A A .

Exercises

1. Let

7 13

4 4

7 15

A . What is the dimension of A ? What is 12[ ]A ? What is 31[ ]A ?

2. Suppose 2 4( )ija A where ija i j . Write out the matrix in full.

3. Write out in full the matrix

(i) 4 4( )ija where 1ija when i j , 0 otherwise.

(ii) 4 4( )ija where 0ija if i j . (Fill in the rest of the entries ‘*’)

(iii) 5 5( )ija where 0ija when i j . (Fill in the rest of the entries with ‘*’)

(vi) 5 5( )ija where 0ija when i j . (Fill in the rest of the entries with ‘*’)

These are all square matrices. Matrices (i) and (ii) are called diagonal matrices.

Matrix (iii) is a “lower triangular matrix”, and (iv) is an “upper triangular

matrix” (so we have in (iii) and (iv) matrices that are square and triangular!)

4. Give an example of a (4 4) matrix such that [ ] [ ]ij jiA A .

5. If

2 1 3 1 1 3

9 0 4 9 0

3 4 7 3 4 7

u v

u v

, what is u and v ?


4-4

6. Let 1v , 2v , 3v , 4v represent cities, and suppose there are one-way flights from

1v to 2v and 3v , from 2v to 3v and 4v , and two-way flights between 1v and 4v .

Write out a matrix A such that [ ] 1ij A if there is a flight from iv to jv , and zero

otherwise.

7. What is the dimension of the matrix

1 8 3

9 1 9

0 0 0

?

8. Let 0 0 0

0 0 0

A and

0 0

0 0

0 0

B . Is A B ?

Matrices with all zero entries are called zero matrices, and written ,m n0 , or n0

if square, or simply 0 if the dimensions can be easily obtained from context.

9. If

1 3 9 4 13

3 2 1 1 4 3

2 5 1 7

u

v w

, what is u , v , and w ?

10. If

3 4

2 2 8

1 5

A , what is A ? If

3 4 6 41

1 8 2 52

1 4 3 1

B , what is B ?

11. Which of the following matrices are symmetric?

(a)

1 2 3 5

2 5 4

3 4 3 3

5 3 1

b

b

(b)

1 2 3 5

0 5 4

0 0 3 3

0 0 0 1

b

(c)

1 0 0 0

0 5 0 0

0 0 0 0

0 0 0 1

(d)

1 1 3 5

2 5 4

3 4 3 3

5 3 1

b

b

(e)

1 1

1 1

1 1

1 1


4-5

12. True or False?

(a) Symmetric matrices must be square.

(b) A scalar is symmetric.

(c) If A is symmetric, then A is symmetric.

(d) The sum of symmetric matrices is symmetric.

(e) If ( ) A A , then A is symmetric.

13. (a) Find A and B if they satisfy

1 2 1

24 3 0

A B

4 2 3

25 1 1

A B

simultaenously.

(b) If A B C and 3 2 A B 0 simultaneously, find A and B in

terms of C .


5-1

Section 5 Matrix Algebra: Multiplication

Let A be an ( )m n matrix and B be ( )n p . These dimensions can be arbitrary, butthe number of columns of A and the number of rows of B must be the same. Theproduct AB is defined as the ( )m p matrix whose ( , )thi j element is defined by

1

nik k ji j k

a b

AB .

That is, the ( , )thi j element of the product AB is defined as the sum of the product ofthe elements of the thi row of A with the corresponding elements in the thj columnof B . For example, the (1,1)th element of AB is

1 1 11 11 12 21 13 31 1 11,1 1...

nk k n nk

a b a b a b a b a b

AB

The (2,3)th element is

2 3 21 13 22 23 23 33 2 32,3 1...

nk k n nk

a b a b a b a b a b

AB

Visually, for a product of a (3 3) matrix into a (3 2) matrix, we have

11 12 13 11 11 12 21 13 3111 12

21 22 23 21 22

31 32 33 31 32

a a a a b a b a bb b

a a a b b

a a a b b

11 12 13 11 11 12 21 13 31 11 12 12 22 13 3211 12

21 22 23 21 22

31 32 33 31 32

a a a a b a b a b a b a b a bb b

a a a b b

a a a b b

11 12 13 11 11 12 21 13 31 11 12 12 22 13 3211 12

21 22 23 21 22 21 11 22 21 23 31

31 3231 32 33

a a a a b a b a b a b a b a bb b

a a a b b a b a b a b

b ba a a

and so on...

Example Let

2 8

3 0

5 1

A and4 7

6 9

B . Then

2 8 (2)(4) (8)(6) (2)(7) (8)(9) 56 864 7

3 0 (3)(4) (0)(6) (3)(7) (0)(9) 12 216 9

5 1 (5)(4) (1)(6) (5)(7) (1)(9) 26 44

AB

Example Let6 5 1

1 0 4

A and

4 1

5 2

0 1

B . Then

4 16 5 1 (6)(4) (5)(5) ( 1)(0) (6)( 1) (5)(2) ( 1)(1) 49 3

5 21 0 4 (1)(4) (0)(5) (4)(0) (1)( 1) (0)(2) (4)(1) 4 3

0 1

AB


5-2

Example The simultaneous equations

1 2

1 2

2 4

2 2

x x

x x

can be written in matrix form by defining2 1

1 2

A , 1

2

x

x

x , and4

2

b . Then

we can write the system as

1

2

2 1 4

1 2 2

x

x

, or simply Ax b .

Exercises

Many of these exercises provide more than just practice with matrix multiplication;

many of the exercises illustrate properties of matrix multiplication which differ from

multiplication among ordinary numbers. As you do these exercises, ask yourself what

the lesson is.

1. Find AB when

(a)

0 2 0

3 0 4

2 3 0

A and

8 0

0 1

3 5

B (b)2 5 1

1 0 4

A and

4

5

1

B

2. Let

2 8

3 0

5 1

A ,2 0

3 8

B , and7 2

6 3

C

(i) Compute BC ; (ii) Compute CB ; (iii) Can BA be computed?

Remark: This exercise shows that for any two matrices A and B , AB BA . Wedistinguish between premultiplication and postmultiplication. In the product AB , wesay that B is premultiplied by A , or A is postmultiplied by B .

3. Let

1

3

2

2

d and 4 2 1 6f . Compute (i) fd (ii) df (iii) d d .


5-3

4. Show that for any vector

1

2

n

x

x

x

x

, the product 0 x x . When will 0 x x ?

5. (i) Compute2 4 2 4

1 2 1 2

(ii) Let1

11

b

b

A . Compute 21 1

1 11 1

b b

b b

A AA

Remark A matrix with all elements equal zero is called the zero matrix 0 .Obviously, A 0 A and A0 0A 0 . Note however that AB 0 does not implyA 0 or B 0 . In fact, the square of a non-zero matrix can be a zero matrix!

Matrix multiplication therefore does not behave like the usual multiplication ofnumbers: the order of multiplication is important, and AB 0 does not imply thateither A 0 or B 0 . Matrix multiplication, however, does follow associative anddistributive laws:

(AB)C = A(BC)

A(B + C) = AB + AC

(A + B)C = AC + BC

These are easy to prove.

6. Compute

(i)11 12

21 22

31 32

1 0 0

0 1 0

0 0 1

a a

a a

a a

(ii)11 12

21 22

31 32

1 0

0 1

a a

a a

a a

Remark The square matrix1 0 0

0 1 0

0 0 1

I

.


5-4

is called the identity matrix. It behaves like the number ‘1’ in regular multiplication:for any matrix A , AI IA A . To emphasize the dimension of an identity matrix,we sometimes write nI if it is ( )n n .

7. Compute

(i)11 11 12

22 21 22

33 31 32

0 0

0 0

0 0

b a a

b a a

b a a

(ii)11

11 12 1322

21 22 2333

0 0

0 0

0 0

ba a a

ba a a

b

Remark: Matrices B where 0i jB for all i j are called diagonal

matrices.

(iii)

11 12 131

21 22 232

31 32 333

41 42 43

a a ab

a a ab

a a ab

a a a

(iv) 11 12 13

21 22 231 2 3 4

31 32 33

41 42 43

a a a

a a ab b b b

a a a

a a a

8. Show that

11 12 13 1311 121

21 22 23 2321 222 1 2 3

31 32 33 3331 323

41 42 43 4341 42

a a a aa ab

a a a aa ab b b b

a a a aa ab

a a a aa a

9. Show that

11 12 13

21 22 231 2 3 4

31 32 33

41 42 43

1 11 12 13 2 21 22 23 3 31 32 33 4 41 42 43

a a a

a a ab b b b

a a a

a a a

b a a a b a a a b a a a b a a a

10. Write the simultaneous equations

4 4

19 3 3

7 1

x z

x y z

x y

in matrix notation.


5-5

11. For 1 2 3

4 5 6

a a a

a a a

A and1 2 3

4 5 6

7 8 9

b b b

b b b

b b b

B , prove that ( ) AB B A by

multiplying out the matrices.

Remark: This result holds generally. For any ( )m n matrix A and any

( )n p matrix B , we have ( ) AB B A . The proof is not difficult, and

provides a very good exercise in using the notation we developed earlier. We

want to show that the ( , )thi j element of ( )AB is equal to the ( , )thi j

element of B A . By the definition of the transpose, ( , )thi j element of

( )AB is the ( , )thj i element of AB , therefore

1

1

1

[( ) ] [ ]

.

i j j i

n

j k k ik

n

k i j kk

n

ik k jk

i j

a b

b a

AB AB

B A

B A

12. Prove that ( ) ABC C B A .

13. Let X be a general ( )n k matrix. Explain why X X is square. Explain whyX X is symmetric.

14. Given a ( )n n square matrix A , define the trace of the matrix to be

1( )

niii

trace a

A

That is, the trace of a square matrix is simply the sum of its diagonal elements. Showthat

( ) ( ) ( )trace trace trace A B A B (of course, both matrices must be the same size)

( ) ( )trace trace A A

( ) trace( )trace AB BA (here A and B need not be of the same size).

Hint for the last one: look at the proof given in question 11 and adapt it.


6-1

Section 6 Introduction to the Inverse Matrix

The inverse of a square matrix A is the matrix, denoted by 1A , such that

1 A A I .

Example The inverse of the matrix2 1

1 2

A is

1 2 11

1 25

A ,

since

1 2 1 2 1 5 0 1 01 1

1 2 1 2 0 5 0 15 5

A A

One application of matrix inverses is in solving simultaneous equations. Take forexample

1 2

1 2

2 4

2 2

x x

x x

which can be written in matrix form as Ax b , where

2 1

1 2

A , 1

2

x

x

x , and4

2

b .

Since we know 1A , we can simply (pre)multiply Ax b with 1A to get

1 1 A Ax A b .

Since 1 A A I , and Ix x , we have 1x A b . This is the solution to the system.

1 2 1 4 21

1 2 2 05

A b

You can verify on your own that 1 2x and 2 0x solves the equations.

The formula for the inverse of an arbitrary (2 2) matrix

11 12

21 22

a a

a a

A

is


6-2

22 121

21 11

1

| |

a a

a a

AA

where 11 22 12 21| | a a a a A .

To show this, multiply the two together:

22 12 11 12 11 22 12 21 12 22 12 22

21 11 21 22 12 22 12 22 11 22 12 21

1 1

| | | |

| | 01

0 | || |

1 0

0 1

a a a a a a a a a a a a

a a a a a a a a a a a a

A A

A

AA.

The formula for the inverse of a (2 2) matrix is worth committing to memory.

The expression | |A is called the ‘determinant’ of A , something we will discuss indetail in later sections. Note that if | | 0A , then the inverse will not exist (we saythat A is ‘singular’). If A is singular, the system will not have a unique solution.

Exercises

1. Find the inverse of the following matrices

(i)1 2

2 1

(ii)1 3

2 4

(iii)1 2

2 4

(iv)1 0

2 0

(v)0 0

2 1

(vi)1 1

1 1

(vii)2 2

2 2

(viii)3 2

2 1

(ix)1 0

0 1

(x)7 1

2 4

(xi)2 0

0 2

(xii)0 2

2 0

2. Find the inverse of the matrix

0

0

a

d

.

Verify by directly multiplication.


6-3

3. Make a guess as to the inverse of the ( )n n matrix

11

22

33

0 0 0

0 0 0

0 0 0

0 0 0 nn

a

a

a

a

Verify your conjecture by directly multiplication.

4. Find the inverse of the matrix0

0

b

c

.

5. Solve the following systems of equations by computing the inverse of thecoefficient matrix:

(i) 1 2

1 2

2 4

2 2

x x

x x

(ii) 1 2

1 2

3 5 6

6 10 12

x x

x x

(iii) 1 2

1 2

3 5 6

6 10 10

x x

x x

(iv)2 4

2 2

y x a

y x

(v) 1 2

1 2

2 3 0

2 0

x x

x x

(vi) 1 2

1 2

3 5 0

6 10 0

x x

x x

6. Let1 2

2 4

A ,2 3

1.5 2

B , and1 1

2 3

C .

(i) Show that AB AC .

(ii) Show that 1A does not exist.

Remark The example in (i) shows that AB AC does not imply that B Cin general. However, if 1A exists, then it must be that B C , since

1 1 AB AC A AB A AC B C .

7. We defined the inverse matrix of A as the matrix 1A such that 1 A A I .

Show that this implies that 1 AA I . That is, it doesn’t matter whether you

premultiplying or postmultiplying A with 1A , you still get I as a result.

8. Leta b

c d

A . Find the inverse of A (assume , , ,a b c d are such that the

inverse exists), and show that 1 1( ) ( ) A A .


6-4

9. Let A be an ( )n n matrix whose inverse exists. Show that 1 1( ) ( ) A A .(You don’t need to know how to compute an ( )n n for this. Start with the fact

that 1 A A I and take transposes.

10. Let 11 12

21 22

a a

a a

A and 11 12

21 22

b b

b b

B , and assume that their inverses exist.

Show that1 1 1( ) AB B A .

11. Let A and B be ( )n n matrices whose inverses exist. Show that1 1 1( ) AB B A .

12. Let0 1 2

2 0 3

A and

1 2

2 0

1 1

B . Find the inverse of AB . Does the

relationship 1 1 1( ) AB B A hold for these two matrices? Why?

13. Is it true that 1 1 1( ) A B A B ? Give a counterexample.


7-1

Section 7 Finding an Inverse using Elementary Row Operations

The formula for the inverse of (3 3) and larger square matrices is much morecomplicated. We will see them in a later section. For now, we show a practical (buttedious) way to find the inverse of a matrix using “elementary row operations”. Theseoperations are exactly the steps used in the “elimination” process for solving systemsof equations.

The idea here is that the three elementary row operators:

Switching rows

Subtracting a constant times one row from another row

Multiplying an entire row by some number

can all be replicated by premultiplication by certain matrices. For example, to switchrows 1 and 2 in the matrix

0 2 4

3 1 2

6 2 1

A

we can premultiply by0 1 0

1 0 0

0 0 1

E

which is obtained by switching the 1st and 2nd rows of the (3 3) identity matrix. Youcan verify that premultiplying A by E switches rows 1 and 2 of A :

0 1 0 0 2 4 3 1 2

1 0 0 3 1 2 0 2 4

0 0 1 5 2 1 5 2 1

EA

Similarly, premultiplying a matrix by

(2) (3)

1 0 0

0 0 1

0 1 0

E ,

which is obtained by switching the 2nd and 3rd rows of the (3 3) identity matrix,switches the first and third rows of the matrix:

1 0 0 3 1 2 3 1 2

0 0 1 0 2 4 5 2 1

0 1 0 5 2 1 0 2 4

This generalizes to switching other pairs of rows, and to larger matrices.


7-2

Another type of elementary row operation is to add/subtract a constant times one rowto another row. For example, you can use the ‘3’ in row 1 as a pivot to eliminate the‘5’ below it using this operation.

e.g. row2 = row2 - 53 row1 71

3 3

3 1 2 3 1 2

5 2 1 0

0 2 4 0 2 4

and use the resulting 13 in row 2 as a pivot to eliminate the 2 in row 3:

e.g. row3 = row3 – 6 row2 7 71 13 3 3 3

3 1 2 3 1 2

0 0

0 2 4 0 0 18

These operations can also be done by premultiplying with appropriate matrices. Tosubtract 5

3 of row 1 from row 2, premultiply the matrix by

53

1 0 0

1 0

0 0 1

which is obtained by taking an identity matrix and subtracting 53 of row1 from row2.

Verify that

5 713 3 3

1 0 0 3 1 2 3 1 2

1 0 5 2 1 0

0 0 1 0 2 4 0 2 4

To subtract 6 row 2 from row 3, premultiply by

1 0 0

0 1 0

0 6 1

which is obtained by taking an identity matrix and subtracting 6 row1 from row2.Verify that

7 71 13 3 3 3

1 0 0 3 1 2 3 1 2

0 1 0 0 0

0 6 1 0 2 4 0 0 18

The pattern should be clear: to find the appropriate matrix for executing anyparticular row operation, take the identity matrix and apply that same operation to it.


7-3

To multiple row three by 118 , premultiply by

118

1 0 0

0 1 0

0 0

.

We have 7 71 13 3 3 3

118

1 0 0 3 1 2 3 1 2

0 1 0 0 0

0 0 0 0 18 0 0 1

.

If the inverse of a matrix A exists, then there is a series of elementary row operatorsthat reduces A to the identity matrix. Suppose the required elementary row operatorsare (in order) 1E , 2E , ..., nE , then

2 1...n E E E A I

which means that 12 1...n

E E E A . Furthermore, because 2 1 2 1... ...n nE E E I E E E ,we can use the following technique:

Write A and I side-by-side. Then apply the same row operators to both A and Iuntil A is reduced to the identity matrix. At the same time, the identity matrix will be“reduced” to the inverse matrix.

A I

1 1E A E I

2 1 2 1E E A E E I

1

2 1 2 1

reduced to

... ...n nI A

E E E A E E E I

Here is the fully worked out example for our example matrix:

0 2 4 1 0 0

3 1 2 0 1 0

6 2 1 0 0 1

switch rows 1 and 2

3 1 2 0 1 0

0 2 4 1 0 0

6 2 1 0 0 1

row 3 minus 2 row 1,

3 1 2 0 1 0

0 2 4 1 0 0

0 0 3 0 2 1


7-4

1 23 38 43 3

3 1 0 0

0 2 0 1

0 0 3 0 2 1

1 16 3

1 4 22 3 3

2 13 3

1 0 0 0

0 1 0

0 0 1 0

row 2 minus 43 row 3



12

8 43 3

3 0 0 1 0

0 2 0 1

0 0 3 0 2 1

13 row 112 row 213 row 3

The inverse of

0 2 4

3 1 2

6 2 1

A is therefore

1 1 16 3

1 4 22 3 3

2 13 3

0 2 4 0

3 1 2

6 2 1 0

.

You should verify this by computing

1 16 3

1 4 22 3 3

2 13 3

0 0 2 4

3 1 2

0 6 2 1

or

1 16 3

1 4 22 3 3

2 13 3

0 2 4 0

3 1 2

6 2 1 0

and checking to see if you get the identity matrix.

If we are using this technique to solve the equation Ax b , i.e. to compute1x A b : write down A and b side-by-side and apply the row operations to redure

A into the identity matrix. The rationale for this is the same as previously:

A b

1 1E A E b

2 1 2 1E E A E E b

1

2 1 2 1

reduced to

... ...n nI A b

E E E A E E E b


7-5

Exercises


1 1 3

2 1 3

2 2 1

A .

2. The matrix

1 1 3

2 1 3

1 2 6

A has no inverse. Apply elementary row operations

to A as though finding the inverse. What happens?

3. Write down the simultaneous equations

1 2 3

1 2 3

1 3 3

3 3 2 6

2 3 3

5 2 4

x x x

x x x

x x x

in the form Ax b . Solve this by

(i) finding the inverse of A and computing 1x A b ;

(ii) writing down A and b side-by-side:

3 3 2 6

2 1 3 3

1 5 2 4

and applying the necessary row operations to reduce the left side of thematrix to the identity matrix.


2 2 3 4

0 1 0 11

1 1 0 3

2 0 1 3

D .


8-1

Section 8 An Introduction to Determinants and Cramer’s Rule

In this section, you are introduced to a formula for solving systems of simultaneousequations, called Cramer’s Rule. We begin with solutions to systems with twoequations in two unknowns, and move our way up to the general n equations in nunknowns case. The formula in the general case is demonstrated here but not derived;the derivation will be given later. The key ingredient of the formula for solvingsystems of equations is the determinant of a matrix.

Determinants of (2 2) Matrices

A system of two simultaneous equations in two unknowns

11 1 12 2 1

21 1 22 2 2

a x a x b

a x a x b

can be written as Ax b , where

11 12

21 22

a a

a a

A , 1

2

x

x

x , and 1

2

b

b

b .

We have seen two (intimately related) ways to solve system of equations. One way isby elimination; another way is by computing the inverse of 1A and then finding

1x A b . The elimination method for computing the inverse of A clearly shows theconnection between the two methods. In this section, we consider yet another method,centered on the concept of a determinant (which as you will see later, is also closelyconnected to the inverse of A ).

By solving the system in the usual way, we can show that the general solution to thesystem is

1 22 2 121

11 22 21 12

b a b ax

a a a a

and 2 11 1 21

211 22 21 12

b a b ax

a a a a

.

Some observations:

- The denominators of both 1x and 2x are the same, and are made up from onlythe elements of the coefficient matrix A . We call the expression in thedenominator the ‘determinant’ of A , denoted by

11 22 21 12| | a a a a A

Determinants are also sometimes denoted as

det( )A , or 11 12

21 22

a a

a a.


8-2

- If the denominator is zero, the system will not have a unique solution, since wecannote divide by zero (the system will have either no solution, or an infinitenumber of solutions). This expression therefore determines whether or not thesystem of equations has a unique solution; the system will have a unique solutiononly if | | 0A .

- Observe that the numerator of 1x can be expressed as the determinant of the

matrix 1 12

2 22

b a

b a

, which is just the matrix A with its first column replaced by

1

2

b

b

. Likewise, the numerator of 2x is the determinant of the matrix 11 1

21 2

a b

a b

,

which is the matrix A with its second column replaced by 1

2

b

b

.

In other words, the solutions can be written in terms of determinants as

1 12

2 221

11 12

21 22

b a

b ax

a a

a a

and

11 1

21 22

11 12

21 22

a b

a bx

a a

a a

Why is this important? Because this pattern extends to larger systems of equations (itis called ‘Cramer’s Rule’). Furthermore, by using the properties of ‘Determinants’,we can (i) find ways of helping us compute solutions of larger systems more easily,and (ii) we can often say a lot regarding the characteristics of solutions of systems ofequations, without actually solving them. More on all this later. For now, memorizethe formula for the determinant of a (2 2) matrix.

When will the determinant of the coefficient matrix be zero? There are the obviouscases where all the elements one or more row or one or more column are zero:

21 22

0 00

a a , 11

21

00

0

a

a ,

0 00

0 0 .

It might be a good exercise for you to write out the systems of equations thatcorrespond to these coefficient matrices.


8-3

A less obvious case is when one row is a multiple of the other (or is the same as theother):

11 1211 12 11 12

11 12

0a a

ca a ca aca ca

;

Take for example the simultaneous equations systems

1 2

1 2

2 4 1

3 6 1.5

x x

x x

1 2

1 2

2 4 1

3 6 2

x x

x x

The coefficients in the second equation in both systems of equations are 1.5 times ofthe first equation. You can quickly calculate the determinant of the coefficient matrixto be zero. For the system of equations on the left, the constant on the right hand sideof the equality for the second equation is also 1.5 times that of the first. In this case,the two equations are the essentially the same: dividing the second equationthroughout by 1.5 gives the first. There is effectively only one equation here, in twounknowns, so there are infinitely many solutions.

The system on the right, on the other hand, obviously has no solution.

One special case should be considered separately. Consider the system

11 1 12 2

21 1 22 2

0

0

a x a x

a x a x

Such a system is called a homogenous system. If the determinant of the coefficientmatrix

11 12

21 22

a a

a a

A

is non-zero, then there is a unique solution 1 0x and 2 0x . This is sometimesreferred to as the trivial solution (these are just two lines that intersect at the origin).The question then is: when will this system of equations have a non-trivial solution?It should be straightforward to see that non-trivial solutions will exist only if thedeterminant of the coefficient matrix is zero. (Cramer’s rule will not give you thesolutions in this case, however – you get a “0/0” result).


8-4

Determinants of (3 3) Matrices

The general 3 equation 3 unknown system

11 1 12 2 13 3 1

21 1 22 2 23 3 2

31 1 32 2 33 3 3

a x a x a x b

a x a x a x b

a x a x a x b

takes a bit more work to solve, but with a little bit of patience you can show that thesolution is

1 22 33 12 23 3 13 2 32 13 22 3 1 23 32 12 2 331

11 22 33 12 23 31 13 21 32 13 22 31 11 23 32 12 21 33

b a a a a b a b a a a b b a a a b ax

a a a a a a a a a a a a a a a a a a

11 2 33 1 23 31 13 21 3 13 2 31 11 23 3 1 21 332

11 22 33 12 23 31 13 21 32 13 22 31 11 23 32 12 21 33

a b a b a a a a b a b a a a b b a ax


11 22 3 12 2 31 1 21 32 1 22 31 11 2 32 12 21 33

11 22 33 12 23 31 13 21 32 13 22 31 11 23 32 12 21 33

a a b a b a b a a b a a a b a a a bx


Obviously you wouldn’t want to memorize this, at least not in this form. However,observe again that the denominators of all three are the same, and composed only ofthe coefficients (the a ’s). Again, the system will have a unique solution only if theexpression in the denominators does not equal zero; if it equals zero, then the systemwill either have infinitely many solutions, or none. The expression in the denominatoragain determines whether or not the system will have a unique solution, so we willcollect the coefficients into the matrix

11 12 13

21 22 23

31 32 33

a a a

a a a

a a a

A

and define the determinant of this (3 3) matrix to be

11 22 33 12 23 31 13 21 32 13 22 31 11 23 32 12 21 33| | a a a a a a a a a a a a a a a a a a A .

This formula is also not easily memorized. There is a shortcut that is sometimesuseful: extending the matrix to include the first two columns.

11 12 13 11 12

21 22 23 21 22

31 32 33 31 32

a a a a a

a a a a a

a a a a a


8-5

The formula is then easily remembered in terms of the “diagonals” where the termsunder the solid arrows are added, whereas the terms under the dashed arrows aresubtract.Next, observe that the numerators of 1x , 2x , and 3x can then be written as

1 12 13

2 22 23

3 32 33

b a a

b a a

b a a

,11 1 13

21 2 23

31 3 33

a b a

a b a

a b a

, and11 12 1

21 22 2

31 32 3

a a b

a a b

a a b

respectively, i.e. we can write

1 12 13

2 22 23

3 32 331

11 12 13

21 22 23

31 32 33

b a a

b a a

b a ax

a a a

a a a

a a a

,

11 1 13

21 2 23

31 3 332

11 12 13

21 22 23

31 32 33

a b a

a b a

a b ax

a a a

a a a

a a a

,

11 12 1

21 22 2

31 32 33

11 12 13

21 22 23

31 32 33

a a b

a a b

a a bx

a a a

a a a

a a a

.

We will save ourselves some tedium by using ( )iA b to represent the matrix A with

the i th column replaced by1

2

3

b

b

b

b , thus writing

11

( )

| |x

A b

A, 2

2

( )

| |x

A b

A, and 3

3

( )

| |x

A b

A.

Example Use Cramer’s Rule to solve

2 7

3 2

3 3 2 0

y z

x y z

x y z

The coefficient matrix is

0 2 1

1 1 3

3 3 2

A

Using the ‘shortcut’ rule for computing determinants, we have


8-6

0 2 1 0 2

1 1 3 1 1

3 3 2 3 3

which gives | | 0 ( 18) ( 3) 3 0 4 28 A .

Also, 1

7 2 1

( ) 2 1 3

0 3 2

A b , and the determinant of this matrix is

7 2 1 7 2

2 1 3 2 1

0 3 2 0 3

which gives 1| ( ) | ( 14) 0 ( 6) 0 ( 63) 8 35 A b , so the solution for x is

35

28x

Similarly, the matrices 2 ( )A b and 3( )A b are

2

0 7 1

( ) 1 2 3

3 0 2

A b and 3

0 2 7

( ) 1 1 2

3 3 0

A b

with determinants 2| ( ) | 71A b and 3| ( ) | 54 A b , so the full solution is

35

28x ,

71

28y , and

54 27

28 14z .

The visual trick for computing (3 3) matrices does not extend to larger square

matrices. In the next section, we will look at a different formula for the determinant,

which will apply generally.


8-7

Exercises

1. Find the determinants of the following matrices

(i)1 2

2 1

(ii)1 3

2 4

(iii)1 2

2 4

(iv)1 0

2 0

(v)0 0

2 1

(vi)1 1

1 1

(vii)2 2

2 2

(viii)3 2

2 1

(ix)1 0

0 1

(x)7 1

2 4

2. Solve the following systems of equations using Cramer’s Rule

(i) 1 2

1 2

2 4

2 2

x x

x x

(ii) 1 2

1 2

3 5 6

6 10 12

x x

x x

(iii) 1 2

1 2

3 5 6

6 10 10

x x

x x

(iv)2 4

2 2

y x a

y x

(v) 1 2

1 2

2 3 0

2 0

x x

x x

(vi) 1 2

1 2

3 5 0

6 10 0

x x

x x

For the systems that do not have a unique solution, find out whether it has zero orinfinitely many solutions.

3. Solve, using Cramer’s rule, the following system of equations for C and Y (takeeverything else as fixed)

C a bY , 0 1b

Y C G .

4. Suppose we have the following demand and supply equation

0 1dQ P , 1 0

0 1 2sQ P R , 1 0 , 2 0

where dQ and sQ are the quantities demanded (by consumers) and supplied (byfirms). Suppose that in equilibrium, P is the market price of the good, and R israinfall. In equilibrium, we have

( )d sQ Q Q .

Using Cramer’s Rule, solve this system of equations for the equilibrium price Pand quantity Q , treating everything else as fixed. What happens to equilibriumprice and quantity when rainfall R increases? How would your answer comparewith the case when demand is completely inelastic (i.e. 1 0 )?


8-8

5. Find the determinants of the following matrices

(i)

4 0 1

19 1 3

7 1 0

(ii)

0 2 0

3 0 4

2 3 0

6. Use Cramer’s Rule to solve

4 4

19 3 3

7 1

x z

x y z

x y

7 Let 1 2

3 4

a a

a a

A and 1 2

3 4

b b

b b

B . Prove that | | | | | |AB A B

This is a very useful result, and holds for the general case, not only for (2 2)case: if A and B are ( )n n matrices, then | | | | | |AB A B . Note that it isessential, however, for both A and B to be square matrices of the sizedimension (why?)

8. Find the determinants of

(i)0

a b

d

(ii)0a

c d

(iii)0

0

a

b

9. Find the determinants of

(i) 0

0 0

a b c

e f

i

(ii)

0 0

0

a

d e

g h i

10. Let

1 2 3

2 1 5

0 3 1

A .

(a) Show that the third equation can be written as a linear combination of the firsttwo rows (i.e., you can find 1c and 2c such that

1 21 2 3 2 1 5 0 3 1c c .

(b) Show that det 0A .


9-1

Section 9 The Laplace Expansion

In the last section, we defined the determinant of a (3 3) matrix

11 12 13

21 22 23

31 32 33

a a a

a a a

a a a

A

to be 11 22 33 12 23 31 13 21 32 13 22 31 11 23 32 12 21 33| | a a a a a a a a a a a a a a a a a a A .

In this section, we introduce a general formula for computing determinants. Rewriting

11 22 33 12 23 31 13 21 32 13 22 31 11 23 32 12 21 33

11 22 33 11 23 32 12 23 31 12 21 33 13 21 32 13 22 31

11 22 33 23 32 12 23 31 21 33 13 21 32 22 31

11 22 33 23 32

| |

( ) ( ) ( )

( )



a a a a a a a a a a a a a a a

a a a a a

A

12 21 33 23 31 13 21 32 22 31( ) ( )a a a a a a a a a a

note that the terms outside the brackets are the terms along the first row of the matrix

11 12 13

21 22 23

31 32 33

a a a

a a a

a a a

The term in brackets associated with 11a is the determinant of the (2 2) matrix after

deleting the 1st row and 1

st column of A :

11

22 33 23 32( )

a

a a a a

12a 13a

21a 22 23

31

a a

a 32 33a a



nd column of A :

11

21 33 23 31( )

a

a a a a

12a 13a

21 22a a 23

31 32

a

a a 33a



nd column of A :

11

21 32 22 31( )

a

a a a a

12a 13a

21 22 23a a a

31 32 33a a a

.


9-2

There is the matter of the minus sign in front of 12a . This can be achieved by

multiplying into each term ( 1)i j . Therefore, we can write

11 12 13

21 22 23

31 32 33

a a a

a a a

a a a

11

1 111 ( 1)

a

a

12a 13a

21a 22 23

31

a a

a

11

1 212

32 33

(1,1)th Minor of

(1,1)th Cofactor of

( 1)

a

a

a a

A

A

12a 13a

21 22a a 23

31 32

a

a a

11

1 313

33

(1,2)th Minor of

(1,2)th Cofactor of

( 1)

a

a

a

A

A

12a 13a

21 22 23a a a

31 32 33a a a

(1,3)th Minor of

(1,3)th Cofactor of

A

A

Some names have been introduced in the formula above: the determinant of the

matrix after removing the thi row and thj column is called the ( , )thi j “minor” of

A ; including the sign ( 1)i j gives us the ( , )thi j “cofactor” of A .

Example The determinant of the matrix

0 2 1

1 1 3

3 3 2

A

is

1 1 1 2 1 3

1 3 1 3 1 1| | (0)( 1) (2)( 1) ( 1)( 1) 28

3 2 3 2 3 3

A ;

The cofactor expansion for computing determinants is not unique. For instance, we

could have written the original formula as

11 22 33 12 23 31 13 21 32 13 22 31 11 23 32 12 21 33

12 21 33 13 21 32 11 22 33 13 22 31 11 23 32 12 23 31

21 12 33 13 32 22 11 33 13 31 23 11 32 12 31

| |

( ) ( ) ( )




A

which can be written as


9-3

11 12 13

21 22 23

31 32 33

a a a

a a a

a a a

11

2 121 ( 1)

a

a

12 13

21

a a

a 22a 23a

31a

11 12

2 222

32 33

(2,1)th Minor of

(2,1)th Cofactor of

( 1)

a a

a

a a

A

A

13

21

a

a 22a 23a

31 32a a

11 12 13

2 323

33

(2,2)th Minor of

(2,2)th Cofactor of

( 1)

a a a

a

a

A

A

21a 22a 23a

31 32 33a a a

(2,3)th Minor of

(2,3)th Cofactor of

A

A

Alternatively, we could have expanded along a column:

11 22 33 12 23 31 13 21 32 13 22 31 11 23 32 12 21 33

12 23 31 12 21 33 11 22 33 13 22 31 13 21 32 11 23 32

12 21 33 23 31 22 11 33 13 31 32 11 23 13 21

| |

( ) ( ) ( )




A

in which case, we have

11 12 13

21 22 23

31 32 33

a a a

a a a

a a a

11

1 212 ( 1)

a

a

12a 13a

21 22a a 23

31 32

a

a a

11 12

2 222

33

(1,2)th Minor of

(1,2)th Cofactor of

( 1)

a a

a

a

A

A

13

21

a

a 22a 23a

31 32a a

11 12

3 232

33

(2,2)th Minor of

(2,2)th Cofactor of

( 1)

a a

a

a

A

A

13

21 22

a

a a 23

31

a

a 32a 33a

(3,2)th Minor of

(3,2)th Cofactor of

A

A

Note that we achieved the correct signs by multiplying the ( , )thi j minor with

( 1)i j .

Expanding along any row or column would in fact give us the same expression; we

can write

3

1( 1)i j

ij ijja M

A for any row i

or 3

1( 1)i j

ij ijia M

A for any column j .

where Mij is the ( , )thi j minor of A. This formula is known as the Laplace Expansion.


9-4

The fact that you can expand along any row or column can simplify computations

substantially if there is a row or column with many zeros.

Example Compute the determinant of the matrix

1 0 11

2 0 3

5 4 6

A

using the Laplace Expansion (i) expanding along the first row, (ii) expanding down

the second column.

(i) 1 1 1 2 1 3

1 0 110 3 2 3 2 0

2 0 3 (1)( 1) (0)( 1) (11)( 1) 12 0 88 764 6 5 6 5 4

5 4 6

(ii) 1 2

1 0 112 3

2 0 3 (0)( 1)5 6

5 4 6

2 2 1 11(0)( 1)

5 6

3 2 1 11(4)( 1) 76

2 3

(nn) Determinants

Cramer’s Rule and the Laplace Expansion extend to larger systems. The determinant

for a general ( )n n matrix

11 12 1

21 22 2

1 2

n

n

n n nn

a a a

a a a

a a a

A

is 1

( 1)n i j

ij ijja M

A for any row i

or 1

( 1)n i j

ij ijia M

A for any column j

where the ( , )thi j minor Mij of an nn matrix A is the determinant of the (n1)(n1)

matrix that remains after removing the ith row and jth column of A.


9-5

Example

The determinant of

2 2 3 4

0 1 0 11

1 1 0 3

2 0 1 3

D is

1 3 2 3

39 0

3 3 4 3

0 42

0 1 11 2 2 4

| | 3( 1) 1 1 3 0( 1) 1 1 3 ...

2 0 3 2 0 3

2 2 4 2 2 4

0( 1) 0 1 11 ( 1)( 1) 0 1 11 3

2 0 3 1 1 3

D

where we have expanded down the third column.

The Laplace expansion includes the (2 2) case. Define the determinant of a single

number as | |a a . Note that in this context the symbol does not refer to

absolute values, e.g. | 2 | = 2, not 2. Then taking, say, a first row expansion, we

have

2 1 1 1 1 21 1 11 11 12 12 11 22 12 211

( 1) ( 1) ( 1)jj jj

a M a M a M a a a a

A .

Cramer’s Rule also extends to general ( )n n systems of equations. The solution to

11 1 12 2 1 1

21 1 22 2 2 2

1 1 2 2

...

...

...

...

n n

n n

n n nn n n

a x a x a x b

a x a x a x b

a x a x a x b

is | ( ) |

| |

iix

A b

A , 1,...,i n ,

where

11 12 1

21 22 2

1 2

n

n

n n nn

a a a

a a a

a a a

A ,

1

2

n

b

b

b

b

and ( )iA b is the matrix A with its thi column replaced by b .


9-6

Exercises

1. Find the determinants of the following matrices using the Laplace expansion.

(i)

4 0 1

19 1 3

7 1 0

(ii)

0 2 0

3 0 4

2 3 0

(iii) 21 22 23

31 32 33

41 42 43

a a a

a a a

a a a

(vi) 11

21 22

31 32 33

0 0

0

a

a a

a a a

(v) 11 12 13

22 23

33

0

0 0

a a a

a a

a

(vi) 11

22

33

0 0

0 0

0 0

a

a

a

(vii) 23

32

41

0 0

0 0

0 0

a

a

a

(viii)

4 3 2

6 4.5 3

7 1 0

(ix)

4 3 0

9 1 3

7 1 0

2. Use Cramer’s Rule to solve

4 4

19 3 3

7 1

x z

x y z

x y

3. Solve the following system of equations

1 2 3 4

2 4

1 2 4

1 3 4

2 2 3 4 2

11 3

3 1

2 3 2

x x x x

x x

x x x

x x x

4. Find the determinants using the Laplace expansion

(i) 11 41

21 22

31 33

41 42 43 44

0 0

0 0

0 0

a a

a a

a a

a a a a

(ii) 11

21 22

31 32 33

41 42 43 44

0 0 0

0 0

0

a

a a

a a a

a a a a

(iii) 11 21 31 41

22 32 42

33 43

44

0

0 0

0 0 0

a a a a

a a a

a a

a

(iv) 11

22

33

44

0 0 0

0 0 0

0 0 0

0 0 0

a

a

a

a

(v)

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

(vi)

0 0 0 1

0 0 1 0

0 1 0 0

1 0 0 0


9-7

5. Let A be an arbitrary ( )n n matrix. Show that if we multiply every element of

a single row or column by c , then the determinant of the new matrix is | |c A . What

is the determinant of cA ?

6. Let E be the matrix obtained by switching the 1st and last rows of an ( )n n

identity matrix, i.e.

0 0 0 0 0 1

0 1 0 0 0 0

0 0 1 0 0 0

0 0 0 1 0 0

0 0 0 0 1 0

1 0 0 0 0 0

E

(a) Show using the Laplace expansion that det 1 E ;

(b) Does your answer depend on whether n is even or odd?

(c) Use your result to prove that the determinant of the matrix formed by switching

any two rows of the identity matrix is 1 .

7. (a) Let

1 0 0

0 1 0

0 1a

E

and suppose A is some (3 )n matrix. Describe the rows of the product EA in

terms of the rows of A ; What is the determinant of the matrix E ?

(b) Let E be a matrix that carries out the elementary row operation of subtracting a

multiple of one row from another row. What is the determinant of E ?


10-1

Section 10 Properties of Determinants

A very important result, stated here without proof, is that

| | | | | |AB A B

if A and B are square matrices that can be multiplied together.

We can use this result to relate elementary row operations to the determinants ofsquare matrices, generating several important properties of determinants. This in turnprovides a way to simplify the computation of determinants (including determinantsof larger matrices).

(1) If two rows of A are interchanged, the sign of the determinant changes.

(2) If all the elements of a single row or column of A are multiplied by , thenthe determinant of the new matrix is | | A .

(3) If a multiple of a row (column) is added to another row (column), thedeterminant remains unchanged.

We illustrate these using examples. It should be easy for you to generalize from theseexamples. Throughout, we use

3 3 2

2 1 3

1 5 2

A

as an example. You can verify that the determinant of this matrix is 24 .

Recall that switching two rows of the matrix:

e.g. switching rows 1 and 2:

3 3 2 2 1 3

2 1 3 3 3 2

1 5 2 1 5 2

is equivalent to premultiplying A with that matrix

(1) (2)

0 1 0

1 0 0

0 0 1

E

You can verify this:

(1) (2)

0 1 0 3 3 2 2 1 3

1 0 0 2 1 3 3 3 2

0 0 1 1 5 2 1 5 2

E A


10-2

The matrix (1) (2)E is obtained by switching the 1st and 2nd rows of the (3 3)

identity matrix. Using the Laplace expansion, you can easily show that

(1) (2)

0 1 0

1 0 0 1

0 0 1 E

This is true in general: the determinant of the identity matrix is one; when two rowsof the identity matrix are switched, the determinant of the new matrix becomes 1 .

This means that switching two rows of a matrix results in a switch in the sign of itsdeterminant.

(1) (2) (1) (2)| | | | | | | | E A E A A

In our example:

(1) (2) (1) (2)

2 1 3

3 3 2 ( 1)( 24) 24

1 5 2 E A E A

In general, switching two rows of a matrix switches the sign of its determinant.

The second type of row operator is multiplying a row of a matrix by some number:

e.g. multiply row 1 of A by 1/3:

233 3 2 1 1

2 1 3 2 1 3

1 5 2 1 5 2

This is equivalent to premultiplying the matrix with

13

13

(1)

0 0

0 1 0

0 0 1

E

You can easily verify that the determinant of this matrix is 13 . Therefore

1 13 3

23

1(1) (1) 3

1 1

2 1 3 ( 24) 8

1 5 2 E A E A

Multiplying one row of a matrix by multiplies the determinant by .


10-3

Finally, the third type of elementary row operator is to add/subtract a constant timesone row to another row:

e.g. subtract half of row 2 of A from row 3:

3 3 2 3 3 2

2 1 3 2 1 3

1 5 2 0 4.5 0.5

This is equivalent to premultiplying the matrix with

12(3) (2) (3)

12

1 0 0

0 1 0

0 1

E

obtained by subtracting half of the second row of the identity matrix from row 3 ofthe identity matrix. It is obvious that the determinant of this matrix is one. Therefore

1 12 2(3) (2) (3) (3) (2) (3)

3 3 2

2 1 3 (1) 24

0 4.5 0.5 E A E A A

Adding/subtracting a constant times one row of a matrix to another row of thematrix does not change its determinant.

Since the determinant of a triangular matrix is simply the product of the elements inthe diagonal, these results imply that to compute a determinant, we can reduce amatrix to a triangular matrix, compute the determinant of that, and reverse the effectsof the row operations used.

Row Operation gives Effect on det.

3 3 2 1 0 0

2 1 3 0 1 0

1 5 2 0 0 1

switch rows 1 and 3, 1 5 2 0 0 1

2 1 3 0 1 0

3 3 2 1 0 0

[ ( 1) ]

row 2 minus 2 times row 1, 1 5 2 0 0 1

0 9 1 0 1 2

3 3 2 1 0 0

[no change]

row 3 minus 3 times row 1, 1 5 2 0 0 1

0 9 1 0 1 2

0 12 4 1 0 3

[no change]


10-4

Row Operation gives Effect on det.

row 2 minus row 3, 1 5 2 0 0 1

0 3 3 1 1 1

0 12 4 1 0 3

[no change]

row 3 plus 4 times row 2,1 5 2 0 0 1

0 3 3 1 1 1

0 0 8 3 4 1

[no change]

Determinant of this last triangular matrix is (1 3 8 ) = 24 .

Looking back at the row operations used, we find we switch the sign once. Therefore,the determinant of the original matrix is

3 3 2

2 1 3 24

1 5 2

.

Exercises

1. The following matrices are derived from the matrix

3 3 2

2 1 3

1 5 2

using elementary row operations. Find their determinants.

(i)

1 5 2

2 1 3

3 3 2

(ii)

3 3 2

4 2 6

1 5 2

(iii)

5 13 6

2 1 3

1 5 2

2. It is true generally that | | | |A A . Verify this for (2 2) and (3 3) matrices.

3. Find the determinant of the matrix

2 2 3 4

0 1 1 11

1 1 7 3

2 0 1 3

D

by first reducing to a triangular matrix.


11-1

Section 11 Rank and Linear Dependence

In an early section, we learnt that the number of solutions in a system of linearequations depends on the number of “independent” equations in the system relative tothe number of unknowns. From Cramer’s Rule, we learnt that a system of n -linearequations in n -unknowns will have a unique solution only when the determinant ofthe coefficient matrix is not zero. When will the determinant of a matrix be zero? Thishas a lot to do with the linear dependence in the coefficient matrix.

The presence of any zero row or column will produce a zero determinant (this is easyto verify: just use the Laplace expansion along that zero row or column).

In the (2 2) case we obtained a zero determinant when one row was a multiple ofthe other (or is the same as the other), and this remains true of the (3 3) case. Amore subtle case is when one row is a linear combination of the other two (i.e., wecan write one row as the sum of some multiple of a second row plus some multiple ofthe third). For instance, take the matrix

1 2 3

4 2 4

3 3 5

Α

which you can verify has a zero determinant. Here the third row is equal to the firstrow plus half of the second row:

123 3 5 1 2 3 4 2 4 .

(Alternatively, we can say that the first row is the third row minus half of the second

121 2 3 3 3 5 4 2 4 .

or that the second row is two times the third row minus two times the first.)

We say that the three row vectors that make up the matrix are “linearly dependent”.This “linear dependence” between the rows leads to a zero determinant because itmeans that row operations can be applied to ultimately create a zero row.

A (3 3) matrix will have a non-zero determinant only if it has three linearlyindependent rows (or columns). More generally, think of a (3 3) square matrix

11 12 13

21 22 23

31 32 33

a a a

a a a

a a a

A

as a stack of three row vectors


11-2

1 11 12 13

2 21 22 23

3 31 32 33

a a a

a a a

a a a

a

A a

a

We say that three (column or row) vectors 1a , 2a , and 3a are linearly dependent if

there are constants 1c , 2c , and 3c , not all zero, such that

1 1 2 2 3 3c c c a a a 0 .

If the only case where this is true is 1 2 3 0c c c , then the vectors are said to belinearly independent. For now, define the rank of the matrix is the number of linearlyindependent row vectors that it contains.

Examples

1. Let

1 2 3

2 4 6

2 1 3

A . Let 1 1 2 3a , 2 2 4 6a , and 3 2 1 3a .

Then these vectors are linearly dependent, since 1 2 32 1. 0. 0 0 0 a a a .The second row is just twice that of the first row (or the first is half of thesecond). Removing one of the dependent vectors, say 2 2 4 6a , theremaining vectors 1a and 3a are linearly independent: the only constants 1c and

2c such that 1 21 2 3 2 1 3 0 0 0c c are 1 2 0c c . The matrixhas two linear independent vectors, and is said to be of rank 2.

If we imagine these three vectors in the usual representation as ‘arrows’ in three-dimensional co-ordinate space, we will see that the vector 2a is just an extension of

1a . Combinations of the three vectors are effectively combinations of only twovectors.

1 1 2 2 3 3 1 1 2 1 3 3 1 2 1 3 32 ( 2 )c c c c c c c c c a a a a a a a a

Different combinations of the three vectors will therefore result in new vectors that alllie on a plane (a two-dimensional space), and cannot ‘span’ or cover the entire 3-dspace.

2. Let

1 2 3

3 3 6

2 1 3

A . Let 1 1 2 3a , 2 3 3 6a , and 3 2 1 3a .

Then these vectors are linearly dependent, since 1 2 31 1. 1. 0 0 0 a a a .The second row “depends” on the other two in that it is the sum of 1a and 3a .


11-3

(Equivalently, 3a “depends on” 1a and 2a because it is 2 1a a ). Removing oneof these ‘dependent’ vectors (any one of them) leaves two independent vectors.For instance, removing 3a , then the only constants 1c and 2c such that

1 1 2 2 1 21 2 3 3 3 6 0 0 0c c c c a a

are 1 2 0c c . The matrix has two linear independent vectors, and is said to beof rank 2.

If we imagine these three vectors geometrically in three-dimensional co-ordinatespace, we will see that although no one vector is an extension of another, all threevectors nonetheless lie in a single plane, so again combinations of the three vectorswill ‘create’ new vectors that also lie on that plane. The three vectors cannot ‘span’,or cover, the entire 3-d space.

3. Let

1 2 3

2 4 6

3 6 9

A . Let 1 1 2 3a , 2 2 4 6a , and 3 3 6 9a .

Then these vectors are linearly dependent, since 1 2 31 1. 1. 0 0 0 a a a . In

this case, there is only one linearly independent vector. If we remove 3a , we find

that the remaining two are still linearly dependent: 2a is twice that of 1a . Therank of this matrix is one: ( ) 1rank A .

Geometrically, all three vectors lie on a single line. The vector 2a is twice that of 1a ,

and 3a is three times that of 1a . Combinations of these vectors therefore will only

create new vectors that are also extensions of 1a .

1 1 2 2 3 3 1 1 2 1 3 1 1 2 3 12 3 ( 2 3 )c c c c c c c c c a a a a a a a .

4. Let

0 0 0

2 4 6

2 1 3

A . Let 1 0 0 0a , 2 2 4 6a , and 3 2 1 3a .

Then these vectors are linearly dependent, since 1 1 2 3. 0. 0. 0 0 0c a a afor any 1c . Removing vector 1a leaves two linearly independent vectors. Therank of this matrix is two.


11-4

5. Let

1 0 0

0 2 4

0 2 1

A . Let 1 1 0 0a , 2 0 2 4a , and 3 0 2 1a .

Then all three row vectors are linearly independent; it is impossible to find 1c , 2cand 3c , not all zero, such that 1 1 2 2 3 3 0 0 0c c c a a a . This matrix is of“full rank”.

A square matrix will have a non-zero determinant if and only if (‘iff’) it has full rank.

There are many ways to determine the rank of a matrix. One way is to do ‘GaussianElimation” (the row operations) and see how many non-zero pivots you get (seeSection 2 for ‘pivots’). Other ways will be discussed in later sections, or in moreadvanced classes. For the time being we will proceed by observation.

We conclude this section with several remarks regarding rank. Earlier, we viewed amatrix as a stack of row vectors. The rank concept that we discussed could be called‘row rank’. But we can also view a matrix as the concatenation (joining together) ofthree column vectors

11 12 13

1 2 3 21 22 23

31 23 33

a a a

a a a

a a a

A a a a

1. Everything that we said here follows for column vectors as for row vectors, andin fact a matrix will have the same number of linearly independent column vectors asthere are linearly independent row vectors: if A has two linearly independent rowvectors, it will also have two linearly independent column vectors; if it has only onelinearly independent row vector, then it will one have one linearly independentcolumn vector, and so on. A matrix’s row rank is the same as its “column rank”.

2. We can also speak of the rank of non-square matrices. Take for example

2 4 6

2 1 3

A

This is in fact the last two rows of an earlier example we were looking at. Looking atthe rows, we see that these two rows are linearly independent (verify!) The ‘row-rank’ is two. What if we view this matrix as a concatenation of three two-dimensionalvectors? Note that three two-dimensional vectors can only span a two-dimensional


11-5

space. Draw three two dimensional vectors on a “ x - y ” co-ordinate space on a pieceof paper. Can you generate a three-dimensional space from this? (The answer is no.)

If we observe the three vectors

2

2

4

1

6

3

we observe that these do span the entire two-dimensional plane. So the column rankof this matrix is also 2. The principle that a matrix’s row and column ranks are thesame continues to hold, and holds generally.

As another illustration, consider

2 4 6

1 2 3

A

It is easy to see that the row rank of this matrix is one, and the column rank of thismatrix is also one.

Because the row and column ranks are always the same, we can always speakunambiguously of “the rank of a matrix”. Furthermore,

( ) min(#rows, #columns)rank A

3. Finally, we often need to find the rank of a product. Here are two very usefulresults:

3a If A is ( )m n and B is ( )n p , then

( ) min( ( ), ( ))rank rank rankAB A B

Proof Let C AB . Every column of C is some linear combination of thecolumns of A , therefore the columns in C cannot span a space of dimensiongreater than the column rank of A . Similarly, every row of C is some linearcombination of the rows of B , therefore the rows in C cannot span a space ofdimension greater than the row rank of B . Since row rank and column rank arethe same, we have our result.

3b If A is ( )m n and B is ( )n n and full rank, then

( ) ( )rank rankAB A


11-6

Proof If min( , )x a b , then obviously x a and x b are both true. Since

( ) min( ( ), ( ))rank rank rankAB A B .

we have1 1

1

( ) ( ) [ exists since is full rank]

min( ( ), )

( )

min( ( ), ( ))

( )

rank rank

rank

rank

rank rank

rank

A ABB B B

AB B

AB

A B

A

This extends to the product CAB where both C and B are full rank:

( ) ( )rank rankCAB A .

Exercises

1. Find the determinants of the following matrices using the Laplace expansion. Ifyou find the determinant of a matrix to be zero, find out how many linearindependent rows the matrix has.

(i)4 0 1

19 1 3

7 1 0

(ii)4 3 0

19 1 0

7 1 0

(iii)4 3 0

1 1 2

7 6 6

(vi)1 3 1

2 6 2

4 12 4

2. Show using the formula

11 12 13

21 22 23 11 22 33 12 23 31 13 21 32 13 22 31 11 23 32 12 21 33

31 32 33

a a a

a a a a a a a a a a a a a a a a a a a a a

a a a

that the determinant of a (3 3) matrix will be zero if

(i) one row is a multiple of another;

(ii) one row is a linear combination of the other two.

3. Show that T( ) ( )rank rankA A A .


12-1

Section 12 A Formula for the Inverse

In this chapter, we develop a formula for the inverse of an ( )n n matrix, based oncofactors. We will not be using this formula for computing inverses – for that theelementary row operations approach is the most efficient. The objective in studyingthe formula for the inverse is, for us, to understand where Cramer’s Rule comes from.

Recall that the ( , )thi j “minor” of an ( )n n matrix A is the determinant of the( 1) ( 1)n n matrix after removing the thi row and thj column of A . We willsometimes refer to this as the minor associated with the ( , )i j th element of A . The( , )i j th times ( 1)i j gives us the ( , )thi j “cofactor” of A , or the cofactor associatedwith the ( , )i j th element of A .

For example, let

3 5 6 7

5 4 7 3

1 2 9 10

2 8 2 1

A .

The (2,1) th minor is 21

5 6 7

2 9 10

8 2 1

M .

The (2,1) th cofactor is 2 121

5 6 7

( 1) 2 9 10

8 2 1

C .


3 5 7

5 4 3

2 8 1

M .

The (3,3) th cofactor is 3 333

3 5 7

( 1) 5 4 3

2 8 1

C .


5 4 7

1 2 9

2 8 2

M .

The (1,4) cofactor is 1 414

5 4 7

( 1) 1 2 9

2 8 2

C .

3

A

5 6 7

5 4 7 3

1 2 9 10

2 8 2 1

3 5 6

A

7

5 4 7 3

1 2 9 10

2 8 2 1

3

A

5 6 7

5 4 7 3

1 2 9 10

2 8 2 1


12-2

We can collect all the cofactors of a matrix -- one cofactor for each element of thematrix – into a single cofactor matrix. For example, the cofactor matrix of A ,denoted ( )C A , is the (4 4) matrix

11 12 13 14 11 12 13 14

21 22 23 24 21 22 23 24

31 32 33 34 31 32 33 34

41 42 43 44 41 42 43 44

( )

4 7 3 5 7 3 5 4 3 5 4 7

2 9 10 1 9 10 1 2 10 1 2 9

8 2 1 2 2 1 2 8 1 2 8 2

5 6 7 3 6 7

2 9 10 1 9

8 2 1

C C C C M M M M

C C C C M M M MC

C C C C M M M M

C C C C M M M M

A

3 5 7 3 5 6

10 1 2 10 1 2 9

2 2 1 2 8 1 2 8 2

5 6 7 3 6 7 3 5 7 3 5 6

4 7 3 5 7 3 5 4 3 5 4 7

8 2 1 2 2 1 2 8 1 2 8 2

5 6 7 3 6 7 3 5 7 3 5 6

4 7 3 5 7 3 5 4 3 5 4 7

2 9 10 1 9 10 1 2 10 1 2 9

Obviously, computing the cofactor matrix of a large matrix by hand is a tediousaffair. Fortunately we won’t have to compute these matrices explicitly, but it isnonetheless useful in helping us to derive useful results.

Exercise

1. Compute the cofactor matrix of the matrix

3 1 2

5 2 0

4 6 9

A .

2. Compute the cofactor matrix of the matrix

3 1 2 4 2

5 2 0 9 5

4 6 9 0 8

10 3 2 7 1

0 2 4 1 11

A ,

leaving each element as a determinant of a (4 4) matrix – don’t compute them!


12-3

A Property of Cofactors

Recall the Laplace expansion for the determinant of a matrix:

1

nij iji

a C

A for any column j

where we have given only the expansion along a column (all of the followingstatements works if we replace ‘column’ by ‘row). That is, the determinant can becomputed as the sum of the product of the cofactors of a column with thecorresponding elements of that column. What happens if we were to take the sum ofthe product of the cofactors of a column with the corresponding elements of adifferent column? To take a specific example, consider a (4 4) matrix A and itscorresponding cofactor matrix ( )C A , and take an expansion along the second column

1311 12 14

2321 22 24

3331 32 34

4341 42 44

aa a a

aa a a

aa a a

aa a a

A ,

1311 12 14

2321 22 24

3331 32 34

4341 42 44

( )

CC C C

CC C CC

CC C C

CC C C

A

The expansion 12 12 22 22 32 32 42 42a C a C a C a C gives the determinant of A . Whathappens if we take, for example, the sum of the column 2 cofactors multiplied by thecolumn 1 elements:

11 12 21 22 31 32 41 42a C a C a C a C ?

To answer this question, take the following matrix A and the corresponding cofactormatrix

1311 11 14

2321 21 24

3331 31 34

4341 41 44

aa a a

aa a a

aa a a

aa a a

A and

11 12 13 14

21 22 23 24

31 32 33 34

41 42 43 44

( )

C C C C

C C C CC

C C C C

C C C C

A

and make the following observations:

(i) the determinant of A can be computed as

12 22 32 4211 21 31 41| | a C a C a C a C A ,

where we have expanded along the second column;

(ii) the cofactors associated with the second column of A are identical to thecofactors associated with the second column of A :

1212C C , 2222C C ,

3232C C ,

4242C C , since the second column is removed when computingthese cofactors. Therefore 11 12 21 22 31 32 41 42| | a C a C a C a C A .


12-4

(iii) The determinant of A is zero, because it has two identical rows.

Therefore

11 12 21 22 31 32 41 42 0a C a C a C a C .

In general

The sum of the product of the cofactors of one column and theelements of another column is zero.

Exercises

1. (a) Write down the three cofactors 13C , 23C , and 33C corresponding tothe three elements in the third column of the matrix

3 1 2

5 2 0

4 6 9

A .

and compute the determinant of A by expanding down the third column.

(b) Now find the sum of the products of the same three cofactors and thecorresponding elements of a different column, e.g compute

13 23 333 5 4C C C

and 13 23 331 2 6C C C

2. For the matrix

a b

c d

A ,

write down the cofactors 21C and 22C corresponding to the elements

of the second row. Find 21 22cC dC and 21 22aC bC .


12-5

A Formula for the Inverse, and Cramer’s Rule

The results from the preceding section can be used to develop a formula for theinverse. Suppose we premultiply A by the transpose of the cofactor matrix of A .What we get is

11 21 31 41 1311 12 14

12 22 32 42 2321 22 24T

13 23 33 43 3331 32 34

14 24 34 44 4341 42 44

| | 0 0 0

0 | | 0 0( )

0 0 | | 0

0 0 0 | |

C C C C aa a a

C C C C aa a aC

C C C C aa a a

C C C C aa a a

A

AA A

A

A

Each diagonal element in the right-most matrix is the sum of the product of thecofactors of a column of A and the elements of that column, and is therefore equal tothe determinant. One example is marked out. (To reiterate: the cofactor matrix herehas been transposed.) The other elements are the sum of the product of the cofactorsof a column of A and the elements of a different column of A , therefore is zero.Multiplying both sides by the reciprocal of the determinant, we get

T1( )

| |C A A I

A

and therefore:

11 21 31 41

12 22 32 421 T

13 23 33 43

14 24 34 44

1 1( )

| | | |

C C C C

C C C CC

C C C C

C C C C

A AA A

.

The transpose of the cofactor matrix is called the adjoint of A , ( )adj A . so the

formula for the inverse is often written as

1 1.( )

| |adj A A

A,

which is called the adjoint formula for the inverse.

Now we prove Cramer’s Rule. Take the general n equations in n unknowns systemof simultaneous equations.

11 1 12 2 1 1

21 1 22 2 2 2

1 1 2 2

...

...

...

...

n n

n n

n n nn n n

a x a x a x b

a x a x a x b

a x a x a x b


12-6

and write it as Ax b

where

11 12 1

21 22 2

1 2

n

n

n n nn

a a a

a a a

a a a

A

,

1

2

n

x

x

x

x

, and

1

2

n

b

b

b

b

.

The solution is, using the inverse 1x A b . Writing this using the adjoint formula:

11 21 11 1

12 22 22 21

1 2

1

| |

n

n

nn n n nn

C C Cx b

C C Cx b

bx C C C

A bA

Take as a specific example the solution for 2x . We have

1 12 2 22 22 | |

n nb C b C b Cx

A

Finally, observe that the numerator of this expression is the determinant of the matrix

11 1 1

21 2 22

1

( )

n

n

n n nn

a b a

a b a

a b a

A b

(recall again that in computing the cofactors of the column, that column is deleted.)

This same argument holds for any ix , therefore we get Cramer’s Rule:

| ( ) |

| |i

ix A b

A, 1,...,i n .

In other words, solving a system by Cramer’s Rule is exactly equivalent to solvingthe system by using the inverse. Cramer’s Rule is simply a shortcut for implementingthe inverse matrix approach.


12-7

Exercises

1. Under what condition will a system of n equations in n unknowns have aunique solution?

2. Earlier you computed the cofactor matrix and determinant of the matrix

3 1 2

5 2 0

4 6 9

A .

Use the adjoint formula to compute the inverse of A . Verify your answer by

computing the product 1A A .

3. Use the adjoint formula to compute the inverse of the matrix

a b

c d

A .

4. Find the inverse of matrix

2 0 0 0

0 3 0 0

0 0 6 0

0 0 0 5

A .

Can you generalize to arbitrary diagonal matrices of dimension ( )n n ?


12-8

Properties of the Inverse

We summarize here a few properties of matrix inverses that will be useful for us. Youwill notice from the proofs that these properties arise from the nature of what aninverse is, and that we do not have to appeal to the adjoint formula to obtain theresults. That is, all the proofs arise from the fact that by definition, the inverse (whereit exists) of a square matrix A is the square matrix 1A of similar size, such that

1 A A I .

(1) 1 1 1( ) AB B A when the inverses exist.

Premultiplying 1 1 B A by AB gives the identity matrix:

1 1 1 1 1

I

AB B A AB B A AA I .

But 1 1 AB B A I says that 1 1 B A is the inverse of AB .

(2) 1 T T 1( ) ( ) A A

This says that the inverse of a transpose is the transpose of the inverse. It doesn’tmatter whether you transpose before computing the inverse, or transpose theinverse. We can take 1 A A I as the starting point. Taking transpose, andnoting that the transpose of the identity matrix is itself, we have

T1 T 1 T( ) A A I A A I .

Now premultiply both sides by T 1( )A , we get

T 1 T 1 T T 1( ) ( ) ( )

I

A A A A I

which gives us 1 T T 1( ) ( ) A A . A corollary is that the inverse of a symmetric

matrix is symmetric: if T A A , then 1 T T 1 1( ) ( ) A A A .

(3) 1 1| |

| | A

A.

This arises from the fact that the determinant of a product is the product of the

determinints: 1| | | | | | A A I .


12-9

Exercises

1(a) Let1 1

3 1

A and5 2

5 1

B .

Find 1( )AB and verify that 1 1 1( ) AB B A .

(b) Let1 1 2

3 1 4

A and

5 2

5 1

2 1

B .

Find 1( )AB . Why would it inappropriate to write 1 1 1( ) AB B A ?

Note that an alternative notation for transpose is the ‘prime’, i.e., T X X . It isimportant to be comfortable with both notations. In the following questions, we willuse the ‘prime’ notation. Thereafter, we will switch between the two, depending onwhich looks better.

2. Let X be a matrix of dimension ( )n k such 1( )X X exists. We want to showthat 1 1( ) ( ) X X X X . Which of the following arguments are correct?

A. X X is symmetric ( ) X X X X X X . We know that the inverse ofa symmetric matrix is symmetric, therefore

1 1( ) ( ) X X X X .

B. 1 1 11 1 1 1 1( ) ( ) X X X X X X X X X X

C. 11 1 1( ) ( ) ( ) ( ) X X X X X X X X

(a) A only (b) B and C only (c) All three (d) A and C only.

Explain your choice.

3. Let

2 4 1 2

0 3 2 1

0 0 8 2

0 0 0 2

A . Find 1| |A .

Matrix Algebra NotesAnthony Tay 13-1

Section 13 Eigenvalues and Eigenvectors

Eigenvalues are numbers associated with matrices that are useful in manyapplications, including dynamic problems involving differential or differenceequations. Eigenvalues (and eigenvectors) are also intimately connected to othermatrix concepts such as the determinant, the rank, and definiteness.

13.1 Definitions

For any n n matrix A , a scalar is an eigenvalue of A if there isa nonzero vector x such that

Ax x (1.1)

The vector x is said to be an eigenvector of A associated with .

Example

The scalar 3 is an eigenvalue of1 2

3 0

A , and1

1

x is an eigenvector

associated with this eigenvalue, because and x satisfies equation (1.1):

1 2 1 13

3 0 1 1

.

The scalar 2 is also an eigenvalue of1 2

3 0

A , with associated eigenvector

23

1

x , because:

2 23 3

1 22

3 0 1 1

.

There are no other eigenvalues for this matrix.

Remark Geometrically, pre-multiplying a vector T1 2x xx by a matrix Achanges the direction of the vector. An eigenvector of A is a vector whose directionis not changed when pre-multiplied by A ; doing so merely scales the vector by afactor of .


To find the eigenvectors and eigenvalues of a n n matrix A :

We use the fact that

( ) Ax x Ax x 0 A I x 0 .

This system of equation has a non-trivial solution x 0 only if

det( ) 0 A I

(this is called the characteristic equation or characteristic polynomial of the matrixA ). Therefore, we can find the eigenvalues of A by finding all that satisfydet( ) 0 A I . Then, for each , we find the associated eigenvector x from theequation ( ) A I x 0 .

Example Find the eigenvalues of the matrix1 2

3 0

A .

The eigenvalues satisfy det( ) 0 A I . Since

2

1 2 1 0det( ) det

3 0 0 1

1 2det

3

(1 ) 6

6

( 3)( 2),

A I

the eigenvalues are 3 and 2 .

For 3 , the system ( ) A I x 0 is

1 1

2 2

1 2 2 2 0

3 3 3 0

x x

x x

which gives 1 2x x , i.e. any vector of the form1

1s

x is an eigenvector of A

associated with 3 .


Similarly, for 2 , the system ( ) A I x 0 is

1 1

2 2

1 2 3 2 0

3 3 2 0

x x

x x

which gives 21 23x x , i.e., any vector of the form

23

1s

x is an eigenvector of

A associated with 2 .

Remark Note that if T1 2x xx is an eigenvector associated with aneigenvalue , then the vector T1 2s sx sxx for any s is also an eigenvector of Aassociated with , since

( ) ( )s s Ax x A x x .

That is, for every eigenvalue, we have an entire ‘line’ of eigenvectors. For example,for the matrix

1 2

3 0

A ,

the vectors T1 1x , T2 2x , T3 3x , etc. are all eigenvectors associated

with the eigenvalue 3 .

Exercises

1. Find the eigenvalues and associated eigenvectors of the matrices

(a)5 2

4 1

;

Ans: Eigenvalues are 3 and 1 with eigenvectors1

1s

and1

2t

resp.

(b)1 1

4 1

(c)0.8 0.3

0.2 0.7

(d)1 2

2 4

(e)2 1

1 4

(f)

1 0 0

2 1 0

5 3 2


Exercise 1(d) shows that eigenvalues may take value zero. There is nothing unusualabout this, although it does say something important regarding the matrix (more onthat later).

In general, an ( )n n matrix will have n eigenvalues, but exercises 1(e) and (f)shows that these eigenvalues need not be distinct. The matrix in exercise 1(e) haseigenvalue 3 , occuring twice. In exercise 1(f), the eigenvalues are 1 , 1, and 2.We say that the eigenvalue 1 occurs with multiplicity 2.

There will usually be one ‘line’ of eigenvectors associated with each distincteigenvalue. In 1(f), there is only one line of eigenvectors

1

1s

x

associated with the eigenvalue 3 . In 1(e), the eigenvectors are

0

0

1

s

x (corresponding to 2 ) and

0

1

3

s

x corresponding to 1 .

However, there are some unusual cases, where multiple lines of eigenvectors areassociated with a single eigenvalue:

2. Find the eigenvalues of the matrix

2 0

0 2

A .

Show that any vector x 0 is an eigenvector associated with the eigenvalues. Doesthis make any geometric sense?

Note also that eigenvalues can take complex values!

3. Find the eigenvalues and associated eigenvectors of the matrix2 4

2 6

A .

You should find 4 2i . The eigenvectors can be expressed in many ways. One

possibility is1

1

is

and1

1

2

t i

respectively.


13.2 A detailed look at the (2× 2) case

We study the (2 2) case in detail. A few of the results obtained here apply only to

the (2 2) case, but many generalize to the ( )n n case (albeit with more

complicated proofs). Studying the (2 2) case will help us develop intuition for theseresults with a minimum of algebra.

Let 11 12

21 22

a a

a a

A . The characteristic polynomial is

11 12 211 22 12 21 11 22 11 22 12 21

21 22

( ) det ( )( ) ( ) ( )a a

a a a a a a a a a aa a

i.e., 2( ) ( ) det( )tr A A

Therefore the eigenvalues are

2

1,2

( ) ( ) 4det( )

2

tr tr

A A A

Useful results

1. The two eigenvalues of A are

real

identical

complex

if

2

2

2

( ) 4det( )

( ) 4det( )

( ) 4det( )

tr

tr

tr

A A

A A

A A

.

The eigenvalues of a (2 2) matrix can be identical only if they are real, sincecomplex roots of a polynomial always appear in conjugate pairs. However, matricesthat are (4 4) or larger can have repeated pairs of complex roots.

It should be clear from the characteristic polynomial that if A is triangular (ordiagonal), i.e. if 12 0a or 21 0a , then the eigenvalues of A are simply its diagonalelements:

1 11a and 2 22a .

Naturally, in this case the eigenvalues are real (we are only considering matrices ofreal numbers).


2. Another important case where the eigenvalues are guaranteed to be real iswhen the matrix is symmetric: the eigenvalues of a (2 2) symmetric matrix arealways real. If 12 21a a , then

2( ) 4det( )tr A A = 2 2 2 211 22 11 22 12 11 22 12( ) 4( ) ( ) 4 0a a a a a a a a

Note that the eigenvalues will also be distinct, unless 11 22a a and 12 21( ) 0a a .

3. The product of the eigenvalues of a (2 2) matrix gives its determinant. The sumof the eigenvalues gives the trace:

1 2det( ) A and 1 2( )tr A

In detail:

2 2

1 2

2 2

( ) ( ) 4det( ) ( ) ( ) 4det( )

2 2

1( ) ( ) 4det( )

4

det( )

tr tr tr tr

tr tr

A A A A A A

A A A

A

1 22 ( )

( )2

trtr

AA .

We will see that similar results hold for larger matrices. For the (2 2) case, oneconsequence of this result is that (when the eigenvalues are real):

both eigenvalues are positive det( ) 0A and ( ) 0tr A

both eigenvalues are negative det( ) 0A and ( ) 0tr A

the two eigenvalues have opposite signs det( ) 0A

4. If A is singular, i.e., det( ) 0A , then at least one of the eigenvalues are zero:

if det( ) 0A , then 1 ( )tr A , 2 0 ;

if ( ) 0tr A and det( ) 0A , then 1 2 0 .


5. If 1 and 2 are the eigenvalues of A , then the eigenvalues of 2 A AA are 21

and 22 . The associated eigenvectors remain the same.

We can see this without referring to the formula for the eigenvalues. An eigenvalue

1 and the associated eigenvector 1x satisfies

1 1 1Ax x

Pre-multiplying by A on both sides, we have

2 21 1 1 1 1 1 1 1( ) A x Ax x x .

This says that 21 is an eigenvalue of 2A , with associated eigenvector 1x . This result

obviously generalizes to arbitrary powers of A .

Note a very important special case when the matrix A is idempotent (that is, whenAA A ). In such cases, the eigenvalues can only take values 1 and 0. Since AA A ,

both AA and A have identical eigenvalues, i.e., 2i i , so 1 and 0 are the only

possible values.

6. If 1 and 2 are the eigenvalues of A , and A is invertible, then the eigenvaluesof 1A are 11/ and 21/ , with the same associated eigenvectors.

Starting with 1 1 1Ax x and pre-multiplying by 1A , we have

1 11 1 1 A Ax A x

But the LHS is 1x , therefore 11 1 1 A x x , from which we get

11 1 1(1 / ) A x x .

7. The eigenvalues of TA are the same as those of A .

That the eigenvalues of the transpose are the same as those of the original matrix iseasy to see – a matrix and its transpose share the same characteristic polynomial.

Diagonalization

The following important results come under the general topic of diagonalization ofmatrices:

8. Suppose that the eigenvalues of A are real and distinct, i.e. 1 2 . Then the

eigenvectors 1x and 2x are linearly independent, i.e.

1 1 2 2 1 2 0c c c c x x 0 .


Proof Starting with 1 1 2 2c c x x 0 , premultiply by A to get

1 1 2 2c c Ax Ax A0 0 .

Substituting 1 1 1Ax x and 2 2 2Ax x gives

1 1 1 2 2 2c c x x 0 .

Multiplying 1 1 2 2c c x x 0 throughout by 1 gives

1 1 1 2 1 2c c x x 0 .

Substituting 1 1 1 2 1 2c c x x into 1 1 1 2 2 2c c x x 0 gives

2 1 2 2 1 2c c x x 0 , or

2 2 1 2( )c x 0 .

Since 2 x 0 and 1 2 , we have 2 0c . A similar argument shows that 1 0c .

9. Suppose that the eigenvalues 1 and 2 of A are real and distinct, witheigenvectors 1x and 2x . Let

1 2P x x and 1

2

0

0

Λ .

Then1 P AP Λ , or equivalently 1A PΛP .

We say that A is diagonalizable.

We note first that 1P exists, since 1x and 2x are linearly independent. Theequations 1 1 1Ax x and 2 2 2Ax x can be put together into a single equationAP PΛ . The result follows.

Result (9) is extremely helpful when large powers of matrices have to be computed,because

1 1 1 1 1

1

1

...

...

n

n

A PΛP PΛP PΛP PΛP PΛP

PΛΛΛ ΛΛP

PΛ P

Furthermore, nΛ is computationally cheap because it is diagonal, so computing1n PΛ P tends to produce more accurate results than multiplying nA directly, as

relatively fewer computational steps are required. This result is also helpful forderiving theoretical results, e.g. for which matrices will n A 0 as n ? Ans:matrices whose eigenvalues are all less than one in absolute value.


10. Suppose A is symmetric (so we know its eigenvalues are real) with distincteigenvalues 1 2 . Then the eigenvectors 1x and 2x are orthogonal, i.e. 1 2 0 x x .

Proof

We have 1 1 1Ax x . Pre-multiplying the first by T2x gives T T

2 1 1 2 1x Ax x x . Takingtranspose gives T T T T T T

2 1 1 2 1 2 1 1 2( ) x Ax x A x x Ax x x , where we have used thefact that A is symmetric. We also have 2 2 2Ax x . Pre-multiplying by T

1x givesT T1 2 2 1 2x Ax x x . Therefore

T T T1 1 2 2 1 2 1 2 1 2( ) 0 x x x x x x .

Since 1 2 , we have T1 2 0x x . If the eigenvectors were chosen to have unit length,

then the eigenvectors are orthonormal:

T1 2 0x x and T

1 1 1x x .

If we pick the unit eigenvectors when constructing the matrix P in result (9), wewould have T P P I . In other words, we have 1 T P P . We say that the matrix P isorthonormal.

What about the case where the two eigenvalues are not distinct? For a (2 2)symmetric matrix A , 1 2 ( ) iff

0

0

A

which is already diagonal. (Or put differently, every vector x is an eigenvector, andwe are at liberty to pick a pair of orthonormal eigenvectors to construct the matrix P ,so we pick T1 1 0x and T2 0 1x , i.e., we pick P to be the identity matrix.)

We can therefore re-state result (10) as the next result:

11. Suppose A is symmetric (in which case we know its eigenvalues are real,though perhaps not distinct). Then there exists an orthonormal matrix P such that

1 P AP Λ , or equivalently 1A PΛP

where

1

2

0

0

Λ and 1 2P x x ,

and where 1x and 2x are eigenvectors with unit length associated with the

eigenvalues 1 and 2 respectively. This is the “Spectral Theorem for Symmetric

Matrices”, stated and proved here for (2 2) matrices. (The result applies to generalsymmetric matrices).


Exercises

1. The (non-symmetric) matrix5 2

4 1

A has (real, distinct) eigenvalues 1 3

and 2 1 .

(a) Verify that 1 2( )tr A and 1 2det( ) A ;

(b) Verify that the eigenvalues of 2A are 21 and 2

2 ;

(c) Verify that the eigenvalues of 1A are 11/ and 21/ ;

(d) Verify that the eigenvalues of A are the same as those of A . What are theassociated eigenvectors?

(e) Verify that the eigenvectors of A are linearly independent;

(f) Verify the diagonalization formula in result (10);

(g) Show that the eigenvectors are not orthogonal.

2. Find the eigenvalues and associated eigenvectors of1 2

2 4

A . Verify result

(11).

3. Eigenvectors are often normalized to length one. For instance, the eigenvector

1

1

x of the matrix1 2

3 0

A can be normalized to1/ 2

1/ 2

x so that

1/22 2| | (1 / 2) (1/ 2) 1 x .

Normalize the eigenvectors in Qn.1.


13.3 The general case

We take a brief look at the (3 3) case before moving to the general case. Let

11 12 13

21 22 23

31 32 33

a a a

a a a

a a a

A

Its characteristic equation is

22 23 21 23 21 2211 12 13

32 33 31 33 31 32

det( )

( )det det deta a a a a a

a a aa a a a a a

A I

(3.1)

where we have expanded the determinant along the first row. The eigenvalues of Aare the roots of this equation. As before, the roots may be real or complex, they mayall be distinct or there may be repeated roots. If a root is repeated once, we say it hasmultiplicity 2. In the special case where A is (upper or lower) triangular or diagonal,the characteristic equation is particular easy to compute, since the determinant of suchmatrices is simply the product of its diagonal. In such cases, the eigenvalues aresimply the diagonal elements of the matrix.

Without expanding the expansion any further, we note that the characteristic equationis an order 3 polynomial, which we can write as

3 23 2 1 0( ) b b b b (3.2)

The third power of appears only in the first term of the determinant expansion, and

has coefficient 3( 1) . An order three polynomial has three roots (the eigenvalues), sowe can write this equation as

31 2 3( ) ( 1) ( )( )( ) (3.3)

From (3.1), the determinant of A can be found by setting 0 . Doing so inequation (3.3) show that

1 2 3det( ) A

As in the (2 2) case, the det( ) 0A iff one or more of the eigenvalues are zero.


Observe that the second power of also appears only in the first term in theexpansion (3.1). Expanding the first term in (3.1) we have

11 22 33 23 32

11 22 33 11 23 32

( ) ( )( )

( )( )( ) ( )

a a a a a

a a a a a a

(3.4)

so the second power of in fact only appears in 11 22 33( )( )( )a a a .Expanding this expression further, you can easily verify that the coefficient on 2 is

11 22 33a a a , which is ( )tr A . Expanding (3.3), we see that the coefficient on 2there is 1 2 3 . Matching coefficients, we see that

1 2 3( )tr A .

All these results extend to the general ( )n n case: for an ( )n n matrix A we have

1 2 3det( ) ... n A and 1 2( ) ... ntr A .

The arguments for results (5)-(10) in Section 2 also carry over to the ( )n n casewith little or no amendment: if 1 is an eigenvalue of A , then 1 is also aneigenvalue of TA , 1

n is an eigenvalue of nA , and if A is invertible, then 11/ isan eigenvalue of 1A ; Eigenvectors associated with distinct eigenvalues are linearlyindependent, and A has n distinct eigenvalues 1 2, , ..., n , with associatedeigenvectors 1 2, , ..., nx x x , then it is diagonalizable: letting

1 2 nP x x x and

1

2

0 0

0 0

0 0 n

Λ

.

Then1 P AP Λ , or equivalently 1A PΛP .

The Spectral Theorem for Symmetric Matrices also continues to hold: if the( )n n matrix A is symmetric, then

(i) all its eigenvalues are real;

(ii) the eigenvectors the correspond to different eigenvalues are orthogonal,

(iii) there exists an orthonormal matrix P (i.e. 1 T P P ) comprising the uniteigenvectors such that

1 P AP Λ , or equivalently 1A PΛP


The proof in the case when the eigenvalues are distinct is a simply extrapolation ofthe argument following result (11). For the proof when there are repeated roots, seeStrang (2009) Section 6.4.

Exercises

1. Suppose Σ is symmetric, with non-zero eigenvalues. Use the spectral

theorem to define the matrix 1/2Σ such that 1/2 1/2Σ Σ Σ . Similarly, find 1/2Σ such

that 1 1/2 1/2 Σ Σ Σ .


Section 14 Applications of Eigenvalues

Eigenvalues and the Rank of a Matrix We often have to determine the rank of amatrix (the number of linear independent rows or columns contained in the matrix).The Spectral Theorem for symmetric matrices makes it very easy to do so. We usethe following result:

If A is ( )m n and B is ( )n n and full rank, then ( ) ( )rank rankAB A

which extends to the product CAB where both C and B are full rank:

( ) ( )rank rankCAB A .

This means that if A is a square diagonalizable matrix, then

1( ) ( ) ( )rank rank rank A PΛP Λ

The rank of Λ is simply the number of non-zero terms on the diagonal, i.e. thenumber of non-zero eigenvalues of A .

Furthermore, if A is symmetric and idempotent, then the eigenvalues of A take onvalues 1 or 0 only, i.e., the diagonal of Λ comprises only 1’s and 0’s. In this case,

( ) trace( )rank Λ Λ

(remember that the trace of a square matrix is just the sum of the elements on itsdiagonal.) Furthermore, because

1Λ P AP

we have1 1( ) ( ) ( ) ( )trace trace trace trace Λ P AP PP A A

where we have used the fact that ( ) ( )trace traceAB BA when both products exist.It is therefore very easy to find the rank of a symmetric idempotent matrix. Simplyadd up the elements in its diagonal – you don’t even have to find the eigenvalues!

The result1( ) ( ) ( )rank rank rank A P ΛP Λ

always works for symmetric matrices, since symmetric matrices are alwaysdiagonalizable. But this approach can also work with non-diagonalizable matrices,and even non-square matrices. Recall that for any matrix A ,

T( ) ( )rank rankA A A

Since TA A is symmetric, we can compute its rank, and therefore the rank of A , bycomputing and counting the number of non-zero eigenvalues possessed by TA A .


Exercises (incomplete)

1. Let X be an arbitrary ( )n k matrix such that 1( )X X exists. Because X Xis symmetric (why?), we know that 1( )X X is also symmetric (why?). Use this factto show that the matrix

( ) I X X X X

is symmetric and idempotent. Find its rank.


Positive Definiteness of Quadratic Forms A quadratic form is an expression ofthe form

TQ x Ax

When A is (2 2) , this expression is

11 12 1T 2 21 211 1 12 21 1 2 22 2

21 22 2( )

a a xx xa x a a x x a x

a a x

Q x Ax

For a general ( )n n matrix, we have

11 12 1 1

21 22 2 2T 1 2

1 1

1 2

nn n

nnij i j

i j

n n nn n

a a a x

a a a xx x xa x x

a a a x

Q x Ax

The following are examples of quadratic forms:

(i) 1 2 21 2

1 1 1 2 22

1 12

1 1

xx xq x x x x

x

(ii) 1 2 21 2

2 1 22

2 02

0 1

xx xq x x

x

(iii) 1 2 21 2

3 1 1 2 22

1 24 3

2 3

xx xq x x x x

x

(iv) 1 2 21 2

4 1 1 2 22

1 26 3

4 3

xx xq x x x x

x

The quadratic forms in (i), (ii) and (iii) involve symmetric matrices, whereas (iv) doesnot. Note, however, that for any quadratic form involving a non-symmetric matrix,there is an equivalent one using a symmetric matrix. In the (2 2) case, replace both

12a and 21a by 12 21( ) / 2a a . For example, corresponding to (iv) we have

1 2 21 24 1 1 2 2

2

1 36 3

3 3

xx xq x x x x

x

.

In other words, we can limit ourselves to studying quadratic forms involvingsymmetric matrices.

For larger ( )n n , matrices, replace A with T( ) / 2A A .


In applications, we often need to determine the sign of a quadratic form for arbitraryvalues of x 0 . For instance, the quadratic form 1q can be written as

2 2 21 1 1 2 2 1 22 ( )q x x x x x x

so we know that 1 0q no matter what values of 1x and 2x are chosen. We call suchquadratic forms positive semi-definite. The quadratic form 2q has a similar (butstronger) behavior. This quadratic form can be written as

2 22 1 22 0q x x

for all values of 1x and 2x , not both equal to zero at the same time. This quadraticform is said to be positive definite. The quadratic form 3q , however, is “indefinite”:writing

2 23 1 2 1 1 2 2( , ) 4 3q x x x x x x

we have 3( 2,1) 4 8 3 1q whereas 3(2,1) 4 8 3 15q . That is, thequadratic form is negative for some values of 1x and 2x , and positive for others.

A quadratic form TQ x Ax is said to be

-- positive definite if T 0x Ax for all x 0 ;

-- positive semi-definite if T 0x Ax for all x 0 ;

-- negative definite if T 0x Ax for all x 0 ;

-- negative semi-definite if T 0x Ax for all x 0 ;

Although definiteness pertain to the quadratic form TQ x Ax , we often apply theterm to the matrix A . One application of these concepts is in optimization theorywhere the second order condition often involves determining the definiteness of theHessian matrix. Another application is in ‘comparing matrices’.

One interesting fact about definite symmetric matrices is they can be factorized intothe product of a triangle matrix and its transpose. We state and explain this forpositive definite symmetric matrices:

Triangular Factorization Any positive definite symmetric( )n n matrix A has a unique representation of the form

TA LDL

where L is lower triangular with ones down the diagonal, and D isdiagonal with positive diagonal elements.


We demonstrate this for a positive definite symmetric (3 3) matrix

11 12 13

21 22 23

31 32 33

a a a

a a a

a a a

A .

Because A is positive definite, we have 11 0a (...pick T 1 0 0x ). We can

construct the elimination matrix

21

11

31

11

1

1 0 0

1 0

0 1

a

a

a

a

E

You can easily verify that

11T

1 1 22 23

32 33

0 0

0

0

a

b b

b b

E AE

where 1 1 11/ij ij i jb a a a a .

Note that if A is positive definite, then T1 1E AE must also be positive definite: define

y such that T1x E y which then allows us to write T T T

1 1 y E AE y x Ax . Because

1E is non-singular, x 0 y 0 . Therefore T 0x Ax for all x 0 implies

T T1 1 0y E AE y for all y 0 .

Because T1 1E AE is positive definite, 22 0b (... pick T [0 1 0]y ), so we can

define

32

22

2

1 0 0

0 1 0

0 1b

b

E

You can verify that

11T T

2 1 1 2 22

33

0 0

0 0

0 0

a

b

c

E E AE E

where2

32

2233 33

b

bc b .


Call the resulting diagonal matrix D . Because 1E and 2E are lower triangular withones down the diagonal, 2 1E E has the same structure. Furthermore, 1

2 1( )L E Eexists, and has the same structure. Therefore

T T T2 1 1 2 E E AE E D A LDL .

We can go one step further. Defining

11

22

33

0 0

0 0

0 0

a

b

c

D ,

we have T 1/2 1/2 T T A LDL LD D L CC where 1/2C LD is also lower triangular.This is the Cholesky factorization (or Cholesky decomposition) of A . Thesedecompositions are related to, but different from, the eigenvalue-eigenvectordecomposition discussed earlier.

These arguments readily extend to the general case.

Proving the indefiniteness of a matrix merely requires providing counter examples, aswe did for 3q above. To prove the definiteness of a matrix is much harder, becausewe have to show that the requisite sign holds for all x 0 . The objective here is todevelop criteria for evaluating the definiteness of a matrix. We provide two sets ofresults, one using principal minors, and the other using eigenvalues. It is worthwhilepreviewing the results with (2 2) matrices, before results for the general case aregiven.

For a given (2 2) symmetric matrix 11 12

12 22

a a

a a

A , let

T 2 21 2 11 1 12 1 2 22 2( , ) 2x x a x a x x a x Q x Ax ,

for arbitrary x 0 . Then

(i) 1 2( , )x xQ is positive semi-definite 11 0a , 22 0a , and 211 22 12 0a a a ;

(ii) 1 2( , )x xQ is positive definite 11 0a and 211 22 12 0a a a ;

(iii) 1 2( , )x xQ is negative semi-definite 11 0a , 22 0a , and 211 22 12 0a a a ;

(iv) 1 2( , )x xQ is negative definite 11 0a and 211 22 12 0a a a .


Proof:

(i) Suppose 11 0a , 22 0a , and 211 22 12 0a a a .

Case 1: 11 0a . Then 211 22 12 0a a a implies 12 0a , so the quadratic form is

21 2 22 2( , ) 0x x a x Q .

Case 2: 11 0a . Then

2 21 2 11 1 12 1 2 22 2

2 212 2211 1 1 2 2

11 11

2 22 212 22 12

11 1 2 2 211 11 11

2 2212 11 22 12

11 1 2 2211 11

( , ) 2

( 2 )

(*)

x x a x a x x a x

a aa x x x x

a a

a a aa x x x x

a a a

a a a aa x x x

a a

Q

With 11 0a and 211 22 12 0a a a , clearly 1 2( , ) 0x x Q for all 1 2,x x not both equal

to zero.

We now show the converse: Suppose 2 21 2 11 1 12 1 2 22 2( , ) 2 0x x a x a x x a x Q for

all 1 2,x x not both equal to zero. Then, in particular, we have

11(1,0) 0a Q , and 22(0,1) 0a Q .

If 11 0a , then 1 12 1 22( ,1) 2 0x a x a Q , which implies 12 0a , since if 12 0a ,we can make 1( ,1) 0x Q by choosing 1x a large enough negative number, and if

12 0a , we can make 1( ,1) 0x Q by choosing 1x to be a large enough positivenumber. Then 2

11 22 12 0a a a . If 11 0a , then we must have 211 22 12 0a a a ,

otherwise we can make 1 2( , ) 0x x Q by choosing 1x and 2x to make

121 2

110

ax x

a .

Result (iii) is proved in a similar fashion.

(ii) Suppose 11 0a and 211 22 12 0a a a . Then from (*) , 1 2( , ) 0x x Q .

Suppose 1 2( , ) 0x x Q for all 1 2,x x not both equal to zero. Then 11(1,0) 0a Q .Because 11 0a , we can write (*) . Then 2

12 11 11 22 12 11( / ,1) ( ) / 0a a a a a a Q ,which implies 2

11 22 12 0a a a .

Result (iv) is proved in similar fashion.


We re-state this result for general ( )n n symmetric matrices (without proof):

Theorem Consider the ( )n n symmetric matrix A and the associatedquadratic form

T

1 1

( )n n

ij i ji j

Q a x x

x x Ax ,

where x is an arbitrary non-zero n -dimensional vector. Let kD be the k-thleading principal minor of A , and k denote an arbitrary principal minors oforder k . Then ( )Q x is

(a) positive definite 0kD for 1,2,...,k n

(b) positive semi-definite 0k for 1,2,...,k n .

(c) negative definite ( 1) 0kkD for 1,2,...,k n .

(d) negative semi-definite ( 1) 0kk for 1,2,...,k n .

It should be clear that the results stated and proved for the (2 2) case is a specialcase of this theorem.

(Note: to add definition of principal minors and leading principal minors. For themoment, look it up.)

An Eigenvalue Approach Eigenvalues provide a more convenient way todetermine definiteness of quadratic forms. Because we are dealing with symmetricmatrices, which are diagonalizable, we can rewrite our quadratic form as

T T T T 2 2 2 21 1 2 2 1 1( ) ... n n n ny y y y Q x x Ax x PΛP x y Λy

where Ty P x . It should be clear that ( ) 0Q x for all x 0 is equivalent to

( ) 0Q x for all y 0 , which is in turn equivalent to 0r , 1,...,r n . Similarly,( ) 0Q x for all x 0 is equivalent to ( ) 0Q x for all y 0 , which is in turn

equivalent to 0r , 1,...,r n . That is,

For a given symmetric ( )n n matrix A , let T( ) Q x x Ax , for arbitrary x 0 .Then

(i) ( )Q x is positive semi-definite 0r , 1,...,r n

(ii) ( )Q x is positive definite 0r , 1,...,r n

(iii) ( )Q x is negative semi-definite 0r , 1,...,r n

(iv) ( )Q x is negative definite 0r , 1,...,r n


Exercises (not available yet)


15-1

Section 15 Vectors of Random Variables

When working with several random variables 1X , 2X , ..., nX , it is often convenientto arrange them in vector form

1

2

n

X

X

X

x

We can then make use of matrix algebra to help us organize and manipulate largenumbers of random variables simultaneously. We define the expectation of a randomvector as element-by-element expectation:

1

2

[ ]

[ ][ ]

[ ]n

E X

E XE

E X

x

.

If X is an ( )m n matrix of random variables, then [ ]E X is the ( )m n matrixwhere the ( , )thi j element is the mean of the ( , )thi j element of X , i.e.,

if

11 12 1

21 22 2

1 2

n

n

m m mn

X X X

X X X

X X X

X

, then

11 12 1

21 22 2

1 2

[ ] [ ] [ ]

[ ] [ ] [ ][ ]

[ ] [ ] [ ]

n

n

m m mn

E X E X E X

E X E X E XE

E X E X E X

X

.

These definitions provide a neat way for computing the variances and covariances ofthe variables in X “all at once”:

1 1 1 1 1 1 2 2 1 1

2 2 1 1 2 2 2 2 2 2

1 1 2 2

var[ ] [( [ ])( [ ]) ]

( [ ])( [ ]) ( [ ])( [ ]) ( [ ])( [ ])

( [ ])( [ ]) ( [ ])( [ ]) ( [ ])( [ ])

( [ ])( [ ]) ( [ ])( [ ]) (

n n

n n

n n n n n

E E E

X E X X E X X E X X E X X E X X E X

X E X X E X X E X X E X X E X X E XE

X E X X E X X E X X E X X

x x x x x

1 1 2 1

2 1 2 2

1 2

[ ])( [ ])

var[ ] cov[ , ] cov[ , ]

cov[ , ] var[ ] cov[ , ]

cov[ , ] cov[ , ] var[ ]

n n n

n

n

n n n

E X X E X

X X X X X

X X X X X

X X X X X

We call var[ ]x the variance-covariance matrix of x .


15-2

The formula var[ ] [( [ ])( [ ]) ]E E E x x x x x can be viewed as the matrix versionof the variance formula 2var[ ] [( [ ]) ]X E X E X for a single variable.

Sometimes we want to compute a ‘covariance matrix’ between two vectors of randomvariables x and y . We can compute

1 1 1 1 1 1 2 2 1 1

2 2 1 1 2 2 2 2 2 2

1 1 2 2

cov[ , ] [( [ ])( [ ]) ]

( [ ])( [ ]) ( [ ])( [ ]) ( [ ])( [ ])

( [ ])( [ ]) ( [ ])( [ ]) ( [ ])( [ ])

( [ ])( [ ]) ( [ ])( [ ]) (

n n

n n

n n n n

E E E

X E X Y E Y X E X Y E Y X E X Y E Y

X E X Y E Y X E X Y E Y X E X Y E YE

X E X Y E Y X E X Y E Y X

x y x x y y

1 1 1 2 1

2 1 2 2 2

1 2

[ ])( [ ])

cov[ , ] cov[ , ] cov[ , ]

cov[ , ] cov[ , ] cov[ , ]

cov[ , ] cov[ , ] cov[ , ]

n n n n

n

n

n n n n

E X Y E Y

X Y X Y X Y

X Y X Y X Y

X Y X Y X Y

Rules for dealing with the mean vector and the variance-covariance matrix

If x is an ( 1)n vector of random variables, X is an ( )m n matrix of randomvariables, b is an ( 1)m vector of constants, and A is an ( )m n matrix ofconstants, then

1. [ ] [ ]E E Ax b A x b

2. var -cov[ ] var[ ] Ax b A x A .

In particular,

1 1 2 2 1 1var[ ... ] var[ ] var[ ] cov[ , ]

n nn n i j i ji j

c X c X c X c c X X

c x c x c

3. A useful result is

11 22

11 22

[ [ ]] [ ... ]

[ ] [ ] ... [ ]

[ [ ]]

nn

nn

E tr E X X X

E X E X E X

tr E

X

X

.

The first of these is straightforward to show by simply writing out the expressionAx b in full and taking expectations. This formula is the matrix version of the

usual single variable result

[ ] [ ]E aX b aE X b


15-3

To show (2), plug Ax b into the variance formula:

var[ ] [( [ ])( [ ]) ]

[( [ ] )( [ ] ) ]

[( [ ])( [ ]) ]

[ ( [ ])( ( [ ])) ]

[ ( [ ])( [ ]) ]

[( [ ])( [ ]) ]

var[ ]

E E E

E E E

E E E

E E E

E E E

E E E

Ax b Ax b Ax b Ax b Ax b

Ax b A x b Ax b A x b

Ax A x Ax A x

A x x A x x

A x x x x A

A x x x x A

A x A

This is the matrix version of the single variable result

2var[ ] var[ ]aX b a X .

Note that a variance-covariance matrix must be positive-definite --

1 1 2 2var[ ... ] var[ ] var[ ]n nc X c X c X c x c x c

has to be positive for all c 0 , since variances must be positive.

Exercise Let 1

2

X

X

x be a vector of random variables, and

11 12

21 22

a a

a a

A and 1

2

b

b

b

be constants. Write out Ax b in full, and take expectations to show that

[ ] [ ]E E Ax b A x b

The Multivariate Normal Distribution

The random vector x follows the multivariate normal distribution with

mean

1

2[ ]

n

E

x μ

and variance-covariance matrix

2 2 211 12 12 2 221 22 2

2 2 21 2

n

n

n n nn

Σ

if its distribution has the form

/ 2 1/ 2 1( ) (2 ) | | exp{ (1/ 2)( ) ' ( )}nf x Σ x μ Σ x μ .


15-4

We denote this by ~ ( , )Nx μ Σ . This is analogous to the univariate normal pdf:

2

22

1 1 ( )( ) exp

22X

xf x

Exercise Write the joint pdf out without matrix notation for the bivariate case

1

2

X

X

x .

Then show that1 2 1 2, 1 2 1 2( , ) ( ) ( )X X X Xf x x f x f x when 12 21 0 . What does this

say?

Exercise Use computer software (say matlab) to plot the distribution of thebivariate normal for various parameter values.

The following are important properties of the multivariate normal:

4. The conditional distributions are also normal. In particular, if

1 1 11 12

2 2 21 22

~ ,N

x μ Σ Σx μ Σ Σ

where 1x and 1μ are 1( 1)n , 2x and 2μ are 2( 1)n , 11Σ is 1 1( )n n , 12Σ is

1 2( )n n , 21Σ is 2 1( )n n , and 2 2( )n n , then

a. the marginal distribution for 1x is 1 11( , )N μ Σ

b. the conditional distribution for 1x given 2x is 1|2 11|2( , )N μ Σ where

11|2 1 12 22 2 2( ) μ μ Σ Σ x μ and 1

11|2 11 12 22 21 Σ Σ Σ Σ Σ

Exercise In the bivariate case,2

1 1 11 12

22 2 21 22

~ ,X

NX

(4a) says that 21 1 11~ ( , )X N and 2

2 2 22~ ( , )X N . Write out the expressions in(4b). Note in particular that the conditional mean of 1X given 2X is a linear functionof 2X .


15-5

Exercise Using the expressions for the conditional mean and conditional varianceof 2X given 1X , show that

1 2 2 1 1, 1 2 | 2 1 1( , ) ( | ) ( )X X X X Xf x x f x x f x .

(You can make a similar argument for the general 1x and 2x case.)

5. If ~ ( , )Nx μ Σ , then ~ ( , )N Ax b Aμ b AΣA ; The expression Ax b isnormal because linear combinations of normal random variables remain normal. Theformulae for the mean and variance-covariance matrix are the usual ones.

6. If ~ ( , )Nx μ Σ and 2 2 21 2( , ,..., )ndiag Σ , then the random variables in x are

independent.

The following make use of the fact that

The square of a standard normal variable has a 21 distribution;

The sum of n independent 21 is a 2

n ;

If ~ (0,1)X N , 2~ nY and X and Y are independent, then ~/

nX

tY n

If 21 ~ nY , 2

2 ~ mY , and 1Y and 2Y are independent, then 1( , )

2

/~

/ n mY n

FY m

We have

7. If ~ ( , )Nx 0 I and A is symmetric, and idempotent with rank J , then the scalarrandom variable 2

( )~ Jx Ax . In particular, 2( )~ nx x .

ProofBecause A is symmetric, we can write A CΛC , with C C I . Note that

~ ( , )NC x 0 I because var( ) C x C IC C C I . That is, C x is a vector ofindependently distributed standard normal variables.

Write

21

ni iiy

x Ax x CΛC x y Λy

where y C x . Each 2iy is an independent chi-sq degree one, since the iy ’s are

independent standard normal variables. Because A is idempotent, there are J

i ’s equal to one, and ( )n J i ’s that are zero. Relabeling the iy ’s so that thefirst i ’s are equal to one, we have

21

Jii

y

x Ax

which is a sum of J independent 2J . Therefore 2

( )~ Jx Ax .


15-6

8. If ~ ( , )Nx μ Σ , then 1 2( )( ) ( ) ~ n x μ Σ x μ .

Proof

Σ is positive definite, symmetric, and full rank. Therefore we can write1 1/ 2 1/ 2 Σ Σ Σ . Note that 1/ 2 ( ) ~ ( , )N z Σ x μ 0 I , therefore

1 1/ 2 1/ 2 2( )( ) ( ) ( ( )) ( ) ~ n x μ Σ x μ Σ x μ Σ x μ z z .

9. If ~ ( , )Nx 0 I , and A and B are symmetric and idempotent, then x Ax andx Bx are independent if AB 0 .

Proof

Because A and B are symmetric and idempotent, we have A A A and B B B . Therefore we can write the quadratic forms as

( ) ( ) x Ax x A Ax Ax Ax . Because x is normal with mean 0 , Ax also normalwithmean 0 . For vectors of zero mean random variables, cov[ , ] [ ]E x y xy(why?). We have

cov , [ ]E E Ax Bx Axx B A xx B AB AB .

Therefore, AB 0 implies that Ax and Bx are normally distributed, withcovariance 0 . This implies that Ax and Bx are independent (why?), andtherefore the quadratic forms x Ax and x Bx are also independent.

It follows from (7) that[ / ( )]

[ / ( )]

rank

rank

x Ax Ax Bx B

is distributed ( ) , ( )rank rankF A B .

10. If ~ ( , )Nx 0 I , and A is symmetric and idempotent, then Lx and x Ax areindependent if LA 0 .

Proof

Same idea as in (9).

We use these results to prove some standard results in statistics. Suppose

1 2, ,..., nX X X are n independent draws from a 2( , )N distribution, i.e.

2~ ( , )N x μ I .

where μ is an ( 1)n vector

1

1

1

μ i

.


15-7

We know that the sample mean

1

1 n

ii

X Xn

is normally distributed (because a linear combination of normal variables is normal)with mean

1 1

1 1 1[ ]

n n

i ii i

E X E X E X nn n n

and variance

2

22 2

1 1

1 1 1var[ ] var var

n n

i ii i

X X X nn nn n

Furthermore,2

~ ,X Nn

implies that

2~ (0,1)

/

XN

n

Unfortunately, this result is not very helpful if, for instance, you want to test a

hypothesis on , since 2 is general unknown, and must be estimated. An unbiasedestimator is

2 2

1

1ˆ ( )1

n

ii

X Xn

To show this, note that 2 2 2

1 1

( )n n

i ii i

X X X nX

. Each iX has variance

2 2 2var[ ] [ ]i iX E X ,

so

2 2 2 2 2 2

1 1 1

n n n

i ii i i

E X E X n n

.

Also,2

22 2varE X X E Xn

. Putting all this together, we have:


15-8

2 2

1

2 2

1

22 2 2

2

2

1ˆ ( )1

1

1

1

1

1( 1)

1

.

n

ii

n

ii

E E X Xn

E X nXn

n n nn n

nn

One idea, then, is to substitute 2 with 2̂ in2 /

X

n

to get the ‘t-statistic’

2ˆ /

Xt

n

.

Unfortunately, the t -statistic does not have a standard normal distribution.

We use the results discussed earlier to derive the distribution of the t - statistic. We

begin by deriving a matrix expression for 2̂ . Observe first that

1

21

1 1

2

1

11 0 0 1 11 1 1 1 1 1

0 1 0 1 1( ( ) )

0 0 1 1 1

1 (1 / )

1

1

n

nii

n

X

X

X

X n X

X

X

X

I i i i i x

2

n

X

X X

X X


15-9

An interesting property of the matrix 1( ) M I i i i i is that it is symmetric andidempotent, with rank 1n .

Symmetric: 1 1 1( ) ( ) ( ) M I i i i i I i i i i I i i i i M

Idempotent:

1 1

1 1 1 1

cancels out

1 1 1

( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( )

MM I i i i i I i i i i

II i i i i I Ii i i i i i i i i i i i

I i i i i i i i i i i i i

M

Because M symmetric and idempotent, ( ) ( )rank traceM M , and

1 1 1( ) ( ) ( ( ) ) ( ( ) ) 1trace trace trace n trace n I i i i i I i i i i i i i i

Therefore

2 2

1

1 2 1

2

1ˆ ( )1

1

1

1

1

1

1

n

ii

n

n

X Xn

X X X X X X X X

X X

n

X X

n

n

x M Mx

x Mx

Furthermore, note that

1~ ( , )N

x μ 0 I

( ) Mx M x μ since Mμ 0 (why?)

Together with the fact that M is symmetric and idempotent with rank 1n , result (7)implies that

212 2

1 1( ) ( ) ~ n

x Mx x μ M x μ .


15-10

This result is consistent with the fact that 2 2ˆ[ ]E . The mean of a 21n is 1n ,

i.e.

2

1[ ] 1E n

x Mx .

Since 2 1ˆ1n

x Mx , the result follows. The fact that

22

12 2

ˆ( 1) 1~ n

n

x Mx ,

which we have just shown, is of course a much stronger one. The result that2 2ˆ[ ]E does not depend on the normality of x . If we have normality of x , then

we have the 2 result.

Finally, note that

11( ) ( ) ~ (0,1)

nX N

i i i x μ ,

and that

1 1 1 1 1 1

cancels out

( ) ( ) ( ) ( ) ( ) ( ) i i i M i i i I i i i i i i i i i i i i i i 0

which says that

11( ) ( ) ~ (0,1)

nX N

i i i x μ and

22

12 2

ˆ( 1) 1~ n

n

x Mx

are independent. Therefore,

12 221

2

( ) (0,1)~

ˆ / ˆ( 1)( 1)

1

n

n

nXX N

t tn n

nn

.

Of course, if iX ’s are not random draws from 2( , )N , then all of these results donot hold (except [ ]E X , 2var[ ] /X n and 2 2ˆ[ ]E , which does not requirethe normality assumption). Under reasonable conditions, the t -statistic will convergeto the normal as the sample size grows.


16-1

Section 16 Differentiation of Matrix Forms

There are no new ‘calculus’ results here. We merely have a few definitions so that wecan take derivatives of functions expressed in matrix form.

Definitions

Given 1 2( , ,..., )ny f x x x , let

1

2

n

x

x

x

x

, then define

1

2

/

/

/ n

y x

y xy

y x

x .

Example If 2 21 2 3y x x x , then

21 2 32

1 2 32 2

1 2

2

2

x x xy

x x x

x x

x

.

Example If 1 2 3 4y x x x x , then

2

1

4

3

x

xy

x

x

x

Example Let 1 21 4 2 3

3 4

detx x

y x x x xx x

, then

4

3

2

1

x

xy

x

x

x

The case of linear functions and quadratic forms are particularly important. We have:

Example If y c x where c and x are ( 1)n , theny

c

x.

To see this, simply expand and differentiate: 1 1 2 2 ... n ny c x c x c x c x , so

1 1

2 2

/

/

/ n n

y x c

y x cy

y x c

cx


16-2

Example If y x Ax where x is ( 1)n and A is ( )n n , then

( )y

A A xx

It is easiest to see this in the 2n case: let

11 12

21 22

a a

a a

A

Then

11 12 1 2 21 211 1 12 1 2 21 1 2 22 2

21 22 2

a a xx xy a x a x x a x x a x

a a x

Then

1 11 1 12 21 2 11 12 21 1

12 21 1 22 2 12 21 22 2

2

2 ( ) 2

( ) 2 2

y

x a x a a x a a a xy

a a x a x a a a xy

x

x,

and it is easily seen that 11 12 21

12 21 22

2

2

a a a

a a a

A A .

If A is symmetric, then A A , so

2y

Ax

x,

which bears considerable resemblance to the univariate case where / 2dy dx ax

when 2y ax . Remember that x Ax is the matrix version of the quadratic function.

**********

The general principle is that the shape of the derivative matrix follows that of thedenominator of the derivative, so that for 1 2( , ,..., )ny f x x x ,

1 2 n

y y y y

x x x

x

since 1 2 nx x xx .

Example If y c x where c and x are ( 1)n , theny

cx

.


16-3

Example If

1 21 4 2 3

3 4

detx x

y x x x xx x

,

we have 4 3 2 11 2 3 4

y f f f fx x x x

x x x x

x

.

This “row” form of the vector derivative is most often applied to situations where weare differentiating a vector of functions, i.e. when

1 1 1 2

2 2 1 2

1 2

( , ,..., )

( , ,..., )

( , ,..., )

n

n

m m n

y f x x x

y f x x x

y f x x x

y

,

in which case

yx

is the application of the1 2 n

y y y y

x x x

x

to each

individual iy to get

1 1 1 2 1

2 1 2 2 2

1 2

/ / /

/ / /

/ / /

n

n

m m m n

y x y x y x

y x y x y x

y x y x y x

yx

.

Example If y Ax where A is ( )m n and x is ( 1)n , then

yA

x.

11 12 1 1 11 1 12 2 1

21 22 2 2 21 1 22 2 2

1 2 1 1 2 2

n n n

n n n

m m mn n m m mn n

a a a x a x a x a x

a a a x a x a x a x

a a a x a x a x a x

y Ax

.

Then

1 1 1 2 1 11 12 1

2 1 2 2 2 21 22 2

1 2 1 2

/ / /

/ / /

/ / /

n n

n n

m m m n m m mn

y x y x y x a a a

y x y x y x a a a

y x y x y x a a a

yA

x

.


16-4

Another application is to get a matrix of second derivatives: if 1 2( , ,..., )ny f x x x ,we have

2 2 2

21 1 2 1

2 2 22

22 1 2 2

2 2 2

21 2

n

n

n n n

y y y

x x x x x

y y yy y

x x x x x

y y y

x x x x x

x x x x

.

This is called the Hessian of ( )f x and written ( )f x . Young’s Theorem says thatthis will be a symmetric matrix.

Example If y x Ax where x is ( 1)n and A is ( )n n , then ( )y

A A xx

,

and therefore

2

( )y

A Ax x

, 2 A if A is symmetric.

This should remind you of the fact that 2 2/ 2d y dx a when 2y ax .

Finally, we can use the principle that the shape of the derivative matrix follows that ofthe denominator of the derivative, to get derivatives with respect to matrices (asopposed to with respect to vectors):

Example If 1 2

3 4

x x

x x

X , then

1 2 4 3

2 1

3 4

det( ) det( )

det( )det( ) det( )

x x x x

x x

x x

X X

XX XX

.

Taking this further, consider the derivativeln(det )

XX

. Applying the chain rule:

1 2 4 3 1

2 1

3 4

det det

ln(det ) 1 1( )

det detdet det

x x x x

x x

x x

X X

XX

X XX X X

since 4 21

3 1

1

det

x x

x x

XX

.


16-5

This is true also for general ( )n n matrices. Note: to write down ln(det )X we oughtto have det 0X , which is true if X is positive-definite. Furthermore, if A issymmetric, then the inverse is symmetric, and we do not require the transposition ofthe inverse.

As a final example, consider the quadratic form y x Ax , and considerdifferentiating it with respect to A (earlier we differentiated with respect to x ). Tosimplify the exposition, we again focus on the (2 2) case, where

11 12 1 2 21 211 1 12 1 2 21 1 2 22 2

21 22 2

a a xx xy a x a x x a x x a x

a a x

We have

211 12 1 1 2 1 1 2

221 2 2

21 22

y y

a a x x x x x x

xy y x x xa a

x Axxx

A

There are other important forms. These will do for now.

Summary

c x

cx

;

( ) x Ax

A A xx

, 2 Ax if A is symmetric;

2 2 2

21 1 2 1

2 2 22

22 1 2 2

2 2 2

21 2

n

n

n n n

y y y

x x x x x

y y yy

x x x x x

y y y

x x x x x

x x

, ( )

x AxA A

x x, 2 A if A is symm.

AxA

x, 1ln(det )

( )

XX

X,

x Ax

xxA

Documents

Matrix Algebra Notes Section 1 About these notes - mysmu.edu · Matrix Algebra Notes Anthony Tay 1-1 Matrix Algebra Notes Section 1 About these notes These are notes on matrix algebra