57
Linear Algebra Lecture Notes Yoichiro Mori March 27, 2013

labook

Embed Size (px)

DESCRIPTION

linear algebra

Citation preview

Page 1: labook

Linear Algebra Lecture Notes

Yoichiro Mori

March 27, 2013

Page 2: labook

MATH 2574H 2 Yoichiro Mori

Page 3: labook

Contents

1 Vectors and Matrices 5

1.1 Vectors and the Vector Spaces Rn and Cn . . . . . . . . . . . 5

1.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Square Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Linear Algebra in Dimension Two 17

2.1 Linear Equations in two unknowns . . . . . . . . . . . . . . . 17

2.2 Linear Independence/Basis Vectors . . . . . . . . . . . . . . . 21

2.3 Linear Transformation . . . . . . . . . . . . . . . . . . . . . . 25

2.4 Examples of Linear transformations . . . . . . . . . . . . . . 28

2.4.1 Scaling Transformation . . . . . . . . . . . . . . . . . 28

2.4.2 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.3 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4.4 Orthogonal Projection . . . . . . . . . . . . . . . . . . 31

2.4.5 Magnification and Rotation . . . . . . . . . . . . . . . 31

2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Linear Equations 35

3.1 A 3× 3 Example . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Elementary Row Operations and the General Invertible Case 39

3.3 A Non-invertible Example . . . . . . . . . . . . . . . . . . . . 41

3.4 The General n× n Linear Equation . . . . . . . . . . . . . . 43

3.5 n Linear Equations in m Unknowns . . . . . . . . . . . . . . 47

3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3

Page 4: labook

4 Linear Independence and Linear Transformations 51

4.1 Linear Independence, Subspaces and Dimension . . . . . . . . 514.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . 55

5 The Determinant 57

MATH 2574H 4 Yoichiro Mori

Page 5: labook

Chapter 1

Vectors and Matrices

1.1 Vectors and the Vector Spaces Rn and C

n

A vector in two and three dimension are “arrows” in the plane and in spacerespectively. By introducing a coordinate system, we may identify a twodimensional vector v as a pair of two real numbers and a vector w in threedimension as three real numbers:

v =

(

v1v2

)

, w =

w1

w2

w3

, (1.1.1)

where v1, v2 and w1, w2, w3 are real numbers. The set of all vectors in twodimension is called R

2 and the corresponding set in three dimension is calledR3. In general, an n-dimensional vector u is defined as a list of n numbers:

u =

u1u2...un

(1.1.2)

and the set of all n-dimensional vectors is called Rn, n = 1, 2, · · · , or the vec-

tor space Rn. Occasionally, we will consider the case when the components

of u (that is, the numbers u1, · · · , un) are complex numbers rather than realnumbers. The set of complex n-dimensional vectors are called C

n or thevector space C

n. Since complex numbers contain real numbers, any vectorin R

n may be considered a vector in Cn. The set of real numbers R is called

field of scalars for the vector space Rn and the set of complex numbers C

is the field of scalars for the vector space Cn. We shall sometimes use the

5

Page 6: labook

word scalar to refer to a real or complex number depending on whether weare considering R

n or Cn.

We will usually list the components of a vector vertically. It is possibleto list the vector components horizontally, so that u = (u1, u2, · · · , un).When we need to make the distinction, we will call a vertically listed vectora column vector and a horizontally listed vector a row vector (for reasonsthat will become clear when we introduce matrices). Vectors will usually bewritten using boldface symbols.

Example 1. Consider the vectors:

v =

1−30

, w =

1223

, u =

1 + i−23i

. (1.1.3)

Here, v ∈ R3, w ∈ R

4 and u ∈ C3.

We now define scalar multiplication of vectors. For a scalar a and avector v, we define scalar multiplication as:

av = a

v1v2...vn

=

av1av2...avn

. (1.1.4)

For two n-dimensional vectors u and w, we define vector addition by:

u+w =

u1u2...un

+

w1

w2

...wn

=

u1 + w1

u2 + w2

...un + wn

. (1.1.5)

For vectors in R2 and R

3, this should be familiar to many of you. Scalarmultiplication just scales the vector while vector addition is geometricallythe usual parallelogram law. Addition of vectors of different sizes are notdefined.

For two vectors u and v in Rn, we define the inner product or dot product

as follows:

u · v = u1v1 + u2v2 + · · ·+ unvn. (1.1.6)

MATH 2574H 6 Yoichiro Mori

Page 7: labook

For vectors in R2 and R

3, this is the familiar dot product you probably sawin coordinate geometry. For complex vectors, the inner product is somewhatdifferent, and is given by:

u · v = u1v1 + u2v2 + · · ·+ unvn, (1.1.7)

where ·̄ is the complex conjugate. We shall seldom make use of the innerproduct for complex vectors in these notes.

1.2 Matrices

An m× n matrix is a list of numbers arranged on a m× n grid:

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

(1.2.1)

where aij , 1 ≤ i ≤ m, 1 ≤ j ≤ n, the components (or elements) of a matrix,are either real or complex numbers. Two matrices are equal if their sizesare the same and each of their components are the same. An m× n matrixis has m rows and n columns. We thus have m row vectors of the matrix A

ri = (ai1, · · · , ain), i = 1, · · · ,m. (1.2.2)

and n column vectors of the matrix A

cj =

a1j...anj

, j = 1, · · · , n. (1.2.3)

Matrices will usually be written using a capital letter, It is possible to see avector in R

m or Cm as a m× 1 matrix.

Scalar multiplication of a matrix A by a scalar c is defined as:

cA =

ca11 ca12 · · · ca1nca21 ca22 · · · ca2n...

.... . .

...cam1 cam2 · · · camn

. (1.2.4)

MATH 2574H 7 Yoichiro Mori

Page 8: labook

Matrix addition for two m × n matrices A and B are defined as (here, welet bij be the matrix components of B, as in (1.2.1) for A)

A+B =

a11 + b11 a12 + b12 · · · a1n + b1na21 + b21 a22 + b22 · · · a2n + b2n

......

. . ....

am1 + bm1 am2 + bm2 · · · amn + bmn

. (1.2.5)

Addition of matrices of different sizes are not defined.

1.3 Matrix Multiplication

So far, we have only defined scalar multiplication and addition of matrices.If we only have these two operations on matrices, they will be just the sameas vectors, except that they are arranged in a rectangle instead of a line.What makes matrices very useful is the fact that we can multiply them.Suppose we have a m× n matrix A and a n× l matrix B. Then, we definematrix vector multiplication as follows. Let aij, 1 ≤ i ≤ m, 1 ≤ j ≤ n andbjk, 1 ≤ j ≤ n, 1 ≤ k ≤ l be the matrix components of the matrices A andB respectively. Then,

AB = C =

c11 · · · c1l...

. . ....

c1m · · · cml

, cik =

n∑

j=1

aijbjk, 1 ≤ i ≤ m, 1 ≤ k ≤ l

(1.3.1)The resulting matrix C is thus an m × l matrix and its ik component cikgiven by looking at the i-th row vector or A (which we call ai) and the k-thcolumn vector of B (which we call bk). If the matrix components are allreal numbers, then cik is given as the dot product of ai and bk:

cik = ai · bk. (1.3.2)

Note that the matrix product AB is only possible if the number of columnsof A is equal to the number of rows of B (or the row vector length of A isthe same as the column vector length of B).

We define multiplication of am×n matrix with an n-dimensional vector.This is done by thinking of a vector as a n× 1 matrix.

It is easier to understand all this by looking at examples.

MATH 2574H 8 Yoichiro Mori

Page 9: labook

Example 2. Let

A =

1 0 2−1 1 02 −3 1

, B =

0 1−1 05 3

, C =

(

−4 1 0−2 2 1

)

, (1.3.3)

and

u =

(

1−1

)

, v =

10−2

. (1.3.4)

Then,

AB =

10 7−1 −18 5

, BC =

−2 2 14 −1 0

−26 11 3

, CB =

(

−1 −43 1

)

,

CA =

(

−5 1 −8−2 −1 −3

)

.

(1.3.5)

The products BA and AC are not defined. We also have:

Av =

−3−10

, Bu =

−1−12

, Cv =

(

−4−4

)

. (1.3.6)

Other matrix vector products are not defined.

Multiplication and addition of matrices satisfy the following properties.Suppose matrices A,B and C are such that the products AB and BC aredefined. Then, one can check that:

(AB)C = A(BC). (1.3.7)

We may thus compute products of matrices starting from any adjacent com-bination, and it is thus meaningful to write ABC instead of (AB)C orA(BC). In the same way, if AB is defined and Bv is defined for a vector v,we have:

A(Bv) = (AB)v, (1.3.8)

Suppose we can define AB and AC and B and C are of the same size,so B + C is defined. Then, we can check that:

A(B + C) = AB +AC. (1.3.9)

MATH 2574H 9 Yoichiro Mori

Page 10: labook

Likewise, if AC and BC is defined and A and B are of the same size, wecan check that:

(A+B)C = AC +BC. (1.3.10)

Analogous assertions are true for the matrix vector product:

A(v +w) = Av +Aw, (A+B)v = Av +Bv, (1.3.11)

whenever each of the operations above are well defined.Some of you may recall that (1.3.7) is called associativity and (1.3.9) and

(1.3.10) are called distributivity. These laws are also true for multiplicationand addition of scalars (real or complex numbers). Another important prop-erty of multiplication of scalars is that multiplication is commutative. Thatis to say, if a and b are scalars, ab = ba. In the case of matrices, this is nottrue in general. Indeed, if you look at Example 2, we see that AB is a 3× 2matrix but BA is not even well-defined. Both BC and CB are well-definedin (1.3.5) but they have different sizes so that they are not equal. We shallsee soon that even when AB and BA have the same size, the two productsmay be different.

It is sometimes useful to divide a matrix into blocks. For example:

P =

1 0 0 1 20 1 0 −1 10 0 1 0 3

0 0 1 1 00 0 0 0 1

=

(

A1 B1

C1 D1

)

(1.3.12)

Q =

1 1 3 0 01 2 −1 0 0−2 4 5 0 0

0 0 0 1 30 0 0 −2 1

=

(

A2 B2

C2 D2

)

(1.3.13)

The interesting thing here is that it is possible to perform block multiplica-tion:

PQ =

(

A1A2 +B1C2 A1B2 +B1D2

C1A2 +D1C2 C1B2 +D1D2

)

=

1 1 3 −3 51 2 −1 −3 −2−2 4 5 −6 3

−2 4 5 1 30 0 0 −2 1

.

(1.3.14)This generalizes to the case when we have multiple blocks, so long as thematrix multiplications are well-defined.

MATH 2574H 10 Yoichiro Mori

Page 11: labook

1.4 Transpose

We define one last operation, the transpose. Given a matrix A, AT, thetranspose of A, is the matrix obtained by switching rows and columns of A.For example:

A =

(

1 2 34 5 6

)

, AT =

1 42 53 6

(1.4.1)

The transpose will sometimes be used as a notational device to write avertical vector in terms of a horizontal vector. For example,

v =

1−34

= (1,−3, 4)T. (1.4.2)

One useful property to note about transposes is the following.

(AB)T = BTAT. (1.4.3)

The transpose of a product of two matrices is the product of the transposesbut in reverse order. This can be shown easily if you go back to the definitionof a matrix product.

1.5 Square Matrices

A square matrix is a matrix with the same number of rows and columns. Theimportant point to notice is that the product of two n× n square matricesis again a n× n square matrix.

Let O (called the zero matrix) be the n × n matrix whose componentsare all 0. For any n× n matrix A, we have:

A+O = A. (1.5.1)

Let I be the n× n matrix whose components are as follows:

I =

1 0 · · · 00 1 · · · 0...

.... . .

...0 0 · · · 1

. (1.5.2)

MATH 2574H 11 Yoichiro Mori

Page 12: labook

The matrix I has all ones along the diagonal and all off-diagonal componentsare 0. The matrix I is called the identity matrix for the following reason.Take any n× n matrix A. Then, it is easy to check that:

AI = IA = A. (1.5.3)

The identity matrix thus plays the role of 1 in scalar multiplication.We see that addition and multiplication of n × n matrices is much like

addition and multiplication of scalars (real or complex numbers). There areimportant differences, however, as we shall see in the following examples.

Example 3. Let

A =

(

1 22 3

)

, B =

(

1 00 −1

)

(1.5.4)

Then,

AB =

(

1 −22 −3

)

, BA =

(

1 2−2 −3

)

(1.5.5)

. We see that, in general, AB 6= BA even for square matrices.

Because of this non-commutativity, some familiar formulas from algebraare not valid for matrices. Let A and B be two square matrices of the samesize.

(A+B)2 = (A+B)(A+B) = A2 +AB +BA+B2. (1.5.6)

Note that the last expression is in general not equal to A2 + 2AB + B2

because AB may not be equal to BA.Given a n× n matrix A, if there is a n× n matrix B that satisfies:

AB = BA = I (1.5.7)

then, the matrix B is called the inverse of A, and is written A−1. Notall matrices have an inverses. A matrix that has an inverse is said to beinvertible.

Example 4. Let

A =

(

1 22 4

)

, B =

(

4 −2−2 1

)

, C =

(

−2 1−4 2

)

.D =

(

1 23 7

)

(1.5.8)

We see thatAB = O, C2 = O (1.5.9)

MATH 2574H 12 Yoichiro Mori

Page 13: labook

where O is the zero matrix. The matrices A,B and C do not have an inverse.The matrix D has an inverse:

D−1 =

(

7 −2−3 1

)

. (1.5.10)

For scalars a, b and c, if ab = 0 either a = 0 or b = 0 and if c2 = 0 thenc = 0. For scalar d, if d 6= 0, we always have d−1. The above example showsthat these properties are not true for matrices.

1.6 Exercises

1. Consider

u =

(

3−2

)

, v =

−1−10

, w =

102

,

A =

(

1 1 22 −1 3

)

, B =

0 11 −14 2

, C =

1 −1 0−2 3 16 1 0

D =

(

1 −10 1

)

, E =

0 11 10 2

, F =

(

−1 1 10 −2 1

)

(1.6.1)

Compute the following.

(a) Du, Bu, Cv, Av.

(b) (B + E)u, (A+ F )(v + 2w).

(c) 3A, 2A+ F, B + 2E − 2B + E.

(d) AB, BA, BA+ C.

(e) CB, AC, ED.

(f) AB −AE, (2A+ F )C.

(g) (B + E)D, DA− 2DF .

(h) ABu, ACw.

(i) ABD, DAB

(j) BAC, CBA

(k) AT +B, BTE

MATH 2574H 13 Yoichiro Mori

Page 14: labook

2. Find an example of two square matrices A,B such that AB = BA.Find an example of two square matrices C,D such that CD 6= DC.

3. Consider a = (a1, · · · , an) and b = (b1, · · · , bn). Compute aTb andabT.

4. Check the calculation (1.3.14) to convince yourself that block multi-plication works.

5. Consider the 2× 2 matrix:

A =

(

a bc d

)

(1.6.2)

Compute the matrix:

A2 − (a+ d)A+ (ad− bc)I (1.6.3)

where I is the 2× 2 identity matrix.

6. Let

A =

(

0 −11 0

)

(1.6.4)

Find all matrices B such that AB = BA.

7. Consider the matrix:

A =

0 a b0 0 c0 0 0

(1.6.5)

(a) Compute A2 and A3.

(b) Compute An for any n ≥ 1.

8. A square matrix like the one in the above problem, in which all ele-ments in the matrix along the diagonal and below are zero is calleda strictly upper triangular matrix. Suppose we have a n × n strictlyupper triangular matrix A. Compute Am for m ≥ n.

9. An upper triangular matrix is a matrix that whose elements below (butnot including) the diagonal are equal to 0. For example, a 3× 3 uppertriangular matrix A looks like:

A =

a b c0 d e0 0 f

. (1.6.6)

MATH 2574H 14 Yoichiro Mori

Page 15: labook

(a) If two 3 × 3 matrices A and B are upper triangular, check thatA+B and AB are also upper triangular.

(b) Does the above generalize to n× n matrices?

10. A square matrix is called diagonal if all elements except that diagonalelements are zero. A 3× 3 matrix that is diagonal is given by:

A =

a 0 00 b 00 0 c

(1.6.7)

Compute An for any n for the above 3× 3 diagonal matrix.

11. Consider the matrix:

A =

(

1 10 1

)

. (1.6.8)

(a) Compute A,A2, A3, A4, A5.

(b) Guess what An may be, and prove that your guess is correct bymathematical induction.

12. Consider the matrices:

A =

0 1 00 0 11 0 0

, B =

0 1 01 0 00 0 1

(1.6.9)

(a) Compute A2, A3, B2, AB,A2B.

(b) What is A−1 and B−1?

(c) Compute B3A8B2A4B.

13. Let A and B be square matrices such that their inverses A−1 and B−1

exist. Show that the inverse of AB is given by B−1A−1. Can yougeneralize this to the product of many matrices?

14. Suppose a matrix A is a 3× 3 matrix with the following property. Forany x ∈ R

3,Ax = x. (1.6.10)

(a) Plug in x = (1, 0, 0)T , (0, 1, 0)T and (0, 0, 1)T into the above rela-tion. What does this tell us about A?

(b) Find what matrix A should be.

15. Generalize the conclusion of the above problem to n× n matrices.

MATH 2574H 15 Yoichiro Mori

Page 16: labook

MATH 2574H 16 Yoichiro Mori

Page 17: labook

Chapter 2

Linear Algebra in DimensionTwo

We first study linear algebra (mostly) in dimension two. Calculations aresimple here, but yet, we can gain a great deal of intuition.

2.1 Linear Equations in two unknowns

Consider the linear equation:

ax+ by = e (2.1.1)

cx+ dy = f (2.1.2)

We would like to solve the above equation. We may multiply the first equa-tion by d and multiply the second by b and subtract, to find:

(ad− bc)x = de− bf. (2.1.3)

If ad− bc 6= 0, then we may solve the above equation to find x, we may alsofind y:

x =de− bf

ad− bc, y =

−ce+ af

ad− bc. (2.1.4)

Let us use matrix notation for the above. The linear equation (2.1.1) maybe written as follows:

(

a bc d

)(

xy

)

=

(

ef

)

. (2.1.5)

17

Page 18: labook

We may write this simply as

Ax = b, A =

(

a bc d

)

, x =

(

xy

)

, b =

(

ef

)

. (2.1.6)

If ad− bc 6= 0, (2.1.4) may be expressed in matrix form as:

Bb = x, B =1

ad− bc

(

d −b−c a

)

. (2.1.7)

It can be easily checked that:

AB = BA = I (2.1.8)

where I is the identity matrix. Recall from (1.5.7) that such a matrix B isthe inverse matrix B = A−1. With matrix notation, we see that the solutionto the equation Ax = b is simply given by x = A−1b when ad−bc 6= 0. Thequantity ad− bc is the called the determinant of the matrix A is denoted by:

detA = |A| = ad− bc. (2.1.9)

What happens if detA = ad− bc = 0? Suppose for simplicity that botha and c are non-zero. Then, ad− bc = 0 implies that

b = λa, d = λc (2.1.10)

for some constant λ. Equation (2.1.1), thus looks like:

x+ λy =e

a(2.1.11)

x+ λy =f

c(2.1.12)

The above equations only have a solution when e/a = f/c, in which casethere are infinitely many solutions. If e/a 6= f/c, there are no solutions tothe above equations. The geometric interpretation of this is that lines withthe same slope can have common point(s) if and only if they are the sameline. Otherwise, they are two parallel lines and do not intersect. One wayto write the condition e/a = f/c is to write:

(

ac

)

‖(

ef

)

(2.1.13)

MATH 2574H 18 Yoichiro Mori

Page 19: labook

where ‖ means that either vector is a constant multiple of the other. Gen-eralizing the above argument a little (the reader should try!), one comes tothe following conclusion. Suppose

detA = 0, A 6= O (2.1.14)

where O is the zero matrix. Let the column vectors of A be:

a1 =

(

ac

)

, a2 =

(

bd

)

. (2.1.15)

Then,

Ax = b

{

has infinitely many solutions if b ‖ a1 or b ‖ a2,

has no solution otherwise.(2.1.16)

When A = O, there is not very much to say:

Ox = b

{

has infinitely many solutions if b = 0,

has no solution otherwise.(2.1.17)

where 0 is the zero vector. One consequence of (2.1.17) and (2.1.16) is thefollowing. If detA = 0 then,

Ax = 0 has infinitely many solutions, (2.1.18)

and

there are some b for which Ax = b has no solution. (2.1.19)

Does A have an inverse if detA = 0? The answer is no, and we givetwo different proofs for this. Since detA = 0, we know from (2.1.18) thatthe equation Ax = 0 has infinitely many solutions. This implies that theremust be a vector c such that

Ac = 0, c 6= 0. (2.1.20)

Now, suppose there is a matrix B satisfying BA = I. Multiplying both sidesof the above from the left with B. We see that

c = 0, (2.1.21)

but this is a contradiction, since c 6= 0.

MATH 2574H 19 Yoichiro Mori

Page 20: labook

Another way to show that A−1 cannot exist is the following. Supposethere is a matrix B such that

AB = I. (2.1.22)

Let b be a vector for which Ax = b cannot be solved (such a vector existsfrom (2.1.19)). Given (2.1.22), we have

ABb = b. (2.1.23)

But this implies that Bb is a solution to Ax = b, which is a contradiction.There is thus no matrix B satisfying AB = I.

Some of you may have realized that the two arguments prove two slightlydifferent statements; the first shows that a matrix B satisfying BA = I (aleft inverse) does not exist, and the second shows that a matrix B satisfyingAB = I (a right inverse) does not exist. Since we have taken the definitionof the inverse matrix in (1.5.7) to be a matrix satisfying both AB = Iand BA = I (a matrix that is a left and a right inverse), negating eitherpossibility will show that an inverse cannot exist.

Let us now summarize our results into a theorem.

Theorem 1. For a 2 × 2 matrix A and a vector b ∈ R2, the following

statements are equivalent.

• detA 6= 0.

• Ax = b can be solved for any b.

• Ax = 0 has only one solution, x = 0.

• The matrix A has an inverse.

The same theorem can also be stated in the following way (can you seewhy the two theorems are equivalent?).

Theorem 2. For a 2 × 2 matrix A and a vector b ∈ R2, the following

statements are equivalent.

• detA = 0.

• There are some vectors b for which Ax = b does not have a solution.

• Ax = 0 has infinitely many solutions.

• The matrix A does not have an inverse.

MATH 2574H 20 Yoichiro Mori

Page 21: labook

Example 5. Consider the equation:

Ax = b, A =

(

1 22 5

)

, x =

(

xy

)

, b =

(

ef

)

. (2.1.24)

For this equation, the determinant is equal to detA = 1 6= 0, and therefore,there is always a unique solution for any b. The inverse of A is:

A−1 =

(

5 −2−2 1

)

. (2.1.25)

On the other hand, consider:

Ax = b, A =

(

1 23 6

)

, x =

(

xy

)

, b =

(

ef

)

. (2.1.26)

In this case, detA = 0 and therefore, there is no inverse. There are solutionsif:

(

13

)

‖(

ef

)

or f − 3e = 0, (2.1.27)

and in this case, the solutions are all x satisfying:

x+ 2y = e. (2.1.28)

2.2 Linear Independence/Basis Vectors

Given vectors v1, · · · ,vm ∈ Rn, a linear combination of vectors is given by:

c1v1 + c2v2 + · · ·+ cnvm (2.2.1)

where c1, · · · , cn are scalars.

Definition 1 (linear independence). The vectors v1, · · · ,vm ∈ Rn are lin-

early independent if

c1v1 + c2v2 + · · ·+ cmvm = 0 implies ck = 0 for all k = 1, · · · ,m. (2.2.2)

If this is not the case, the vectors v1, · · · ,vm are linearly dependent.

One may also state linear dependence as follows. The vectors v1, · · · ,vm

are linearly dependent if there are scalar constants c1, · · · , cn, not all of whichare equal to 0, satisfying

c1v1 + c2v2 + · · ·+ cmvm = 0. (2.2.3)

MATH 2574H 21 Yoichiro Mori

Page 22: labook

Since at least one of the ck is not equal to 0, say, cj , we may divide by cj toobtain:

vj = − 1

cj(c1v1 + · · · + cj−1vj−1 + cj+1vj+1 + · · ·+ cnvn. (2.2.4)

Thus, linear dependence is equivalent to saying that one of the vectors canbe written as a linear combination of the others.

Example 6. Consider the vectors:

v1 =

(

00

)

,v2 =

(

12

)

,v3 =

(

24

)

,v4 =

(

1−1

)

,v5 =

(

01

)

. (2.2.5)

Then, the zero vector v1 with any other combination of vectors is linearlydependent. For example,

1 · v1 + 0 · v2 = 0 (2.2.6)

and therefore the two vectors v1,v2 are linearly dependent. Vectors v2 andv3 are linearly dependent because:

2v2 − v3 = 0. (2.2.7)

Let us now examine v2 and v4. We must consider the expression:

c1v2 + c2v4 = c1

(

12

)

+ c2

(

1−1

)

=

(

1 21 −1

)(

c1c2

)

=

(

00

)

. (2.2.8)

We may see the above as an equation for c1 and c2. We may use our resultfrom Theorem 1 to see that c1 = 0, c2 = 0 is the only solution to the aboveequation. Therefore, v2 and v4 are linearly independent. Now, consider thevectors v2,v4 and v5. In this case, we have:

v2 − v4 − 3v5 = 0 (2.2.9)

and therefore the three vectors are linearly dependent.

Let us now consider the general case of vectors in R2. First, let us

consider just one vector v1. This vector is linearly independent if and onlyif v1 6= 0.

Next, consider the case of two vectors

v1 =

(

ac

)

,v2 =

(

bd

)

. (2.2.10)

MATH 2574H 22 Yoichiro Mori

Page 23: labook

If two vectors are linearly dependent, we have

c1v1 + c2v2 = 0 (2.2.11)

where not both c1 and c2 are zero. If c1 6= 0, this means that:

v1 = −c2c1v2. (2.2.12)

The vector v1 is a constant multiple of v2. If c2 6= 0, then v2 is a constantmultiple of v1. Thus, the geometric meaning of linear dependence of twovectors is that the two vectors are colinear (we shall think of the zero vectoras being colinear to all vectors). In fact, it is easy to convince ourselves thattwo vectors are colinear, or linearly dependent, if and only if:

ad− bc = 0. (2.2.13)

If we introduce the matrix

A =

(

a bc d

)

(2.2.14)

this condition is just detA = 0. An equivalent statement is that two vectorsare linearly independent if and only if detA 6= 0.linear dependence of twovectors means that the two vectors are colinear (we shall think of the zerovector as being colinear to all vectors).

Another way to see this is the following. We are interested in lookingfor scalar constants c1 and c2 that satisfy:

c1v1 + c2v2 =

(

a bc d

)(

c1c2

)

=

(

00

)

. (2.2.15)

The above can be seen as an equation for c1 and c2. From Theorem 1, we seethat the only solution to the above is c1 = 0, c2 = 0 if and only if detA 6= 0.

Suppose v1,v2 are linearly independent. Let us consider the followingproblem. Given any vector v3, can we write v3 as a linear combination ofv1 and v2? We want to find constants c1 and c2 that satisfy

v3 = c1v1 + c2v2. (2.2.16)

Let

v1 =

(

ac

)

, v2 =

(

bd

)

, v3 =

(

ef

)

. (2.2.17)

Finding c1 and c2 satisfying (2.2.16) is equivalent to solving the followinglinear equation:

(

a bc d

)(

c1c2

)

=

(

ef

)

. (2.2.18)

MATH 2574H 23 Yoichiro Mori

Page 24: labook

Since v1 and v2 are linearly independent, ad − bc 6= 0. This equation thushas a unique solution. Thus, any vector in R

2 can be written uniquelyas a linear combination of linearly independent vectors v1 and v2. A setof linearly independent vectors in R

n is called a set of basis vectors if anyvector in R

n can be expressed as a linear combination of these vectors. Wethus see that any two linearly independent vectors form a set of basis vectorsin R

2.

We may use this fact to show that a set of 3 or more vectors can neverbe linearly independent. Suppose that the vectors v1, · · · ,vm,m ≥ 3 arelinearly independent. Then, there are two vectors in this set, say, v1 andv2 that are linearly independent. Since v1 and v2 are linearly independent,they form a basis, and therefore, v3 can be written as a linear combinationof v1 and v2. This shows that the three vectors v1,v2 and v3 are linearlydependent. This is a contradiction.

Let us state our observations as a theorem.

Theorem 3. Two vectors

v1 =

(

ac

)

, v2 =

(

bd

)

, (2.2.19)

are linearly independent if and only if

detA 6= 0, A =

(

a bc d

)

(2.2.20)

Any two linearly independent vectors in R2 form a basis of R2. The maxi-

mum number of linearly independent vectors in R2 is 2.

The first part of the above theorem can also be seen as saying thatdetA 6= 0 is equivalent to the column vectors of A being linearly independent.It is in fact easy to see that the row vectors are also linearly independent ifand only if detA 6= 0. We thus have two more conditions we can add to theequivalent conditions in Theorem 1.

Theorem 4. For a 2× 2 matrix A, the following statements are equivalent.

• detA 6= 0.

• The column vectors of A are linearly independent.

• The row vectors of A are linearly independent.

MATH 2574H 24 Yoichiro Mori

Page 25: labook

2.3 Linear Transformation

An important way to view square matrices is as linear transformations. Takea 2 × 2 matrix A and a vector in x ∈ R

2. We may view A as a map thattakes the vector x to the vector Ax. Letting

A =

(

a bc d

)

, x =

(

xy

)

(2.3.1)

we see that the vector x is mapped to:

Ax = xv1 + yv2,v1 =

(

ac

)

, v2 =

(

bd

)

. (2.3.2)

The unit vectors e1 = (1, 0)T and e2 = (0, 1)T are thus mapped to thevectors v1 and v2. The unit coordinate square spanned by e1 and e2 getsmapped to the parallelogram spanned by the vectors v1 and v2. As a whole,the orthogonal coordinate plane R

2 gets mapped to a slanted coordinateplane with coordinate axes aligned with v1 and v2.

Let us interpret the results in the previous sections in terms of this lineartransformation. First, suppose detA 6= 0. From a geometric point of view,this means that v1 and v2 are not parallel (or linearly independent). Fromthis, it is graphically clear that every point in R

2 gets mapped one-to-oneonto R

2. This implies that A should have an inverse, and that Ax = b

should be uniquely solvable for every b. Note also that the only point thatgets mapped to the origin 0 is 0.

Now, suppose detA = 0. This means that (a, c)T and (b, d)T are colinear,or linearly dependent. There are two cases to consider.

First, suppose either one of v1 and v2 is a nonzero vector (say, v1).Then, we see that A maps R

2 to a line ℓ that goes through the origin andis parallel to v1. Since

v2 = λv1 for some λ (2.3.3)

we see that points satisfying x+λy = 0 get sent to the origin. The equationAx = b will have a solution only if b lies on the line ℓ.

The second case is when a = b = c = d = 0. In this case, all points inR2 get sent to the origin.We thus see that depending on the matrix A, the linear transformation

will map the plane either to the whole plane R2, a line or the origin. The

set which the plane is mapped to is called the image of A and is written asImA. We thus see that matrices can be classified by the dimension of theimage (we shall later define dimension, but for now, think of it in the usual

MATH 2574H 25 Yoichiro Mori

Page 26: labook

intuitive way; a plane has dimension 2, a line has dimension 1, etc), whichwe call the rank of a matrix. If the matrix A maps the plane to the plane,the rank of A is 2. If A maps the plane to a line, the rank of A is 1. If Amaps the plane to the origin, the rank of A is 0.

A notion that is intimately related to the image is the nullspace (orkernel) of a matrix A and is written as KerA. The nullspace is the set ofpoints that are sent to the origin. If the matrix A has rank 2, then the onlypoint sent to the origin is the origin. The kernel is the origin alone, and thedimension of the kernel is 0. If the matrix A has rank 1, then a whole planegets sent to a line. The dimension of the kernel is 1. When the rank of A isO, then the whole plane gets mapped to the origin, and therefore, the kernelis the whole plane and the dimension of the kernel is 2.

We describe this as a theorem:

Theorem 5. Consider a linear transformation given by a 2× 2 matrix A.

1. If detA 6= 0, then the ImA is the whole plane and the kernel is just theorigin. The rank of A is 2 and the the dimension of the kernel is 0.

2. If detA = 0 and A 6= O, then ImA is a line and the kernel is a line.The rank of A is 1 and the dimension of the kernel is 1.

3. If A = O, then ImA is the origin, and the kernel is the whole plane.The rank of A is 0 and the dimension of the kernel is 2.

An interesting consequence of this is that:

rankA+ dimkerA = 2, (2.3.4)

where rankA is the rank of A and dimkerA is the dimension of the kernel ofA.

There is an interesting geometric meaning to the determinant. It is thesigned area of the parallelogram spanned by the vectors v1 and v2. It issigned in the sense that:

det

(

1 00 1

)

= 1, det

(

0 11 0

)

= −1. (2.3.5)

Even though the parallelogram (square) spanned by (1, 0)T and (0, 1)T isthe same as that spanned by (0, 1)T and (1, 0)T, the latter determinant hasa negative sign. The sign is carrying information on orientation. Supposeyou turn the first vector anti-clockwise until it overlaps with the secondvector. If you can do this by turning the first vector less than 180 degrees

MATH 2574H 26 Yoichiro Mori

Page 27: labook

(or π radians), then the determinant is positive. If more than 180 degreesis required, then the determinant is negative.

The “parallelogram area” interpretation of the determinant makes itclear why detA detects whether the vectors v1 and v2 are linearly indepen-dent. If the two vectors are non-parallel (linearly independent), then theparallelogram formed by the two vectors should have a non-zero area. If thetwo vectors are parallel (linearly dependent), the two vectors collapse ontoa line, and the resulting parallelogram does not have any area.

The determinant can also be thought of as a scaling factor for area.As we saw, the unit coordinate square gets mapped to the parallelogramspanned by v1 and v2. Therefore, the area of the unit square is scaled by afactor of detA. It is not difficult to convince oneself that this should actuallybe true for any shape. For example, any circle will be mapped to an ellipse(it turns out) that has an area that is detA times the original circle.

An important property of the determinant is its compatibility with ma-trix multiplication. Suppose A and B are both 2× 2 matrices. Then,

det(AB) = (detA)(detB). (2.3.6)

This can be checked by a somewhat tedious algebraic computation. Thegeometric reason why (2.3.6) holds is the following. Suppose we apply thematrix AB to R

2. For a vector v ∈ R2, we have:

(AB)(v) = A(Bv) (2.3.7)

by the associative law. This implies that the linear transformation by ABis the same as applying the linear transformation by B and A in succession.The magnifying factor for the area, when applying the linear transformationAB should be det(AB). Applying the linear transformation AB to R

2 isthe same as applying the linear transformation B and then A to R

2. In thiscase, the magnifying factor for the area should be detA × detB. The twomagnifying factors should be the same, and we thus have (2.3.6).

Example 7. Consider the matrices:

A =

(

1 22 5

)

, B =

(

1 23 6

)

. (2.3.8)

We see that detA = 1 6= 0 and therefore, the matrix A maps the plane tothe plane. The kernel is just the origin.

The matrix B has determinant equal to 0. Since the matrix is non-zero,it must have rank 1. To find the image and kernel of B, we may apply B to

MATH 2574H 27 Yoichiro Mori

Page 28: labook

the vector (x, y)T:

B

(

xy

)

= (x+ 2y)

(

13

)

. (2.3.9)

Therefore, the kernel is the line:

x+ 2y = 0 (2.3.10)

and the image is the line parallel to (1, 3)T passing through the origin. Inother words, the image is the line:

y = 3x. (2.3.11)

2.4 Examples of Linear transformations

Here, we take a look at some examples of linear transformations. Before weproceed, we make the following useful observation. Suppose

Au1 = v1 and Au2 = v2, (2.4.1)

where u1 and u2 are linearly independent. Let U be the matrix formed bytaking u1 and u2 to be the column vectors, and V be the matrix formed bytaking v1 and v2 to be the column vectors. Then, (2.4.1) can be written as:

AU = V. (2.4.2)

Since the column vectors of U are linearly independent, U has an inverse,and therefore,

A = V U−1. (2.4.3)

A linear transformation A can be found if we know where two linearly inde-pendent vectors are sent.

2.4.1 Scaling Transformation

The matrix:

A =

(

a 00 b

)

(2.4.4)

scales the x axis by a factor a and the y axis by a factor of b. There are manymatrices that have a similar character. For example, consider the matrix:

A =

(

2 11 2

)

. (2.4.5)

MATH 2574H 28 Yoichiro Mori

Page 29: labook

At first glance, this does not look like (2.4.4), but,

A

(

11

)

= 3

(

11

)

, A

(

1−1

)

=

(

1−1

)

(2.4.6)

So this matrix scales the (1, 1)T direction by a factor of 3 and the (1,−1)T

direction by a factor of 1.

2.4.2 Rotation

Rotation of the plane about the origin is a linear transformation. To seewhat the matrix may be, we examine where the vectors (1, 0)T and (0, 1)T

may be mapped to by rotation about the origin by an angle θ in the counter-clockwise direction. We see that

(

10

)

7→(

cos(θ)sin(θ)

)

,

(

01

)

7→(

− sin(θ)cos(θ)

)

. (2.4.7)

Therefore, the rotation matrix R(θ) about the origin is given by:

R(θ) =

(

cos(θ) − sin(θ)sin(θ) cos(θ)

)

. (2.4.8)

For example, rotation by π/2, or 90 degrees is given by:

R(π/2) =

(

0 −11 0

)

. (2.4.9)

Note that the determinant of the matrix R(θ) is equal to 1. Rotation doesnot change the area of anything. The inverse matrix of a rotation by degreeθ should be a rotation matrix by degree −θ. Indeed, it can be checked that

R(θ)R(−θ) = R(−θ)R(θ) = I. (2.4.10)

One interesting thing we can do with rotation matrices is the following. Arotation by θ followed by a rotation by ψ should give us a rotation by θ+ψaround the origin. Therefore, we should have:

R(θ + ψ) = R(ψ)R(θ) (2.4.11)

From this, it is easy to deduce the addition rule for the sines and cosines(see Exercise 10 of Section 2.5).

MATH 2574H 29 Yoichiro Mori

Page 30: labook

2.4.3 Reflection

Take the line y = mx and let us consider reflection across this line. Thisis also a linear transformation. Let us call this matrix A(m). We need toknow the images of two linearly independent vectors. The vector (1,m)T

must be mapped to itself, since (1,m)T lies on the line itself. The vector(m,−1)T, which is perpendicular to the line y = mx, must be mapped to(−m, 1)T. We thus have:

A(m)

(

1m

)

=

(

1m

)

, A(m)

(

m−1

)

=

(

−m1

)

. (2.4.12)

We thus have:

A(m)

(

1 mm −1

)

=

(

1 −mm 1

)

(2.4.13)

We thus see that

A(m) =

(

1 −mm 1

) −1

1 +m2

(

−1 −m−m 1

)

=1

1 +m2

(

1−m2 2m2m m2 − 1

)

.

(2.4.14)We can produce another derivation of this same result using rotations. Thevector y = mx subtends an angle θ (m = tan θ) with respect to the x axis.Thus, reflection across the line y = mx can be seen as a succesion of thefollowing operations.

1. Rotate around the origin by an angle −θ.

2. Reflect across the x-axis. Since the point (x, y)T gets mapped to(x,−y)T, reflection across the x-axis is given by the matrix:

A(0) =

(

1 00 −1

)

. (2.4.15)

3. Rotate around the origin by an angle θ.

Therefore, reflection across the line y = mx is given by:

A(m) = R(θ)A(0)R(−θ) =(

cos2(θ)− sin2(θ) 2 cos(θ) sin(θ)2 cos(θ) sin(θ) −(cos2(θ)− sin2(θ))

)

=

(

cos(2θ) sin(2θ)sin(2θ) − cos(2θ)

)

.

(2.4.16)

MATH 2574H 30 Yoichiro Mori

Page 31: labook

It is not difficult to show that the above and (2.4.14) are the same matrix,using m = tan(θ) and some trigonometric identities.

An interesting fact about reflection matrices is that their determinant is−1. A reflection does not change area but reverses the orientation.

2.4.4 Orthogonal Projection

Consider the line y = mx, and let P (m) be the linear transformation thattakes a point on the plane to the nearest point on the line y = mx. Thisis called the orthogonal projection. The vector (1,m)T is mapped to itselfand the vector (m,−1)T is mapped to the origin. Therefore,

P (m)

(

1 mm −1

)

=

(

1 0m 0

)

. (2.4.17)

We see that

P (m) =1

1 +m2

(

1 mm m2

)

. (2.4.18)

Note that the determinant of matrix is 0, as it should be. The whole planegets mapped to the line y = mx. We also see that:

P (m)2 = P (m). (2.4.19)

Applying a projection twice produces the same result.

2.4.5 Magnification and Rotation

The following matrix occurs pretty often:

A =

(

a −bb a

)

(2.4.20)

where (a, b) 6= (0, 0). To understand what this matrix does, we write it inthe following form:

A =√

a2 + b2

(

a√a2+b2

− b√a2+b2

b√a2+b2

a√a2+b2

)

. (2.4.21)

Setting θ so that

cos(θ) =a√

a2 + b2, sin(θ) =

b√a2 + b2

, (2.4.22)

MATH 2574H 31 Yoichiro Mori

Page 32: labook

we may write A as:

A =√

a2 + b2R(θ) (2.4.23)

where R(θ) is a rotation matrix. We thus see that A may be seen as rotationby an angle θ around the origin followed by magnification by a factor of√a2 + b2 (or vice-versa). For example, the matrix:

A =

(

1 −11 1

)

(2.4.24)

has the effect of rotating by π/4 (45 degrees) and magnifying the wholeplane by a factor of

√2.

2.5 Exercises

1. Consider the linear equations:

Ax = b, x =

(

xy

)

, b =

(

ef

)

. (2.5.1)

Discuss the solvability of this equation for the following choices of the2× 2 matrix A. Does the equation have a solution for every b? If not,for what right hand side b does the equation have a solution? If thereis a solution, find what the solution is.

(

1 −12 1

)

,

(

1 −1−1 1

)

,

(

2 4−1 −2

)

,

(

5 51 3

)

,

(

9 33 1

)

(

0 00 0

)

,

(

3 52 3

)

,

(

2 5−4 −10

)

,

(

4 71 2

)

,

(

6 21 1/2

) (2.5.2)

2. Find the inverse matrices of Exercise 1 if they exist.

3. Describe the image and kernel of the matrices of Exercise 1.

4. Examine if the following vectors are linearly independent. If they arelinearly dependent, exhibit a nontrivial linear combination that givesthe zero vector.

(a) (1, 1)T, (1, 2)T.

(b) (2, 3)T, (4, 6)T.

(c) (0, 0)T, (1, 1)T.

MATH 2574H 32 Yoichiro Mori

Page 33: labook

(d) (1, 0)T, (0, 3)T.

(e) (2, 1)T, (1, 2)T, (0, 3)T.

(f) (3, 1)T, (1,−1)T, (1, 1)T.

(g) (1, 1)T, (2, 2)T, (3, 4)T.

(h) (−1, 1)T, (2, 1)T, (3, 3)T.

5. Prove the product formula (2.3.6) for the determinant of 2×2 matrices.

6. A matrix A that satisfies the property:

A2 = A (2.5.3)

is called a projection (note that the orthogonal projection is a spe-cial kind of projection, see (2.4.19)). What can the determinant ofa projection be? Hint: Use the product rule for the determinant ofmatrices.

7. Consider matrices A and B that are inverses of each other, that is tosay, AB = BA = I.

(a) Show that (detA)(detB) = 1.

(b) Suppose components of both A and B are integers. What canyou say about detA and detB? Hint: Since A and B are inte-ger matrices, their determinant must also be integers. Therefore,detA and detB are two integers that, if multiplied, give you thevalue of 1.

8. 2× 2 matrices A that satisfy the relation:

AT

(

1 00 −1

)

A =

(

1 00 −1

)

(2.5.4)

are important in Einstein’s theory of relativity. What are the possiblevalues of the determinant of A? Hint: Use the product rule. Note thatthe determinant of A and AT are equal.

9. Find the linear transformation with the following properties.

(a) Maps (1, 0)T to (1, 1)T and (0, 1)T to (0, 2)T.

(b) Maps (2, 1)T to (1, 1)T and (3, 1)T to (0, 1)T.

(c) Maps (1,−1)T to (0, 1)T and (−2, 1)T to (1, 2)T.

MATH 2574H 33 Yoichiro Mori

Page 34: labook

(d) Rotation around the origin by 3π/4 in the counter-clockwise di-rection.

(e) Rotation around the origin by π/3 in the counter-clockwise di-rection.

(f) Rotation around the origin by π/6 in the counter-clockwise di-rection, followed by a magnification by a factor of 2.

(g) Reflection about the line y = 2x.

(h) Reflection about the line y = x followed by a reflection about theline y = 2x.

(i) Orthogonal projection to the line y = −2x.

(j) Orthogonal projection to the line y = x followed by an orthogonalprojection to the line y = −x.

10. Compute both sides of (2.4.11) to deduce the addition law for sinesand cosines.

11. Suppose A(m) is the matrix for reflection about y = mx and A(n)is the matrix for reflection about y = nx. Show that A(m)A(n) is arotation matrix (that is to say, successive reflection across two linesthrough the origin is in fact a rotation about the origin). What is therotation angle? Hint: Use expression (2.4.16) and use some trigono-metric identities.

12. Take any vector v ∈ R2. Show that the length of the vector does

not change after application of either a rotation matrix or a reflectionmatrix.

13. Consider the matrix (2.4.20) and write this as:

A = aI + bJ, I =

(

1 00 1

)

, J =

(

0 −11 0

)

. (2.5.5)

(a) Show that IJ = JI.

(b) Using this, compute (aI + bJ)(cI + dJ).

(c) Compare this with the multiplication of two complex numbersa+ bi and c+ di where c and d are real numbers.

MATH 2574H 34 Yoichiro Mori

Page 35: labook

Chapter 3

Linear Equations

Our goal from here on is to generalize our experience in dimension 2 todimension n. In this section, we consider linear equations. In the nextsection, we generalize the determinant.

3.1 A 3× 3 Example

We first consider linear equations. Let us first consider the following linearsystem.

x+ y + z = 22x+ 5y + 7z = 5x− y + 3z = −4

,

1 1 12 5 71 −1 3

xyz

=

25−4

(3.1.1)

We will be writing the equations in two ways, one without matrices and theother without matrices. The 3×3 matrix here is called the coefficient matrixof the linear equation. It is useful to introduce the following augmentedcoefficient matrix:

x+ y + z = 22x+ 5y + 7z = 5x− y + 3z = −4

,

1 1 1 22 5 7 51 −1 3 −4

(3.1.2)

The augmented coefficient matrix is obtained by placing the right hand sideof the equation next to the coefficient matrix. Let us solve this equation bythe method of elimination. First, we will use the first equation to eliminate xfrom the first two equations. We will write the augmented coefficient matrix

35

Page 36: labook

alongside the equation. We find:

x+ y + z = 23y + 5z = 1

−2y + 2z = −6,

1 1 1 20 3 5 10 −2 2 −6

. (3.1.3)

In matrix notation, this procedure amounts to multiplying the first row by aconstant and subtracting it from the first and second rows to produce zerosin the first column. Next, let us divide the third equation through by −2and reorder the last two equations.

x+ y + z = 2y − z = 3

3y + 5z = 1,

1 1 1 20 1 −1 30 3 5 1

(3.1.4)

In terms of matrices, this procedure corresponds to first dividing the lastrow by 2 and interchanging the last two rows. Now, we take the secondequation and eliminate y from the first and second equations.

x+ 2z = −1y − z = 3

8z = −8,

1 0 2 −10 1 −1 30 0 8 −8

. (3.1.5)

In terms of matrices, we have multiplied the second row with a constant andsubtracted this from the first and third rows to produce zeros on the secondcolumn. Next, we divide the last equation by 8, and eliminate z from theother equations.

x = 1y = 2z = −1

,

1 0 0 10 1 0 20 0 1 −1

. (3.1.6)

The effect on matrices is that we have divided the last row by 8, and mul-tiplied the last row by a constants and subtracted these from the the firsttwo rows to produce zeros in the last column. The coefficient matrix hasnow been transformed to the identity matrix and we have solved our linearequation.

Let us review what we have done. The operations we performed on theequations are of three types:

1. Multiply one equation by a scalar and add (or subtract) from anotherequation.

MATH 2574H 36 Yoichiro Mori

Page 37: labook

2. Interchange the order of two equations.

3. Multiply(or divide) an equation by a nonzero scalar.

In terms of matrices, these correspond to the following operations.

1. Multiply one row by a scalar and add to a different row.

2. Interchange two rows.

3. Multiply a row by a nonzero scalar.

These three operations are called elementary row operations. The procedureof solving a linear equation may thus be seen as applying these proceduresrepeatedly to produce zeros for all off-diagonal elements, column by column,until the coefficient matrix becomes the identity matrix.

It is an important fact that all these operations may be seen as matrixmultiplication. To see what this means, consider how the matrix changesfrom (3.1.2) to (3.1.3):

E1 =

1 0 0−2 1 00 0 1

, E2 =

1 0 00 1 0−1 0 1

(3.1.7)

E2E1

1 1 1 22 5 7 51 −1 3 −4

=

1 1 1 20 3 5 10 −2 2 −6

(3.1.8)

The augmented coefficient matrix changes from (3.1.2) to (3.1.3) by an ap-plication of two matrices. The matrix E1 adds −2 times the first row to thesecond row, and the matrix E2 adds −1 times the first row to the last row.The procedure from (3.1.3) to (3.1.4) may be written as:

E3 =

1 0 00 1 00 0 −1/2

, E4 =

1 0 00 0 10 1 0

(3.1.9)

E4E3

1 1 1 20 3 5 10 −2 2 −6

=

1 1 1 20 1 −1 30 3 5 1

(3.1.10)

The matrix E3 multiplies the last row by −1/2 and E4 interchanges rows 2and 3. Following through in a similar fashion, we find that, from (3.1.4) to

MATH 2574H 37 Yoichiro Mori

Page 38: labook

(3.1.5) we have:

E5 =

1 −1 00 1 00 0 1

, E6 =

1 0 00 1 00 −3 1

(3.1.11)

E6E5

1 1 1 20 1 −1 30 3 5 1

=

1 0 2 −10 1 −1 30 0 8 −8

. (3.1.12)

Finally, from (3.1.5) to (3.1.6), we have:

E7 =

1 0 00 1 00 0 1/8

, E8 =

1 0 −10 1 00 0 1

, E9 =

1 0 00 1 −10 0 1

(3.1.13)

E9E8E7

1 0 2 −10 1 −1 30 0 8 −8

=

1 0 0 10 1 0 20 0 1 −1

. (3.1.14)

All combined, we may write this as

E9E8E7E6E5E4E3E2E1

1 1 1 22 5 7 51 −1 3 −4

=

1 0 0 10 1 0 20 0 1 −1

.

(3.1.15)We see that a sequence of 9 elementary row operations allowed us to reducethe coefficient matrix A to the identity matrix I:

BA = I, A =

1 1 12 5 71 −1 3

, B = E9E8E7E6E5E4E3E2E1. (3.1.16)

The fact that BA = I suggests that B may be the inverse matrix of A. Tofind out, we must check that AB = I as well. This can be done as follows.Each Ek, k = 1, · · · , 9 has an inverse matrix. For example,

E−1

1=

1 0 02 1 00 0 1

, E−1

3=

1 0 00 1 00 0 2

, E−1

4= E4. (3.1.17)

Therefore, multiplying both sides of BA = I from the left sequentiallyE−1

9, E−1

8down to E−1

1, we have:

A = E−11E−1

2· · ·E−1

9. (3.1.18)

From this, it is easy to check that AB = I. Therefore, B is the inverse ofA, and B = A−1.

MATH 2574H 38 Yoichiro Mori

Page 39: labook

3.2 Elementary Row Operations and the General

Invertible Case

In general, we may proceed as follows. Suppose we have a system of nequations in n unknowns:

Ax = b, A =

a11 · · · a1n...

. . ....

an1 · · · ann

, x =

x1...xn

, b =

b1...bn

(3.2.1)

The matrix A is the coefficient matrix, and the matrix (A|b) is the aug-mented coefficient matrix. We shall also consider the corresponding homo-geneous equation:

Ax = 0, where 0 is the zero vector. (3.2.2)

To solve equation (3.2.1) (or (3.2.2), we sequentially perform the followingelementary row operations, whose definition we repeat here for convenience.

Definition 2 (Elementary Row Operations). 1. Multiply row i by a scalarc and add to a row j, i 6= j.

2. Interchange rows i and j, i 6= j.

3. Multiply a row i by a nonzero scalar c.The process of sequentially applying elementary row operations to sim-

plify a matrix is called row reduction.

Elementary row operation 1 can be expressed as multiplying the followingmatrix from the left to the (augmented) coefficient matrix:

P (ij; c) =

1. . .

1...

. . .

c · · · 1. . .

1

(3.2.3)

In other words, this is a n × n matrix with 1 along the diagonal and 0everywhere else except that the (i, j) element is equal to c. In the example

MATH 2574H 39 Yoichiro Mori

Page 40: labook

above, matrices E1, E2, E5, E6, E8, E9 are of this type. Note that this matrixis invertible:

P (ij; c)−1 = P (ij;−c). (3.2.4)

This just says that, in order to undo elementary row operation 1, we mustadd −c times the i-th row to the j-th row.

Elementary row operation 2 corresponds to:

Q(ij) =

1. . .

0 · · · 1...

. . ....

1 · · · 0. . .

1

(3.2.5)

This is a n× n matrix whose diagonal is equal to 1 except at the (i, i) and(j, j) element, where it is 0. The matrix E4 is of this type. Note that Q(ij)is invertible and

Q(ij)−1 = Q(ij). (3.2.6)

Exchanging the two rows can be undone by exchanging the two rows onceagain.

As for the off-diagonal elements, (i, j) and (j, i) elements are equal to 1,and otherwise, all other elements are equal to 0.

R(i; c) =

1. . .

c. . .

1

(3.2.7)

This is a n × n matrix whose diagonal is equal to 0 except at position i,where it is equal to c. All off-diagonal elements are 0. The matrices E3 andE7 are of this type. Note that

R(i; c)−1 = R(i; 1/c). (3.2.8)

This says that we may undo row operation 3, by multiplying the i-th rowby 1/c.

MATH 2574H 40 Yoichiro Mori

Page 41: labook

We can successfully solve (3.2.1) if we can sequentially apply the ele-mentary row operations E1, · · · , EN on the (augmented) coefficient matrixso that the coefficient matrix is reduced to the identity matrix I:

ENEN−1 · · ·E2E1(A|b) = (I|b∗). (3.2.9)

The vector x = b∗ is then the solution to our equations and

A−1 = ENEN−1 · · ·E2E1. (3.2.10)

This suggests the following algorithm to compute the inverse matrix. Onesuccessively applies row reduction to the matrix:

(A|I) (3.2.11)

where I is the n× n identity matrix. Then,

ENEN−1 · · ·E2E1(A|I) = (I|B) (3.2.12)

but B = ENEN−1 · · ·E2E1, and therefore, B = A−1.

3.3 A Non-invertible Example

Can any matrix square matrix A be reduced to the identity matrix by el-ementary row operations? The answer must be no. As we see above, ifthis can be done, this automatically implies that A has an inverse. But wealready know from our experience in dimension two that there are matriceswith no inverse. The question now becomes, how far can we simplify Aor (A|b) with elementary row operations? To get an idea of what can beaccomplished, let us look at another example. Consider the equation:

−y − z = −2x+ 2y + z = 4

3x+ 5y + 2z + w = 112x+ 2y − w = 3

,

0 −1 −1 0 −21 2 1 0 43 5 2 1 112 2 0 −1 3

. (3.3.1)

We now perform elementary row operations on the equation. We will onlywrite out the augmented coefficient matrix. Our procedure is the same asbefore. We try to sequentially introduce zeros, column by column, in the

MATH 2574H 41 Yoichiro Mori

Page 42: labook

off-diagonal elements.

0 −1 −1 0 −21 2 1 0 43 5 2 1 112 2 0 −1 3

−→

1 2 1 0 40 −1 −1 0 −23 5 2 1 112 2 0 −1 3

−→

1 2 1 0 40 −1 −1 0 −20 −1 −1 1 −10 −2 −2 −1 −5

−→

1 0 −1 0 00 1 1 0 20 0 0 1 10 0 0 −1 −1

(3.3.2)

At this point, we are stuck. We would like to proceed further to introduce1 in the (3, 3) position and make the (1, 3) and (2, 3) elements zero. But wecannot do so since there both (3, 3) and (4, 3) elements are zero. We cannotintroduce a nonzero element to (3, 3) by exchanging rows. We thereforeleave the third column as is and move on to the fourth column.

1 0 −1 0 00 1 1 0 20 0 0 1 10 0 0 −1 −1

−→

1 0 −1 0 00 1 1 0 20 0 0 1 10 0 0 0 0

(3.3.3)

Going back to the equations, the above corresponds to

x− z = 0,

y + z = 2,

w = 1,

0 = 0.

(3.3.4)

The solution to (3.3.1) is:

xyzw

= c

1−110

+

0201

, (3.3.5)

where c is an arbitrary constant. Therefore, (3.3.1) has infinitely manysolutions.

What happens for other values of the right hand side b? When the righthand side is the zero vector, a linear equation will always have a solution.

MATH 2574H 42 Yoichiro Mori

Page 43: labook

It is easy to see that

xyzw

= c

1−110

. (3.3.6)

If the right hand side of (3.3.1) is changed to (−2, 4, 11, 4)T , then the finalaugmented coefficient matrix will be:

1 0 −1 0 00 1 1 0 20 0 0 1 10 0 0 0 1

. (3.3.7)

The last row corresponds to the equation:

0 = 1 (3.3.8)

which can never be satisfied. Therefore, in this case, the linear equation willhave no solution.

3.4 The General n× n Linear Equation

The general situation is as follows. Let us return to (3.2.1). By performinga series of elementary row operations, it is always possible to transform then × n coefficient matrix A into a row echelon matrix by row reduction. Arow echelon matrix is a matrix with the following properties:

1. If row i is all zeros, then all rows below it (row j > i) are all zeros aswell.

2. If row i is not all zeros, then the first non-zero element is 1. This iscalled a pivot.

3. The elements above the pivot are all 0.

4. If row i+ 1 is not all zeros, then the pivot of row i+ 1 is to the rightof the pivot of row i.

The final coefficient matrix in (3.3.3) is in row echelon form, and the pivotsare in boldface:

1 0 −1 00 1 1 00 0 0 1

0 0 0 0

. (3.4.1)

MATH 2574H 43 Yoichiro Mori

Page 44: labook

We now define the rank of a matrix.

Definition 3 (Rank of a Matrix). If a matrix A is reduced to a row echelonmatrix R, the number of pivots, or the number of non-zero rows of the rowechelon matrix R is called the rank of the matrix A.

In the case of the coefficient matrix (3.3.1), the corresponding row eche-lon matrix is (3.4.1) and therefore the rank is 3. For this definition to makeany sense, it has to be shown that a matrix A cannot be reduced to twodifferent row echelon forms with a different number of pivots. In fact, itturns out that the row-echelon matrix for any matrix is unique (we will notprove this fact, but see discussion below (3.4.4)).

We are now ready to discuss the general solvability of (3.2.1) and theassociated homogeneous equation (3.2.2).

Suppose, by row reduction, we have reduced the coefficient matrix A torow echelon form R, and suppose the number of non-zero rows, that is, therank of A, is equal to r. For the homogeneous equation (3.2.2), x = 0 isalways a solution. If the row echelon matrix R is in fact the identity matrix(r = n) then x = 0 is the only solution.

Suppose r < n, and let us consider the homogeneous equation. We onlyhave r equations that are non-zero and the rest of the n − r equations arejust 0 = 0. In the case of equation (3.3.1), for example, the rank of thecoefficient matrix is 3, and therefore, we only have 3 equations, the fourthand last equation being 0 = 0. In the case of (3.3.1) with 0 right handside, we found that z can be chosen arbitrarily (see (3.3.4)) and the solutionto the homogeneous system was given by (3.3.6). There is one arbitraryconstant in the solution.

In general, we can read off the solution structure from the row echelonmatrix R as follows. Column i of the matrix R corresponds to an unknownxi. Let i1 < i2 < · · · < ir be the columns with a pivot and j1 < j2 · · · < jn−r

be the columns without a pivot. In the case of (3.4.1), columns 1, 2, 4 willcorrespond to i1, i2, i3 and column 3 will correspond to j1. The unknownsxj1 , · · · , xjn−r

can be chosen freely. They will be the arbitrary constants inthe solution. The unknowns xik will be written in terms xjℓ, where jℓ > ik.In (3.4.1), xj1 = x3 = z can be chosen arbitrarily. The unknowns xi1 =x1 = x and xi2 = x2 = y depend on xj1 = z. The unknown xi3 = x4 = w issimply equal to 0 since there are no jℓ such that jℓ > i3. Suppose we let:

xjℓ = cℓ, ℓ = 1, · · · , n− r. (3.4.2)

The solution to the homogeneous equation can be written as:

x = c1a1 + · · ·+ cn−ran−r (3.4.3)

MATH 2574H 44 Yoichiro Mori

Page 45: labook

where the vectors a1, · · · ,an−r are given as follows. Let (aℓ)k be the k-th component of the vector aℓ and Rij be the elements of the row-echelonmatrix R:

(aℓ)k =

−Rqℓ if k = iq, iq ≤ jℓ

1 if k = jℓ

0 otherwise.

(3.4.4)

An important point here is that a1, · · · ,an−r are linearly independent. Thiscan be seen by noting that aℓ is the only vector among the n − r vectorswith non-zero value at the jℓ position.

From the expression of (3.4.4), it is not too difficult to show that the row-echelon matrix must be unique. If two different row-echelon matrices wereto correspond to the same coefficient matrix A, then we will get two differentexpressions for the solution Ax = 0. One then shows that it is impossibleto have two different expressions satisfying both (3.4.3) and (3.4.4). This inturn establishes the fact that the rank is well-defined.

What if b 6= 0? In the case when r = n, or equivalently, when thecoefficient matrix can be reduced to the identity matrix, we know that thereis always a solution and that it is unique.

Consider the case when rank r < n. The solution to the homogeneousequation is given by (3.4.3). There can be two different cases depending onthe right hand side b. In the case of (3.3.1), if b = (−2, 4, 11, 3)T , then thesolutions were given by (3.3.5) where as if b = (−2, 4, 11, 5)T , then therewas no solution.

Suppose, after row reduction, the augmented coefficient matrix (A|b)has been reduced to (R|b∗). If the last n − r elements of the b∗ are all 0,then there is a solution. If not, there is no solution. It is easy to see thatany choice of b∗ can be obtained starting from some right hand side b. Toproduce such a b, we have only to apply the elementary row operations onb∗ in reverse.

Suppose the last n− r elements of b∗ are zero so that there is a solution.Take one solution, say x0:

Ax0 = b. (3.4.5)

Any other solution x must satisfy Ax = b, so, we have

A(x− x0) = 0. (3.4.6)

Therefore, x−x0 satisfies the homogeneous equation, whose solution is givenby (3.4.3). The solutions to (3.2.1) can be expressed, in this case as:

x = x0 + c1a1 + · · · + cn−ran−r. (3.4.7)

MATH 2574H 45 Yoichiro Mori

Page 46: labook

We may now state what we found as a theorem.

Theorem 6. The following statements are equivalent for a n×n matrix A.

1. The matrix A can be reduced to the identity matrix through row reduc-tion.

2. The matrix A has an inverse.

3. The equation Ax = 0 has a unique solution.

4. The equation Ax = b has a solution for every b.

Proof. We have already seen that item 1 implies item 2. Given item 2,Ax = 0 has a unique solution because:

A−1Ax = x = A−10 = 0. (3.4.8)

Likewise, if item 2 holds, Ax = b has a (unique) solution for every b.Suppose item 3 is true but item 1 is not true. Then, the matrix A will bereduced to a row echelon matrix that is not the identity matrix. In this case,as we argued above, Ax = 0 will have infinitely many solutions parametrizedby a number of constants. Therefore, item 3 implies item 1. In much thesame way, item 4 implies item 1.

An equivalent statement is the following.

Theorem 7. The following statements are equivalent for a n×n matrix A:

1. The matrix A is reduced to row echelon form that is not the identitymatrix.

2. The matrix A does not have an inverse.

3. The equation Ax = 0 has infinitely many solutions.

4. There are vectors b for which the equation Ax = b does not have asolution.

MATH 2574H 46 Yoichiro Mori

Page 47: labook

3.5 n Linear Equations in m Unknowns

The foregoing discussion on the rank and the row-echelon form can be ap-plied to n linear equations in m unknowns, where n and m may not neces-sarily be equal.

Ax = b, A =

a11 · · · a1m...

. . ....

an1 · · · anm

, x =

x1...xm

, b =

b1...bn

(3.5.1)

To solve this linear system, we apply the elementary row operations andreduce to row echelon form. The resulting solution can be obtained in exactlythe same way. Let us look at an example. Consider:

1 2 31 −1 0−1 3 25 3 8

xyz

=

0000

(3.5.2)

After some computation, we see that the reduced row-echelon form of thecoefficient matrix is:

1 0 10 1 10 0 00 0 0

(3.5.3)

Therefore, the solution to the above homogeneous equation is given by:

xyz

= c

111

. (3.5.4)

Let us take another example:

1 2 3 −1 1−1 3 2 −4 32 −2 0 4 23 3 6 0 0

x1x2x3x4x5

=

00000

. (3.5.5)

The row echelon matrix is given by:

1 0 1 1 00 1 1 −1 00 0 0 0 10 0 0 0 0

(3.5.6)

MATH 2574H 47 Yoichiro Mori

Page 48: labook

Therefore, the solution is given by:

x1x2x3x4x5

= c1

11100

+ c2

1−1010

. (3.5.7)

In the above, we only considered homogeneous equations, but the resultsare the same for non-homogeneous equations. Let us put these observationsinto a theorem.

Theorem 8. Suppose we have n linear equation in m unknowns (3.5.1).Suppose the augmented coefficient matrix (A|b) has been reduced to rowechelon form (R|b∗). Let r be the number of pivots of R (or equivalently,the rank of A.

1. If b = 0, the solution can be written as:

x = c1a1 + · · ·+ cm−ram−r (3.5.8)

where aℓ are given by (3.4.4). In particular, aℓ are linearly indepen-dent.

2. If the last n − r elements of b∗ are all equal to 0, then (3.5.1) hasa solution. If x0 is a particular solution, then the general solution isgiven by:

x = x0 + c1a1 + · · ·+ cm−ram−r. (3.5.9)

3. If not all of the last n − r elements of b∗ are zero, then there is nosolution.

Let us also state a simple but useful consequence of the above theorem.

Proposition 1. Consider (3.5.1) with n linear equations in m unknowns,and let b = 0. If m > n, there is a nontrivial solution (a solution x 6= 0) tothis system.

Proof. By the form of the row-echelon matrix, the rank r of A is alwayssmaller than or equal to n. Therefore, r ≤ n < m. By item 1 of Theorem 8,there must be a non-trivial solution.

MATH 2574H 48 Yoichiro Mori

Page 49: labook

3.6 Exercises

1. Put the following matrices in row echelon form. Write out all theelementary row operations you used in terms of matrices.

3 2 −32 −1 11 1 1

,

4 8 −122 2 −25 −5 5

,

4 −12 32 41 −1 −1 11 10 0 1

4 −8 161 −3 62 1 1

,

(

2 2 5 61 1 −2 2

)

2. Find the inverse matrix, if any, of the following matrices.

−3 2 2−2 2 12 −1 −1

,

2 1 −11 −2 22 1 1

,

3 −4 −3−2 2 12 −1 0

3. Find the solution to the equations Ax = 0 where A is given by thematrices in Exercise 1.

4. Find the solution to the following system of equations.

1 −3 12 1 31 5 5

xyz

=

232

,

1 2 3 42 3 4 03 4 0 14 0 1 2

xyzw

=

0123

1 2 22 1 13 2 20 1 1

xyz

=

1230

5. Given two square matrices A and B, suppose AB is invertible. Showthat A and B must both be invertible. (Hint: use the fact that, ifA or B is not invertible, Ax = 0 or Bx = 0 must have a non-trivialsolution.)

6. Using the above, show that, ifAB = I for two square matrices, BA = Ias well.

MATH 2574H 49 Yoichiro Mori

Page 50: labook

7. Suppose a n×n matrix A is such that the sum of the elements in eachrow is equal to 0. That is to say, if the elements of the matrix A isgiven by aij ,

∑nj=1

aij = 0. Show that the matrix is not invertible.

(Hint: Let v = (1, · · · , 1)T. Compute Av.)

MATH 2574H 50 Yoichiro Mori

Page 51: labook

Chapter 4

Linear Independence andLinear Transformations

4.1 Linear Independence, Subspaces and Dimen-

sion

Suppose we are given vectors v1, · · · ,vm in Rn. We would like to know

whether these vectors are linearly independent. This amounts to askingwhether

x1v1 + · · · + xmvm = 0 (4.1.1)

has a non-trivial solution (the trivial solution is x1 = · · · = xm = 0, anythingelse is a non-trivial solution). Let A be the n × m matrix whose columnvectors are given by vℓ. Then, the above question is equivalent to askingwhether

Ax = 0, x = (x1, · · · xm)T (4.1.2)

where has a non-trivial solution. This is nothing other than (3.5.1). Theorem8 tells us exactly when there are non-trivial solutions. If the rank r is equalto m, then there is only the trivial solution, x = 0, and thus, the vectorsv1, · · · ,vm are linearly independent. Otherwise, the vectors are linearlydependent.

We state the above observation as a proposition.

Proposition 2. Let v1, · · · ,vm be vectors in Rn, and let A be the n ×m

matirx whose column vectors are given by v1, · · · ,vm. Then, if the rank rof A is equal to m, then the vectors v1, · · · ,vm are linearly independent. Ifnot, the vectors are linearly dependent.

51

Page 52: labook

As an immediate consequence of the above, we have the following.

Proposition 3. It is impossible to have more than n linearly independentvectors in R

n.

Proof. Suppose we have m > n linearly independent vectors. This impliesthat the n × m matrix A formed by these vectors must have rank m, byProposition 2. But the row-echelon matrix is a n×m matrix, and therefore,can only have at most n pivot columns. Therefore, its rank is at most n.Since m < n, this is a contradiction.

Another consequence of (2) is the following.

Theorem 9. For a n×n matrix A, the following statements are equivalent.

1. A is invertible.

2. The column vectors of A are linearly independent.

3. The row vectors of A are linearly independent.

Proof. Item (2) is equivalent to the statement that

Ax = 0 (4.1.3)

has only the trivial solution. The equivalence of item 1 and item 2 thusfollows from Theorem 6. Since the row vectors of A are the column vectorsof AT, item 3 is equivalent to AT being invertible. So we have only to showthat the invertibility of A is equivalent to the invertibility of AT. SupposeA is invertible. Then, there is a matrix B such that

AB = BA = I. (4.1.4)

Let us now take the transpose of the above, and use (1.4.3).

BTAT = ATBT = IT = I. (4.1.5)

We see therefore that BT is the inverse of AT and therefore, AT is invertible.Since the transpose of AT is A, we can repeat the same argument to showthat the invertibility of AT implies the invertibility of A.

Now, suppose the rank r of A in (4.1.2) is smaller than m. Let us lookat the situation a little further. As we did in Section 3.4, we label thecolumns with pivots as i1 < i2 < · · · < ir and the columns without pivots as

MATH 2574H 52 Yoichiro Mori

Page 53: labook

j1 < j2 < · · · < jm−r. According to theorem 8, the solution to (4.1.2) canbe written as:

x = c1a1 + · · ·+ cm−ram−r. (4.1.6)

Take any aℓ, ℓ = 1, · · · ,m−r. x = aℓ is a solution to (4.1.2), and this impliesthat the vector vjℓ can be written as a linear combination of viq , iq < jℓ. Onthe other hand, the vectors vi1 , · · · ,vir are linearly independent. Indeed,if vi1 , · · · ,vir is linearly dependent, there will be a nontrivial solution to(4.1.2) such that xjℓ = 0 for all jℓ. But xjℓ = 0 implies cℓ = 0 in (4.1.6), acontradiction. Let us put this into a proposition.

Proposition 4. Suppose we have a n ×m matrix A whose rank is r, andsuppose we reduced A to row echelon form R. The r column vectors of Acorresponding to the pivot columns of R are linearly independent, and therest of the column vectors of A can be written as linear combinations of theser vectors.

Example 8. Consider the column vectors of the 4× 5 matrix (3.5.5). Therank of this matrix is 3, as can be seen by (3.5.6). Looking at the row echelonform of this matrix (3.5.6), we see that the following column vectors 1, 2, 5are linearly independent:

v1 =

1−123

,v2 =

23−23

,v5 =

1320

. (4.1.7)

The column vectors v3 and v4 are expressed as:

v3 = v1 + v2, v4 = v1 − v2. (4.1.8)

To proceed further, let us introduce the notion of a subspace.

Definition 4 (Subspace of Rn). A subspace V of Rn is a subset of Rn withthe following two properties.

1. For a vector v ∈ V and an arbitrary scalar c, cv also belongs to V .

2. For two vectors v,w ∈ V , v+w is also in V .

Given m vectors v1, · · · ,vm, the set of all vectors:

c1v1 + · · ·+ vm (4.1.9)

forms a subspace. If all vectors in a subspace V can be written as linearcombinations of vectors v1, · · · ,vm, we say that the vectors v1, · · · ,vm spansV .

MATH 2574H 53 Yoichiro Mori

Page 54: labook

Proposition 5. Every subspace V of Rn is spanned by a finite number oflinearly independent vectors.

Proof. If the subspace consists of just the 0 vector, there is nothing to prove.Suppose otherwise. Pick a non-zero vector v1 that is in V . Consider thespan:

c1v1, c1 ∈ R. (4.1.10)

If this spans all of V , we are done. If not, there must be a vector v2 thatcannot be expressed in the above form. Therefore, v1 and v2 are linearlyindependent and

c1v1 + c2v2, c1, c2 ∈ R (4.1.11)

must belong to V . If this spans all of V , we are done. If not, we add anothervector v3 not expressible as above. This is thus linearly independent withrespect to the rest. This process has to stop before we add the n+1st vector,since there are at most n linearly vectors in Rn, according to Proposition3.

Definition 5 (Basis). Suppose a subspace V of Rn is spanned by linearlyindependent vectors v1, · · · ,vm. We say that such vectors are a set of basisvectors of V .

Proposition 5 thus states that every subspace has a basis.

Example 9. Consider the following subset V of R3:

c1

124

+ c2

110

, (4.1.12)

where c1 and c2 are arbitrary constants. This is a subspace of R3 spannedby (1, 2, 4)T and (1, 1, 0)T . The two vectors are linearly independent, andtherefore, the two vectors form a basis of V and the dimension of V is 2. Itis also possible to express the same subspace as:

c1

234

+ c2

110

. (4.1.13)

There are thus many different ways of expressing the same subspace. It isalso true that V can be expressed as:

c1

124

+ c2

234

+ c3

110

. (4.1.14)

MATH 2574H 54 Yoichiro Mori

Page 55: labook

but in this case, the three vectors are not linearly independent.

As we have seen above, there are various choices for basis vectors of asubspace, but the number of basis vectors is always the same.

Proposition 6. Two sets of basis vectors always has the same number ofvectors.

Proof. Suppose otherwise. Then, there are basis vectors v1, · · · ,vm andw1, · · · ,wq with m 6= q. Suppose m < q. Then, each vector in wk can bewritten as:

wk = a1kv1 + a2kv2 + · · ·+ amkvm. (4.1.15)

where the ajk are scalar constants. To examine linear independence of wk,we must examine the expression

q∑

k=1

xkwk =

m∑

j=1

(

q∑

k=1

ajkxk

)

vj = 0, (4.1.16)

where xk are scalars. Since the vj are linearly independent, we have:

q∑

k=1

ajkxk = 0 for j = 1, · · · ,m (4.1.17)

This is a linear homogeneous equation with m equations in q unknownsx1, · · · , xq. Since m < q, by Proposition 1 there is a non-trivial solution.This contradicts the assumption that w1, · · · ,wq were linearly independent.The case q < m can be handled in exactly the same manner.

The above proposition allows us to define the dimension of a subspace.

Definition 6. The dimension of a subspace V in Rn is the number of basis

vectors of the subspace.

Example 10.

4.2 Linear Transformations

An n × n matrix A can be seen as a map from Rn to R

n. This is calleda linear transformation. We define two important concepts for a lineartransformation.

MATH 2574H 55 Yoichiro Mori

Page 56: labook

Definition 7 (Kernel and Image). The kernel or nullspace of a matrix Ais set of vectors v ∈ R

n that satisfy:

Av = 0. (4.2.1)

The kernel of A is written as kerA. The image of a matrix A is the set ofvectors in v ∈ R

n for whichAx = v (4.2.2)

has a solution x. The image of A is written as ImA.

Both the kernel and image are subspaces of Rn. This can be seen asfollows. Suppose v and w are in kerA. Then,

A(cv) = cA(v) = 0, A(v +w) = Av +Aw = 0. (4.2.3)

Therefore, cv and v +w are in the kernel of A. Take two vectors v and w

in the image of A. This means that there are vectors x and y such that

Ax = v, Ay = w. (4.2.4)

Therefore, we have

A(cx) = cAx = cv, A(x+ y) = Ax+Ay = v+w. (4.2.5)

Since the kernel and image are both subspaces of Rn, we can considertheir dimension.

Definition 8. The rank of a n × n matrix A as defined in Proposition ??

is equal to the dimension of the image of A.

The main result of this chapter is the following.

Theorem 10. Suppose A is a n× n square matrix. Then,

rankA+ dimKerA = n, (4.2.6)

where dimKerA is the dimension of the kernel of A.

Proof.

MATH 2574H 56 Yoichiro Mori

Page 57: labook

Chapter 5

The Determinant

Given a n×n matrix A, we would like to define its determinant. We alreadyhave a definition for the 2× 2 matrix. We define the determinant of a n×nmatrix recursively.

57