Part IB - Linear Algebra - Maths Lecture Notes · 0 Introduction IB Linear Algebra 0 Introduction In IA Vectors and Matrices, we have learnt about vectors (and matrices) in a rather

Part IB — Linear Algebra

Based on lectures by S. J. WadsleyNotes taken by Dexter Chua

Michaelmas 2015

These notes are not endorsed by the lecturers, and I have modified them (oftensignificantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

Definition of a vector space (over R or C), subspaces, the space spanned by a subset.Linear independence, bases, dimension. Direct sums and complementary subspaces.

[3]

Linear maps, isomorphisms. Relation between rank and nullity. The space of linearmaps from U to V , representation by matrices. Change of basis. Row rank and columnrank. [4]

Determinant and trace of a square matrix. Determinant of a product of two matricesand of the inverse matrix. Determinant of an endomorphism. The adjugate matrix. [3]

Eigenvalues and eigenvectors. Diagonal and triangular forms. Characteristic andminimal polynomials. Cayley-Hamilton Theorem over C. Algebraic and geometricmultiplicity of eigenvalues. Statement and illustration of Jordan normal form. [4]

Dual of a finite-dimensional vector space, dual bases and maps. Matrix representation,rank and determinant of dual map. [2]

Bilinear forms. Matrix representation, change of basis. Symmetric forms and their linkwith quadratic forms. Diagonalisation of quadratic forms. Law of inertia, classificationby rank and signature. Complex Hermitian forms. [4]

Inner product spaces, orthonormal sets, orthogonal projection, V = W ⊕W⊥. Gram-

Schmidt orthogonalisation. Adjoints. Diagonalisation of Hermitian matrices. Orthogo-

nality of eigenvectors and properties of eigenvalues. [4]

1

Contents IB Linear Algebra

Contents

0 Introduction 3

1 Vector spaces 41.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . . 41.2 Linear independence, bases and the Steinitz exchange lemma . . 61.3 Direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Linear maps 152.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . . 152.2 Linear maps and matrices . . . . . . . . . . . . . . . . . . . . . . 192.3 The first isomorphism theorem and the rank-nullity theorem . . . 212.4 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.5 Elementary matrix operations . . . . . . . . . . . . . . . . . . . . 27

3 Duality 293.1 Dual space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Dual maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Bilinear forms I 37

5 Determinants of matrices 41

6 Endomorphisms 496.1 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.2 The minimal polynomial . . . . . . . . . . . . . . . . . . . . . . . 53

6.2.1 Aside on polynomials . . . . . . . . . . . . . . . . . . . . 536.2.2 Minimal polynomial . . . . . . . . . . . . . . . . . . . . . 54

6.3 The Cayley-Hamilton theorem . . . . . . . . . . . . . . . . . . . 576.4 Multiplicities of eigenvalues and Jordan normal form . . . . . . . 63

7 Bilinear forms II 727.1 Symmetric bilinear forms and quadratic forms . . . . . . . . . . . 727.2 Hermitian form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

8 Inner product spaces 838.1 Definitions and basic properties . . . . . . . . . . . . . . . . . . . 838.2 Gram-Schmidt orthogonalization . . . . . . . . . . . . . . . . . . 868.3 Adjoints, orthogonal and unitary maps . . . . . . . . . . . . . . . 898.4 Spectral theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

2

0 Introduction IB Linear Algebra

0 Introduction

In IA Vectors and Matrices, we have learnt about vectors (and matrices) in arather applied way. A vector was just treated as a “list of numbers” representinga point in space. We used these to represent lines, conics, planes and manyother geometrical notions. A matrix is treated as a “physical operation” onvectors that stretches and rotates them. For example, we studied the propertiesof rotations, reflections and shears of space. We also used matrices to expressand solve systems of linear equations. We mostly took a practical approach inthe course.

In IB Linear Algebra, we are going to study vectors in an abstract setting.Instead of treating vectors as “lists of numbers”, we view them as things wecan add and scalar-multiply. We will write down axioms governing how theseoperations should behave, just like how we wrote down the axioms of grouptheory. Instead of studying matrices as an array of numbers, we instead look atlinear maps between vector spaces abstractly.

In the course, we will, of course, prove that this abstract treatment of linearalgebra is just “the same as” our previous study of “vectors as a list of numbers”.Indeed, in certain cases, results are much more easily proved by working withmatrices (as an array of numbers) instead of abstract linear maps, and we don’tshy away from doing so. However, most of the time, looking at these abstractlywill provide a much better fundamental understanding of how things work.

3

1 Vector spaces IB Linear Algebra

1 Vector spaces

1.1 Definitions and examples

Notation. We will use F to denote an arbitrary field, usually R or C.

Intuitively, a vector space V over a field F (or an F-vector space) is a spacewith two operations:

– We can add two vectors v1,v2 ∈ V to obtain v1 + v2 ∈ V .

– We can multiply a scalar λ ∈ F with a vector v ∈ V to obtain λv ∈ V .

Of course, these two operations must satisfy certain axioms before we cancall it a vector space. However, before going into these details, we first look at afew examples of vector spaces.

Example.

(i) Rn = {column vectors of length n with coefficients in R} with the usualaddition and scalar multiplication is a vector space.

An m× n matrix A with coefficients in R can be viewed as a linear mapfrom Rm to Rn via v 7→ Av.

This is a motivational example for vector spaces. When confused aboutdefinitions, we can often think what the definition means in terms of Rnand matrices to get some intuition.

(ii) Let X be a set and define RX = {f : X → R} with addition (f + g)(x) =f(x) + g(x) and scalar multiplication (λf)(x) = λf(x). This is a vectorspace.

More generally, if V is a vector space, X is a set, we can define V X = {f :X → V } with addition and scalar multiplication as above.

(iii) Let [a, b] ⊆ R be a closed interval, then

C([a, b],R) = {f ∈ R[a,b] : f is continuous}

is a vector space, with operations as above. We also have

C∞([a, b],R) = {f ∈ R[a,b] : f is infinitely differentiable}

(iv) The set of m× n matrices with coefficients in R is a vector space, usingcomponentwise addition and scalar multiplication, is a vector space.

Of course, we cannot take a random set, define some random operations calledaddition and scalar multiplication, and call it a vector space. These operationshave to behave sensibly.

Definition (Vector space). An F-vector space is an (additive) abelian group Vtogether with a function F× V → V , written (λ,v) 7→ λv, such that

(i) λ(µv) = λµv for all λ, µ ∈ F, v ∈ V (associativity)

(ii) λ(u + v) = λu + λv for all λ ∈ F, u,v ∈ V (distributivity in V )

4


(iii) (λ+ µ)v = λv + µv for all λ, µ ∈ F, v ∈ V (distributivity in F)

(iv) 1v = v for all v ∈ V (identity)

We always write 0 for the additive identity in V , and call this the identity.By abuse of notation, we also write 0 for the trivial vector space {0}.

In a general vector space, there is no notion of “coordinates”, length, angleor distance. For example, it would be difficult to assign these quantities to thevector space of real-valued continuous functions in [a, b].

From the axioms, there are a few results we can immediately prove.

Proposition. In any vector space V , 0v = 0 for all v ∈ V , and (−1)v = −v,where −v is the additive inverse of v.

Proof is left as an exercise.In mathematics, whenever we define “something”, we would also like to define

a “sub-something”. In the case of vector spaces, this is a subspace.

Definition (Subspace). If V is an F-vector space, then U ⊆ V is an (F-linear)subspace if

(i) u,v ∈ U implies u + v ∈ U .

(ii) u ∈ U, λ ∈ F implies λu ∈ U .

(iii) 0 ∈ U .

These conditions can be expressed more concisely as “U is non-empty and ifλ, µ ∈ F,u,v ∈ U , then λu + µv ∈ U”.

Alternatively, U is a subspace of V if it is itself a vector space, inheriting theoperations from V .

We sometimes write U ≤ V if U is a subspace of V .

Example.

(i) {(x1, x2, x3) ∈ R3 : x1 + x2 + x3 = t} is a subspace of R3 iff t = 0.

(ii) Let X be a set. We define the support of f in FX to be supp(f) ={x ∈ X : f(x) 6= 0}. Then the set of functions with finite supportforms a vector subspace. This is since supp(f + g) ⊆ supp(f) ∪ supp(g),supp(λf) = supp(f) (for λ 6= 0) and supp(0) = ∅.

If we have two subspaces U and V , there are several things we can do withthem. For example, we can take the intersection U ∩ V . We will shortly showthat this will be a subspace. However, taking the union will in general notproduce a vector space. Instead, we need the sum:

Definition (Sum of subspaces). Suppose U,W are subspaces of an F vectorspace V . The sum of U and V is

U +W = {u + w : u ∈ U,w ∈W}.

Proposition. Let U,W be subspaces of V . Then U + W and U ∩ W aresubspaces.

5


Proof. Let ui + wi ∈ U +W , λ, µ ∈ F. Then

λ(u1 + w1) + µ(u2 + w2) = (λu1 + µu2) + (λw1 + µw2) ∈ U +W.

Similarly, if vi ∈ U ∩ W , then λv1 + µv2 ∈ U and λv1 + µv2 ∈ W . Soλv1 + µv2 ∈ U ∩W .

Both U ∩W and U +W contain 0, and are non-empty. So done.

In addition to sub-somethings, we often have quotient-somethings as well.

Definition (Quotient spaces). Let V be a vector space, and U ⊆ V a subspace.Then the quotient group V/U can be made into a vector space called the quotientspace, where scalar multiplication is given by (λ,v + U) = (λv) + U .

This is well defined since if v + U = w + U ∈ V/U , then v −w ∈ U . Hencefor λ ∈ F, we have λv − λw ∈ U . So λv + U = λw + U .

1.2 Linear independence, bases and the Steinitz exchangelemma

Recall that in Rn, we had the “standard basis” made of vectors of the formei = (0, · · · , 0, 1, 0, · · · , 0), with 1 in the ith component and 0 otherwise. Wecall this a basis because everything in Rn can be (uniquely) written as a sumof (scalar multiples of) these basis elements. In other words, the whole Rn isgenerated by taking sums and multiples of the basis elements.

We would like to capture this idea in general vector spaces. The mostimportant result in this section is to prove that for any vector space V , any twobasis must contain the same number of elements. This means we can define the“dimension” of a vector space as the number of elements in the basis.

While this result sounds rather trivial, it is a very important result. We willin fact prove a slightly stronger statement than what was stated above, and thisensures that the dimension of a vector space is well-behaved. For example, thesubspace of a vector space has a smaller dimension than the larger space (atleast when the dimensions are finite).

This is not the case when we study modules in IB Groups, Rings and Modules,which are generalizations of vector spaces. Not all modules have basis, whichmakes it difficult to define the dimension. Even for those that have basis, thebehaviour of the “dimension” is complicated when, say, we take submodules.The existence and well-behavedness of basis and dimension is what makes linearalgebra different from modules.

Definition (Span). Let V be a vector space over F and S ⊆ V . The span of Sis defined as

〈S〉 =

{n∑i=1

λisi : λi ∈ F, si ∈ S, n ≥ 0

}This is the smallest subspace of V containing S.

Note that the sums must be finite. We will not play with infinite sums, sincethe notion of convergence is not even well defined in a general vector space.

Example.

6


(i) Let V = R3 and S =

1

00

,

011

,

122

. Then

〈S〉 =

abb

: a, b ∈ R

.

Note that any subset of S of order 2 has the same span as S.

(ii) Let X be a set, x ∈ X. Define the function δx : X → F by

δx(y) =

{1 y = x

0 y 6= x.

Then 〈δx : x ∈ X〉 is the set of all functions with finite support.

Definition (Spanning set). Let V be a vector space over F and S ⊆ V . S spansV if 〈S〉 = V .

Definition (Linear independence). Let V be a vector space over F and S ⊆ V .Then S is linearly independent if whenever

n∑i=1

λisi = 0 with λi ∈ F, s1, s2, · · · , sn ∈ S distinct,

we must have λi = 0 for all i.If S is not linearly independent, we say it is linearly dependent.

Definition (Basis). Let V be a vector space over F and S ⊆ V . Then S is abasis for V if S is linearly independent and spans V .

Definition (Finite dimensional). A vector space is finite dimensional if there isa finite basis.

Ideally, we would want to define the dimension as the number of vectors inthe basis. However, we must first show that this is well-defined. It is certainlyplausible that a vector space has a basis of size 7 as well as a basis of size 3. Wemust show that this can never happen, which is something we’ll do soon.

We will first have an example:

Example. Again, let V = R3 and S =

1

00

,

011

,

122

. Then S is

linearly dependent since

1

100

+ 2

011

+ (−1)

122

= 0.

S also does not span V since

001

6∈ 〈S〉.7


Note that no linearly independent set can contain 0, as 1 · 0 = 0. We alsohave 〈∅〉 = {0} and ∅ is a basis for this space.

There is an alternative way in which we can define linear independence.

Lemma. S ⊆ V is linearly dependent if and only if there are distinct s0, · · · , sn ∈S and λ1, · · · , λn ∈ F such that

n∑i=1

λisi = s0.

Proof. If S is linearly dependent, then there are some λ1, · · · , λn ∈ F not allzero and s1, · · · , sn ∈ S such that

∑λisi = 0. Wlog, let λ1 6= 0. Then

s1 =

n∑i=2

− λiλ1

si.

Conversely, if s0 =∑ni=1 λisi, then

(−1)s0 +

n∑i=1

λisi = 0.

So S is linearly dependent.

This in turn gives an alternative characterization of what it means to be abasis:

Proposition. If S = {e1, · · · , en} is a subset of V over F, then it is a basis ifand only if every v ∈ V can be written uniquely as a finite linear combinationof elements in S, i.e. as

v =

n∑i=1

λiei.

Proof. We can view this as a combination of two statements: it can be spannedin at least one way, and it can be spanned in at most one way. We will see thatthe first part corresponds to S spanning V , and the second part corresponds toS being linearly independent.

In fact, S spanning V is defined exactly to mean that every item v ∈ V canbe written as a finite linear combination in at least one way.

Now suppose that S is linearly independent, and we have

v =

n∑i=1

λiei =

n∑i=1

µiei.

Then we have

0 = v − v =

n∑i=1

(λi − µi)ei.

Linear independence implies that λi − µi = 0 for all i. Hence λi = µi. So v canbe expressed in a unique way.

On the other hand, if S is not linearly independent, then we have

0 =

n∑i=1

λiei

8


where λi 6= 0 for some i. But we also know that

0 =

n∑i=1

0 · ei.

So there are two ways to write 0 as a linear combination. So done.

Now we come to the key theorem:

Theorem (Steinitz exchange lemma). Let V be a vector space over F, andS = {e1, · · · , en} a finite linearly independent subset of V , and T a spanningsubset of V . Then there is some T ′ ⊆ T of order n such that (T \ T ′) ∪ S stillspans V . In particular, |T | ≥ n.

What does this actually say? This says if T is spanning and S is independent,there is a way of grabbing |S| many elements away from T and replace themwith S, and the result will still be spanning.

In some sense, the final remark is the most important part. It tells us thatwe cannot have a independent set larger than a spanning set, and most of ourcorollaries later will only use this remark.

This is sometimes stated in the following alternative way for |T | <∞.

Corollary. Let {e1, · · · , en} be a linearly independent subset of V , and sup-pose {f1, · · · , fm} spans V . Then there is a re-ordering of the {fi} such that{e1, · · · , en, fn+1, · · · , fm} spans V .

The proof is going to be slightly technical and notationally daunting. So ithelps to give a brief overview of what we are going to do in words first. The ideais to do the replacement one by one. The first one is easy. Start with e1. SinceT is spanning, we can write

e1 =∑

λiti

for some ti ∈ T, λi ∈ F non-zero. We then replace with t1 with e1. The result isstill spanning, since the above formula allows us to write t1 in terms of e1 andthe other ti.

We continue inductively. For the rth element, we again write

er =∑

λiti.

We would like to just pick a random ti and replace it with er. However, wecannot do this arbitrarily, since the lemma wants us to replace something in Twith with er. After all that replacement procedure before, some of the ti mighthave actually come from S.

This is where the linear independence of S kicks in. While some of the timight be from S, we cannot possibly have all all of them being from S, or elsethis violates the linear independence of S. Hence there is something genuinelyfrom T , and we can safely replace it with er.

We now write this argument properly and formally.

Proof. Suppose that we have already found T ′r ⊆ T of order 0 ≤ r < n such that

Tr = (T \ T ′r) ∪ {e1, · · · , er}

9


spans V .(Note that the case r = 0 is trivial, since we can take T ′r = ∅, and the case

r = n is the theorem which we want to achieve.)Suppose we have these. Since Tr spans V , we can write

er+1 =

k∑i=1

λiti, λi ∈ F, ti ∈ Tr.

We know that the ei are linearly independent, so not all ti’s are ei’s. So there issome j such that tj ∈ (T \ T ′r). We can write this as

tj =1

λjer+1 +

∑i6=j

−λiλj

ti.

We let T ′r+1 = T ′r ∪ {tj} of order r + 1, and

Tr+1 = (T \ T ′r+1) ∪ {e1, · · · , er+1} = (Tr \ {tj}} ∪ {er+1}

Since tj is in the span of Tr ∪ {er+1}, we have tj ∈ 〈Tr+1〉. So

V ⊇ 〈Tr+1〉 ⊇ 〈Tr〉 = V.

So 〈Tr+1〉 = V .Hence we can inductively find Tn.

From this lemma, we can immediately deduce a lot of important corollaries.

Corollary. Suppose V is a vector space over F with a basis of order n. Then

(i) Every basis of V has order n.

(ii) Any linearly independent set of order n is a basis.

(iii) Every spanning set of order n is a basis.

(iv) Every finite spanning set contains a basis.

(v) Every linearly independent subset of V can be extended to basis.

Proof. Let S = {e1, · · · , en} be the basis for V .

(i) Suppose T is another basis. Since S is independent and T is spanning,|T | ≥ |S|.The other direction is less trivial, since T might be infinite, and Steinitzdoes not immediately apply. Instead, we argue as follows: since T islinearly independent, every finite subset of T is independent. Also, S isspanning. So every finite subset of T has order at most |S|. So |T | ≤ |S|.So |T | = |S|.

(ii) Suppose now that T is a linearly independent subset of order n, but〈T 〉 6= V . Then there is some v ∈ V \ 〈T 〉. We now show that T ∪ {v} isindependent. Indeed, if

λ0v +

m∑i=1

λiti = 0

10


with λi ∈ F, t1, · · · , tm ∈ T distinct, then

λ0v =

m∑i=1

(−λi)ti.

Then λ0v ∈ 〈T 〉. So λ0 = 0. As T is linearly independent, we haveλ0 = · · · = λm = 0. So T ∪ {v} is a linearly independent subset of size> n. This is a contradiction since S is a spanning set of size n.

(iii) Let T be a spanning set of order n. If T were linearly dependent, thenthere is some t0, · · · , tm ∈ T distinct and λ1, · · · , λm ∈ F such that

t0 =∑

λiti.

So t0 ∈ 〈T \ {t0}〉, i.e. 〈T \ {t0}〉 = V . So T \ {t0} is a spanning set oforder n− 1, which is a contradiction.

(iv) Suppose T is any finite spanning set. Let T ′ ⊆ T be a spanning set of leastpossible size. This exists because T is finite. If |T ′| has size n, then doneby (iii). Otherwise by the Steinitz exchange lemma, it has size |T ′| > n.So T ′ must be linearly dependent because S is spanning. So there is somet0, · · · , tm ∈ T distinct and λ1, · · · , λm ∈ F such that t0 =

∑λiti. Then

T ′ \ {t0} is a smaller spanning set. Contradiction.

(v) Suppose T is a linearly independent set. Since S spans, there is someS′ ⊆ S of order |T | such that (S \S′)∪T spans V by the Steinitz exchangelemma. So by (ii), (S \ S′) ∪ T is a basis of V containing T .

Note that the last part is where we actually use the full result of Steinitz.Finally, we can use this to define the dimension.

Definition (Dimension). If V is a vector space over F with finite basis S, thenthe dimension of V , written

dimV = dimF V = |S|.

By the corollary, dimV does not depend on the choice of S. However, it doesdepend on F. For example, dimC C = 1 (since {1} is a basis), but dimR C = 2(since {1, i} is a basis).

After defining the dimension, we can prove a few things about dimensions.

Lemma. If V is a finite dimensional vector space over F, U ⊆ V is a propersubspace, then U is finite dimensional and dimU < dimV .

Proof. Every linearly independent subset of V has size at most dimV . So letS ⊆ U be a linearly independent subset of largest size. We want to show that Sspans U and |S| < dimV .

If v ∈ V \ 〈S〉, then S ∪{v} is linearly independent. So v 6∈ U by maximalityof S. This means that 〈S〉 = U .

Since U 6= V , there is some v ∈ V \ U = V \ 〈S〉. So S ∪ {v} is a linearlyindependent subset of order |S|+ 1. So |S|+ 1 ≤ dimV . In particular, dimU =|S| < dimV .

11


Proposition. If U,W are subspaces of a finite dimensional vector space V , then

dim(U +W ) = dimU + dimW − dim(U ∩W ).

The proof is not hard, as long as we manage to pick the right basis to do theproof. This is our slogan:

When you choose a basis, always choose the right basis.

We need a basis for all four of them, and we want to compare the bases. So wewant to pick bases that are compatible.

Proof. Let R = {v1, · · · ,vr} be a basis for U ∩W . This is a linearly independentsubset of U . So we can extend it to be a basis of U by

S = {v1, · · · ,vr,ur+1, · · · ,us}.

Similarly, for W , we can obtain a basis

T = {v1, · · · ,vr,wr+1, · · · ,wt}.

We want to show that dim(U + W ) = s + t − r. It is sufficient to prove thatS ∪ T is a basis for U +W .

We first show spanning. Suppose u + w ∈ U + W , u ∈ U,w ∈ W . Thenu ∈ 〈S〉 and w ∈ 〈T 〉. So u + w ∈ 〈S ∪ T 〉. So U +W = 〈S ∪ T 〉.

To show linear independence, suppose we have a linear relation

r∑i=1

λivi +

s∑j=r+1

µjuj +

t∑k=r+1

νkwk = 0.

So ∑λivi +

∑µjuj = −

∑νkwk.

Since the left hand side is something in U , and the right hand side is somethingin W , they both lie in U ∩W .

Since S is a basis of U , there is only one way of writing the left hand vectoras a sum of vi and uj . However, since R is a basis of U ∩W , we can write theleft hand vector just as a sum of vi’s. So we must have µj = 0 for all j. Thenwe have ∑

λivi +∑

νkwk = 0.

Finally, since T is linearly independent, λi = νk = 0 for all i, k. So S ∪ T islinearly independent.

Proposition. If V is a finite dimensional vector space over F and U ∪ V is asubspace, then

dimV = dimU + dimV/U.

We can view this as a linear algebra version of Lagrange’s theorem. Combinedwith the first isomorphism theorem for vector spaces, this gives the rank-nullitytheorem.

12


Proof. Let {u1, · · · ,um} be a basis for U and extend this to a basis {u1, · · · ,um,vm+1, · · · ,vn} for V . We want to show that {vm+1 + U, · · · ,vn + U} is a basisfor V/U .

It is easy to see that this spans V/U . If v + U ∈ V/U , then we can write

v =∑

λiui +∑

µivi.

Thenv + U =

∑µi(vi + U) +

∑λi(ui + U) =

∑µi(vi + U).

So done.To show that they are linearly independent, suppose that∑

λi(vi + U) = 0 + U = U.

Then this requires ∑λivi ∈ U.

Then we can write this as a linear combination of the ui’s. So∑λivi =

∑µjuj

for some µj . Since {u1, · · · ,um,vn+1, · · · ,vn} is a basis for V , we must haveλi = µj = 0 for all i, j. So {vi + U} is linearly independent.

1.3 Direct sums

We are going to define direct sums in many ways in order to confuse students.

Definition ((Internal) direct sum). Suppose V is a vector space over F andU,W ⊆ V are subspaces. We say that V is the (internal) direct sum of U andW if

(i) U +W = V

(ii) U ∩W = 0.

We write V = U ⊕W .Equivalently, this requires that every v ∈ V can be written uniquely as u+w

with u ∈ U,w ∈W . We say that U and W are complementary subspaces of V .

You will show in the example sheets that given any subspace U ⊆ V , U musthave a complementary subspace in V .

Example. Let V = R2, and U = 〈(

01

)〉. Then 〈

(11

)〉 and 〈

(10

)〉 are both

complementary subspaces to U in V .

Definition ((External) direct sum). If U,W are vector spaces over F, the(external) direct sum is

U ⊕W = {(u,w) : u ∈ U,w ∈W},

with addition and scalar multiplication componentwise:

(u1,w1) + (u2,w2) = (u1 + u2,w1 + w2), λ(u,w) = (λu, λw).

13


The difference between these two definitions is that the first is decomposingV into smaller spaces, while the second is building a bigger space based on twospaces.

Note, however, that the external direct sum U ⊕W is the internal directsum of U and W viewed as subspaces of U ⊕W , i.e. as the internal direct sumof {(u,0) : u ∈ U} and {(0,v) : v ∈ V }. So these two are indeed compatiblenotions, and this is why we give them the same name and notation.

Definition ((Multiple) (internal) direct sum). If U1, · · · , Un ⊆ V are subspacesof V , then V is the (internal) direct sum

V = U1 ⊕ · · · ⊕ Un =

n⊕i=1

Ui

if every v ∈ V can be written uniquely as v =∑

ui with ui ∈ Ui.This can be extended to an infinite sum with the same definition, just noting

that the sum v =∑

ui has to be finite.

For more details, see example sheet 1 Q. 10, where we prove in particularthat dimV =

∑dimUi.

Definition ((Multiple) (external) direct sum). If U1, · · · , Un are vector spacesover F, the external direct sum is

U1 ⊕ · · · ⊕ Un =

n⊕i=1

Ui = {(u1, · · · ,un) : ui ∈ Ui},

with pointwise operations.This can be made into an infinite sum if we require that all but finitely many

of the ui have to be zero.

14

2 Linear maps IB Linear Algebra

2 Linear maps

In mathematics, apart from studying objects, we would like to study functionsbetween objects as well. In particular, we would like to study functions thatrespect the structure of the objects. With vector spaces, the kinds of functionswe are interested in are linear maps.

2.1 Definitions and examples

Definition (Linear map). Let U, V be vector spaces over F. Then α : U → Vis a linear map if

(i) α(u1 + u2) = α(u1) + α(u2) for all ui ∈ U .

(ii) α(λu) = λα(u) for all λ ∈ F,u ∈ U .

We write L(U, V ) for the set of linear maps U → V .

There are a few things we should take note of:

– If we are lazy, we can combine the two requirements to the single require-ment that

α(λu1 + µu2) = λα(u1) + µα(u2).

– It is easy to see that if α is linear, then it is a group homomorphism (if weview vector spaces as groups). In particular, α(0) = 0.

– If we want to stress the field F, we say that α is F-linear. For example,complex conjugation is a map C→ C that is R-linear but not C-linear.

Example.

(i) Let A be an n×m matrix with coefficients in F. We will write A ∈Mn,m(F).Then α : Fm → Fn defined by v→ Av is linear.

Recall matrix multiplication is defined by: if Aij is the ijth coefficient ofA, then the ith coefficient of Av is Aijvj . So we have

α(λu + µv)i =

m∑j=1

Aij(λu + µv)j

= λ

m∑j=1

Aijuj + µ

m∑j=1

Aijvj

= λα(u)i + µα(v)i.

So α is linear.

(ii) Let X be a set and g ∈ FX . Then we define mg : FX → FX by mg(f)(x) =g(x)f(x). Then mg is linear. For example, f(x) 7→ 2x2f(x) is linear.

(iii) Integration I : (C([a, b]),R)→ (C([a, b]),R) defined by f 7→∫ xaf(t) dt is

linear.

(iv) Differentiation D : (C∞([a, b]),R)→ (C∞([a, b]),R) by f 7→ f ′ is linear.

15


(v) If α, β ∈ L(U, V ), then α+β defined by (α+β)(u) = α(u) +β(u) is linear.

Also, if λ ∈ F, then λα defined by (λα)(u) = λ(α(u)) is also linear.

In this way, L(U, V ) is also a vector space over F.

(vi) Composition of linear maps is linear. Using this, we can show that manythings are linear, like differentiating twice, or adding and then multiplyinglinear maps.

Just like everything else, we want to define isomorphisms.

Definition (Isomorphism). We say a linear map α : U → V is an isomorphismif there is some β : V → U (also linear) such that α ◦ β = idV and β ◦ α = idU .

If there exists an isomorphism U → V , we say U and V are isomorphic, andwrite U ∼= V .

Lemma. If U and V are vector spaces over F and α : U → V , then α is anisomorphism iff α is a bijective linear map.

Proof. If α is an isomorphism, then it is clearly bijective since it has an inversefunction.

Suppose α is a linear bijection. Then as a function, it has an inverseβ : V → U . We want to show that this is linear. Let v1,v2 ∈ V , λ, µ ∈ F. Wehave

αβ(λv1 + µv2) = λv1 + µv2 = λαβ(v1) + µαβ(v2) = α(λβ(v1) + µβ(v2)).

Since α is injective, we have

β(λv1 + µv2) = λβ(v1) + µβ(v2).

So β is linear.

Definition (Image and kernel). Let α : U → V be a linear map. Then theimage of α is

imα = {α(u) : u ∈ U}.

The kernel of α iskerα = {u : α(u) = 0}.

It is easy to show that these are subspaces of V and U respectively.

Example.

(i) Let A ∈Mm,n(F) and α : Fn → Fm be the linear map v 7→ Av. Then thesystem of linear equations

m∑j=1

Aijxj = bi, 1 ≤ i ≤ n

has a solution iff (b1, · · · , bn) ∈ imα.

The kernel of α contains all solutions to∑j Aijxj = 0.

16


(ii) Let β : C∞(R,R)→ C∞(R,R) that sends

β(f)(t) = f ′′(t) + p(t)f ′(t) + q(t)f(t).

for some p, q ∈ C∞(R,R).

Then if y(t) ∈ imβ, then there is a solution (in C∞(R,R)) to the differentialequation

f ′′(t) + p(t)f ′(t) + q(t)f(t) = y(t).

Similarly, kerβ contains the solutions to the homogeneous equation

f ′′(t) + p(t)f ′(t) + q(t)f(t) = 0.

If two vector spaces are isomorphic, then it is not too surprising that theyhave the same dimension, since isomorphic spaces are “the same”. Indeed this iswhat we are going to show.

Proposition. Let α : U → V be an F-linear map. Then

(i) If α is injective and S ⊆ U is linearly independent, then α(S) is linearlyindependent in V .

(ii) If α is surjective and S ⊆ U spans U , then α(S) spans V .

(iii) If α is an isomorphism and S ⊆ U is a basis, then α(S) is a basis for V .

Here (iii) immediately shows that two isomorphic spaces have the samedimension.

Proof.

(i) We prove the contrapositive. Suppose that α is injective and α(S) is linearlydependent. So there are s0, · · · , sn ∈ S distinct and λ1, · · · , λn ∈ F not allzero such that

α(s0) =

n∑i=1

λiα(si) = α

(n∑i=1

λisi

).

Since α is injective, we must have

s0 =

n∑i=1

λisi.

This is a non-trivial relation of the si in U . So S is linearly dependent.

(ii) Suppose α is surjective and S spans U . Pick v ∈ V . Then there is someu ∈ U such that α(u) = v. Since S spans U , there is some s1, · · · , sn ∈ Sand λ1, · · · , λn ∈ F such that

u =

n∑I=1

λisi.

Then

v = α(u) =

n∑i=1

λiα(si).

So α(S) spans V .

17


(iii) Follows immediately from (i) and (ii).

Corollary. If U and V are finite-dimensional vector spaces over F and α : U → Vis an isomorphism, then dimU = dimV .

Note that we restrict it to finite-dimensional spaces since we’ve only shownthat dimensions are well-defined for finite dimensional spaces. Otherwise, theproof works just fine for infinite dimensional spaces.

Proof. Let S be a basis for U . Then α(S) is a basis for V . Since α is injective,|S| = |α(S)|. So done.

How about the other way round? If two vector spaces have the same di-mension, are they necessarily isomorphic? The answer is yes, at least forfinite-dimensional ones.

However, we will not just prove that they are isomorphic. We will show thatthey are isomorphic in many ways.

Proposition. Suppose V is a F-vector space of dimension n <∞. Then writinge1, · · · , en for the standard basis of Fn, there is a bijection

Φ : {isomorphisms Fn → V } → {(ordered) basis(v1, · · · ,vn) for V },

defined byα 7→ (α(e1), · · · , α(en)).

Proof. We first make sure this is indeed a function — if α is an isomorphism,then from our previous proposition, we know that it sends a basis to a basis. So(α(e1), · · · , α(en)) is indeed a basis for V .

We now have to prove surjectivity and injectivity.Suppose α, β : Fn → V are isomorphism such that Φ(α) = Φ(β). In other

words, α(ei) = β(ei) for all i. We want to show that α = β. We have

α

x1...xn

= α

(n∑i=1

xiei

)=∑

xiα(ei) =∑

xiβ(ei) = β

x1...xn

.

Hence α = β.Next, suppose that (v1, · · · ,vn) is an ordered basis for V . Then define

α

x1...xn

=

∑xivi.

It is easy to check that this is well-defined and linear. We also know that α isinjective since (v1, · · · ,vn) is linearly independent. So if

∑xivi =

∑yivi, then

xi = yi. Also, α is surjective since (v1, · · · ,vn) spans V . So α is an isomorphism,and by construction Φ(α) = (v1, · · · ,vn).

18


2.2 Linear maps and matrices

Recall that our first example of linear maps is matrices acting on Fn. We willshow that in fact, all linear maps come from matrices. Since we know that allvector spaces are isomorphic to Fn, this means we can represent arbitrary linearmaps on vector spaces by matrices.

This is a useful result, since it is sometimes easier to argue about matricesthan linear maps.

Proposition. Suppose U, V are vector spaces over F and S = {e1, · · · , en} is abasis for U . Then every function f : S → V extends uniquely to a linear mapU → V .

The slogan is “to define a linear map, it suffices to define its values on abasis”.

Proof. For uniqueness, first suppose α, β : U → V are linear and extend f : S →V . We have sort-of proved this already just now.

If u ∈ U , we can write u =∑ni=1 uiei with ui ∈ F since S spans. Then

α(u) = α(∑

uiei

)=∑

uiα(ei) =∑

uif(ei).

Similarly,

β(u) =∑

uif(ei).

So α(u) = β(u) for every u. So α = β.For existence, if u ∈ U , we can write u =

∑uiei in a unique way. So defining

α(u) =∑

uif(ei)

is unambiguous. To show linearity, let λ, µ ∈ F, u,v ∈ U . Then

α(λu + µv) = α(∑

(λui + µvi)ei

)=∑

(λui + µvi)f(ei)

= λ(∑

uif(ei))

+ µ(∑

vif(ei))

= λα(u) + µα(v).

Moreover, α does extend f .

Corollary. If U and V are finite-dimensional vector spaces over F with bases(e1, · · · , em) and (f1, · · · , fn) respectively, then there is a bijection

Matn,m(F)→ L(U, V ),

sending A to the unique linear map α such that α(ei) =∑ajifj .

We can interpret this as follows: the ith column of A tells us how to writeα(ei) in terms of the fj .

We can also draw a fancy diagram to display this result. Given a basise1, · · · , em, by our bijection, we get an isomorphism s(ei) : U → Fm. Similarly,we get an isomorphism s(fi) : V → Fn.

19


Since a matrix is a linear map A : Fm → Fn, given a matrix A, we canproduce a linear map α : U → V via the following composition

U Fm Fn V.s(ei) A s(fi)

−1

We can put this into a square:

Fm Fn

U V

A

s(ei)

α

s(fi)

Then the corollary tells us that every A gives rise to an α, and every α correspondsto an A that fit into this diagram.

Proof. If α is a linear map U → V , then for each 1 ≤ i ≤ m, we can write α(ei)uniquely as

α(ei) =

n∑j=1

ajifj

for some aji ∈ F. This gives a matrix A = (aij). The previous proposition tellsus that every matrix A arises in this way, and α is determined by A.

Definition (Matrix representation). We call the matrix corresponding to alinear map α ∈ L(U, V ) under the corollary the matrix representing α withrespect to the bases (e1, · · · , em) and (f1, · · · , fn).

It is an exercise to show that the bijection Matn,m(F)→ L(U, V ) is an iso-morphism of the vector spaces and deduce that dimL(U, V ) = (dimU)(dimV ).

Proposition. Suppose U, V,W are finite-dimensional vector spaces over F withbases R = (u1, · · · ,ur) , S = (v1, ..,vs) and T = (w1, · · · ,wt) respectively.

If α : U → V and β : V → W are linear maps represented by A and Brespectively (with respect to R, S and T ), then βα is linear and represented byBA with respect to R and T .

Fr Fs Ft

U V W

A B

s(R)

α

s(S)

β

s(T )

Proof. Verifying βα is linear is straightforward. Next we write βα(ui) as a linear

20


combination of w1, · · · ,wt:

βα(ui) = β

(∑k

Akivk

)=∑k

Akiβ(vk)

=∑k

Aki∑j

Bjkwj

=∑j

(∑k

BjkAki

)wj

=∑j

(BA)jiwj

2.3 The first isomorphism theorem and the rank-nullitytheorem

The main theorem of this section is the rank-nullity theorem, which relates thedimensions of the kernel and image of a linear map. This is in fact an easycorollary of a stronger result, known as the first isomorphism theorem, whichdirectly relates the kernel and image themselves. This first isomorphism is anexact analogy of that for groups, and should not be unfamiliar. We will alsoprovide another proof that does not involve quotients.

Theorem (First isomorphism theorem). Let α : U → V be a linear map. Thenkerα and imα are subspaces of U and V respectively. Moreover, α induces anisomorphism

α : U/ kerα→ imα

(u + kerα) 7→ α(u)

Note that if we view a vector space as an abelian group, then this is exactlythe first isomorphism theorem of groups.

Proof. We know that 0 ∈ kerα and 0 ∈ imα.Suppose u1,u2 ∈ kerα and λ1, λ2 ∈ F. Then

α(λ1u1 + λ2u2) = λ1α(u1) + λ2α(u2) = 0.

So λ1u1 + λ2u2 ∈ kerα. So kerα is a subspace.Similarly, if α(u1), α(u2) ∈ imα, then λα(u1)+λ2α(u2) = α(λ1u1 +λ2u2) ∈

imα. So imα is a subspace.Now by the first isomorphism theorem of groups, α is a well-defined isomor-

phism of groups. So it remains to show that α is a linear map. Indeed, wehave

α(λ(u + kerα)) = α(λu) = λα(u) = λ(α(u + kerα)).

So α is a linear map.

21


Definition (Rank and nullity). If α : U → V is a linear map between finite-dimensional vector spaces over F (in fact we just need U to be finite-dimensional),the rank of α is the number r(α) = dim imα. The nullity of α is the numbern(α) = dim kerα.

Corollary (Rank-nullity theorem). If α : U → V is a linear map and U isfinite-dimensional, then

r(α) + n(α) = dimU.

Proof. By the first isomorphism theorem, we know that U/ kerα ∼= imα. So wehave

dim imα = dim(U/ kerα) = dimU − dim kerα.

So the result follows.

We can also prove this result without the first isomorphism theorem, and saya bit more in the meantime.

Proposition. If α : U → V is a linear map between finite-dimensional vectorspaces over F, then there are bases (e1, · · · , em) for U and (f1, · · · , fn) for Vsuch that α is represented by the matrix(

Ir 00 0

),

where r = r(α) and Ir is the r × r identity matrix.In particular, r(α) + n(α) = dimU .

Proof. Let ek+1, · · · , em be a basis for the kernel of α. Then we can extend thisto a basis of the (e1, · · · , em).

Let fi = α(ei) for 1 ≤ i ≤ k. We now show that (f1, · · · , fk) is a basis forimα (and thus k = r). We first show that it spans. Suppose v ∈ imα. Then wehave

v = α

(m∑i=1

λiei

)for some λi ∈ F. By linearity, we can write this as

v =

m∑i=1

λiα(ei) =

k∑i=1

λifi + 0.

So v ∈ 〈f1, · · · , fk〉.To show linear dependence, suppose that

k∑i=1

µifi = 0.

So we have

α

(k∑i=1

µiei

)= 0.

22


So∑ki=1 µiei ∈ kerα. Since (ek+1, · · · , em) is a basis for kerα, we can write

k∑i=1

µiei =

m∑i=k+1

µiei

for some µi (i = k + 1, · · · ,m). Since (e1, · · · , em) is a basis, we must haveµi = 0 for all i. So they are linearly independent.

Now we extend (f1, · · · , fr) to a basis for V , and

α(ei) =

{fi 1 ≤ i ≤ k0 k + 1 ≤ i ≤ m

.

Example. Let

W = {x ∈ R5 : x1 + x2 + x3 = 0 = x3 − x4 − x5}.

What is dimW? Well, it clearly is 3, but how can we prove it?We can consider the map α : R5 → R2 given byx1...

x5

7→ (x1 + x2 + x5x3 − x4 − x5.

)

Then kerα = W . So dimW = 5 − r(α). We know that α(1, 0, 0, 0, 0) = (1, 0)and α(0, 0, 1, 0, 0) = (0, 1). So r(α) = dim imα = 2. So dimW = 3.

More generally, the rank-nullity theorem gives that m linear equations in nhave a space of solutions of dimension at least n−m.

Example. Suppose that U and W are subspaces of V , all of which are finite-dimensional vector spaces of F. We let

α : U ⊕W → V

(u,w) 7→ u + w,

where the ⊕ is the external direct sum. Then imα = U +W and

kerα = {(u,−u) : u ∈ U ∩W} ∼= dim(U ∩W ).

Then we have

dimU + dimW = dim(U ⊕W ) = r(α) + n(α) = dim(U +W ) + dim(U ∩W ).

This is a result we’ve previously obtained through fiddling with basis and horriblestuff.

Corollary. Suppose α : U → V is a linear map between vector spaces over Fboth of dimension n <∞. Then the following are equivalent

(i) α is injective;

(ii) α is surjective;

23


(iii) α is an isomorphism.

Proof. It is clear that, (iii) implies (i) and (ii), and (i) and (ii) together implies(iii). So it suffices to show that (i) and (ii) are equivalent.

Note that α is injective iff n(α) = 0, and α is surjective iff r(α) = dimV = n.By the rank-nullity theorem, n(α) + r(α) = n. So the result follows immediately.

Lemma. Let A ∈ Mn,n(F) = Mn(F) be a square matrix. The following areequivalent

(i) There exists B ∈Mn(F) such that BA = In.

(ii) There exists C ∈Mn(F) such that AC = In.

If these hold, then B = C. We call A invertible or non-singular, and writeA−1 = B = C.

Proof. Let α, β, γ, ι : Fn → Fn be the linear maps represented by matricesA,B,C, In respectively with respect to the standard basis.

We note that (i) is equivalent to saying that there exists β such that βα = ι.This is true iff α is injective, which is true iff α is an isomorphism, which is trueiff α has an inverse α−1.

Similarly, (ii) is equivalent to saying that there exists γ such that αγ = ι.This is true iff α is injective, which is true iff α is isomorphism, which is true iffα has an inverse α−1.

So these are the same things, and we have β = α−1 = γ.

2.4 Change of basis

Suppose we have a linear map α : U → V . Given a basis {ei} for U , and a basis{fi} for V , we can obtain a matrix A.

U V

Fm Fn

α

A

(ei) (fi)

We now want to consider what happens when we have two different basis {ui}and {ei} of U . These will then give rise to two different maps from Fm to ourspace U , and the two basis can be related by a change-of-basis map P . We canput them in the following diagram:

U U

Fm Fm

ιU

P

(ui) (ei)

24


where ιU is the identity map. If we perform a change of basis for both U and V ,we can stitch the diagrams together as

U U V V

Fm Fm Fn Fn

ιU α ιV

P

(ui)

B

A

(ei) (fi)

Q

(vi)

Then if we want a matrix representing the map U → V with respect to bases(ui) and (vi), we can write it as the composition

B = Q−1AP.

We can write this as a theorem:

Theorem. Suppose that (e1, · · · , em) and (u1, · · · ,um) are basis for a finite-dimensional vector space U over F, and (f1, · · · , fn) and (v1, · · · ,vn) are basisof a finite-dimensional vector space V over F.

Let α : U → V be a linear map represented by a matrix A with respect to(ei) and (fi) and by B with respect to (ui) and (vi). Then

B = Q−1AP,

where P and Q are given by

ui =

m∑k=1

Pkiek, vi =

n∑k=1

Qkifk.

Note that one can view P as the matrix representing the identity map iUfrom U with basis (ui) to U with basis (ei), and similarly for Q. So both areinvertible.

Proof. On the one hand, we have

α(ui) =

n∑j=1

Bjivj =∑j

∑`

BjiQ`jf` =∑`

[QB]ìf`.

On the other hand, we can write

α(ui) = α

(m∑k=1

Pkiek

)=

m∑k=1

Pki∑`

A`kf` =∑`

[AP ]ìf`.

Since the f` are linearly independent, we conclude that

QB = AP.

Since Q is invertible, we get B = Q−1AP .

Definition (Equivalent matrices). We say A,B ∈ Matn,m(F) are equivalent ifthere are invertible matrices P ∈ Matm(F), Q ∈ Matn(F) such that B = Q−1AP .

25


Since GLK(F) = {A ∈ Matk(F) : A is invertible} is a group, for each k ≥ 1,this is indeed an equivalence relation. The equivalence classes are orbits underthe action of GLm(F)×GLn(F), given by

GLm(F)×GLn(F)×Matn,m(F)→ Mat(F)

(P,Q,A) 7→ QAP−1.

Two matrices are equivalent if and only if they represent the same linear mapwith respect to different basis.

Corollary. If A ∈ Matn,m(F), then there exists invertible matrices P ∈GLm(F), Q ∈ GLn(F) so that

Q−1AP =

(Ir 00 0

)for some 0 ≤ r ≤ min(m,n).

This is just a rephrasing of the proposition we had last time. But this tellsus there are min(m,n) + 1 orbits of the action above parametrized by r.

Definition (Column and row rank). If A ∈ Matn,m(F), then

– The column rank of A, written r(A), is the dimension of the subspace ofFn spanned by the columns of A.

– The row rank of A, written r(A), is the dimension of the subspace of Fmspanned by the rows of A. Alternatively, it is the column rank of AT .

There is no a priori reason why these should be equal to each other. However,it turns out they are always equal.

Note that if α : Fm → Fn is the linear map represented by A (with respect tothe standard basis), then r(A) = r(α), i.e. the column rank is the rank. Moreover,since the rank of a map is independent of the basis, equivalent matrices havethe same column rank.

Theorem. If A ∈ Matn,m(F), then r(A) = r(AT ), i.e. the row rank is equivalentto the column rank.

Proof. We know that there are some invertible P,Q such that

Q−1AP =

(Ir 00 0

),

where r = r(A). We can transpose this whole equation to obtain

(Q−1AP )T = PTAT (QT )−1 =

(Ir 00 0

)So r(AT ) = r.

26


2.5 Elementary matrix operations

We are now going to re-prove our corollary that we can find P,Q such that

Q−1AP =

(Ir 00 0

)in a way that involves matrices only. This will give a

concrete way to find P and Q, but is less elegant.To do so, we need to introduce elementary matrices.

Definition (Elementary matrices). We call the following matrices of GLn(F)elementary matrices:

Snij =

1. . .

10 1

1. . .

11 0

1. . .

1

This is called a reflection, where the rows we changed are the ith and jth row.

Enij(λ) =

1. . .

1 λ. . .

1. . .

1

This is called a shear, where λ appears at the i, jth entry.

Tni (λ) =

1. . .

1λ

1. . .

1

This is called a shear, where λ 6= 0 appears at the ith column.

Observe that if A is a m× n matrix, then

(i) ASnij is obtained from A by swapping the i and j columns.

27


(ii) AEnIj(λ) is obtained by adding λ× column i to column j.

(iii) ATni (λ) is obtained from A by rescaling the ith column by λ.

Multiplying on the left instead of the right would result in the same operationsperformed on the rows instead of the columns.

Proposition. If A ∈ Matn,m(F), then there exists invertible matrices P ∈GLm(F), Q ∈ GLn(F) so that

Q−1AP =

(Ir 00 0

)for some 0 ≤ r ≤ min(m,n).

We are going to start with A, and then apply these operations to get it intothis form.

Proof. We claim that there are elementary matrices Em1 , · · · , Ema and Fn1 , · · · , Fnb(these E are not necessarily the shears, but any elementary matrix) such that

Em1 · · ·Ema AFn1 · · ·Fnb =

(Ir 00 0

)This suffices since the Emi ∈ GLM (F) and Fnj ∈ GLn(F). Moreover, to prove theclaim, it suffices to find a sequence of elementary row and column operationsreducing A to this form.

If A = 0, then done. If not, there is some i, j such that Aij 6= 0. By swappingrow 1 and row i; and then column 1 and column j, we can assume A11 6= 0. Byrescaling row 1 by 1

A11, we can further assume A11 = 1.

Now we can add −A1j times column 1 to column j for each j 6= 1, and thenadd −Ai1 times row 1 to row i 6= 1. Then we now have

A =

1 0 · · · 00... B0

Now B is smaller than A. So by induction on the size of A, we can reduce B toa matrix of the required form, so done.

It is an exercise to show that the row and column operations do not changethe row rank or column rank, and deduce that they are equal.

28

3 Duality IB Linear Algebra

3 Duality

Duality is a principle we will find throughout mathematics. For example, in IBOptimisation, we considered the dual problems of linear programs. Here we willlook for the dual of vector spaces. In general, we try to look at our question in a“mirror” and hope that the mirror problem is easier to solve than the originalmirror.

At first, the definition of the dual might see a bit arbitrary and weird. Wewill try to motivate it using what we will call annihilators, but they are muchmore useful than just for these. Despite their usefulness, though, they can beconfusing to work with at times, since the dual space of a vector space V will beconstructed by considering linear maps on V , and when we work with maps ondual spaces, things explode.

3.1 Dual space

To specify a subspace of Fn, we can write down linear equations that its elements

satisfy. For example, if we have the subspace U = 〈

121

〉 ⊆ F3, we can specify

this by saying

x1x2x3

∈ U if and only if

x1 − x3 = 0

2x1 − x2 = 0.

However, characterizing a space in terms of equations involves picking someparticular equations out of the many possibilities. In general, we do not likemaking arbitrary choices. Hence the solution is to consider all possible suchequations. We will show that these form a subspace in some space.

We can interpret these equations in terms of linear maps Fn → F. For

example x1 − x3 = 0 if and only if

x1x2x3

∈ ker θ, where θ : F3 → F is defined

by

x1x2x3

7→ x1 − x3.

This works well with the vector space operations. If θ1, θ2 : Fn → F vanish onsome subspace of Fn, and λ, µ ∈ F, then λθ1 +µθ2 also vanishes on the subspace.So the set of all maps Fn → F that vanishes on U forms a vector space.

To formalize this notion, we introduce dual spaces.

Definition (Dual space). Let V be a vector space over F. The dual of V isdefined as

V ∗ = L(V,F) = {θ : V → F : θ linear}.

Elements of V ∗ are called linear functionals or linear forms.

By convention, we use Roman letters for elements in V , and Greek lettersfor elements in V ∗.

29


Example.

– If V = R3 and θ : V → R that sends

x1x2x3

7→ x1 − x3, then θ ∈ V ∗.

– Let V = FX . Then for any fixed x, θ : V → F defined by f 7→ f(x) is inV ∗.

– Let V = C([0, 1],R). Then f 7→∫ 1

0f(t) dt ∈ V ∗.

– The trace tr : Mn(F)→ F defined by A 7→∑ni=1Aii is in Mn(F)∗.

It turns out it is rather easy to specify how the dual space looks like, at leastin the case where V is finite dimensional.

Lemma. If V is a finite-dimensional vector space over f with basis (e1, · · · , en),then there is a basis (ε1, · · · , εn) for V ∗ (called the dual basis to (e1, · · · , en))such that

εi(ej) = δij .

Proof. Since linear maps are characterized by their values on a basis, there existsunique choices for ε1, · · · , εn ∈ V ∗. Now we show that (ε1, · · · , εn) is a basis.

Suppose θ ∈ V ∗. We show that we can write it uniquely as a combination ofε1, · · · , εn. We have θ =

∑ni=1 λiεi if and only if θ(ej) =

∑ni=1 εi(ej) (for all j)

if and only if λj = θ(ej). So we have uniqueness and existence.

Corollary. If V is finite dimensional, then dimV = dimV ∗.

When V is not finite dimensional, this need not be true. However, weknow that the dimension of V ∗ is at least as big as that of V , since the abovegives a set of dimV many independent vectors in V ∗. In fact for any infinitedimensional vector space, dimV ∗ is strictly larger than dimV , if we manage todefine dimensions for infinite-dimensional vector spaces.

It helps to come up with a more concrete example of how dual spaces looklike. Consider the vector space Fn, where we treat each element as a columnvector (with respect to the standard basis). Then we can regard elements of V ∗

as just row vectors (a1, · · · , an) =∑nj=1 ajεj with respect to the dual basis. We

have

(∑ajεj

)(∑xi

ei

)=∑i,j

ajxiδij =

n∑i=1

aixi =(a1 · · · an

)x1...xn

.

This is exactly what we want.Now what happens when we change basis? How will the dual basis change?

Proposition. Let V be a finite-dimensional vector space over F with bases(e1, · · · , en) and (f1, · · · , fn), and that P is the change of basis matrix so that

fi =

n∑k=1

Pkiek.

30


Let (ε1, · · · , εn) and (η1, · · · , ηn) be the corresponding dual bases so that

εi(ej) = δij = ηi(fj).

Then the change of basis matrix from (ε1, · · · , εn) to (η1, · · · , ηn) is (P−1)T , i.e.

εi =

n∑`=1

PTì η`.

Proof. For convenience, write Q = P−1 so that

ej =

n∑k=1

Qkjfk.

So we can compute( ∞∑`=1

Pi`η`

)(ej) =

( ∞∑`=1

Pi`η`

)(n∑k=1

Qkjfk

)=∑k,`

Pi`δ`kQkj

=∑k,`

Pi`Q`j

= [PQ]ij

= δij .

So εi =∑n`=1 P

Tì η`.

Now we’ll return to our original motivation, and think how we can definesubspaces of V ∗ in terms of subspaces of V , and vice versa.

Definition (Annihilator). Let U ⊆ V . Then the annihilator of U is

U0 = {θ ∈ V ∗ : θ(u) = 0,∀u ∈ U}.

If W ⊆ V ∗, then the annihilator of W is

W 0 = {v ∈ V : θ(v) = 0,∀θ ∈W}.

One might object that W 0 should be a subset of V ∗∗ and not V . We willlater show that there is a canonical isomorphism between V ∗∗ and V , and thiswill all make sense.

Example. Consider R3 with standard basis e1, e2, e3; (R3)∗ with dual basis(ε1, ε2, ε3). If U = 〈e1 + 2e2 + e3〉 and W = 〈ε1 − ε3, 2ε1 − ε2〉, then U0 = Wand W 0 = U .

We see that the dimension of U and U0 add up to three, which is thedimension of R3. This is typical.

Proposition. Let V be a vector space over F and U a subspace. Then

dimU + dimU0 = dimV.

31


We are going to prove this in many ways.

Proof. Let (e1, · · · , ek) be a basis for U and extend to (e1, · · · , en) a basis forV . Consider the dual basis for V ∗, say (ε1, · · · , εn). Then we will show that

U0 = 〈εk+1, · · · , εn〉.

So dimU0 = n− k as required. This is easy to prove — if j > k, then εj(ei) = 0for all i ≤ k. So εk+1, · · · , εn ∈ U0. On the other hand, suppose θ ∈ U0. Thenwe can write

θ =

n∑j=1

λjεj .

But then 0 = θ(ei) = λi for i ≤ k. So done.

Proof. Consider the restriction map V ∗ → U∗, given by θ 7→ θ|U . This isobviously linear. Since every linear map U → F can be extended to V → F, thisis a surjection. Moreover, the kernel is U0. So by rank-nullity theorem,

dimV ∗ = dimU0 + dimU∗.

Since dimV ∗ = dimV and dimU∗ = dimU , we’re done.

Proof. We can show that U0 ' (V/U)∗, and then deduce the result. Details areleft as an exercise.

3.2 Dual maps

Since linear algebra is the study of vector spaces and linear maps between them,after dualizing vector spaces, we should be able to dualize linear maps as well.If we have a map α : V → W , then after dualizing, the map will go the otherdirection, i.e. α∗ : W ∗ → V ∗. This is a characteristic common to most dualizationprocesses in mathematics.

Definition (Dual map). Let V,W be vector spaces over F and α : V → W ∈L(V,W ). The dual map to α, written α∗ : W ∗ → V ∗ is given by θ 7→ θ ◦ α.Since the composite of linear maps is linear, α∗(θ) ∈ V ∗. So this is a genuinemap.

Proposition. Let α ∈ L(V,W ) be a linear map. Then α∗ ∈ L(W ∗, V ∗) is alinear map.

This is not the same as what we remarked at the end of the definition ofthe dual map. What we remarked was that given any θ, α∗(θ) is a linear map.What we want to show here is that α∗ itself as a map W ∗ → V ∗ is linear.

Proof. Let λ, µ ∈ F and θ1, θ2 ∈W ∗. We want to show

α∗(λθ1 + µθ2) = λα∗(θ1) + µα∗(θ2).

To show this, we show that for every v ∈ V , the left and right give the sameresult. We have

α∗(λθ1 + µθ2)(v) = (λθ1 + µθ2)(αv)

= λθ1(α(v)) + µθ2(α(v))

= (λα∗(θ1) + µα∗(θ2))(v).

So α∗ ∈ L(W ∗, V ∗).

32


What happens to the matrices when we take the dual map? The answer isthat we get the transpose.

Proposition. Let V,W be finite-dimensional vector spaces over F and α : V →W be a linear map. Let (e1, · · · , en) be a basis for V and (f1, · · · , fm) be a basisfor W ; (ε1, · · · , εn) and (η1, · · · , ηm) the corresponding dual bases.

Suppose α is represented by A with respect to (ei) and (fi) for V and W .Then α∗ is represented by AT with respect to the corresponding dual bases.

Proof. We are given that

α(ei) =

m∑k=1

Akifk.

We must compute α∗(ηi). To do so, we evaluate it at ej . We have

α∗(ηi)(ej) = ηi(α(ej)) = ηi

(m∑k=1

Akjfk

)=

m∑k=1

Akjδik = Aij .

We can also write this as

α∗(ηi)(ej) =

n∑k=1

Aikεk(ej).

Since this is true for all j, we have

α∗(ηi)

n∑k=1

Aikεk =

n∑k=1

ATkiεk.

So done.

Note that if α : U → V and β : V →W , θ ∈W ∗, then

(βα)∗(θ) = θβα = α∗(θβ) = α∗(β∗(θ)).

So we have (βα)∗ = α∗β∗. This is obviously true for the finite-dimensional case,since that’s how transposes of matrices work.

Similarly, if α, β : U → V , then (λα+ µβ)∗ = λα∗ + µβ∗.What happens when we change basis? If B = Q−1AP for some invertible P

and Q, then

BT = (Q−1AP )T = PTAT (Q−1)T = ((P−1)T )−1AT (Q−1)T .

So in the dual space, we conjugate by the dual of the change-of-basis matrices.As we said, we can use dualization to translate problems about a vector space

to its dual. The following lemma gives us some good tools to do so:

Lemma. Let α ∈ L(V,W ) with V,W finite dimensional vector spaces over F.Then

(i) kerα∗ = (imα)0.

(ii) r(α) = r(α∗) (which is another proof that row rank is equal to columnrank).

33


(iii) imα∗ = (kerα)0.

At first sight, (i) and (iii) look quite similar. However, (i) is almost trivial toprove, but (iii) is rather hard.

Proof.

(i) If θ ∈W ∗, then

θ ∈ kerα∗ ⇔ α∗(θ) = 0

⇔ (∀v ∈ V ) θα(v) = 0

⇔ (∀w ∈ imα) θ(w) = 0

⇔ θ ∈ (imα)0.

(ii) As imα ≤W , we’ve seen that

dim imα+ dim(imα)0 = dimW.

Using (i), we seen(α∗) = dim(imα)0.

Sor(α) + n(α∗) = dimW = dimW ∗.

By the rank-nullity theorem, we have r(α) = r(α∗).

(iii) The proof in (i) doesn’t quite work here. We can only show that oneincludes the other. To draw the conclusion, we will show that the twospaces have the dimensions, and hence must be equal.

Let θ ∈ imα∗. Then θ = φα for some φ ∈W ∗. If v ∈ kerα, then

θ(v) = φ(α(v)) = φ(0) = 0.

So imα∗ ⊆ (kerα)0.

But we knowdim(kerα)0 + dim kerα = dimV,

So we have

dim(kerα)0 = dimV − n(α) = r(α) = r(α∗) = dim imα∗.

Hence we must have imα∗ = (kerα)0.

Not only do we want to get from V to V ∗, we want to get back from V ∗ to V .We can take the dual of V ∗ to get a V ∗∗. We already know that V ∗∗ is isomorphicto V , since V ∗ is isomorphic to V already. However, the isomorphism betweenV ∗ and V are not “natural”. To define such an isomorphism, we needed to picka basis for V and consider a dual basis. If we picked a different basis, we wouldget a different isomorphism. There is no natural, canonical, uniquely-definedisomorphism between V and V ∗.

However, this is not the case when we want to construct an isomorphismV → V ∗∗. The construction of this isomorphism is obvious once we think hardwhat V ∗∗ actually means. Unwrapping the definition, we know V ∗∗ = L(V ∗,F).

34


Our isomorphism has to produce something in V ∗∗ given any v ∈ V . This isequivalent to saying given any v ∈ V and a function θ ∈ V ∗, produce somethingin F.

This is easy, by definition θ ∈ V ∗ is just a linear map V → F. So given vand θ, we just return θ(v). We now just have to show that this is linear and isbijective.

Lemma. Let V be a vector space over F. Then there is a linear map ev : V →(V ∗)∗ given by

ev(v)(θ) = θ(v).

We call this the evaluation map.

We call this a “canonical” map since this does not require picking a particularbasis of the vector spaces. It is in some sense a “natural” map.

Proof. We first show that ev(v) ∈ V ∗∗ for all v ∈ V , i.e. ev(v) is linear for anyv. For any λ, µ ∈ F, θ1, θ2 ∈ V ∗, then for v ∈ V , we have

ev(v)(λθ1 + µθ2) = (λθ1 + µθ2)(v)

= λθ1(v) + µθ2(v)

= λ ev(v)(θ1) + µ ev(v)(θ2).

So done. Now we show that ev itself is linear. Let λ, µ ∈ F, v1,v2 ∈ V . Wewant to show

ev(λv1 + µv2) = λ ev(v1) + µ ev(v2).

To show these are equal, pick θ ∈ V ∗. Then

ev(λv1 + µv2)(θ) = θ(λv1 + µv2)

= λθ(v1) + µθ(v2)

= λ ev(v1)(θ) + µ ev(v2)(θ)

= (λ ev(v1) + µ ev(v2))(θ).

So done.

In the special case where V is finite-dimensional, this is an isomorphism.

Lemma. If V is finite-dimensional, then ev : V → V ∗∗ is an isomorphism.

This is very false for infinite dimensional spaces. In fact, this is true onlyfor finite-dimensional vector spaces (assuming the axiom of choice), and some(weird) people use this as the definition of finite-dimensional vector spaces.

Proof. We first show it is injective. Suppose ev(v) = 0 for some v ∈ V . Thenθ(v) = ev(v)(θ) = 0 for all θ ∈ V ∗. So dim〈v〉0 = dimV ∗ = dimV . Sodim〈v〉 = 0. So v = 0. So ev is injective. Since V and V ∗∗ have the samedimension, this is also surjective. So done.

From now on, we will just pretend that V and V ∗∗ are the same thing, atleast when V is finite dimensional.

Note that this lemma does not just say that V is isomorphic to V ∗∗ (wealready know that since they have the same dimension). This says there is acompletely canonical way to choose the isomorphism.

In general, if V is infinite dimensional, then ev is injective, but not surjective.So we can think of V as a subspace of V ∗∗ in a canonical way.

35


Lemma. Let V,W be finite-dimensional vector spaces over F after identifying(V and V ∗∗) and (W and W ∗∗) by the evaluation map. Then we have

(i) If U ≤ V , then U00 = U .

(ii) If α ∈ L(V,W ), then α∗∗ = α.

Proof.

(i) Let u ∈ U . Then u(θ) = θ(u) = 0 for all θ ∈ U0. So u annihilateseverything in U0. So u ∈ U00. So U ⊆ U00. We also know that

dimU = dimV − dimU0 = dimV − (dimV − dimU00) = dimU00.

So we must have U = U00.

(ii) The proof of this is basically — the transpose of the transpose is theoriginal matrix. The only work we have to do is to show that the dual ofthe dual basis is the original basis.

Let (e1, · · · , en) be a basis for V and (f1, · · · , fm) be a basis for W , and let(ε1, · · · , εn) and (η1, · · · , ηn) be the corresponding dual basis. We knowthat

ei(εj) = δij = εj(ei), fi(ηj) = δij = ηj(fi).

So (e1, · · · , en) is dual to (ε1, · · · , εn), and similarly for f and η.

If α is represented by A, then α∗ is represented by AT . So α∗∗ is representedby (AT )T = A. So done.

Proposition. Let V be a finite-dimensional vector space F and U1, U2 aresubspaces of V . Then we have

(i) (U1 + U2)0 = U01 ∩ U0

2

(ii) (U1 ∩ U2)0 = U01 + U0

2

Proof.

(i) Suppose θ ∈ V ∗. Then

θ ∈ (U1 + U2)0 ⇔ θ(u1 + u2) = 0 for all ui ∈ Ui⇔ θ(u) = 0 for all u ∈ U1 ∪ U2

⇔ θ ∈ U01 ∩ U0

2 .

(ii) We have

(U1 ∩ U2)0 = ((U01 )0 ∩ (U0

2 )0)0 = (U01 + U0

2 )00 = U01 + U0

2 .

So done.

36

4 Bilinear forms I IB Linear Algebra

4 Bilinear forms I

So far, we have been looking at linear things only. This can get quite boring.For a change, we look at bi linear maps instead. In this chapter, we will look atbilinear forms in general. It turns out there isn’t much we can say about them,and hence this chapter is rather short. Later, in Chapter 7, we will study somespecial kinds of bilinear forms which are more interesting.

Definition (Bilinear form). Let V,W be vector spaces over F. Then a functionφ : V ×W → F is a bilinear form if it is linear in each variable, i.e. for eachv ∈ V , φ(v, · ) : W → F is linear; for each w ∈W , φ( · ,w) : V → F is linear.

Example. The map defined by

V × V ∗ → F(v, θ) 7→ θ(v) = ev(v)(θ)

is a bilinear form.

Example. Let V = W = Fn. Then the function (v,w) =∑ni=1 viwi is bilinear.

Example. If V = W = C([0, 1],R), then

(f, g) 7→∫ a

0

fg dt

is a bilinear form.

Example. Let A ∈ Matm,n(F). Then

φ : Fm × Fn → F(v,w) 7→ vTAw

is bilinear. Note that the (real) dot product is the special case of this, wheren = m and A = I.

In fact, this is the most general form of bilinear forms on finite-dimensionalvector spaces.

Definition (Matrix representing bilinear form). Let (e1, · · · , en) be a basis forV and (f1, · · · , fm) be a basis for W , and ψ : V ×W → F. Then the matrix Arepresenting ψ with respect to the basis is defined to be

Aij = ψ(ei, fj).

Note that if v =∑λiei and w =

∑µjfj , then by linearity, we get

ψ(v,w) = ψ(∑

λiei,w)

=∑i

λiψ(ei,w)

=∑i

λiψ(ei,∑

µjfj

)=∑i,j

λiµjψ(ei, fj)

= λTAµ.

37


So ψ is determined by A.We have identified linear maps with matrices, and we have identified bilinear

maps with matrices. However, you shouldn’t think linear maps are bilinear maps.They are, obviously, two different things. In fact, the matrices representingmatrices and bilinear forms transform differently when we change basis.

Proposition. Suppose (e1, · · · , en) and (v1, · · · ,vn) are basis for V such that

vi =∑

Pkiek for all i = 1, · · · , n;

and (f1, · · · , fm) and (w1, · · · ,wm) are bases for W such that

wi =∑

Q`jf` for all j = 1, · · · ,m.

Let ψ : V × W → F be a bilinear form represented by A with respect to(e1, · · · , en) and (f1, · · · , fm), and by B with respect to the bases (v1, · · · ,vn)and (w1, · · · ,wm). Then

B = PTAQ.

The difference with the transformation laws of matrices is this time we aretaking transposes, not inverses.

Proof. We have

Bij = φ(vi,wj)

= φ(∑

Pkiek,∑

Q`jf`

)=∑

PkiQ`jφ(ek, f`)

=∑k,`

PTikAk`Q`j

= (PTAQ)ij .

Note that while the transformation laws for bilinear forms and linear mapsare different, we still get that two matrices are representing the same bilinearform with respect to different bases if and only if they are equivalent, since ifB = P−1AQ, then B = ((P−1)T )TAQ.

If we are given a bilinear form ψ : V ×W → F, we immediately get two linearmaps:

ψL : V →W ∗, ψR : W → V ∗,

defined by ψL(v) = ψ(v, · ) and ψR(w) = ψ( · ,w).For example, if ψ : V × V ∗ → F, is defined by (v, θ) 7→ θ(v), then ψL : V →

V ∗∗ is the evaluation map. On the other hand, ψR : V ∗ → V ∗ is the identitymap.

Lemma. Let (ε1, · · · , εn) be a basis for V ∗ dual to (e1, · · · , en) of V ; (η1, · · · , ηn)be a basis for W ∗ dual to (f1, · · · , fn) of W .

If A represents ψ with respect to (e1, · · · , en) and (f1, · · · , fm), then A alsorepresents ψR with respect to (f1, · · · , fm) and (ε1, · · · , εn); and AT representsψL with respect to (e1, · · · , en) and (η1, · · · , ηm).

38


Proof. We just have to compute

ψL(ei)(fj) = Aij =(∑

Ai`η`

)(fj).

So we get

ψL(ei) =∑

ATìη`.

So AT represents ψL.We also have

ψR(fj)(ei) = Aij .

SoψR(fj) =

∑Akjεk.

Definition (Left and right kernel). The kernel of ψL is left kernel of ψ, whilethe kernel of ψR is the right kernel of ψ.

Then by definition, v is in the left kernel if ψ(v,w) = 0 for all w ∈W .More generally, if T ⊆ V , then we write

T⊥ = {w ∈W : ψ(t,w) = 0 for all t ∈ T}.

Similarly, if U ⊆W , then we write

⊥U = {v ∈ V : ψ(v,u) = 0 for all u ∈ U}.

In particular, V ⊥ = kerψR and ⊥W = kerψL.If we have a non-trivial left (or right) kernel, then in some sense, some

elements in V (or W ) are “useless”, and we don’t like these.

Definition (Non-degenerate bilinear form). ψ is non-degenerate if the left andright kernels are both trivial. We say ψ is degenerate otherwise.

Definition (Rank of bilinear form). If ψ : V → W is a bilinear form F on afinite-dimensional vector space V , then the rank of V is the rank of any matrixrepresenting φ. This is well-defined since r(PTAQ) = r(A) if P and Q areinvertible.

Alternatively, it is the rank of ψL (or ψR).

Lemma. Let V and W be finite-dimensional vector spaces over F with bases(e1, · · · , en) and (f1, · · · , fm) be their basis respectively.

Let ψ : V ×W → F be a bilinear form represented by A with respect to thesebases. Then φ is non-degenerate if and only if A is (square and) invertible. Inparticular, V and W have the same dimension.

We can understand this as saying if there are too many things in V (or W ),then some of them are bound to be useless.

Proof. Since ψR and ψL are represented by A and AT (in some order), they bothhave trivial kernel if and only if n(A) = n(AT ) = 0. So we need r(A) = dimVand r(AT ) = dimW . So we need dimV = dimW and A have full rank, i.e. thecorresponding linear map is bijective. So done.

39


Example. The map

F2 × F2 → F(ac

),

(bd

)7→ ad− bc

is a bilinear form. This, obviously, corresponds to the determinant of a 2-by-2matrix. We have ψ(v,w) = −ψ(w,v) for all v,w ∈ F2.

40

5 Determinants of matrices IB Linear Algebra

5 Determinants of matrices

We probably all know what the determinant is. Here we are going to give aslightly more abstract definition, and spend quite a lot of time trying motivatethis definition.

Recall that Sn is the group of permutations of {1, · · · , n}, and there is aunique group homomorphism ε : Sn → {±1} such that ε(σ) = 1 if σ can bewritten as a product of an even number of transpositions; ε(σ) = −1 if σ can bewritten as an odd number of transpositions. It is proved in IA Groups that thisis well-defined.

Definition (Determinant). Let A ∈ Matn,n(F). Its determinant is

detA =∑σ∈Sn

ε(σ)

n∏i=1

Aiσ(i).

This is a big scary definition. Hence, we will spend the first half of thechapter trying to understand what this really means, and how it behaves. Wewill eventually prove a formula that is useful for computing the determinant,which is probably how you were first exposed to the determinant.

Example. If n = 2, then S2 = {id, (1 2)}. So

detA = A11A22 −A12A21.

When n = 3, then S3 has 6 elements, and

detA = A11A22A33 +A12A23A31 +A13A21A32

−A11A23A32 −A22A31A13 −A33A12A21.

We will first prove a few easy and useful lemmas about the determinant.

Lemma. detA = detAT .

Proof.

detAT =∑σ∈Sn

ε(σ)

n∏i=1

Aσ(i)i

=∑σ∈Sn

ε(σ)

n∏j=1

Ajσ−1(j)

=∑τ∈Sn

ε(τ−1)

n∏j=1

Ajτ(j)

Since ε(τ) = ε(τ−1), we get

=∑τ∈Sn

ε(τ)

n∏j=1

Ajτ(j)

= detA.

41


Lemma. If A is an upper triangular matrix, i.e.

A =

a11 a12 · · · a1n0 a22 · · · a2n...

.... . .

...0 0 · · · ann

Then

detA =

n∏i=1

aii.

Proof. We have

detA =∑σ∈Sn

ε(σ)

n∏i=1

Aiσ(i)

But Aiσ(i) = 0 whenever i > σ(i). So

n∏i=1

Aiσ(i) = 0

if there is some i ∈ {1, · · · , n} such that i > σ(i).However, the only permutation in which i ≤ σ(i) for all i is the identity. So

the only thing that contributes in the sum is σ = id. So

detA =

n∏i=1

Aii.

To motivate this definition, we need a notion of volume. How can we definevolume on a vector space? It should be clear that the “volume” cannot beuniquely determined, since it depends on what units we are using. For example,saying the volume is “1” is meaningless unless we provide the units, e.g. cm3.So we have an axiomatic definition for what it means for something to denote a“volume”.

Definition (Volume form). A volume form on Fn is a function d : Fn×· · ·×Fn →F that is

(i) Multilinear, i.e. for all i and all v1, · · · ,vi−1,vi+1, · · · ,vn ∈ Fn, we have

d(v1, · · · ,vi−1, · ,vi+1, · · · ,vn) ∈ (Fn)∗.

(ii) Alternating, i.e. if vi = vj for some i 6= j, then

d(v1, · · · ,vn) = 0.

We should think of d(v1, · · · ,vn) as the n-dimensional volume of the paral-lelopiped spanned by v1, · · · ,vn.

We can view A ∈ Matn(F) as n-many vectors in Fn by considering its columnsA = (A(1) A(2) · · · A(n)), with A(i) ∈ Fn. Then we have

Lemma. detA is a volume form.

42


Proof. To see that det is multilinear, it is sufficient to show that each

n∏i=1

Aiσ(i)

is multilinear for all σ ∈ Sn, since linear combinations of multilinear forms aremultilinear. But each such product is contains precisely one entry from eachcolumn, and so is multilinear.

To show it is alternating, suppose now there are some k, ` distinct such thatA(k) = A(`). We let τ be the transposition (k `). By Lagrange’s theorem, wecan write

Sn = An q τAn,

where An = ker ε and q is the disjoint union. We also know that

∑σ∈An

n∏i=1

Aiσ(i) =∑σ∈An

n∏i=1

Ai,τσ(i),

since if σ(i) is not k or l, then τ does nothing; if σ(i) is k or l, then τ just swapsthem around, but A(k) = A(l). So we get

∑σ∈An

n∏i=1

Aiσ(i) =∑

σ′∈τAn

n∏i=1

Aiσ′(i),

But we know thatdetA = LHS− RHS = 0.

So done.

We have shown that determinants are volume forms, but is this the onlyvolume form? Well obviously not, since 2 detA is also a valid volume form.However, in some sense, all volume forms are “derived” from the determinant.Before we show that, we need the following

Lemma. Let d be a volume form on Fn. Then swapping two entries changesthe sign, i.e.

d(v1, · · · ,vi, · · · ,vj , · · · ,vn) = −d(v1, · · · ,vj , · · · ,vi, · · · ,vn).

Proof. By linearity, we have

0 = d(v1, · · · ,vi + vj , · · · ,vi + vj , · · · ,vn)

= d(v1, · · · ,vi, · · · ,vi, · · · ,vn)

+ d(v1, · · · ,vi, · · · ,vj , · · · ,vn)

+ d(v1, · · · ,vj , · · · ,vi, · · · ,vn)

+ d(v1, · · · ,vj , · · · ,vj , · · · ,vn)

= d(v1, · · · ,vi, · · · ,vj , · · · ,vn)

+ d(v1, · · · ,vj , · · · ,vi, · · · ,vn).

So done.

43


Corollary. If σ ∈ Sn, then

d(vσ(1), · · · ,vσ(n)) = ε(σ)d(v1, · · · ,vn)

for any vi ∈ Fn.

Theorem. Let d be any volume form on Fn, and let A = (A(1) · · · A(n)) ∈Matn(F). Then

d(A(1), · · · , A(n)) = (detA)d(e1, · · · , en),

where {e1, · · · , en} is the standard basis.

Proof. We can compute

d(A(1), · · · , A(n)) = d

(n∑i=1

Ai1ei, A(2), · · · , A(n)

)

=

n∑i=1

Ai1d(ei, A(2), · · · , A(n))

=

n∑i,j=1

Ai1Aj2d(ei, ej , A(3), · · · , A(n))

=∑

i1,··· ,in

d(ei1 , · · · , ein)

n∏j=1

Aijj .

We know that lots of these are zero, since if ik = ij for some k, j, then the termis zero. So we are just summing over distinct tuples, i.e. when there is some σsuch that ij = σ(j). So we get

d(A(1), · · · , A(n)) =∑σ∈Sn

d(eσ(1), · · · , eσ(n))n∏j=1

Aσ(j)j .

However, by our corollary up there, this is just

d(A(1), · · · , A(n)) =∑σ∈Sn

ε(σ)d(e1, · · · , en)

n∏j=1

Aσ(j)j = (detA)d(e1, · · · , en).

So done.

We can rewrite the formula as

d(Ae1, · · · , Aen) = (detA)d(e1, · · · , en).

It is not hard to see that the same proof gives for any v1, · · · ,vn, we have

d(Av1, · · · , Avn) = (detA)d(v1, · · · ,vn).

So we know that detA is the volume rescaling factor of an arbitrary parallelopiped,and this is true for any volume form d.

Theorem. Let A,B ∈ Matn(F). Then det(AB) = det(A) det(B).

44


Proof. Let d be a non-zero volume form on Fn (e.g. the “determinant”). Thenwe can compute

d(ABe1, · · · , ABen) = (detAB)d(e1, · · · , en),

but we also have

d(ABe1, · · · , ABen) = (detA)d(Be1, · · · , Ben) = (detA)(detB)d(e1, · · · , en).

Since d is non-zero, we must have detAB = detAdetB.

Corollary. If A ∈ Matn(F) is invertible, then detA 6= 0. In fact, when A isinvertible, then det(A−1) = (detA)−1.

Proof. We have

1 = det I = det(AA−1) = detAdetA−1.

So done.

Definition (Singular matrices). A matrix A is singular if detA = 0. Otherwise,it is non-singular.

We have just shown that if detA = 0, then A is not invertible. Is the conversetrue? If detA 6= 0, then can we conclude that A is invertible? The answer isyes. We are now going to prove it in an abstract and clean way. We will laterprove this fact again by constructing an explicit formula for the inverse, whichinvolves dividing by the determinant. So if the determinant is non-zero, then weknow an inverse exists.

Theorem. Let A ∈ Matn(F). Then the following are equivalent:

(i) A is invertible.

(ii) detA 6= 0.

(iii) r(A) = n.

Proof. We have proved that (i)⇒ (ii) above, and the rank-nullity theorem implies(iii) ⇒ (i). We will prove (ii) ⇒ (iii). In fact we will show the contrapositive.Suppose r(A) < n. By rank-nullity theorem, n(A) > 0. So there is some

x =

λ1...λn

such that Ax = 0. Suppose λk 6= 0. We define B as follows:

B =

1 λ1. . .

...1 λk−1

λkλk+1 1...

. . .

λn 1

So AB has the kth column identically zero. So det(AB) = 0. So it is sufficientto prove that det(B) 6= 0. But detB = λk 6= 0. So done.

45


We are now going to come up with an alternative formula for the determinant(which is probably the one you are familiar with). To do so, we introduce thefollowing notation:

Notation. Write Aij for the matrix obtained from A by deleting the ith rowand jth column.

Lemma. Let A ∈ Matn(F). Then

(i) We can expand detA along the jth column by

detA =

n∑i=1

(−1)i+jAij det Aij .

(ii) We can expand detA along the ith row by

detA =

n∑j=1

(−1)i+jAij det Aij .

We could prove this directly from the definition, but that is messy and scary,so let’s use volume forms instead.

Proof. Since detA = detAT , (i) and (ii) are equivalent. So it suffices to provejust one of them. We have

detA = d(A(1), · · · , A(n)),

where d is the volume form induced by the determinant. Then we can write as

detA = d

(A(1), · · · ,

n∑i=1

Aijei, · · · , A(n)

)

=

n∑i=1

Aijd(A(1), · · · , ei, · · · , A(n))

The volume form on the right is the determinant of a matrix with the jth columnreplaced with ei. We can move our columns around so that our matrix becomes

B =

(Aij 0stuff 1

)We get that detB = det Aij , since the only permutations that give a non-zerosum are those that send n to n. In the row and column swapping, we have maden− j column transpositions and n− i row transpositions. So we have

detA =

n∑i=1

Aij(−1)n−j(−1)n−i detB

=

n∑i=1

Aij(−1)i+j det Aij .

This is not only useful for computing determinants, but also computinginverses.

46


Definition (Adjugate matrix). Let A ∈ Matn(F). The adjugate matrix of A,written adjA, is the n× n matrix such that (adjA)ij = (−1)i+j det Aji.

The relevance is the following result:

Theorem. If A ∈ Matn(F), then A(adjA) = (detA)In = (adjA)A. In particu-lar, if detA 6= 0, then

A−1 =1

detAadjA.

Note that this is not an efficient way to compute the inverse.

Proof. We compute

[(adjA)A]jk =

n∑i=1

(adjA)jiAik =

n∑i=1

(−1)i+j det AijAik. (∗)

So if j = k, then [(adjA)A]jk = detA by the lemma.Otherwise, if j 6= k, consider the matrix B obtained from A by replacing the

jth column by the kth column. Then the right hand side of (∗) is just detB bythe lemma. But we know that if two columns are the same, the determinant iszero. So the right hand side of (∗) is zero. So

[(adjA)A]jk = detAδjk

The calculation for [A adjA] = (detA)In can be done in a similar manner, or byconsidering (A adjA)T = (adjA)TAT = (adj(AT ))AT = (detA)In.

Note that the coefficients of (adjA) are just given by polynomials in theentries of A, and so is the determinant. So if A is invertible, then its inverse isgiven by a rational function (i.e. ratio of two polynomials) in the entries of A.

This is very useful theoretically, but not computationally, since the polyno-mials are very large. There are better ways computationally, such as Gaussianelimination.

We’ll end with a useful tricks to compute the determinant.

Lemma. Let A,B be square matrices. Then for any C, we have

det

(A C0 B

)= (detA)(detB).

Proof. Suppose A ∈ Matk(F), and B ∈ Mat`(F), so C ∈ Matk,`(F). Let

X =

(A C0 B

).

Then by definition, we have

detX =∑

σ∈Sk+`

ε(σ)

k+∏i=1

Xiσ(i).

If j ≤ k and i > k, then Xij = 0. We only want to sum over permutations σ suchthat σ(i) > k if i > k. So we are permuting the last j things among themselves,and hence the first k things among themselves. So we can decompose this into

47


σ = σ1σ2, where σ1 is a permutation of {1, · · · , k} and fixes the remaining things,while σ2 fixes {1, · · · , k}, and permutes the remaining. Then

detX =∑

σ=σ1σ2

ε(σ1σ2)

k∏i=1

Xiσ1(i)

∏j=1

Xk+j σ2(k+j)

=

( ∑σ1∈Sk

ε(σ1)

k∏i=1

Aiσ1(i)

) ∑σ2∈S`

ε(σ2)∏j=1

Bjσ2(j)

= (detA)(detB)

Corollary.

det

A1 stuff

A2

. . .

0 An

=

n∏i=1

detAi

48

6 Endomorphisms IB Linear Algebra

6 Endomorphisms

Endomorphisms are linear maps from a vector space V to itself. One mightwonder — why would we want to study these linear maps in particular, whenwe can just work with arbitrary linear maps from any space to any other space?

When we work with arbitrary linear maps, we are free to choose any basisfor the domain, and any basis for the co-domain, since it doesn’t make sense torequire they have the “same” basis. Then we proved that by choosing the rightbases, we can put matrices into a nice form with only 1’s in the diagonal.

However, when working with endomorphisms, we can require ourselves to usethe same basis for the domain and co-domain, and there is much more we cansay. One major objective is to classify all matrices up to similarity, where twomatrices are similar if they represent the same endomorphism under differentbases.

6.1 Invariants

Definition. If V is a (finite-dimensional) vector space over F. An endomorphismof V is a linear map α : V → V . We write End(V ) for the F-vector space of allsuch linear maps, and I for the identity map V → V .

When we think about matrices representing an endomorphism of V , we’lluse the same basis for the domain and the range. We are going to study someproperties of these endomorphisms that are not dependent on the basis we pick,known as invariants.

Lemma. Suppose (e1, · · · , en) and (f1, · · · , fn) are bases for V and α ∈ End(V ).If A represents α with respect to (e1, · · · , en) and B represents α with respectto (f1, · · · , fn), then

B = P−1AP,

where P is given by

fi =

n∑j=1

Pjiej .

Proof. This is merely a special case of an earlier more general result for arbitrarymaps and spaces.

Definition (Similar matrices). We say matrices A and B are similar or conjugateif there is some P invertible such that B = P−1AP .

Recall that GLn(F), the group of invertible n× n matrices. GLn(F) acts onMatn(F) by conjugation:

(P,A) 7→ PAP−1.

We are conjugating it this way so that the associativity axiom holds (otherwisewe get a right action instead of a left action). Then A and B are similar iff theyare in the same orbit. Since orbits always partition the set, this is an equivalencerelation.

Our main goal is to classify the orbits, i.e. find a “nice” representative foreach orbit.

49


Our initial strategy is to identify basis-independent invariants for endomor-phisms. For example, we will show that the rank, trace, determinant andcharacteristic polynomial are all such invariants.

Recall that the trace of a matrix A ∈ Matn(F) is the sum of the diagonalelements:

Definition (Trace). The trace of a matrix of A ∈ Matn(F) is defined by

trA =

n∑i=1

Aii.

We want to show that the trace is an invariant. In fact, we will show astronger statement (as well as the corresponding statement for determinants):

Lemma.

(i) If A ∈ Matm,n(F) and B ∈ Matn,m(F), then

trAB = trBA.

(ii) If A,B ∈ Matn(F) are similar, then trA = trB.

(iii) If A,B ∈ Matn(F) are similar, then detA = detB.

Proof.

(i) We have

trAB =

m∑i=1

(AB)ii =

m∑i=1

n∑j=1

AijBji =

n∑j=1

m∑i=1

BjiAij = trBA.

(ii) Suppose B = P−1AP . Then we have

trB = tr(P−1(AP )) = tr((AP )P−1) = trA.

(iii) We have

det(P−1AP ) = detP−1 detAdetP = (detP )−1 detAdetP = detA.

This allows us to define the trace and determinant of an endomorphism.

Definition (Trace and determinant of endomorphism). Let α ∈ End(V ), and Abe a matrix representing α under any basis. Then the trace of α is trα = trA,and the determinant is detα = detA.

The lemma tells us that the determinant and trace are well-defined. Wecan also define the determinant without reference to a basis, by defining moregeneral volume forms and define the determinant as a scaling factor.

The trace is slightly more tricky to define without basis, but in IB AnalysisII example sheet 4, you will find that it is the directional derivative of thedeterminant at the origin.

To talk about the characteristic polynomial, we need to know what eigenvaluesare.

50


Definition (Eigenvalue and eigenvector). Let α ∈ End(V ). Then λ ∈ F is aneigenvalue (or E-value) if there is some v ∈ V \ {0} such that αv = λv.

v is an eigenvector if α(v) = λv for some λ ∈ F.When λ ∈ F, the λ-eigenspace, written Eα(λ) or E(λ) is the subspace of V

containing all the λ-eigenvectors, i.e.

Eα(λ) = ker(λι− α).

where ι is the identity function.

Definition (Characteristic polynomial). The characteristic polynomial of α isdefined by

χα(t) = det(tι− α).

You might be used to the definition χα(t) = det(α− tι) instead. These twodefinitions are obviously equivalent up to a factor of −1, but this definitionhas an advantage that χα(t) is always monic, i.e. the leading coefficient is 1.However, when doing computations in reality, we often use det(α− tι) instead,since it is easier to negate tι than α.

We know that λ is an eigenvalue of α iff n(α− λι) > 0 iff r(α− λι) < dimViff χα(λ) = det(λι − α) = 0. So the eigenvalues are precisely the roots of thecharacteristic polynomial.

If A ∈ Matn(F), we can define χA(t) = det(tI −A).

Lemma. If A and B are similar, then they have the same characteristic polyno-mial.

Proof.det(tI − P−1AP ) = det(P−1(tI −A)P ) = det(tI −A).

Lemma. Let α ∈ End(V ) and λ1, · · · , λk distinct eigenvalues of α. Then

E(λ1) + · · ·+ E(λk) =

k⊕i=1

E(λi)

is a direct sum.

Proof. Supposek∑i=1

xi =

k∑i=1

yi,

with xi,yi ∈ E(λi). We want to show that they are equal. We are going to findsome clever map that tells us what xi and yi are. Consider βj ∈ End(V ) definedby

βj =∏r 6=j

(α− λrι).

Then

βj

(k∑i=1

xi

)=

k∑i=1

∏r 6=j

(α− λrι)(xi)

=

k∑i=1

∏r 6=j

(λi − λr)(xi).

51


Each summand is zero, unless i 6= j. So this is equal to

βj

(k∑i=1

xi

)=∏r 6=j

(λj − λr)(xj).

Similarly, we obtain

βj

(k∑i=1

yi

)=∏r 6=j

(λj − λr)(yj).

Since we know that∑

xi =∑

yi, we must have∏r 6=j

(λj − λr)xj =∏r 6=j

(λj − λr)yj .

Since we know that∏r 6=j(λr − λj) 6= 0, we must have xi = yi for all i.

So each expression for∑

xi is unique.

The proof shows that any set of non-zero eigenvectors with distinct eigenvaluesis linearly independent.

Definition (Diagonalizable). We say α ∈ End(V ) is diagonalizable if there issome basis for V such that α is represented by a diagonal matrix, i.e. all termsnot on the diagonal are zero.

These are in some sense the nice matrices we like to work with.

Theorem. Let α ∈ End(V ) and λ1, · · · , λk be distinct eigenvalues of α. WriteEi for E(λi). Then the following are equivalent:

(i) α is diagonalizable.

(ii) V has a basis of eigenvectors for α.

(iii) V =⊕k

i=1Ei.

(iv) dimV =∑ki=1 dimEi.

Proof.

– (i) ⇔ (ii): Suppose (e1, · · · , en) is a basis for V . Then

α(ei) = Ajiej ,

where A represents α. Then A is diagonal iff each ei is an eigenvector. Sodone

– (ii) ⇔ (iii): It is clear that (ii) is true iff∑Ei = V , but we know that this

must be a direct sum. So done.

– (iii) ⇔ (iv): This follows from example sheet 1 Q10, which says that

V =⊕k

i=1Ei iff the bases for Ei are disjoint and their union is a basis ofV .

52


6.2 The minimal polynomial

6.2.1 Aside on polynomials

Before we talk about minimal polynomials, we first talk about polynomials ingeneral.

Definition (Polynomial). A polynomial over F is an object of the form

f(t) = amtm + am−1t

m−1 + · · ·+ a1t+ a0,

with m ≥ 0, a0, · · · , am ∈ F.We write F[t] for the set of polynomials over F.

Note that we don’t identify a polynomial f with the corresponding functionit represents. For example, if F = Z/pZ, then tp and t are different polynomials,even though they define the same function (by Fermat’s little theorem/Lagrange’stheorem). Two polynomials are equal if and only if they have the same coeffi-cients.

However, we will later see that if F is R or C, then polynomials are equalif and only if they represent the same function, and this distinction is not asimportant.

Definition (Degree). Let f ∈ F[t]. Then the degree of f , written deg f is thelargest n such that an 6= 0. In particular, deg 0 = −∞.

Notice that deg fg = deg f + deg g and deg f + g ≤ max{deg f, deg g}.

Lemma (Polynomial division). If f, g ∈ F[t] (and g 6= 0), then there existsq, r ∈ F[t] with deg r < deg g such that

f = qg + r.

Proof is omitted.

Lemma. If λ ∈ F is a root of f , i.e. f(λ) = 0, then there is some g such that

f(t) = (t− λ)g(t).

Proof. By polynomial division, let

f(t) = (t− λ)g(t) + r(t)

for some g(t), r(t) ∈ F[t] with deg r < deg(t− λ) = 1. So r has to be constant,i.e. r(t) = a0 for some a0 ∈ F. Now evaluate this at λ. So

0 = f(λ) = (λ− λ)g(λ) + r(λ) = a0.

So a0 = 0. So r = 0. So done.

Definition (Multiplicity of a root). Let f ∈ F[t] and λ a root of f . We say λhas multiplicity k if (t− λ)k is a factor of f but (t− λ)k+1 is not, i.e.

f(t) = (t− λ)kg(t)

for some g(t) ∈ F[t] with g(λ) 6= 0.

53


We can use the last lemma and induction to show that any non-zero f ∈ F[t]can be written as

f = g(t)

k∏i=1

(t− λi)ai ,

where λ1, · · · , λk are all distinct, ai > 1, and g is a polynomial with no roots inF.

Hence we obtain the following:

Lemma. A non-zero polynomial f ∈ F[t] has at most deg f roots, counted withmultiplicity.

Corollary. Let f, g ∈ F[t] have degree < n. If there are λ1, · · · , λn distinct suchthat f(λi) = g(λi) for all i, then f = g.

Proof. Given the lemma, consider f − g. This has degree less than n, and(f − g)(λi) = 0 for i = 1, · · · , n. Since it has at least n ≥ deg(f − g) roots, wemust have f − g = 0. So f = g.

Corollary. If F is infinite, then f and g are equal if and only if they agree onall points.

More importantly, we have the following:

Theorem (The fundamental theorem of algebra). Every non-constant polyno-mial over C has a root in C.

We will not prove this.We say C is an algebraically closed field.It thus follows that every polynomial over C of degree n > 0 has precisely n

roots, counted with multiplicity, since if we write f(t) = g(t)∏

(t − λi)ai andg has no roots, then g is constant. So the number of roots is

∑ai = deg f ,

counted with multiplicity.It also follows that every polynomial in R factors into linear polynomials and

quadratic polynomials with no real roots (since complex roots of real polynomialscome in complex conjugate pairs).

6.2.2 Minimal polynomial

Notation. Given f(t) =∑mi=0 ait

i ∈ F[t], A ∈ Matn(F) and α ∈ End(V ), wecan write

f(A) =

m∑i=0

aiAi, f(α) =

m∑i=0

aiαi

where A0 = I and α0 = ι.

Theorem (Diagonalizability theorem). Suppose α ∈ End(V ). Then α is diago-nalizable if and only if there exists non-zero p(t) ∈ F[t] such that p(α) = 0, andp(t) can be factored as a product of distinct linear factors.

Proof. Suppose α is diagonalizable. Let λ1, · · · , λk be the distinct eigenvaluesof α. We have

V =

k⊕i=1

E(λi).

54


So each v ∈ V can be written (uniquely) as

v =

k∑i=1

vi with α(vi) = λivi.

Now let

p(t) =

k∏i=1

(t− λi).

Then for any v, we get

p(α)(v) =

k∑i=1

p(α)(vi) =

k∑i=1

p(λi)vi = 0.

So p(α) = 0. By construction, p has distinct linear factors.Conversely, suppose we have our polynomial

p(t) =

k∏i=1

(t− λi),

with λ1, · · · , λk ∈ F distinct, and p(α) = 0 (we can wlog assume p is monic, i.e.the leading coefficient is 1). We will show that

V =

k∑i=1

Eα(λi).

In other words, we want to show that for all v ∈ V , there is some vi ∈ Eα(λi)for i = 1, · · · , k such that v =

∑vi.

To find these vi out, we let

qj(t) =∏i 6=j

t− λiλj − λi

.

This is a polynomial of degree k − 1, and qj(λi) = δij .Now consider

q(t) =

k∑i=1

qi(t).

We still have deg q ≤ k − 1, but q(λi) = 1 for any i. Since q and 1 agree on kpoints, we must have q = 1.

Let πj : V → V be given by πj = qj(α). Then the above says that

k∑j=1

πj = ι.

Hence given v ∈ V , we know that v =∑πjv.

We now check that πjv ∈ Eα(λj). This is true since

(α− λjι)πjv =1∏

i 6=j(λj − λi)

k∏i=1

(α− λι)(v) =1∏

i 6=j(λj − λi)p(α)(v) = 0.

55


Soαvj = λjvj .

So done.

In the above proof, if v ∈ Eα(λi), then πj(v) = δijv. So πi is a projectiononto the Eα(λi).

Definition (Minimal polynomial). The minimal polynomial of α ∈ End(V ) isthe non-zero monic polynomial Mα(t) of least degree such that Mα(α) = 0.

The monic requirement is just for things to look nice, since we can alwaysdivide by the leading coefficient of a polynomial to get a monic version.

Note that if A represents α, then for all p ∈ F[t], p(A) represents p(α).Thus p(α) is zero iff p(A) = 0. So the minimal polynomial of α is the minimalpolynomial of A if we define MA analogously.

There are two things we want to know — whether the minimal polynomialexists, and whether it is unique.

Existence is always guaranteed in finite-dimensional cases. If dimV = n <∞,then dim End(V ) = n2. So ι, α, α2, · · · , αn2

are linearly dependent. So there aresome λ0, · · · , λn2 ∈ F not all zero such that

n2∑i=0

λiαi = 0.

So degMα ≤ n2. So we must have a minimal polynomial.To show that the minimal polynomial is unique, we will prove the following

stronger characterization of the minimal polynomial:

Lemma. Let α ∈ End(V ), and p ∈ F[t]. Then p(α) = 0 if and only if Mα(t) isa factor of p(t). In particular, Mα is unique.

Proof. For all such p, we can write p(t) = q(t)Mα(t) + r(t) for some r of degreeless than degMα. Then

p(α) = q(α)Mα(α) + r(α).

So if r(α) = 0 iff p(α) = 0. But deg r < degMα. By the minimality of Mα, wemust have r(α) = 0 iff r = 0. So p(α) = 0 iff Mα(t) | p(t).

So if M1 and M2 are both minimal polynomials for α, then M1 | M2 andM2 | M1. So M2 is just a scalar multiple of M1. But since M1 and M2 aremonic, they must be equal.

Example. Let V = F2, and consider the following matrices:

A =

(1 00 1

), B =

(1 10 1

).

Consider the polynomial p(t) = (t− 1)2. We can compute p(A) = p(B) = 0. SoMA(t) and MB(t) are factors of (t− 1)2. There aren’t many factors of (t− 1)2.So the minimal polynomials are either (t− 1) or (t− 1)2. Since A− I = 0 andB − I 6= 0, the minimal polynomial of A is t− 1 and the minimal polynomial ofB is (t− 1)2.

56


We can now re-state our diagonalizability theorem.

Theorem (Diagonalizability theorem 2.0). Let α ∈ End(V ). Then α is diago-nalizable if and only if Mα(t) is a product of its distinct linear factors.

Proof. (⇐) This follows directly from the previous diagonalizability theorem.(⇒) Suppose α is diagonalizable. Then there is some p ∈ F[t] non-zero such

that p(α) = 0 and p is a product of distinct linear factors. Since Mα divides p,Mα also has distinct linear factors.

Theorem. Let α, β ∈ End(V ) be both diagonalizable. Then α and β aresimultaneously diagonalizable (i.e. there exists a basis with respect to whichboth are diagonal) if and only if αβ = βα.

This is important in quantum mechanics. This means that if two operatorsdo not commute, then they do not have a common eigenbasis. Hence we havethe uncertainty principle.

Proof. (⇒) If there exists a basis (e1, · · · , en) for V such that α and β are repre-sented by A and B respectively, with both diagonal, then by direct computation,AB = BA. But AB represents αβ and BA represents βα. So αβ = βα.

(⇐) Suppose αβ = βα. The idea is to consider each eigenspace of α individu-ally, and then diagonalize β in each of the eigenspaces. Since α is diagonalizable,we can write

V =

k⊕i=1

Eα(λi),

where λi are the eigenvalues of V . We write Ei for Eα(λi). We want to showthat β sends Ei to itself, i.e. β(Ei) ⊆ Ei. Let v ∈ Ei. Then we want β(v) to bein Ei. This is true since

α(β(v)) = β(α(v)) = β(λiv) = λiβ(v).

So β(v) is an eigenvector of α with eigenvalue λi.Now we can view β|Ei

∈ End(Ei). Note that

Mβ(β|Ei) = Mβ(β)|Ei

= 0.

Since Mβ(t) is a product of its distinct linear factors, it follows that β|Ei isdiagonalizable. So we can choose a basis Bi of eigenvectors for β|Ei . We can dothis for all i.

Then since V is a direct sum of the Ei’s, we know that B =⋃ki=1Bi is a

basis for V consisting of eigenvectors for both α and β. So done.

6.3 The Cayley-Hamilton theorem

We will first state the theorem, and then prove it later.Recall that χα(t) = det(tι− α) for α ∈ End(V ). Our main theorem of the

section (as you might have guessed from the title) is

Theorem (Cayley-Hamilton theorem). Let V be a finite-dimensional vectorspace and α ∈ End(V ). Then χα(α) = 0, i.e. Mα(t) | χα(t). In particular,degMα ≤ n.

57


We will not prove this yet, but just talk about it first. It is tempting to provethis by substituting t = α into det(tι− α) and get det(α− α) = 0, but this ismeaningless, since what the statement χα(t) = det(tι− α) tells us to do is toexpand the determinant of the matrix

t− a11 a12 · · · a1na21 t− a22 · · · a2n...

.... . .

...an1 an2 · · · t− ann

to obtain a polynomial, and we clearly cannot substitute t = A in this expression.However, we can later show that we can use this idea to prove it, but just be abit more careful.

Note also that if ρ(t) ∈ F[t] and

A =

λ1 . . .

λn

,

then

ρ(A) =

ρ(λ1). . .

ρ(λn)

.

Since χA(t) is defined as∏ni=1(t − λi), it follows that χA(A) = 0. So if α is

diagonalizable, then the theorem is clear.This was easy. Diagonalizable matrices are nice. The next best thing we can

look at is upper-triangular matrices.

Definition (Triangulable). An endomorphism α ∈ End(V ) is triangulable ifthere is a basis for V such that α is represented by an upper triangular matrix

a11 a12 · · · a1n0 a22 · · · a2n...

.... . .

...0 0 · · · ann

.

We have a similar lemma telling us when matrices are triangulable.

Lemma. An endomorphism α is triangulable if and only if χα(t) can be writtenas a product of linear factors, not necessarily distinct. In particular, if F = C(or any algebraically closed field), then every endomorphism is triangulable.

Proof. Suppose that α is triangulable and represented byλ1 ∗ · · · ∗0 λ2 · · · ∗...

.... . .

...0 0 · · · λn

.

58


Then

χα(t) = det

t− λ1 ∗ · · · ∗

0 t− λ2 · · · ∗...

.... . .

...0 0 · · · t− λn

=

n∏i=1

(t− λi).

So it is a product of linear factors.We are going to prove the converse by induction on the dimension of our

space. The base case dimV = 1 is trivial, since every 1 × 1 matrix is alreadyupper triangular.

Suppose α ∈ End(V ) and the result holds for all spaces of dimensions< dimV , and χα is a product of linear factors. In particular, χα(t) has a root,say λ ∈ F.

Now let U = E(λ) 6= 0, and let W be a complementary subspace to U in V ,i.e. V = U ⊕W . Let u1, · · · ,ur be a basis for U and wr+1, · · · ,wn be a basisfor W so that u1, · · · ,ur,wr+1, · · · ,wn is a basis for V , and α is represented by(

λIr stuff0 B

)We know χα(t) = (t− λ)rχB(t). So χB(t) is also a product of linear factors. Welet β : W →W be the map defined by B with respect to wr+1, · · · ,wn.

(Note that in general, β is not α|W in general, since α does not necessarilymap W to W . However, we can say that (α−β)(w) ∈ U for all w ∈W . This canbe much more elegantly expressed in terms of quotient spaces, but unfortunatelythat is not officially part of the course)

Since dimW < dimV , there is a basis vr+1, · · · ,vn for W such that β isrepresented by C, which is upper triangular.

For j = 1, · · · , n− r, we have

α(vj+r) = u +

n−r∑k=1

Ckjvk+r

for some u ∈ U . So α is represented by(λIr stuff0 C

)with respect to (u1, · · · ,ur,vr+1, · · · ,vn), which is upper triangular.

Example. Consider the real rotation matrix(cos θ sin θ− sin θ cos θ

).

This is not similar to a real upper triangular matrix (if θ is not an integermultiple of π). This is since the eigenvalues are e±iθ and are not real. On theother hand, as a complex matrix, it is triangulable, and in fact diagonalizablesince the eigenvalues are distinct.

For this reason, in the rest of the section, we are mostly going to work in C.We can now prove the Cayley-Hamilton theorem.

59


Theorem (Cayley-Hamilton theorem). Let V be a finite-dimensional vectorspace and α ∈ End(V ). Then χα(α) = 0, i.e. Mα(t) | χα(t). In particular,degMα ≤ n.

Proof. In this proof, we will work over C. By the lemma, we can choose a basis{e1, · · · , en} is represented by an upper triangular matrix.

A =

λ1 ∗ · · · ∗0 λ2 · · · ∗...

.... . .

...0 0 · · · λn

.

We must prove that

χα(α) = χA(α) =

n∏i=1

(α− λiι) = 0.

Write Vj = 〈e1, · · · , ej〉. So we have the inclusions

V0 = 0 ⊆ V1 ⊆ · · · ⊆ Vn−1 ⊆ Vn = V.

We also know that dimVj = j. This increasing sequence is known as a flag.Now note that since A is upper-triangular, we get

α(ei) =

i∑k=1

Akiek ∈ Vi.

So α(Vj) ⊆ Vj for all j = 0, · · · , n.Moreover, we have

(α− λjι)(ej) =

j−1∑k=1

Akjek ⊆ Vj−1

for all j = 1, · · · , n. So every time we apply one of these things, we get to asmaller space. Hence by induction on n− j, we have

n∏i=j

(α− λiι)(Vn) ⊆ Vj−1.

In particular, when j = 1, we get

n∏i=1

(α− λiι)(V ) ⊆ V0 = 0.

So χα(α) = 0 as required.Note that if our field F is not C but just a subfield of C, say R, we can just

pretend it is a complex matrix, do the same proof.

60


We can see this proof more “visually” as follows: for simplicity of expression,we suppose n = 4. In the basis where α is upper-triangular, the matrices A−λiIlook like this

A− λ1I =

0 ∗ ∗ ∗0 ∗ ∗ ∗0 0 ∗ ∗0 0 0 ∗

A− λ2I =

∗ ∗ ∗ ∗0 0 ∗ ∗0 0 ∗ ∗0 0 0 ∗

A− λ3I =

∗ ∗ ∗ ∗0 ∗ ∗ ∗0 0 0 ∗0 0 0 ∗

A− λ4I =

∗ ∗ ∗ ∗0 ∗ ∗ ∗0 0 ∗ ∗0 0 0 0

Then we just multiply out directly (from the right):

4∏i=1

(A− λiI) =

0 ∗ ∗ ∗0 ∗ ∗ ∗0 0 ∗ ∗0 0 0 ∗

∗ ∗ ∗ ∗0 0 ∗ ∗0 0 ∗ ∗0 0 0 ∗

∗ ∗ ∗ ∗0 ∗ ∗ ∗0 0 0 ∗0 0 0 ∗

∗ ∗ ∗ ∗0 ∗ ∗ ∗0 0 ∗ ∗0 0 0 0

=

0 ∗ ∗ ∗0 ∗ ∗ ∗0 0 ∗ ∗0 0 0 ∗

∗ ∗ ∗ ∗0 0 ∗ ∗0 0 ∗ ∗0 0 0 ∗

∗ ∗ ∗ ∗0 ∗ ∗ ∗0 0 0 00 0 0 0

=

0 ∗ ∗ ∗0 ∗ ∗ ∗0 0 ∗ ∗0 0 0 ∗

∗ ∗ ∗ ∗0 0 0 00 0 0 00 0 0 0

=

0 0 0 00 0 0 00 0 0 00 0 0 0

.

This is exactly what we showed in the proof — after multiplying out the first kelements of the product (counting from the right), the image is contained in thespan of the first n− k basis vectors.

Proof. We’ll now prove the theorem again, which is somewhat a formalizationof the “nonsense proof” where we just substitute t = α into det(α− tι).

Let α be represented by A, and B = tI −A. Then

B adjB = detBIn = χα(t)In.

But we know that adjB is a matrix with entries in F[t] of degree at most n− 1.So we can write

adjB = Bn−1tn−1 +Bn−2t

n−2 + · · ·+B0,

with Bi ∈ Matn(F). We can also write

χα(t) = tn + an−1tn−1 + · · ·+ a0.

61


Then we get the result

(tIn −A)(Bn−1tn−1 +Bn−2t

n−2 + · · ·+B0) = (tn + an−1tn−1 + · · ·+ a0)In.

We would like to just throw in t = A, and get the desired result, but in all thesederivations, t is assumed to be a real number, and, tIn −A is the matrix

t− a11 a12 · · · a1na21 t− a22 · · · a2n...

.... . .

...an1 an2 · · · t− ann

It doesn’t make sense to put our A in there.

However, what we can do is to note that since this is true for all values of t,the coefficients on both sides must be equal. Equating coefficients in tk, we have

−AB0 = a0In

B0 −AB1 = a1In

...

Bn−2 −ABn−1 = an−1In

ABn−1 − 0 = In

We now multiply each row a suitable power of A to obtain

−AB0 = a0In

AB0 −A2B1 = a1A

...

An−1Bn−2 −AnBn−1 = an−1An−1

AnBn−1 − 0 = An.

Summing this up then gives χα(A) = 0.

This proof suggests that we really ought to be able to just substitute in t = αand be done. In fact, we can do this, after we develop sufficient machinery. Thiswill be done in the IB Groups, Rings and Modules course.

Lemma. Let α ∈ End(V ), λ ∈ F. Then the following are equivalent:

(i) λ is an eigenvalue of α.

(ii) λ is a root of χα(t).

(iii) λ is a root of Mα(t).

Proof.

– (i) ⇔ (ii): λ is an eigenvalue of α if and only if (α − λι)(v) = 0 has anon-trivial root, iff det(α− λι) = 0.

– (iii) ⇒ (ii): This follows from Cayley-Hamilton theorem since Mα | χα.

62


– (i) ⇒ (iii): Let λ be an eigenvalue, and v be a corresponding eigenvector.Then by definition of Mα, we have

Mα(α)(v) = 0(v) = 0.

We also know thatMα(α)(v) = Mα(λ)v.

Since v is non-zero, we must have Mα(λ) = 0.

– (iii) ⇒ (i): This is not necessary since it follows from the above, butwe could as well do it explicitly. Suppose λ is a root of Mα(t). ThenMα(t) = (t − λ)g(t) for some g ∈ F[t]. But deg g < degMα. Hence byminimality of Mα, we must have g(α) 6= 0. So there is some v ∈ V suchthat g(α)(v) 6= 0. Then

(α− λι)g(α)(v) = Mα(α)v = 0.

So we must have α(g(α)(v)) = λg(α)(v). So g(α)(v) ∈ Eα(λ) \ {0}. So (i)holds.

Example. What is the minimal polynomial of

A =

1 0 −20 1 10 0 2

?

We can compute χA(t) = (t−1)2(t−2). So we know that the minimal polynomialis one of (t− 1)2(t− 2) and (t− 1)(t− 2).

By direct and boring computations, we can find (A− I)(A− 2I) = 0. So weknow that MA(t) = (t− 1)(t− 2). So A is diagonalizable.

6.4 Multiplicities of eigenvalues and Jordan normal form

We will want to put our matrices in their “Jordan normal forms”, which isa unique form for each equivalence class of similar matrices. The followingproperties will help determine which Jordan normal form a matrix can have.

Definition (Algebraic and geometry multiplicity). Let α ∈ End(V ) and λ aneigenvalue of α. Then

(i) The algebraic multiplicity of λ, written aλ, is the multiplicity of λ as aroot of χα(t).

(ii) The geometric multiplicity of λ, written gλ, is the dimension of the corre-sponding eigenspace, dimEα(λ).

(iii) cλ is the multiplicity of λ as a root of the minimal polynomial mα(t).

We will look at some extreme examples:

Example.

63


– Let

A =

λ 1 · · · 0

0 λ. . .

......

.... . . 1

0 0 · · · λ

.

We will later show that aλ = n = cλ and gλ = 1.

– Consider A = λI. Then aλ = gλ = n, cλ = 1.

Lemma. If λ is an eigenvalue of α, then

(i) 1 ≤ gλ ≤ aλ

(ii) 1 ≤ cλ ≤ aλ.

Proof.

(i) The first inequality is easy. If λ is an eigenvalue, then E(λ) 6= 0. Sogλ = dimE(λ) ≥ 1. To prove the other inequality, if v1, · · · ,vg is a basisfor E(λ), then we can extend it to a basis for V , and then α is representedby (

λIg ∗0 B

)So χα(t) = (t− λ)gχB(t). So aλ > g = gλ.

(ii) This is straightforward since Mα(λ) = 0 implies 1 ≤ cλ, and since Mα(t) |χα(t), we know that cλ ≤ αλ.

Lemma. Suppose F = C and α ∈ End(V ). Then the following are equivalent:

(i) α is diagonalizable.

(ii) gλ = aλ for all eigenvalues of α.

(iii) cλ = 1 for all λ.

Proof.

– (i) ⇔ (ii): α is diagonalizable iff dimV =∑

dimEα(λi). But this isequivalent to

dimV =∑

gλi≤∑

aλi= degχα = dimV.

So we must have∑gλi

=∑aλi

. Since each gλiis at most aλi

, they mustbe individually equal.

– (i) ⇔ (iii): α is diagonalizable if and only if Mα(t) is a product of distinctlinear factors if and only if cλ = 1 for all eigenvalues λ.

Definition (Jordan normal form). We say A ∈ MatN (C) is in Jordan normalform if it is a block diagonal of the form

Jn1(λ1) 0

Jn2(λ2)

. . .

0 Jnk(λk)

64


where k ≥ 1, n1, · · · , nk ∈ N such that n =∑ni, λ1, · · · , λk not necessarily

distinct, and

Jm(λ) =

λ 1 · · · 0

0 λ. . .

......

.... . . 1

0 0 · · · λ

is an m×m matrix. Note that Jm(λ) = λIm + Jm(0).

For example, it might look something like

λ1 1λ1 1

λ1 0λ2 0

λ3 1λ3 0

. . .. . .

λn 1λn

with all other entries zero.

Then we have the following theorem:

Theorem (Jordan normal form theorem). Every matrix A ∈ Matn(C) is similarto a matrix in Jordan normal form. Moreover, this Jordan normal form matrixis unique up to permutation of the blocks.

This is a complete solution to the classification problem of matrices, at leastin C. We will not prove this completely. We will only prove the uniqueness part,and then reduce the existence part to a special form of endomorphisms. Theremainder of the proof is left for IB Groups, Rings and Modules.

We can rephrase this result using linear maps. If α ∈ End(V ) is an endo-morphism of a finite-dimensional vector space V over C, then the theorem saysthere exists a basis such that α is represented by a matrix in Jordan normalform, and this is unique as before.

Note that the permutation thing is necessary, since if two matrices in Jordannormal form differ only by a rearrangement of blocks, then they are similar, bypermuting the basis.

Example. Every 2× 2 matrix in Jordan normal form is one of the three types:

Jordan normal form χA MA(λ 00 λ

)(t− λ)2 (t− λ)(

λ 00 µ

)(t− λ)(t− µ) (t− λ)(t− µ)(

λ 10 λ

)(t− λ)2 (t− λ)2

65


with λ, µ ∈ C distinct. We see that MA determines the Jordan normal form ofA, but χA does not.

Every 3× 3 matrix in Jordan normal form is one of the six types. Here λ1, λ2and λ3 are distinct complex numbers.

Jordan normal form χA MAλ1 0 00 λ2 00 0 λ3

(t− λ1)(t− λ2)(t− λ3) (t− λ1)(t− λ2)(t− λ3)

λ1 0 00 λ1 00 0 λ2

(t− λ1)2(t− λ2) (t− λ1)(t− λ2)

λ1 1 00 λ1 00 0 λ2

(t− λ1)2(t− λ2) (t− λ1)2(t− λ2)

λ1 0 00 λ1 00 0 λ1

(t− λ1)3 (t− λ1)

λ1 1 00 λ1 00 0 λ1

(t− λ1)3 (t− λ1)2

λ1 1 00 λ1 10 0 λ1

(t− λ1)3 (t− λ1)3

Notice that χA and MA together determine the Jordan normal form of a 3× 3complex matrix. We do indeed need χA in the second case, since if we are givenMA = (t− λ1)(t− λ2), we know one of the roots is double, but not which one.

In general, though, even χA and MA together does not suffice.We now want to understand the Jordan normal blocks better. Recall the

definition

Jn(λ) =

λ 1 · · · 0

0 λ. . .

......

.... . . 1

0 0 · · · λ

= λIn + Jn(0).

If (e1, · · · , en) is the standard basis for Cn, we have

Jn(0)(e1) = 0, Jn(0)(ei) = ei−1 for 2 ≤ i ≤ n.

Thus we know

Jn(0)k(ei) =

{0 i ≤ kei−k k < i ≤ n

In other words, for k < n, we have

(Jn(λ)− λI)k = Jn(0)k =

(0 In−k0 0

).

66


If k ≥ n, then we have (Jn(λ)− λI)k = 0. Hence we can summarize this as

n((Jm(λ)− λIm)r) = min{r,m}.

Note that if A = Jn(λ), then χA(t) = MA(t) = (t − λ)n. So λ is the onlyeigenvalue of A. So aλ = cλ = n. We also know that n(A−λI) = n−r(A−λI) =1. So gλ = 1.

Recall that a general Jordan normal form is a block diagonal matrix of Jordanblocks. We have just studied individual Jordan blocks. Next, we want to look atsome properties of block diagonal matrices in general. If A is the block diagonalmatrix

A =

A1

A2

. . .

Ak

,

then

χA(t) =

k∏i=1

χAi(k).

Moreover, if ρ ∈ F[t], then

ρ(A) =

ρ(A1)

ρ(A2). . .

ρ(Ak)

.

HenceMA(t) = lcm(MA1

(t), · · · ,MAk(t)).

By the rank-nullity theorem, we have

n(ρ(A)) =

k∑i=1

n(ρ(Ai)).

Thus if A is in Jordan normal form, we get the following:

– gλ is the number of Jordan blocks in A with eigenvalue λ.

– aλ is the sum of sizes of the Jordan blocks of A with eigenvalue λ.

– cλ is the size of the largest Jordan block with eigenvalue λ.

We are now going to prove the uniqueness part of Jordan normal formtheorem.

Theorem. Let α ∈ End(V ), and A in Jordan normal form representing α. Thenthe number of Jordan blocks Jn(λ) in A with n ≥ r is

n((α− λι)r)− n((α− λι)r−1).

This is clearly independent of the choice of basis. Also, given this information,we can figure out how many Jordan blocks of size exactly n by doing the rightsubtraction. Hence this tells us that Jordan normal forms are unique up topermutation of blocks.

67


Proof. We work blockwise for

A =

Jn1(λ1)

Jn2(λ2)

. . .

Jnk(λk)

.

We have previously computed

n((Jm(λ)− λIm)r) =

{r r ≤ mm r > m

.

Hence we know

n((Jm(λ)− λIm)r)− n((Jm(λ)− λIm)r−1) =

{1 r ≤ m0 otherwise.

It is also easy to see that for µ 6= λ,

n((Jm(µ)− λIm)r) = n(Jm(µ− λ)r) = 0

Adding up for each block, for r ≥ 1, we have

n((α−λι)r)−n((α−λι)r−1) = number of Jordan blocks Jn(λ) with n ≥ r.

We can interpret this result as follows: if r ≤ m, when we take an additional

power of Jm(λ)−λIm, we get from

(0 Im−r0 0

)to

(0 Im−r−10 0

). So we kill off

one more column in the matrix, and the nullity increase by one. This happensuntil (Jm(λ)− λIm)r = 0, in which case increasing the power no longer affectsthe matrix. So when we look at the difference in nullity, we are counting thenumber of blocks that are affected by the increase in power, which is the numberof blocks of size at least r.

We have now proved uniqueness, but existence is not yet clear. To showthis, we will reduce it to the case where there is exactly one eigenvalue. Thisreduction is easy if the matrix is diagonalizable, because we can decompose thematrix into each eigenspace and then work in the corresponding eigenspace. Ingeneral, we need to work with “generalized eigenspaces”.

Theorem (Generalized eigenspace decomposition). Let V be a finite-dimensionalvector space C such that α ∈ End(V ). Suppose that

Mα(t) =

k∏i=1

(t− λi)ci ,

with λ1, · · · , λk ∈ C distinct. Then

V = V1 ⊕ · · · ⊕ Vk,

where Vi = ker((α− λiι)ci) is the generalized eigenspace.

68


This allows us to decompose V into a block diagonal matrix, and then eachblock will only have one eigenvalue.

Note that if c1 = · · · = ck = 1, then we recover the diagonalizability theorem.Hence, it is not surprising that the proof of this is similar to the diagonalizabilitytheorem. We will again prove this by constructing projection maps to each ofthe Vi.

Proof. Let

pj(t) =∏i6=j

(t− λi)ci .

Then p1, · · · , pk have no common factors, i.e. they are coprime. Thus by Euclid’salgorithm, there exists q1, · · · , qk ∈ C[t] such that∑

piqi = 1.

We now define the endomorphism

πj = qj(α)pj(α)

for j = 1, · · · , k.Then

∑πj = ι. Since Mα(α) = 0 and Mα(t) = (t− λjι)cjpj(t), we get

(α− λjι)cjπj = 0.

So imπj ⊆ Vj .Now suppose v ∈ V . Then

v = ι(v) =

k∑j=1

πj(v) ∈∑

Vj .

SoV =

∑Vj .

To show this is a direct sum, note that πiπj = 0, since the product containsMα(α) as a factor. So

πi = ιπi =(∑

πj

)πi = π2

i .

So π is a projection, and πj |Vj= ιVj

. So if v =∑

vi, then applying πi to bothsides gives vi = πi(v). Hence there is a unique way of writing v as a sum ofthings in Vi. So V =

⊕Vj as claimed.

Note that we didn’t really use the fact that the vector space is over C, exceptto get that the minimal polynomial is a product of linear factors. In fact, forarbitrary vector spaces, if the minimal polynomial of a matrix is a product oflinear factors, then it can be put into Jordan normal form. The converse is alsotrue — if it can be put into Jordan normal form, then the minimal polynomialis a product of linear factors, since we’ve seen that a necessary and sufficientcondition for the minimal polynomial to be a product of linear factors is forthere to be a basis in which the matrix is upper triangular.

Using this theorem, by restricting α to its generalized eigenspaces, we canreduce the existence part of the Jordan normal form theorem to the case Mα(t) =(t− λ)c. Further by replacing α by α− λι, we can reduce this to the case where0 is the only eigenvalue.

69


Definition (Nilpotent). We say α ∈ End(V ) is nilpotent if there is some r suchthat αr = 0.

Over C, α is nilpotent if and only if the only eigenvalue of α is 0. This issince α is nilpotent if the minimal polynomial is tr for some r.

We’ve now reduced the problem of classifying complex endomorphisms to theclassifying nilpotent endomorphisms. This is the point where we stop. For theremainder of the proof, see IB Groups, Rings and Modules. There is in fact anelementary proof of it, and we’re not doing it not because it’s hard, but becausewe don’t have time.

Example. Let

A =

3 −2 01 0 01 0 1

We know we can find the Jordan normal form by just computing the minimalpolynomial and characteristic polynomial. But we can do better and try to finda P such that P−1AP is in Jordan normal form.

We first compute the eigenvalues of A. The characteristic polynomial is

det

t− 3 −2 01 t 01 0 t− 1

= (t− 1)((t− 3)t+ 2) = (t− 1)2(t− 2).

We now compute the eigenspaces of A. We have

A− I =

2 −2 01 −1 01 0 0

We see this has rank 2 and hence nullity 1, and the eigenspace is the kernel

EA(1) =

⟨001

⟩

We can also compute the other eigenspace. We have

A− 2I =

1 −2 01 −2 01 0 −1

This has rank 2 and

EA(2) =

⟨212

⟩ .Since

dimEA(1) + dimEA(2) = 2 < 3,

this is not diagonalizable. So the minimal polynomial must also be MA(t) =χA(t) = (t − 1)2(t − 2). From the classificaion last time, we know that A issimilar to 1 1 0

0 1 00 0 2

70


We now want to compute a basis that transforms A to this. We want a basis(v1,v2,v3) of C3 such that

Av1 = v1, Av2 = v1 + v2, Av3 = 2v3.

Equivalently, we have

(A− I)v1 = 0, (A− I)v2 = v1, (A− 2I)v3 = 0.

There is an obvious choices v3, namely the eigenvector of eigenvalue 2.To find v1 and v2, the idea is to find some v2 such that (A− I)v2 6= 0 but

(A− I)2v2 = 0. Then we can let v1 = (A− I)v1.We can compute the kernel of (A− I)2. We have

(A− I)2 =

2 −2 01 −1 02 −2 0

The kernel of this is

ker(A− I)2 =

⟨001

,

110

⟩ .We need to pick our v2 that is in this kernel but not in the kernel of A − I(which is the eigenspace E1 we have computed above). So we have

v2 =

110

, v1 =

001

, v3 =

212

.

Hence we have

P =

0 1 20 1 11 0 2

and

P−1AP =

1 1 00 1 00 0 2

.

71

7 Bilinear forms II IB Linear Algebra

7 Bilinear forms II

In Chapter 4, we have looked at bilinear forms in general. Here, we want to lookat bilinear forms on a single space, since often there is just one space we areinterested in. We are also not looking into general bilinear forms on a singlespace, but just those that are symmetric.

7.1 Symmetric bilinear forms and quadratic forms

Definition (Symmetric bilinear form). Let V is a vector space over F. A bilinearform φ : V × V → F is symmetric if

φ(v,w) = φ(w,v)

for all v,w ∈ V .

Example. If S ∈ Matn(F) is a symmetric matrix, i.e. ST = S, the bilinear formφ : Fn × Fn → F defined by

φ(x,y) = xTSx =

n∑i,j=1

xiSijyj

is a symmetric bilinear form.

This example is typical in the following sense:

Lemma. Let V be a finite-dimensional vector space over F, and φ : V × V → Fis a symmetric bilinear form. Let (e1, · · · , en) be a basis for V , and let M bethe matrix representing φ with respect to this basis, i.e. Mij = φ(ei, ej). Thenφ is symmetric if and only if M is symmetric.

Proof. If φ is symmetric, then

Mij = φ(ei, ej) = φ(ej , ei) = Mji.

So MT = M . So M is symmetric.If M is symmetric, then

φ(x,y) = φ(∑

xiei,∑

yjej

)=∑i,j

xiMijyj

=

n∑i,j

yjMjixi

= φ(y,x).

We are going to see what happens when we change basis. As in the case ofendomorphisms, we will require to change basis in the same ways on both sides.

Lemma. Let V is a finite-dimensional vector space, and φ : V × V → F abilinear form. Let (e1, · · · , en) and (f1, · · · , fn) be bases of V such that

fi =

n∑k=1

Pkiek.

72


If A represents φ with respect to (ei) and B represents φ with respect to (fi),then

B = PTAP.

Proof. Special case of general formula proven before.

This motivates the following definition:

Definition (Congruent matrices). Two square matrices A,B are congruent ifthere exists some invertible P such that

B = PTAP.

It is easy to see that congruence is an equivalence relation. Two matricesare congruent if and only if represent the same bilinear form with respect todifferent bases.

Thus, to classify (symmetric) bilinear forms is the same as classifying (sym-metric) matrices up to congruence.

Before we do the classification, we first look at quadratic forms, which aresomething derived from bilinear forms.

Definition (Quadratic form). A function q : V → F is a quadratic form if thereexists some bilinear form φ such that

q(v) = φ(v,v)

for all v ∈ V .

Note that quadratic forms are not linear maps (they are quadratic).

Example. Let V = R2 and φ be represented by A with respect to the standardbasis. Then

q

((xy

))=(x y

)(A11 A12

A21 A22

)(xy

)= A11x

2 + (A12 +A21)xy +A22y2.

Notice that if A is replaced the symmetric matrix

1

2(A+AT ),

then we get a different φ, but the same q. This is in fact true in general.

Proposition (Polarization identity). Suppose that charF 6= 2, i.e. 1 + 1 6= 0on F (e.g. if F is R or C). If q : V → F is a quadratic form, then there exists aunique symmetric bilinear form φ : V × V → F such that

q(v) = φ(v,v).

Proof. Let ψ : V × V → F be a bilinear form such that ψ(v,v) = q(v). Wedefine φ : V × V → F by

φ(v,w) =1

2(ψ(v,w) + ψ(w,v))

for all v,w ∈ F. This is clearly a bilinear form, and it is also clearly symmetricand satisfies the condition we wants. So we have proved the existence part.

73


To prove uniqueness, we want to find out the values of φ(v,w) in terms ofwhat q tells us. Suppose φ is such a symmetric bilinear form. We compute

q(v + w) = φ(v + w,v + w)

= φ(v,v) + φ(v,w) + φ(w,v) + φ(w,w)

= q(v) + 2φ(v,w) + q(w).

So we have

φ(v,w) =1

2(q(v + w)− q(v)− q(w)).

So it is determined by q, and hence unique.

Theorem. Let V be a finite-dimensional vector space over F, and φ : V ×V → Fa symmetric bilinear form. Then there exists a basis (e1, · · · , en) for V suchthat φ is represented by a diagonal matrix with respect to this basis.

This tells us classifying symmetric bilinear forms is easier than classifyingendomorphisms, since for endomorphisms, even over C, we cannot always makeit diagonal, but we can for bilinear forms over arbitrary fields.

Proof. We induct over n = dimV . The cases n = 0 and n = 1 are trivial, sinceall matrices are diagonal.

Suppose we have proven the result for all spaces of dimension less than n.First consider the case where φ(v,v) = 0 for all v ∈ V . We want to show thatwe must have φ = 0. This follows from the polarization identity, since this φinduces the zero quadratic form, and we know that there is a unique bilinearform that induces the zero quadratic form. Since we know that the zero bilinearform also induces the zero quadratic form, we must have φ = 0. Then φ willbe represented by the zero matrix with respect to any basis, which is triviallydiagonal.

If not, pick e1 ∈ V such that φ(e1, e1) 6= 0. Let

U = kerφ(e1, · ) = {u ∈ V : φ(e1,u) = 0}.

Since φ(e1, · ) ∈ V ∗ \ {0}, we know that dimU = n − 1 by the rank-nullitytheorem.

Our objective is to find other basis elements e2, · · · , en such that φ(e1, ej) = 0for all j > 1. For this to happen, we need to find them inside U .

Now consider φ|U×U : U × U → F, a symmetric bilinear form. By theinduction hypothesis, we can find a basis e2, · · · , en for U such that φ|U×U isrepresented by a diagonal matrix with respect to this basis.

Now by construction, φ(ei, ej) = 0 for all 1 ≤ i 6= j ≤ n and (e1, · · · , en) isa basis for V . So we’re done.

Example. Let q be a quadratic form on R3 given by

q

xyz

= x2 + y2 + z2 + 2xy + 4yz + 6xz.

We want to find a basis f1, f2, f3 for R3 such that q is of the form

q(af1 + bf2 + cf3) = λa2 + µb2 + νc2

74


for some λ, µ, ν ∈ R.There are two ways to do this. The first way is to follow the proof we just had.

We first find our symmetric bilinear form. It is the bilinear form represented bythe matrix

A =

1 1 31 1 23 2 1

.

We then find f1 such that φ(f1, f1) 6= 0. We note that q(e1) = 1 6= 0. So we pick

f1 = e1 =

100

.

Then

φ(e1,v) =(1 0 0

)1 1 31 1 23 2 1

v1v2v3

= v1 + v2 + 3v3.

Next we need to pick our f2. Since it is in the kernel of φ(f1, · ), it must satisfy

φ(f1, f2) = 0.

To continue our proof inductively, we also have to pick an f2 such that

φ(f2, f2) 6= 0.

For example, we can pick

f2 =

30−1

.

Then we have q(f2) = −8.Then we have

φ(f2,v) =(3 0 −1

)1 1 31 1 23 2 1

v1v2v3

= v2 + 8v3

Finally, we want φ(f1, f3) = φ(f2, f3) = 0. Then

f3 =

5−81

works. We have q(f3) = 8.

With these basis elements, we have

q(af1 + bf2 + cf3) = φ(af1 + bf2 + cf3, af1 + bf2 + cf3)

= a2q(f1) + b2q(f2) + c2q(f3)

= a2 − 8b2 + 8c2.

75


Alternatively, we can solve the problem by completing the square. We have

x2 + y2 + z2 + 2xy + 4yz + 6xz = (x+ y + 3z)2 − 2yz − 8z2

= (x+ y + 3z)2 − 8(z +

y

8

)2+

1

8y2.

We now see

φ

xyz

,

x′y′z′

= (x+ y + 3z)(x′ + y′ + 3z′)− 8(z +

y

8

)(z′ +

y′

8

)+

1

8yy′.

Why do we know this? This is clearly a symmetric bilinear form, and this alsoclearly induces the q given above. By uniqueness, we know that this is the rightsymmetric bilinear form.

We now use this form to find our f1, f2, f3 such that φ(fi, fj) = δij .To do so, we just solve the equations

x+ y + 3z = 1

z +1

8y = 0

y = 0.

This gives our first vector as

f1 =

100

.

We then solve

x+ y + 3z = 0

z +1

8y = 1

y = 0.

So we have

f2 =

−301

.

Finally, we solve

x+ y + 3z = 0

z +1

8y = 0

y = 1.

This gives

f3 =

−5/81

−1/8.

.

Then we can see that the result follows, and we get

q(af1 + bf2 + cf3) = a2 − 8b2 +1

8c2.

76


We see that the diagonal matrix we get is not unique. We can re-scale ourbasis by any constant, and get an equivalent expression.

Theorem. Let φ be a symmetric bilinear form over a complex vector space V .Then there exists a basis (v1, · · · ,vm) for V such that φ is represented by(

Ir 00 0

)with respect to this basis, where r = r(φ).

Proof. We’ve already shown that there exists a basis (e1, · · · , en) such thatφ(ei, ej) = λiδij for some λij . By reordering the ei, we can assume thatλ1, · · · , λr 6= 0 and λr+1, · · · , λn = 0.

For each 1 ≤ i ≤ r, there exists some µi such that µ2i = λi. For r+1 ≤ r ≤ n,

we let µi = 1 (or anything non-zero). We define

vi =eiµi.

Then

φ(vi,vj) =1

µiµjφ(ei, ej) =

{0 i 6= j or i = j > r

1 i = j < r.

So done.

Note that it follows that for the corresponding quadratic form q, we have

q

(n∑i=1

aivi

)=

r∑i=1

a2i .

Corollary. Every symmetric A ∈ Matn(C) is congruent to a unique matrix ofthe form (

Ir 00 0

).

Now this theorem is a bit too strong, and we are going to fix that next lecture,by talking about Hermitian forms and sesquilinear forms. Before that, we dothe equivalent result for real vector spaces.

Theorem. Let φ be a symmetric bilinear form of a finite-dimensional vectorspace over R. Then there exists a basis (v1, · · · ,vn) for V such that φ isrepresented Ip −Iq

0

,

with p+ q = r(φ), p, q ≥ 0. Equivalently, the corresponding quadratic forms isgiven by

q

(n∑i=1

aivi

)=

p∑i=1

a2i −p+q∑j=p+1

a2j .

77


Note that we have seen these things in special relativity, where the Minkowskiinner product is given by the symmetric bilinear form represented by

−1 0 0 00 1 0 00 0 1 00 0 0 1

,

in units where c = 1.

Proof. We’ve already shown that there exists a basis (e1, · · · , en) such thatφ(ei, ej) = λiδij for some λ1, · · · , λn ∈ R. By reordering, we may assume

λi > 0 1 ≤ i ≤ pλi < 0 p+ 1 ≤ i ≤ rλi = 0 i > r

We let µi be defined by

µi =

√λi 1 ≤ i ≤ p√−λi p+ 1 ≤ i ≤ r

1 i > r

Defining

vi =1

µiei,

we find that φ is indeed represented byIp −Iq0

,

We will later show that this form is indeed unique. Before that, we will havea few definitions, that really only make sense over R.

Definition (Positive/negative (semi-)definite). Let φ be a symmetric bilinearform on a finite-dimensional real vector space V . We say

(i) φ is positive definite if φ(v,v) > 0 for all v ∈ V \ {0}.

(ii) φ is positive semi-definite if φ(v,v) ≥ 0 for all v ∈ V .

(iii) φ is negative definite if φ(v,v) < 0 for all v ∈ V \ {0}.

(iv) φ is negative semi-definite if φ(v,v) ≤ 0 for all v ∈ V .

We are going to use these notions to prove uniqueness. It is easy to see thatif p = 0 and q = n, then we are negative definite; if p = 0 and q 6= n, then weare negative semi-definite etc.

78


Example. Let φ be a symmetric bilinear form on Rn represented by(Ip 00 0n−p

).

Then φ is positive semi-definite. φ is positive definite if and only if n = p.If instead φ is represented by(

−Ip 00 0n−p

),

then φ is negative semi-definite. φ is negative definite precisely if n = q.

We are going to use this to prove the uniqueness part of our previous theorem.

Theorem (Sylvester’s law of inertia). Let φ be a symmetric bilinear form on afinite-dimensional real vector space V . Then there exists unique non-negativeintegers p, q such that φ is represented byIp 0 0

0 −Iq 00 0 0

with respect to some basis.

Proof. We have already proved the existence part, and we just have to proveuniqueness. To do so, we characterize p and q in a basis-independent way. Wealready know that p+ q = r(φ) does not depend on the basis. So it suffices toshow p is unique.

To see that p is unique, we show that p is the largest dimension of a subspaceP ⊆ V such that φ|P×P is positive definite.

First we show we can find such at P . Suppose φ is represented byIp 0 00 −Iq 00 0 0

with respect to (e1, · · · , en). Then φ restricted to 〈e1, · · · , ep〉 is represented byIp with respect to e1, · · · , ep. So φ restricted to this is positive definite.

Now suppose P is any subspace of V such that φ|P×P is positive definite. Toshow P has dimension at most p, we find a subspace complementary to P withdimension n− p.

Let Q = 〈ep+1, · · · , en〉. Then φ restricted to Q×Q is represented by(−Iq 0

0 0

).

Now if v ∈ P ∩Q \ {0}, then φ(v,v) > 0 since v ∈ P \ {0} and φ(v,v) ≤ 0 sincev ∈ Q, which is a contradiction. So P ∩Q = 0.

We have

dimV ≥ dim(P +Q) = dimP + dimQ = dimP + (n− p).

Rearranging givesdimP ≤ p.

A similar argument shows that q is the maximal dimension of a subspace Q ⊆ Vsuch that φ|Q×Q is negative definite.

79


Definition (Signature). The signature of a bilinear form φ is the number p− q,where p and q are as above.

Of course, we can recover p and q from the signature and the rank of φ.

Corollary. Every real symmetric matrix is congruent to precisely one matrixof the form Ip 0 0

0 −Iq 00 0 0

.

7.2 Hermitian form

The above result was nice for real vector spaces. However, if φ is a bilinear formon a C-vector space V , then φ(iv, iv) = −φ(v,v). So there can be no goodnotion of positive definiteness for complex bilinear forms. To make them workfor complex vector spaces, we need to modify the definition slightly to obtainHermitian forms.

Definition (Sesquilinear form). Let V,W be complex vector spaces. Then asesquilinear form is a function φ : V ×W → C such that

(i) φ(λv1 + µv2,w) = λφ(v1,w) + µφ(v2,w).

(ii) φ(v, λw1 + µw2) = λφ(v,w1) + µφ(vw2).

for all v,v1,v2 ∈ V , w,w1,w2 ∈W and λ, µ ∈ C.

Note that some people have an opposite definition, where we have linearityin the first argument and conjugate linearity in the second.

These are called sesquilinear since “sesqui” means “one and a half”, and thisis linear in the second argument and “half linear” in the first.

Alternatively, to define a sesquilinear form, we can define a new complexvector space V structure on V by taking the same abelian group (i.e. the sameunderlying set and addition), but with the scalar multiplication C × V → Vdefined as

(λ,v) 7→ λv.

Then a sesquilinear form on V ×W is a bilinear form on V ×W . Alternatively,this is a linear map W → V ∗.

Definition (Representation of sesquilinear form). Let V,W be finite-dimensionalcomplex vector spaces with basis (v1, · · · ,vn) and (w1, · · · ,wm) respectively,and φ : V ×W → C be a sesquilinear form. Then the matrix representing φwith respect to these bases is

Aij = φ(vi,wj).

for 1 ≤ i ≤ n, 1 ≤ j ≤ m.

As usual, this determines the whole sesquilinear form. This follows fromthe analogous fact for the bilinear form on V ×W → C. Let v =

∑λivi and

W =∑µjwj . Then we have

φ(v,w) =∑i,j

λiµjφ(vi,wj) = λ†Aµ.

80


We now want the right definition of symmetric sesquilinear form. We cannotjust require φ(v,w) = φ(w,v), since φ is linear in the second variable andconjugate linear on the first variable. So in particular, if φ(v,w) 6= 0, we haveφ(iv,w) 6= φ(v, iw).

Definition (Hermitian sesquilinear form). A sesquilinear form on V × V isHermitian if

φ(v,w) = φ(w,v).

Note that if φ is Hermitian, then φ(v,v) = φ(v,v) ∈ R for any v ∈ V . Soit makes sense to ask if it is positive or negative. Moreover, for any complexnumber λ, we have

φ(λv, λv) = |λ|2φ(v,v).

So multiplying by a scalar does not change the sign. So it makes sense to talkabout positive (semi-)definite and negative (semi-)definite Hermitian forms.

We will prove results analogous to what we had for real symmetric bilinearforms.

Lemma. Let φ : V × V → C be a sesquilinear form on a finite-dimensionalvector space over C, and (e1, · · · , en) a basis for V . Then φ is Hermitian if andonly if the matrix A representing φ is Hermitian (i.e. A = A†).

Proof. If φ is Hermitian, then

Aij = φ(ei, ej) = φ(ej , ei) = A†ij .

If A is Hermitian, then

φ(∑

λiei,∑

µjej

)= λ†Aµ = µ†A†λ = φ

(∑µjej ,

∑λjej

).

So done.

Proposition (Change of basis). Let φ be a Hermitian form on a finite di-mensional vector space V ; (e1, · · · , en) and (v1, · · · ,vn) are bases for V suchthat

vi =

n∑k=1

Pkiek;

and A,B represent φ with respect to (e1, . . . , en) and (v1, · · · ,vn) respectively.Then

B = P †AP.

Proof. We have

Bij = φ(vi,vj)

= φ(∑

Pkiek,∑

P`je`

)=

n∑k,`=1

PkiP`jAk`

= (P †AP )ij .

81


Lemma (Polarization identity (again)). A Hermitian form φ on V is determinedby the function ψ : v 7→ φ(v,v).

The proof this time is slightly more involved.

Proof. We have the following:

ψ(x + y) = φ(x,x) + φ(x,y) + φ(y,x) + φ(y,y)

−ψ(x− y) = −φ(x,x) + φ(x,y) + φ(y,x)− φ(y,y)

iψ(x− iy) = iφ(x,x) + φ(x,y)− φ(y,x) + iφ(y,y)

−iψ(x + iy) = −iφ(x,x) + φ(x,y)− φ(y,x)− iφ(y,y)

So

φ(x,y) =1

4(ψ(x + y)− ψ(x− y) + iψ(x− iy)− iψ(x + iy)).

Theorem (Hermitian form of Sylvester’s law of inertia). Let V be a finite-dimensional complex vector space and φ a hermitian form on V . Then thereexists unique non-negative integers p and q such that φ is represented byIp 0 0

0 −Iq 00 0 0

with respect to some basis.

Proof. Same as for symmetric forms over R.

82

8 Inner product spaces IB Linear Algebra

8 Inner product spaces

Welcome to the last chapter, where we discuss inner products. Technically, aninner product is just a special case of a positive-definite symmetric bilinear orhermitian form. However, they are usually much more interesting and useful.Many familiar notions such as orthogonality only make sense when we have aninner product.

In this chapter, we will adopt the convention that F always means either Ror C, since working with other fields doesn’t make much sense here.

8.1 Definitions and basic properties

Definition (Inner product space). Let V be a vector space. An inner producton V is a positive-definite symmetric bilinear/hermitian form. We usually write(x, y) instead of φ(x, y).

A vector space equipped with an inner product is an inner product space.

We will see that if we have an inner product, then we can define lengths anddistances in a sensible way.

Example.

(i) Rn or Cn with the usual inner product

(x, y) =

n∑i=1

xiyi

forms an inner product space.

In some sense, this is the only inner product on finite-dimensional spaces,by Sylvester’s law of inertia. However, we would not like to think so, andinstead work with general inner products.

(ii) Let C([0, 1],F) be the vector space of real/complex valued continuousfunctions on [0, 1]. Then the following is an inner product:

(f, g) =

∫ 1

0

f(t)g(t) dt.

(iii) More generally, for any w : [0, 1]→ R+ continuous, we can define the innerproduct on C([0, 1],F) as

(f, g) =

∫ 1

0

w(t)f(t)g(t) dt.

If V is an inner product space, we can define a norm on V by

‖v‖ =√

(v,v).

This is just the usual notion of norm on Rn and Cn. This gives the notion oflength in inner product spaces. Note that ‖v‖ > 0 with equality if and only ifv = 0.

83


Note also that the norm ‖ · ‖ determines the inner product by the polarizationidentity.

We want to see that this indeed satisfies the definition of a norm, as you mighthave seen from Analysis II. To prove this, we need to prove the Cauchy-Schwarzinequality.

Theorem (Cauchy-Schwarz inequality). Let V be an inner product space andv,w ∈ V . Then

|(v,w)| ≤ ‖v‖‖w‖.

Proof. If w = 0, then this is trivial. Otherwise, since the norm is positivedefinite, for any λ, we get

0 ≤ (v − λw,v − λw) = (v,v)− λ(w,v)− λ(v,w) + |λ|2(w,w).

We now pick a clever value of λ. We let

λ =(w,v)

(w,w).

Then we get

0 ≤ (v,v)− |(w,v)|2

(w,w)− |(w,v)|2

(w,w)+|(w,v)|2

(w,w).

So we get|(w,v)|2 ≤ (v,v)(w,w).

So done.

With this, we can prove the triangle inequality.

Corollary (Triangle inequality). Let V be an inner product space and v,w ∈ V .Then

‖v + w‖ ≤ ‖v‖+ ‖w‖.

Proof. We compute

‖v + w‖2 = (v + w,v + w)

= (v,v) + (v,w) + (w,v) + (w,w)

≤ ‖v‖2 + 2‖v‖‖w‖+ ‖w‖2

= (‖v‖+ ‖w‖)2.

So done.

The next thing we do is to define orthogonality. This generalizes the notionof being “perpendicular”.

Definition (Orthogonal vectors). Let V be an inner product space. Thenv,w ∈ V are orthogonal if (v,w) = 0.

Definition (Orthonormal set). Let V be an inner product space. A set {vi :i ∈ I} is an orthonormal set if for any i, j ∈ I, we have

(vi,vj) = δij

84


It should be clear that an orthonormal set must be linearly independent.

Definition (Orthonormal basis). Let V be an inner product space. A subset ofV is an orthonormal basis if it is an orthonormal set and is a basis.

In an inner product space, we almost always want orthonormal basis only. Ifwe pick a basis, we should pick an orthonormal one.

However, we do not know there is always an orthonormal basis, even in thefinite-dimensional case. Also, given an orthonormal set, we would like to extendit to an orthonormal basis. This is what we will do later.

Before that, we first note that given an orthonormal basis, it is easy to findthe coordinates of any vector in this basis. Suppose V is a finite-dimensionalinner product space with an orthonormal basis v1, · · · ,vn. Given

v =

n∑i=1

λivi,

we have

(vj ,v) =

n∑i=1

λi(vj ,vi) = λj .

So v ∈ V can always be written as

n∑i=1

(vi,v)vi.

Lemma (Parseval’s identity). Let V be a finite-dimensional inner product spacewith an orthonormal basis v1, · · · ,vn, and v,w ∈ V . Then

(v,w) =

n∑i=1

(vi,v)(vi,w).

In particular,

‖v‖2 =

n∑i=1

|(vi,v)|2.

This is something we’ve seen in IB Methods, for infinite dimensional spaces.However, we will only care about finite-dimensional ones now.

Proof.

(v,w) =

n∑i=1

(vi,v)vi,

n∑j=1

(vj ,w)vj

=

n∑i,j=1

(vi,v)(vj ,w)(vi,vj)

=

n∑i,j=1

(vi,v)(vj ,w)δij

=

n∑i=1

(vi,v)(vi,w).

85


8.2 Gram-Schmidt orthogonalization

As mentioned, we want to make sure every vector space has an orthonormalbasis, and we can extend any orthonormal set to an orthonormal basis, at leastin the case of finite-dimensional vector spaces. The idea is to start with anarbitrary basis, which we know exists, and produce an orthonormal basis out ofit. The way to do this is the Gram-Schmidt process.

Theorem (Gram-Schmidt process). Let V be an inner product space ande1, e2, · · · a linearly independent set. Then we can construct an orthonormal setv1,v2, · · · with the property that

〈v1, · · · ,vk〉 = 〈e1, · · · , ek〉

for every k.

Note that we are not requiring the set to be finite. We are just requiring itto be countable.

Proof. We construct it iteratively, and prove this by induction on k. The basecase k = 0 is contentless.

Suppose we have already found v1, · · · ,vk that satisfies the properties. Wedefine

uk+1 = ek+1 −k∑i=1

(vi, ei+1)vi.

We want to prove that this is orthogonal to all the other vi’s for i ≤ k. We have

(vj ,uk+1) = (vj , ek+1)−k∑i=1

(vi, ek+1)δij = (vj , ek+1)− (vj , ek+1) = 0.

So it is orthogonal.We want to argue that uk+1 is non-zero. Note that

〈v1, · · · ,vk,uk+1〉 = 〈v1, · · · ,vk, ek+1〉

since we can recover ek+1 from v1, · · · ,vk and uk+1 by construction. We alsoknow

〈v1, · · · ,vk, ek+1〉 = 〈e1, · · · , ek, ek+1〉

by assumption. We know 〈e1, · · · , ek, ek+1〉 has dimension k+ 1 since the ei arelinearly independent. So we must have uk+1 non-zero, or else 〈v1, · · · ,vk〉 willbe a set of size k spanning a space of dimension k + 1, which is clearly nonsense.

Therefore, we can define

vk+1 =uk+1

‖uk+1‖.

Then v1, · · · ,vk+1 is orthonormal and 〈v1, · · · ,vk+1〉 = 〈e1, · · · , ek+1〉 as re-quired.

Corollary. If V is a finite-dimensional inner product space, then any orthonor-mal set can be extended to an orthonormal basis.

86


Proof. Let v1, · · · ,vk be an orthonormal set. Since this is linearly independent,we can extend it to a basis (v1, · · · ,vk,xk+1, · · · ,xn).

We now apply the Gram-Schmidt process to this basis to get an orthonormalbasis of V , say (u1, · · · ,un). Moreover, we can check that the process does notmodify our v1, · · · ,vk, i.e. ui = vi for 1 ≤ i ≤ k. So done.

Definition (Orthogonal internal direct sum). Let V be an inner product spaceand V1, V2 ≤ V . Then V is the orthogonal internal direct sum of V1 and V2 if itis a direct sum and V1 and V2 are orthogonal. More precisely, we require

(i) V = V1 + V2

(ii) V1 ∩ V2 = 0

(iii) (v1,v2) = 0 for all v1 ∈ V1 and v2 ∈ V2.

Note that condition (iii) implies (ii), but we write it for the sake of explicitness.We write V = V1 ⊥ V2.

Definition (Orthogonal complement). If W ≤ V is a subspace of an innerproduct space V , then the orthogonal complement of W in V is the subspace

W⊥ = {v ∈ V : (v,w) = 0,∀w ∈W}.

It is true that the orthogonal complement is a complement and orthogonal,i.e. V is the orthogonal direct sum of W and W⊥.

Proposition. Let V be a finite-dimensional inner product space, and W ≤ V .Then

V = W ⊥W⊥.

Proof. There are three things to prove, and we know (iii) implies (ii). Also, (iii)is obvious by definition of W⊥. So it remains to prove (i), i.e. V = W +W⊥.

Let w1, · · · ,wk be an orthonormal basis for W , and pick v ∈ V . Now let

w =

k∑i=1

(wi,v)wi.

Clearly, we have w ∈W . So we need to show v −w ∈W⊥. For each j, we cancompute

(wj ,v −w) = (wj ,v)−k∑i=1

(wi,v)(wj ,wi)

= (wj ,v)−k∑i=1

(wi,v)δij

= 0.

Hence for any λi, we have (∑λjwj ,v −w

)= 0.

So we have v −w ∈W⊥. So done.

87


Definition (Orthogonal external direct sum). Let V1, V2 be inner product spaces.The orthogonal external direct sum of V1 and V2 is the vector space V1⊕V2 withthe inner product defined by

(v1 + v2,w1 + w2) = (v1,w1) + (v2,w2),

with v1,w1 ∈ V1, v2,w2 ∈ V2.Here we write v1 + v2 ∈ V1 ⊕ V2 instead of (v1,v2) to avoid confusion.

This external direct sum is equivalent to the internal direct sum of {(v1,0) :v1 ∈ V1} and {(0,v2) : v2 ∈ V2}.

Proposition. Let V be a finite-dimensional inner product space and W ≤ V .Let (e1, · · · , ek) be an orthonormal basis of W . Let π be the orthonormalprojection of V onto W , i.e. π : V →W is a function that satisfies kerπ = W⊥,π|W = id. Then

(i) π is given by the formula

π(v) =

k∑i=1

(ei,v)ei.

(ii) For all v ∈ V,w ∈W , we have

‖v − π(v)‖ ≤ ‖v −w‖,

with equality if and only if π(v) = w. This says π(v) is the point on Wthat is closest to v.

W⊥

w

v

π(v)

Proof.

(i) Let v ∈ V , and define

w =∑i=1

(ei,v)ei.

We want to show this is π(v). We need to show v −w ∈ W⊥. We cancompute

(ej ,v −w) = (ej ,v)−k∑i=1

(ei,v)(ej , ei) = 0.

So v −w is orthogonal to every basis vector in w, i.e. v −w ∈W⊥.So

π(v) = π(w) + π(v −w) = w

as required.

88


(ii) This is just Pythagoras’ theorem. Note that if x and y are orthogonal,then

‖x + y‖2 = (x + y,x + y)

= (x,x) + (x,y) + (y,x) + (y.y)

= ‖x‖2 + ‖y‖2.

We apply this to our projection. For any w ∈W , we have

‖v −w‖2 = ‖v − π(v)‖2 + ‖π(v)−w‖2 ≥ ‖v − π(v)‖2

with equality if and only if ‖π(v)−w‖ = 0, i.e. π(v) = w.

8.3 Adjoints, orthogonal and unitary maps

Adjoints

Lemma. Let V and W be finite-dimensional inner product spaces and α : V →W is a linear map. Then there exists a unique linear map α∗ : W → V such that

(αv,w) = (v, α∗w) (∗)

for all v ∈ V , w ∈W .

Proof. There are two parts. We have to prove existence and uniqueness. We’llfirst prove it concretely using matrices, and then provide a conceptual reason ofwhat this means.

Let (v1, · · · ,vn) and (w1, · · · ,wm) be orthonormal basis for V and W .Suppose α is represented by A.

To show uniqueness, suppose α∗ : W → V satisfies (αv,w) = (v, α∗w) forall v ∈ V , w ∈W , then for all i, j, by definition, we know

(vi, α∗(wj)) = (α(vi),wj)

=

(∑k

Akiwk,wj

)=∑k

Aki(wk,wj) = Aji.

So we get

α∗(wj) =∑i

(vi, α∗(wj))vi =

∑i

Ajivi.

Hence α∗ must be represented by A†. So α∗ is unique.To show existence, all we have to do is to show A† indeed works. Now let α∗

be represented by A†. We can compute the two sides of (∗) for arbitrary v,w.We have (

α(∑

λivi

),∑

µjwj

)=∑i,j

λiµj(α(vi),wj)

=∑i,j

λiµj

(∑k

Akiwk,wj

)=∑i,j

λiAjiµj .

89


We can compute the other side and get

(∑λivi, α

∗(∑

µjwj

))=∑i,j

λiµj

(vi,∑k

A†kjvk

)=∑i,j

λiAjiµj .

So done.

What does this mean, conceptually? Note that the inner product V definesan isomorphism V → V ∗ by v 7→ ( · ,v). Similarly, we have an isomorphismW → W ∗. We can then put them in the following diagram:

V W

V ∗ W ∗

α

∼= ∼=

α∗

Then α∗ is what fills in the dashed arrow. So α∗ is in some sense the “dual” ofthe map α.

Definition (Adjoint). We call the map α∗ the adjoint of α.

We have just seen that if α is represented by A with respect to some or-thonormal bases, then α∗ is represented by A†.

Definition (Self-adjoint). Let V be an inner product space, and α ∈ End(V ).Then α is self-adjoint if α = α∗, i.e.

(α(v),w) = (v, α(w))

for all v,w.

Thus if V = Rn with the usual inner product, then A ∈ Matn(R) is self-adjoint if and only if it is symmetric, i.e. A = AT . If V = Cn with the usualinner product, then A ∈ Matn(C) is self-adjoint if and only if A is Hermitian,i.e. A = A†.

Self-adjoint endomorphisms are important, as you may have noticed from IBQuantum Mechanics. We will later see that these have real eigenvalues with anorthonormal basis of eigenvectors.

Orthogonal maps

Another important class of endomorphisms is those that preserve lengths. Wewill first do this for real vector spaces, since the real and complex versions havedifferent names.

Definition (Orthogonal endomorphism). Let V be a real inner product space.Then α ∈ End(V ) is orthogonal if

(α(v), α(w)) = (v,w)

for all v,w ∈ V .

90


By the polarization identity, α is orthogonal if and only if ‖α(v)‖ = ‖v‖ forall v ∈ V .

A real square matrix (as an endomorphism of Rn with the usual inner product)is orthogonal if and only if its columns are an orthonormal set.

There is also an alternative way of characterizing these orthogonal maps.

Lemma. Let V be a finite-dimensional space and α ∈ End(V ). Then α isorthogonal if and only if α−1 = α∗.

Proof. (⇐) Suppose α−1 = α∗. If α−1 = α∗, then

(αv, αv) = (v, α∗αv) = (v, α−1αv) = (v,v).

(⇒) If α is orthogonal and (v1, · · · ,vn) is an orthonormal basis for V , then for1 ≤ i, j ≤ n, we have

δij = (vi,vj) = (αvi, αvj) = (vi, α∗αvj).

So we know

α∗α(vj) =

n∑i=1

(vi, α∗αvj))vi = vj .

So by linearity of α∗α, we know α∗α = idV . So α∗ = α−1.

Corollary. α ∈ End(V ) is orthogonal if and only if α is represented by anorthogonal matrix, i.e. a matrix A such that ATA = AAT = I, with respect toany orthonormal basis.

Proof. Let (e1, · · · , en) be an orthonormal basis for V . Then suppose α isrepresented by A. So α∗ is represented by AT . Then A∗ = A−1 if and only ifAAT = ATA = I.

Definition (Orthogonal group). Let V be a real inner product space. Then theorthogonal group of V is

O(V ) = {α ∈ End(V ) : α is orthogonal}.

It follows from the fact that α∗ = α−1 that α is invertible, and it is clearfrom definition that O(V ) is closed under multiplication and inverses. So this isindeed a group.

Proposition. Let V be a finite-dimensional real inner product space and(e1, · · · , en) is an orthonormal basis of V . Then there is a bijection

O(V )→ {orthonormal basis for V }α 7→ (α(e1, · · · , en)).

This is analogous to our result for general vector spaces and general bases,where we replace O(V ) with GL(V ).

Proof. Same as the case for general vector spaces and general bases.

91


Unitary maps

We are going to study the complex version of orthogonal maps, known as unitarymaps. The proofs are almost always identical to the real case, and we will notwrite the proofs again.

Definition (Unitary map). Let V be a finite-dimensional complex vector space.Then α ∈ End(V ) is unitary if

(α(v), α(w)) = (v,w)

for all v,w ∈ V .

By the polarization identity, α is unitary if and only if ‖α(v)‖ = ‖v‖ for allv ∈ V .

Lemma. Let V be a finite dimensional complex inner product space and α ∈End(V ). Then α is unitary if and only if α is invertible and α∗ = α−1.

Corollary. α ∈ End(V ) is unitary if and only if α is represented by a unitarymatrix A with respect to any orthonormal basis, i.e. A−1 = A†.

Definition (Unitary group). Let V be a finite-dimensional complex inner prod-uct space. Then the unitary group of V is

U(V ) = {α ∈ End(V ) : α is unitary}.

Proposition. Let V be a finite-dimensional complex inner product space. Thenthere is a bijection

U(V )→ {orthonormal basis of V }α 7→ {α(e1), · · · , α(en)}.

8.4 Spectral theory

We are going to classify matrices in inner product spaces. Recall that for generalvector spaces, what we effectively did was to find the orbits of the conjugationaction of GL(V ) on Matn(F). If we have inner product spaces, we will want tolook at the action of O(V ) or U(V ) on Matn(F). In a more human language,instead of allowing arbitrary basis transformations, we only allow transformingbetween orthonormal basis.

We are not going to classify all endomorphisms, but just self-adjoint andorthogonal/unitary ones.

Lemma. Let V be a finite-dimensional inner product space, and α ∈ End(V )self-adjoint. Then

(i) α has a real eigenvalue, and all eigenvalues of α are real.

(ii) Eigenvectors of α with distinct eigenvalues are orthogonal.

Proof. We are going to do real and complex cases separately.

92


(i) Suppose first V is a complex inner product space. Then by the fundamentaltheorem of algebra, α has an eigenvalue, say λ. We pick v ∈ V \ {0} suchthat αv = λv. Then

λ(v,v) = (λv,v) = (αv,v) = (v, αv) = (v, λv) = λ(v,v).

Since v 6= 0, we know (v,v) 6= 0. So λ = λ.

For the real case, we pretend we are in the complex case. Let e1, · · · , en bean orthonormal basis for V . Then α is represented by a symmetric matrixA (with respect to this basis). Since real symmetric matrices are Hermitianviewed as complex matrices, this gives a self-adjoint endomorphism of Cn.By the complex case, A has real eigenvalues only. But the eigenvalues ofA are the eigenvalues of α and MA(t) = Mα(t). So done.

Alternatively, we can prove this without reducing to the complex case. Weknow every irreducible factor of Mα(t) in R[t] must have degree 1 or 2,since the roots are either real or come in complex conjugate pairs. Supposef(t) were an irreducible factor of degree 2. Then(

mα

f

)(α) 6= 0

since it has degree less than the minimal polynomial. So there is somev ∈ V such that (

Mα

f

)(α)(v) 6= 0.

So it must be that f(α)(v) = 0. Let U = 〈v, α(v)〉. Then this is anα-invariant subspace of V since f has degree 2.

Now α|U ∈ End(U) is self-adjoint. So if (e1, e2) is an orthonormal basis ofU , then α is represented by a real symmetric matrix, say(

a bb a

)But then χα|U (t) = (t− a)2 − b2, which has real roots, namely a± b. Thisis a contradiction, since Mα|U = f , but f is irreducible.

(ii) Now suppose αv = λv, αw = µw and λ 6= µ. We need to show (v,w) = 0.We know

(αv,w) = (v, αw)

by definition. This then gives

λ(v,w) = µ(v,w)

Since λ 6= µ, we must have (v,w) = 0.

Theorem. Let V be a finite-dimensional inner product space, and α ∈ End(V )self-adjoint. Then V has an orthonormal basis of eigenvectors of α.

Proof. By the previous lemma, α has a real eigenvalue, say λ. Then we can findan eigenvector v ∈ V \ {0} such that αv = λv.

93


Let U = 〈v〉⊥. Then we can write

V = 〈v〉 ⊥ U.

We now want to prove α sends U into U . Suppose u ∈ U . Then

(v, α(u)) = (αv,u) = λ(v,u) = 0.

So α(u) ∈ 〈v〉⊥ = U . So α|U ∈ End(U) and is self-adjoint.By induction on dimV , U has an orthonormal basis (v2, · · · ,vn) of α eigen-

vectors. Now letv1 =

v

‖v‖.

Then (v1,v2, · · · ,vn) is an orthonormal basis of eigenvectors for α.

Corollary. Let V be a finite-dimensional vector space and α self-adjoint. ThenV is the orthogonal (internal) direct sum of its α-eigenspaces.

Corollary. Let A ∈ Matn(R) be symmetric. Then there exists an orthogonalmatrix P such that PTAP = P−1AP is diagonal.

Proof. Let ( · , · ) be the standard inner product on Rn. Then A is self-adjointas an endomorphism of Rn. So Rn has an orthonormal basis of eigenvectors forA, say (v1, · · · ,vn). Taking P = (v1 v2 · · · vn) gives the result.

Corollary. Let V be a finite-dimensional real inner product space and ψ :V × V → R a symmetric bilinear form. Then there exists an orthonormal basis(v1, · · · ,vn) for V with respect to which ψ is represented by a diagonal matrix.

Proof. Let (u1, · · · ,un) be any orthonormal basis for V . Then ψ is representedby a symmetric matrix A. Then there exists an orthogonal matrix P such thatPTAP is diagonal. Now let vi =

∑Pkiuk. Then (v1, · · · ,vn) is an orthonormal

basis since

(vi,vj) =(∑

Pkiuk,∑

P`ju`

)=∑

PTik(uk,u`)P`j

= [PTP ]ij

= δij .

Also, ψ is represented by PTAP with respect to (v1, · · · ,vn).

Note that the diagonal values of PTAP are just the eigenvalues of A. So thesignature of ψ is just the number of positive eigenvalues of A minus the numberof negative eigenvalues of A.

Corollary. Let V be a finite-dimensional real vector space and φ, ψ symmetricbilinear forms on V such that φ is positive-definite. Then we can find a basis(v1, · · · ,vn) for V such that both φ and ψ are represented by diagonal matriceswith respect to this basis.

Proof. We use φ to define an inner product. Choose an orthonormal basis for V(equipped with φ) (v1, · · · ,vn) with respect to which ψ is diagonal. Then φ isrepresented by I with respect to this basis, since ψ(vi,vj) = δij . So done.

94


Corollary. If A,B ∈ Matn(R) are symmetric and A is positive definitive (i.e.vTAv > 0 for all v ∈ Rn \ {0}). Then there exists an invertible matrix Q suchthat QTAQ and QTBQ are both diagonal.

We can deduce similar results for complex finite-dimensional vector spaces,with the same proofs. In particular,

Proposition.

(i) If A ∈ Matn(C) is Hermitian, then there exists a unitary matrix U ∈Matn(C) such that

U−1AU = U†AU

is diagonal.

(ii) If ψ is a Hermitian form on a finite-dimensional complex inner productspace V , then there is an orthonormal basis for V diagonalizing ψ.

(iii) If φ, ψ are Hermitian forms on a finite-dimensional complex vector spaceand φ is positive definite, then there exists a basis for which φ and ψ arediagonalized.

(iv) Let A,B ∈ Matn(C) be Hermitian, and A positive definitive (i.e. v†Av > 0for v ∈ V \ {0}). Then there exists some invertible Q such that Q†AQ andQ†BQ are diagonal.

That’s all for self-adjoint matrices. How about unitary matrices?

Theorem. Let V be a finite-dimensional complex vector space and α ∈ U(V )be unitary. Then V has an orthonormal basis of α eigenvectors.

Proof. By the fundamental theorem of algebra, there exists v ∈ V \ {0} andλ ∈ C such that αv = λv. Now consider W = 〈v〉⊥. Then

V = W ⊥ 〈v〉.

We want to show α restricts to a (unitary) endomorphism of W . Let w ∈ W .We need to show α(w) is orthogonal to v. We have

(αw,v) = (w, α−1v) = (w, λ−1v) = 0.

So α(w) ∈ W and α|W ∈ End(W ). Also, α|W is unitary since α is. So byinduction on dimV , W has an orthonormal basis of α eigenvectors. If we addv/‖v‖ to this basis, we get an orthonormal basis of V itself comprised of αeigenvectors.

This theorem and the analogous one for self-adjoint endomorphisms have acommon generalization, at least for complex inner product spaces. The key factthat leads to the existence of an orthonormal basis of eigenvectors is that α and α∗

commute. This is clearly a necessary condition, since if α is diagonalizable, thenα∗ is diagonal in the same basis (since it is just the transpose (and conjugate)),and hence they commute. It turns out this is also a sufficient condition, as youwill show in example sheet 4.

95


However, we cannot generalize this in the real orthogonal case. For example,(cos θ sin θ− sin θ cos θ

)∈ O(R2)

cannot be diagonalized (if θ 6∈ πZ). However, in example sheet 4, you will find aclassification of O(V ), and you will see that the above counterexample is theworst that can happen in some sense.

96

Documents

Part IB - Linear Algebra - Maths Lecture Notes · 0 Introduction IB Linear Algebra 0 Introduction In IA Vectors and Matrices, we have learnt about vectors (and matrices) in a rather