MATH2201 Lecture Notes - UCLucahaya/2201notesNew.pdf · 2009-12-08 · • Quadratic and bilinear forms; • Euclidean and Hermitian spaces. Prerequisites (you should know all this

MATH2201 Lecture Notes

Andrei Yafaev (based on notes by Richard Hill, John Talbot and Minhyong Kim)

December 8, 2009

If you find a mistake then please email me ([email protected]).

Contents

1 Number Theory 21.1 Euclid’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Factorization into primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Polynomial Rings 162.1 Irreducible elements in Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Euclid’s algorithm in k[X] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Jordan Canonical Form 233.1 Revision of linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Matrix representation of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Minimal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4 Generalized Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.5 Jordan Bases in the one eigenvalue case . . . . . . . . . . . . . . . . . . . . . . . 403.6 Jordan Canonical (or Normal) Form in the one eigenvalue case . . . . . . . . . . 443.7 Jordan canonical form in general. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4 Bilinear and Quadratic Forms 524.1 Matrix Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2 Symmetric bilinear forms and quadratic forms . . . . . . . . . . . . . . . . . . . . 564.3 Orthogonality and diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . 584.4 Examples of Diagonalising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.5 Canonical forms over C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.6 Canonical forms over R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5 Inner Product Spaces 665.1 Geometry of Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 665.2 Gram–Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.3 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.4 Isometries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.5 Orthogonal Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

1

Lecture 1

Sketch of the course:

• Number theory (prime numbers, factorization, congruences);

• Polynomials (factorization);

• Jordan canonical form (generalization of diagonalizing);

• Quadratic and bilinear forms;

• Euclidean and Hermitian spaces.

Prerequisites (you should know all this from first year algebra courses)

• Fields and vector spaces (bases, linear independence, span, subspaces).

• Linear maps (rank, nullity, kernel and image, matrix representation).

• Matrix algebra (row reduction, determinants, eigenvalues and eigenvectors, diagonaliza-tion).

1 Number Theory

For this chapter and the next (Polynomials), I recommend the book ‘A concrete introduction tohigher algebra’, by Lindsay Childs.

Number theory is the theory of Z = {0,±1,±2, . . .}. Recall also the notation N = {1, 2, 3, 4, . . .}.

1.1 Euclid’s algorithm

• We say that a ∈ Z divides b ∈ Z iff there exists c ∈ Z such that b = ac. We write a|b.

• A common divisor of a and b is d that divides both a and b.

• The greatest common divisor or highest common factor of a and b is a common factord of a and b such that any other common factor is smaller than d. This is written d =gcd(a, b) = hcf(a, b) = (a, b).

Note that every a ∈ Z is a factor of 0, since 0 = 0× a. Therefore every number is a commonfactor of 0 and 0, so there is no such thing as hcf(0, 0). However if a, b ∈ Z are not both zero,then they have a highest common factor. In particular if a > 0 then hcf(a, 0) = a. Euclid’salgorithm is a method for calculating highest common factors.

Note that if a divides b, then also −a divides b or −a divides −b or a divides −b. To removethis ambiguity, we will usually work with positive integers.

The following obvious remark : any divisor of a ≥ 0 is smaller or equal to a, is often used inthe proofs.

2

1.1.1 Euclidean divison Let a ≥ b ≥ 0 be two integers. There exists a UNIQUE pair ofintegers (q, r) satisfying

a = qb + r

and 0 ≤ r < b.

Proof. Two things need to be proved : the existence of (q, r) and its uniqueness.Let us prove the existence.Consider the set

S = {x, x integer ≥ 0 : a − xb ≥ 0}The set S is not empty : 1 belongs to S. The set S is bounded : any element x of S satisfies

x ≤ ab . THerefore, S being a bounded set of positive integers, S is finite and hence contains a

maximal element. Let q be this maximal element and let r := a − qb.We need to prove that 0 ≤ r < b. By definition r ≥ 0. To prove that r < b, let us argue by

contradiction. Suppose that r ≥ b. Then, replacing r by a − qb, we get

a − (q + 1)b ≥ 0

This means that q + 1 ∈ S but q + 1 > q. This contradicts the maximality of q. Therefore r < band the existence is proved.

Let us now prove the uniqueness.Again we argue by contradiction. Suppose that there exists a pair (q′, r′) satisfying a = q′b+r′

with 0 ≤ r′ < b and such that q′ 6= q. By subtracting the inequality 0 ≤ r < b to this inequality,we get −b < r′ − r < b i.e.

|r − r′| < b

Now by subtracting a = q′b + r′ to a = qb + r and taking the modulus, we get

|r − r′| = |q − q′|b

By assumption q 6= q′, hence |q′ − q| ≥ 1 and we get the inequality

|r − r′| ≥ b

The two inequalities satisfied by r−r′ contradict each other, hence q = q′. And now the equality|r − r′| = |q − q′|b gives r = r′. The uniqueness is proved. 2

We now prove the following property which lies at the heart of Euclid’s algorithm.

1.1.2 Let a ≥ b ≥ 0 be two integers and (q, r) such that

a = bq + r, 0 ≤ r < b

Thenhcf(a, b) = hcf(b, r).

Proof. Let A := hcf(a, b) and B := hcf(b, r). As r = a− bq and A divides a and b, A dividesr. Therefore A is a common factor of b and r. As B is the highest common factor of b and r,A ≤ B.

3

In exactly the same way, one proves (left to the reader), that B ≤ A and therefore A = B.2

This proposition leads to the following algorithm (Euclids algorithm).Let a ≥ b ≥ 0 be two integers. We wish to calculate hcf(a, b).The method is this: Set r1 = b and r2 = a. If ri−1 6= 0 then define for i ≥ 3:

ri−2 = qiri−1 + ri, 0 ≤ ri < ri−1.

The fact that ri < ri−1 (strict inequality !!!) implies that there will be an integer n such thatrn = 0.

Then by the theorem we have

hcf(a, b) = hcf(r2, r3) = . . . = hcf(rn−1, 0) = rn−1.

1.1.3 Remark When performing Euclid’s algorithm, be very careful not to divide qi by ri.This is a mistake very easy to make.

1.1.4 Example Take a = 27 and b = 7. We have

27 = 3 × 7 + 6

7 = 1 × 6 + 1

6 = 6 × 1 + 0.

Thereforehcf(27, 7) = hcf(7, 6) = hcf(6, 1) = hcf(1, 0) = 1.

Euclid’s algorithm is very easy to implement on a computer. Suppose that you have somestandard computer language (Basic, Pascal, Fortran,...) and that it has an instruction r :=a mod b which returns the remainder of the Euclidean division of a by b.

The implementation of the algorithm would be something like this:

Procedure hcf(a, b)If a < b then

Swap(a, b)While b 6= 0Begin

r := a mod ba := bb := r

End

Return a

End

The following lemma is very important for what will follow. It is essentially ‘the Euclid’salgorithm’ run backwards.

4

1.1.5 Bezout’s Lemma As usual, let a ≥ b ≥ 0 be integers. Let d = hcf(a, b). Then thereare integers h, k ∈ Z such that

d = ha + kb.

Note that in this lemma, the integers h and k are not positive, in fact exactly one of them isnegative or zero. Prove it !

Proof. Consider the sequence given by Euclid’s algorithm:

a = r1, b = r2, r3, r4, . . . , rn = d.

In fact we’ll show that each of the integers ri may be expressed in the form ha+kb. We prove thisby induction on i: it is certainly the case for i = 1, 2 since r1 = 1×a+0×b and r2 = 0×a+1×b.For the inductive step, assume it is the case for ri−1 and ri−2, i.e.

ri−1 = ha + bk, ri−2 = h′a + k′b.

We haveri−2 = qri−1 + ri.

Thereforeri = h′a + k′b − q(ha + kb) = (h′ − qh)a + (k′ − qk)b.

2

1.1.6 Example Again we take a = 27 and b = 7.

27 = 3 × 7 + 6

7 = 1 × 6 + 1

6 = 6 × 1 + 0.

Therefore

1 = 7 − 1 × 6

= 7 − 1 × (27 − 3 × 7)

= 4 × 7 − 1 × 27.

So we take h = −1 and k = 4.

We now apply the Euclid’s algorithm and Bezout’s lemma to the solution of linear diophan-tine equations.

Let a, b, c be three positive integers. A linear diophantine equation (in two variables) is theequation

ax + by = c

A solution is a pair (x, y) of integers (not necessarily positive) that satisfy this relation.Such an equation may or may not have solutions. For example, consider 2x + 4y = 5. Quite

clearly, if there was a solution, then 2 will divide the right hand side, which is 5. This is not thecase, therefore, this equation does not have a solution.

5

On the other hand, the equation 2x + 4y = 6 has many solutions : (1, 1), (5,−1), .... Thissuggests that the existence of solutions depends on whether or not c is divisible by the hcf(a, b)and that if such is the case, there are many solutions to the equation. This indeed is the case,as shown in the following theorem.

Before we prove the theorem, let us prove a couple of preliminary lemmas.

1.1.7 Lemma Let a and b be two positive integers. Let d := hcf(a, b). Then hcf(a/d, b/d) = 1.

Proof. Use Bezout’s lemma : there exist h, k such that ah + kb = d. Divide by d to getadh + b

dk = 1. Any common divisor of ad and b

d divides one, hence it’s one. 2

Two integers a and b such that hcf(a, b) = 1 are called coprime or relatively prime.

1.1.8 Lemma Suppose a divides bc and hcf(a, b) = 1, then a divides c.

Proof. Write bc = ga.As usual 1 = ha + kb. Multiply by c to get

c = h(ac) + k(bc) = h(ac) + k(ga) = a(hc + kg)

Hence a divides c. 2

1.1.9 Solution to linear diophantine equations Let a, b, c be three positive integers, letd := hcf(a, b) and consider the equation

ax + by = c

1. This equation has a solution if and only if d divides c

2. Suppose that d|c and let (x0, y0) be a solution. The set of all solutions is (x0 +n bd , y0−na

d)where n runs through the set of all integers (positive and negative).

Proof. For the ‘if’ part : Suppose there is a solution (x, y). Then d divides ax + by. But, asax + by = c, d divides c.

For the ‘only if’ part : Suppose that d divides c and write c = dm for some integer m. ByBezout’s lemma there exist integers h, k such that

d = ha + kb

Multiply this relation by m and get

c = dm = (mh)a + (mk)b

This shows that (x0 = mk, y0 = mh) is a solution to the equation. That finishes the ‘only if’part.

6

Let us now suppose that the equation has a solution (in particular d divides c) (x0, y0). Let(x, y) be any other solution. Subtract ax + by = c from ax0 + by0 = c to get

a(x0 − x) + b(y0 − y) = 0

Divide by d to geta

d(x0 − x) = − b

d(y0 − y)

This relation shows that ad divides b

d(y0 −y) but the integers ad and b

d are coprime (lemma 1.1.7)hence a

d divides y0 − y (lemma 1.1.8).Therefore, there exists an integer n such that

y = y0 − na

d

Now plug this into the equality ad(x0 − x) = − b

d(y0 − y) to get that

x = x0 + nb

d

2

The proof of this theorem gives a procedure for finding solutions, it is as follows:

1. Calculate d = hcf(a, b). If d does not divide c, then there are no solutions and you’re done.If d divides c, c = md then there are solutions.

2. Run Euclid’s algorithm backwards to get h, k such that d = ha+kb. Then (x0 = mh, y0 =mk) is a solution.

3. All solutions are

(x0 + nb

d, y0 − n

a

d)

where n runs through all integers.

1.1.10 Example Take a = 27, b = 7, c = 5. We have found that hcf(a, b) = 1 (in particularthere will be solutions with any c) and that 1 = 4 × 7 − 1 × 27 hence h = −1 and k = 4.

Our procedure gives a particular solution : (−5, 20) and the general one (−5+7n, 20−27n).

Take a = 666, b = 153, c = 43. We have found that hcf(a, b) = 9, it does not divide 43, henceno solutions.

Take c = 45 = 5 × 9. There will be solutions. We had 9 = 3 × 666 − 13 × 153. A particularsolution is (15,−65) and the general one is (15 + 17n,−65 − 74n).

(in particular there will be solutions with any c) and that 1 = 4 × 7 − 1 × 27 hence h = −1and k = 4.

Our procedure gives a particular solution : (−5, 20) and the general one is (−5+7n, 20−27n).

7

Lecture 2

1.2 Factorization into primes

1.2.11 Definition An integer p ∈ Z, not equal to ±1 is prime iff the only divisors of p are ±1and ±p.

As usual, we will work with positive integers, in which case the defintion becomes p is primeif and only if its only divisors are 1 and p.

1.2.12 Euclid’s Theorem If p is a prime and p|ab then p|a or p|b.

Proof. Suppose that p 6 |a then hcf(a, p) = 1. By lemma 1.1.8, p divides b. 2

1.2.13 Example Prove that if a and b divide n and a and bare coprime, then ab divide n.

1.2.14 Corollary If p|a1 · · · an then there exists 1 ≤ i ≤ n such that p|ai.

Proof. True for i = 1, 2 so suppose true for n − 1 and suppose that p|a1 · · · an. Let A =a1 · · · an−1 and B = an then p|AB =⇒ p|A or p|B. In the latter case we are done and in theformer case the inductive hypothesis implies that p|ai for some 1 ≤ i ≤ n − 1. 2

1.2.15 Unique Factorisation Theorem If a ≥ 2 is an integer then there are primes pi > 0such that

a = p1p2 · · · ps.

Moreover this factorisation is unique in the sense that if

a = q1q2 · · · qt

for primes qj > 0 thens = t

and{p1, . . . , ps} = {q1, . . . , qs}

(equality of sets) In other words, the pis and the qis are the same prime numbers up to reodering.

Proof. For existence suppose the result does not hold. Then there an integer which can notbe written as a product of primes. Among all those integers, there is a smallest one (the integersare under consideration are greater than two !). Let a be this smallest integer which is not aproduct of primes. Certainly a is not prime so a = bc with 1 < b, c < a. As b abd c are striclysmaller than a, they are products of primes. Write

b = p1 · · · pk

8

andc = pk+1 · · · pl

hencea = p1 · · · pl,

a contradiction hence the factorisation exists.For uniqueness suppose that we have an example where there are two distinct factorisations.

Again we can choose a smallest integer with two diffrent factorisations

a = p1 · · · ps = q1 · · · qt.

Then p1|q1 · · · qt so by Corollary 1.4 we have p1|qj for some 1 ≤ j ≤ t then since p1 and qj areprimes we have p1 = qj . But then dividing a by p1 we have a smaller integer with two distinctfactorisations, a contradiction. 2

1.2.16 Remark Of course, the primes in the factorisation a = p1 · · · ps need not be distinct.

For example : 4 = 22, here p1 = p2 = 2. Similarly 8 = 23, p1 = p2 = p3 = 3. Also 12 =3 × 22, p1 = 3, p2 = p3 = 2

In fact we have that for any integer a ≥ 2, there exist s distinct primes p1, . . . , ps and tintegers ei ≥ 1 such that

a = pe1

1 · · · pet

s

1.2.17 Example

1000 = 23 × 53

144 = 24 × 32

975 = 23 × 53

Factoring a given integer is hard as there is no procedure like Euclidean algorithm. Oneunsually does it by trial and error. The following trivial lemma helps.

1.2.18 Lemma Square root test Let n be a composite (not prime) integer. Then n has a

prime divisor <√

n.

Proof.Write n = ab with a, b > 1. Suppose that a >

√n, then n = ab >

√nb hence b <

√n and

therefore any prime divisor of b is <√

n. 2

For example, suppose you were to factor 3372. Clearly it’s divisible by 2 : 3372 = 2 × 1686.Now, 1686 is again divisible by two : 1686 = 2 × 843 and 3372 = 22 × 843. Now we notice that3 divides 843 = 3× 281. Now the primes <

√281 are 2, 3, 5, 7, 11, 13 and 281 is not divisible by

any of these. Hence 281 is prime and we get a factorisation:

3372 = 22 · 3 · 281

How many primes there are ? Here is the answer.

9

1.2.19 Euclid’s Theorem There exist infinitely many primes.

Proof. Suppose not, let p1, . . . pn be all the primes there are. Consider Q = p1p2 · · · pn + 1.Since Q has a prime factorisation then there is a prime P that divides Q, but this cannot be inour list of primes since any prime pi leaves a remainder of 1 when we divide Q by pi. 2

The idea we used here is this : suppose the set of all primes is finite, we construct an integerthat is not divisible by any of the primes from this set. This is a contradiction.

Can we use the same idea to prove that there are infinitely many primes of a certain form ?Quite clearly Euclid’s theorem shows that there are infinitely many odd primes since the

only even prime is 2. Put in another way, it shows that there are infinitely many promes of theform 2k + 1.

Let’s look at primes of the form 4k + 3. Are there infinitely many of them ?Suppose there are finitely many and list them p1, . . . , pr. Note that p1 = 3. Consider

Q = 4p2 · · · pr + 3 (note that we started at p2 !!!).The integer Q is clearly not divisible by 3 (becaise pi 6= 3 for all i > 1).None of the pi, i > 2 divides Q. Indeed suppose some pi, i > 2 divides Q. Then

4p2 · · · pr + 3 = pik

which shows that pi divides 3 which is not the case.To get a contradiction, we need to prove that Q is divisible by a prime of the form 4k + 3,

for it will have necessarily be one of the pis.Thisis precisely what we are proving.

1.2.20 Lemma Every integer of the form 4k + 3 has a prime factor of the form 4k + 3.

Proof. Let N = 4k + 3. If N is prime, then take for the factor N itself. Let us proceed by

induction : suppose that the result is true for all integers strictly less than N . We can and doassume that N is composite. As N is odd, it factors as a product of two odd numbers. Any oddnumber is of the form 4k + 1 or 4k + 3. We have the following possibilities.

1. N = (4a + 1)(4b + 1) = 4(4ab + a + b) + 1.

2. N = (4a + 1)(4b + 3) = 4(4ab + 3a + b) + 3.

3. N = (4a + 3)(4b + 3) = 4(4ab + 3a + 3b + 2) + 1.

Notice that only the case two occur - cases one and three are not of the form 4k +3 and casetwo has a factor of the form 4k + 3. One concludes by induction. 2

Note that the proof does not work if you try to prove that there are infinitely many primesof the form 4k + 1. This is where it fails. The first prime of this form is 5 = 4× 1 + 1 but whenyou try to construct your Q, you get Q = 4 × 5 + 1 = 21 = 3 × 7. The divisors of Q are of theform 4k + 3, not 4k + 1....

In other words, the method fails because the divisors of Q can have no divisor of the form4k + 1.

It is however true that there are infinitely many primes of the form 4k + 1, in fact, there isthe following spectacular theorem :

10

1.2.21 Dirichlet’s theorem on primes in arithmetic progressions Let a and d be twocoprime integers. There exist infinitely many primes of the form a + kd.

The proof of this theorem is well beyond the scope of this course.To conclude, here are some questions to think about :

1. Is any positive integer a sum of two prime numbers ?

For example : 8 = 3 + 5, 80 = 37 + 43, 800 = 379 + 421,...

2. Are there infinitely many primes p such that p + 2 is prime ?

Ex. (3, 5), (17, 19), (881, 883), (1997, 1999),...

11

Lecture 3

1.3 Congruences

We define a ≡ b mod m iff m|(a − b).We say a is congruent to b modulo m.The congruency class of a is the set of numbers congruent to a modulo m. This is written

[a]. Every integer is congruent to one of the numbers 0, 1, . . . , m−1, so the set of all congruencyclasses is {[0], . . . , [m − 1]}. This is written Z/mZ.

Ex. Take m = 3, them [8] = [5] = [2] = [−1] = [−4] = ...For an integer k, 4k + 1 ≡ 1 mod 4, 4k + 3 ≡ 3 mod 4 and 4k ≡ 0 mod 4.An integer is even if and only if it is zero mod 2. An integer is odd if and only if it is one

mod 2.Let a ≥ b be two positive integers and let (q, r) be such that a = bq + r. Then a ≡ r mod b.

It may help to think of congruences as the remainders of the Euclidean division.

1.3.22 Proposition If a ≡ b mod m then b ≡ a mod m.If a ≡ a′ mod m and b ≡ b′ mod m then a + b ≡ a′ + b′ mod m and ab ≡ a′b′ mod m.

Proof. easy 2

We can rewrite this proposition by simply saying:

[a] + [b] = [a + b] and [a][b] = [ab]

The proposition says that these operations + and × are well defined operations on Z/mZ.Ex. Write down addition and multiplication tables in Z/3Z, Z/4Z and Z/6Z.By an inverse of a modulo m we mean a number c such that ac ≡ 1 mod m. This is written

c ≡ a−1 mod m.An element may or may not have an inverse mod m.Take m = 6. [5] has an inverse in Z/6Z :

[5] × [5] = [25] = [1]

While [3] does not have an inverse : in Z/6Z we have [3][2] = [6] = [0]. So if [3] had an inverse,say [a], we would have [3][a] = [1], and by miltiplying by [2] we would get [0] = [2] which is notthe case.

This suggests that the existence of the inverse of a mod m has something to do with commonfactors of a and m. This is indeed the case as shown in the following lemma.

1.3.23 Lemma An integer a has an inverse modulo m if and only if a and m are coprime(hcf(a, m) = 1).

Proof. The integer a has an inverse mod m if and only if the equation

ax + my = 1

has a solution. This equation has a solution if and only if hcf(a, m) divides 1 which is onlypossible if hcf(a, m) = 1. 2

12

As usual, the proof of the lemma gives a procedure for finding inverses. Use Euclideanalgorithm to calculate hcf(a, m). If it’s not one, there is inverse. If it is one run the algorithmbackwards to find h and k such that ah + mk = 1 and

[a]−1 = [h]

1.3.24 Example Find 43−1 mod 7.Euclid’s algorithm :

43 = 6 × 7 + 1

They are coprime and 1 = −6 × 7 + 1 × 43. Hence 43−1 = 1 mod 7Same with 32−1 mod 7.

32 = 4 ∗ 7 + 4

7 = 1 ∗ 4 + 3

4 = 1 ∗ 3 + 1

And

1 = (1 ∗ 4) + (−1 ∗ 3) = (−1 ∗ 7) + (2 ∗ 4) = (2 ∗ 32) + (−9 ∗ 7) = (−9 ∗ 7) + (2 ∗ 32)

Hence 32−1 = 2 mod 7.Same with 49−1 mod 15.

49 = 3 ∗ 15 + 4

15 = 3 ∗ 4 + 3

4 = 1 ∗ 3 + 1

And get

1 = (1 ∗ 4) + (−1 ∗ 3) = (−1 ∗ 15) + (4 ∗ 4) = (4 ∗ 49) + (−13 ∗ 15) = (−13 ∗ 15) + (4 ∗ 49)

Hence 49−1 mod 7 = 4.

More generally, suppose we want to solve an equation

ax = c mod b

This is equivalent to the existence of an integer y such that

ax + by = c

And we know how to solve this !This equation has a solution if and only if hcf(a, b) divides c and we know how to find all

the solutions.

13

1.3.25 Example Give examples here.

1.3.26 Corollary Z/p = Fp is a field. (Recall that Fp = {0, 1, . . . , p − 1} with addition andmultiplication defined modulo p.)

Proof. This was proved last year. The only axiom, which is not trivial to check, is the onewhich states that every every non-zero element has an inverse. 2

1.3.27 Corollary F×p = {1, 2, . . . , p − 1} is a group with the operation of multiplication.

Proof. A group is a set with a binary operation (in this case multiplication), such that (i)the operation is associative; (ii) there is an identity element; (iii) every element has an inverse.Clearly [1] is the identity element, and the Lemma says that every element has an inverse. 2

1.3.28 Fermat’s Little Theorem If p is prime and a ∈ Z then

ap ≡ a mod p.

Hence if p 6 |a then ap−1 ≡ 1 mod p.

Proof. If p|a then a ≡ 0 mod p and ap ≡ 0 mod p so suppose p 6 |a, and so a ∈ F×p . Recall

that by a corollary to Lagrange’s Theorem, the order of an element of a group divides the orderof the group. Let n be the order of a, so an ≡ 1. But by the corollary to Lagrange’s theorem,n|p − 1. 2

Example What is 3322 mod 23? 23 is prime so 3322 ≡ 1 mod 23.How about 3101 mod 103? Well 103 is prime so 3102 ≡ 1 mod 103 So 3101 ≡ 3−1 mod 103.

To find 3−1 mod 103 use Euclid’s algorithm.

103 = 3 × 34 + 1.

So 3−1 ≡ 34 mod 103. Hence 3101 ≡ 34 mod 103.Another example : 326 mod 7. We know that 327 mod 32 mod 7. It follows that 327 =

32−1 mod 7. It suffices to calculate 32−1 mod 7. We get

32 = 4 ∗ 7 + 4

7 = 1 ∗ 4 + 3

4 = 1 ∗ 3 + 1

and

1 = (1 ∗ 4) + (−1 ∗ 3) = (−1 ∗ 7) + (2 ∗ 4) = (2 ∗ 32) + (−9 ∗ 7) = (−9 ∗ 7) + (2 ∗ 32)

Hence 32−1 ≡ 2 mod 7 and 326 ≡ 2 mod 7.Yet another example : 4535 mod 13.

14

We have 13 × 2 = 26 and 4513 ≡ 45 mod 13. Hence 4535 = 452 × 459 = 4511 mod 13. As4512 ∼= 1 mod 13, we have 4511 = 45−1 mod 13

We need to calculate 45−1 mod 13.Eucledian algorithm : We get

45 = 3 ∗ 13 + 6

13 = 2 ∗ 6 + 1

and1 = (1 ∗ 13) + (−2 ∗ 6) = (−2 ∗ 45) + (7 ∗ 13) = (7 ∗ 13) + (−2 ∗ 45)

Hence 4535 ≡ −2 mod 13.Let’s do 4342 mod 13. We have 4339 ≡ 433 mod 13. Hence 4342 ≡ 436 mod 13.Now 43 mod

13 ≡ 4 mod 13. Hence 4342 ≡ 46 mod 13. Now 42 = 16 = 3 mod 13 Hence 46 = 423= 33 =

27 mod 13 = 1 mod 13 Hence 4342 mod 1 mod 13.And now we get to yet another application of the Bezout’s lemma.

1.3.29 Chinese Remainder Theorem Suppose m and n are coprime; let x and y be twointegers. Then there is a unique [z] ∈ Z/nm such that z ≡ x mod m and z ≡ y mod n.

Proof. (existence) By Bezout’s Lemma, we can find h, k ∈ Z such that

hn + km = 1.

Given x, y we choose z byz = hnx + kmy.

Clearly z ≡ hnx ≡ x mod m (hn ≡ 1 mod m) and z ≡ y mod n.(uniqueness) For uniqueness, suppose z′ is another solution. Then z ≡ z′ mod n and z ≡ z′

mod m. Hence there exist integers r, s such that

z − z′ = nr = ms.

Since hn + km = 1 we have

z − z′ = (z − z′)hn + (z − z′)km = mshn + nrkm = nm(sh + rk).

Hence z ≡ z′(nm). 2

As usual the proof gives you a procedure to find z. To find z, find h and k as in the Bezoutslemma (run Euclidean algorithm backwards). Then z is hnx + kmy.

1.3.30 Example Find the unique solution of x ≡ 3 mod 7 and x ≡ 9 mod 11 satisfying0 ≤ x ≤ 76.

Solution find h, k such that 7h + 11k = 1 using Euclid:11=7+47=4+34=3+1So 1=4-3=4-(7-4)=2.4-7=2.(11-7)-7=2.11-3.7.Hence let h = −3 and k = 2 so take x = −3.7.9 + 2.11.3 = −189 + 66 = −123 ≡ 31 mod 77.

15

Lecture 4

2 Polynomial Rings

2.1 Irreducible elements in Rings

2.1.31 Definition A ring is (R, +, ·), R is a set and +, · are binary operations. (R, +) is anAbelian group and (R, ·) is a monoid and multiplication is distributive over addition. In detail:

• ∀a, b, c ∈ R (a + b) + c = a + (b + c),

• ∃0 ∈ R ∀a ∈ R a + 0 = a = 0 + a,

• ∀a ∈ R∃ − a ∈ R a + (−a) = 0 = (−a) + a,

• ∀a, b ∈ R a + b = b + a,

• ∀a, b, c ∈ R (ab)c = a(bc),

• ∃1 ∈ R ∀a ∈ R 1 · a = a = a · 1,

• ∀a, b, c ∈ R a(b + c) = ab + ac,

• ∀a, b, c ∈ R (b + c)a = ba + ca.

2.1.32 Example There are lots of examples of rings:

• Z is a ring;

• Z/n is a ring;

• Q and R and C are rings;

• More generally every field is a ring. Conversely if R is a ring in which 0 6= 1; xy is alwaysthe same as yx, and every non-zero element has a multiplicative inverse then R is a field.

• The set Mn(R) of real n × n matrices is a ring;

• More generally, given any ring R, the set Mn(R) is a ring.

• The set R[x] is all polynomials in x with coefficients in R is a ring. Note that a polynomialis an expression of the form

a0 + a1x + . . . + anxn, a0, . . . an ∈ R.

• More generally, for any ring R, the set R[x] of polynomials with coefficients in R is aring. Addition and multiplication are defined as one expects: if f(X) =

∑

anXn andg(X) =

∑

bnXn then we define

(f + g)(X) =∑

(an + bn)Xn,

16

(fg)(X) =∑

cnXn,

where

cn =n∑

i=0

aibn−i.

We’ll actually study polynomial rings k[X] over a field k. If f =∑

anXn is a non-zeropolynomial in k[X], then the degree of f is the largest n such that an 6= 0. We also definedeg(0) = −∞. The point of this definition is so that we always have:

deg(f × g) = deg(f) + deg(g).

(we are using the convention that −∞ + n = infty). If f =∑

anXn 6= 0 has degree d, the thecoefficient ad is called the leading coefficient of f . If f has leading coefficient 1 then f is calledmonic.

2.1.33 Example f(X) = X3 + X + 2 has degree 3, and is monic.

2.1.34 Definition Let R be any ring. There are three kinds of element of R:

• An element a ∈ R is a unit if there exists a−1 ∈ R such that aa−1 = a−1a = 1. The set ofunits of R is denoted by R×.

• An element a ∈ R is reducible if it factorizes as a = bc with neither b nor c a unit.

• If a is neither a unit nor reducible then a is called irreducible.

2.1.35 Example If R = Z then Z× = {−1, 1}. The irreducible elements are ±p with p prime.

2.1.36 Example If k is a field then k× = k \ {0}. The element 0 is reducible since 0 = 0× 0..

2.1.37 Proposition The units in k[X] are precisely the polynomials of degree 0, i.e. thenon-zero constant polynomials.

Proof. Clearly if a is a non-zero constant polynomial then it is a unit in k[X]. Conversely,suppose ab = 1. Then we have deg(a) + deg(b) = 0. Hence deg(a) = deg(b) = 0. 2

The question of which polynomials are irreducible is much harder, and depends on the field.For example X2 − 2 factorizes in R[X] as (X +

√2)(X −

√2), but is irreducible in Q[X] (since√

2 is irrational). The only general statement about irreducible polynomials is the following:

17

2.1.38 Proposition If deg(f) = 1 then f is irreducible.

Proof. Suppose f = gh. Then deg(g) + deg(h) = 1. Therefore the degrees of g and h are 0and 1, so one of them is a unit. 2

Note that the converse to the above is false as we have already seen with X2 − 2 in Q[X].Note also that even in R[X], the polynomial X2 +1 is irreducible, although it factorizes in C[X]as (X + i)(X − i). One might ask whether there are similar phenomena for C and bigger fields,but in fact we have:

2.1.39 Fundamental Theorem of Algebra Let f ∈ C[X] be a non-zero polynomial. Thenf factorizes as a product of linear factors:

f(X) = c(X − λ1) · · · (X − λd),

where c is the leading coefficient of f .

Proof. This is proved in a complex analysis course.Here is a sketch of the proof.Let f be a non-constant polynomial.Suppose f has no roots, define

g(z) =1

f(z)

As f has no root, g is a holomorphic function.The function g is bounded because |f(z)| → ∞ as |z| → ∞ (this is because we assumed that

f is non-constant).A bounded holomorphic function C → C is constant, hence f is constant which is a contra-

diction. Hence f has a root. 2

In the notation of this course, the theorem means the in C[X] the irreducible polynomialsare exactly the polynomials of degree 1, with no exceptions. In R[X] the description of theirreducible polynomials is a little more complicated.

In Q[X] things are much more complicated and it can take some time to determine whethera polynomial is irreducible or not.

18

Lecture 5

2.2 Euclid’s algorithm in k[X]

The rings Z and k[X] are very similar. This is because in both rings we a able to divide withremainder in such a way that the remainder is smaller than the element we divided by. In Z ifwe divide a by b we find:

a = qb + r, 0 ≤ r < b.

In k[X], we have something identical:

2.2.40 Division Algorithm Given a, b ∈ k[X] with b 6= 0 there exist unique q, r ∈ k[x] suchthat

a = qb + r and deg(r) < deg(b).

This allows us to prove the same theorems for k[X] as we proved for Z.We have the following corollary of the fundamental theorem of algebra and euclidean division.

2.2.41 Corollary No polynomial f(x) in R[X] of degree > 2 is irreducible in R[X]. Proof.

Let f ∈ R[X] be a polynomial of degree > 2. By fundamental theorem f has a root in C, callit α. Then α (complex conjugate) is another root (because f ∈ R[X]). Let

p(x) = (x − α)(x − α)

Write α = a + bi, expand to get

p(x) = x2 − 2ax + (a2 + b2)

The polynomial p is in R[X] and is irreducible (if it was reducible it would have a real root).Divide f by p.

f(x) = p(x)q(x) + r(x)

with deg(r) ≤ 1. We can write r = sx+r with s, r ∈ R. But f(α) = p(α)q(α)+r(α) = 0 = r(α).As α not real we must have r = s = 0. This implies that p divides f but deg(p) = 2 < deg(f).It follows that f is not irreducible. 2

2.2.42 Example In Q[X] divide f = X4 + 2X3 + X2 + 2X + 1 by g = X2 + X + 1.We find

f − X2g = X3 + 2X + 1

The degree is still ≥ deg(g), hence we do it again

X3 + 2X + 1 − Xg = −X2 + X + 1

and one more time :

−X2 + X + 1 + g = 2(X + 1)

19

We found something of degree strictly smaller than deg(g).We get

f = (x2 + x − 1)g + 2x + 1

Hence q = x2 + x + 1 and r = 2x + 1.Another example:f = x3 + x + 2 and g = x2 − 1.We get

f − xg = 2x + 2

This is of degree strictly smaller than g, hence q = x and r = x + 1.

2.2.43 Example In F5[X] divide ...... etc. Note that in F5 we have 2−1 = 3, 3−1 = 2 and4−1 = 4. .... etc.

2.2.44 The Remainder Theorem If f ∈ k[X] and a ∈ k then

f(a) = 0 ⇐⇒ (X − a)|f.

Proof. If (x − a)|f then there exists g ∈ k[x] such that f(x) = (x − a)g(x). Then f(a) =(a − a)g(a) = 0g(a) = 0.

Conversely if by the Division Algorithm we have q, r ∈ F[x] with deg(r) < deg(X − a) = 1such that f(X) = q(X)(X − a) + r(X). So r(x) ∈ k. Then

r(a) = f(a) − q(x)(a − a) = 0 + 0 = 0.

Hence (X − a)|f . 2

We can also use the division algorithm to calculate highest common factors as before:

2.2.45 Definition Let f, g ∈ k[X], not both zero. A highest common factor of f and g is amonic polynomial h such that:

• h|f and h|g.

• if a|f and a|g then deg(a) ≤ deg(h).

2.2.47 Example

2.2.47 Proposition Let f = qg + r. Then h is a hcf if f and g iff h is a hcf of g and r.

Proof. Exactly the same as with the integers. 2

Note that hcf(f, 0) = 1cf where c is the leading coefficient of f .

20

Lecture 6

2.2.48 Bezout’s Lemma Let f, g ∈ k[X] not both zero. Then there exist a, b ∈ k[X] suchthat

hcf(f, g) = af + bg.

Again the proof is the same as in the case of integers.Let’s do an example : Calculate hcf(f, g) and find a, b such that hcf(f, g) = af + bg with

f = x4 + 1 and g = x2 + x.We write: f − x2g = −x3 + 1, then f − x2g + xg = x2 + 1 and f − x2g + xg − g = 1− x and

we are finished.We find:

f = (x2 − x + 1)g + 1 − x

And thenx2 + x = (−x + 1)(−x − 2) + 2

As 2 is invertible, we find that the hcf is one !Now, we do it backwards:

1 = (1/2)(−x+1)(x+2)+(1/2)(x2+x) = (1/2)((x4+1)−(x2+x)(x2−x+1))(x+2)+(1/2)(x2+x) = (1/2)(x4+1)(

hence a = (1/2)(x + 2) and b = (1/2)(−x3 − x2 + x − 1).

2.2.49 Lemma Let p ∈ k[X] be irreducible. If p|ab then p|a or p|b.

Proof. Exactly identical to the integers. 2

2.2.50 Unique Factorisation Theorem Let f ∈ k[x] be monic. Then there exist p1, p2, . . . , pn ∈k[x] monic irreducibles such that

f = p1p2 · · · pn.

If q1, . . . , qs are monic and irreducible and f = q1 . . . qs then r = s and (after reordering) p1 = q2,... , pr = qr.

Proof. (Existence): We prove the existence by induction on deg(f). If f is linear then it isirreducible and the result holds. So suppose the result holds for polynomials of smaller degree.Either f is irreducible and so the result holds or f = gh for g, h non-constant polynomials ofsmaller degree. By our inductive hypothesis g and h can be factorized into irreducibles andhence so can f .

(Uniqueness): Factorization is obviously unique for linear polynomials (or even irreduciblepolynomials). For the inductive step, assume all polynomials of smaller degree than f haveunique factorization. Let

f = g1 · · · gs = h1 · · ·ht,

with gi, hj monic irreducible.

21

Now g1 is irreducible and g1|h1 · · ·ht. By the Lemma, there is 1 ≤ j ≤ t such g1|hj . Thisimplies g1 = hj since they are both monic irreducibles. After reordering, we can assume j = 1,so

g2 · · · gs = h2 · · ·ht,

is a polynomial of smaller degree than f . By the inductive hypothesis, this has unique factor-ization. I.e. we can reorder things so that s = t and

g2 = h2, . . . , gs = ht.

2

The fundamental theorem of algebra tells you exactly that any monic polynomial in C[x] isa product of irreducibles (recall that polynomials of degree one are irreducible).

A consequence of factorisation theorem and fundamental theorem of algebra is the following:any polynomial of odd degree has a root in R. Indeed, in the decomposition we can havepolynomials of degree one and two. Because the degree is odd, we have a factor of degree one,hence a root.

Another example : x2 + 2x + 1 = (x + 1)2 in k[X].Look at x2 + 1. This is irreducible in R[x] but in C[x] it is reducible and decomposes as

(x+ i)(x− i) and in F2[x] it is also reducible : x2 +1 = (x+1)(x−1) = (x+1)2 in F2[x]. In F5[x]we have 22 = 4 = −1 hence x2 + 1 = (x + 2)(x − 2) (check : (x − 2)(x + 2) = x2 − 4 = x2 + 5).

In fact one can show that x2 + 1 is reducible in Fp[x] is and only if p ≡ 1 mod 4.In Fp[x], the polynomial xp − x decomposes as product of polynomials of degree one.Suppose you want to decompose x4 + 1 in R[x]. It is not irreducible puisque degree est > 2.

Also, x4 + 1 does not have a root in R[x] but it does in C[x]. The idea is to decompose intofactors of the form (x − a) in C[x] and then group the conjugate factors.

This is in general how you decompose a polynomial into irreducibles in R[x] !So here, the roots are

a1 = eiπ/4, a2 = e3iπ/4, a3 = e5iπ/4, a4 = e7iπ/4.

Now note that a4 = a1 and the polynomial (x−a1)(x−a4) is irreducible over R. The middlecoefficient is −(a1 +a2) = −2 cos(π/4) = −

√2. Hence we find : (x−a1)(x−a4) = x2−

√2x+1.

Similarly a2 = a3 and (x − a2)(x − a3) = x2 +√

2x + 1.We get the decomposition into irreducibles over R :

x4 + 1 = (x2 −√

2x + 1)(x2 +√

2x + 1)

In Q[x] one can show that x4 + 1 is irreducible.In F2[x] we can also decompose x4 + 1 into irreducibles. Indeed :

x4 + 1 = x4 − 1 = (x2 − 1)2 = (x − 1)4

22

Lecture 7

3 Jordan Canonical Form

3.1 Revision of linear algebra

• Fields. A field is a commutative ring with 1 such that every non-zero element has aninverse. Examples: Q, R, C, Fp. If k is any field then k(X) (the field of rational functions)is a field.

• Vector spaces, subspaces, direct sums. A vector space over a field k is a set V with twooperations: addition and scalar multiplication by elements of k. Elements of V are calledvectors, and elements of k are called scalars. The axioms are:

– (V, +) is an abelian (commutative) group.

– (xy)v = x(yv) for x, y ∈ k, v ∈ V .

– (x + y)(v + w) = xv + xw + yv + yw for x, y ∈ k, v, w ∈ V .

– 1v = v.

A typical example of a vector space is the space kn of n-tuples of elements in k. Inparticular k itself is a vector space over itself.

Another example is k[X]. The set of polynomials with coefficients in k is a vector space.Fix n ≥ 0 and let k[X]n be te set of polynomials of degree less or equal to n. This is avector space (although it is not a ring !). If n = 0, then this vecor space is just k.

Take k = R and let C be the set of all continuous functions from [0, 1] to R. Then C is anR-vector space.

Similarly, take k = C and let H be the set of all holomorphic functions from the unit ballto C. This is a C vector space. Of course it also an R-vector space.

Another example. Let a, b ∈ R and consider the set of all twice differentiable functions fsuch that

d2f

dx+ a

df

dx+ bf = 0

This is an R vector space (exercise).

• A linear combination of {v1, . . . , vn} is a vector of the form x1v1 + . . . + xnvn.

For example, consider the vector space k[x]n as before. This vector space is in fact the setof all linear combinations of the elements 1, x, . . . , xn.

• The span of a set of vectors is the set of linear combinations of those vectors.

As above, k[x]n is the span of the set {1, x, . . . , xn}. We say that the vectors {1, x, . . . , xn}span or generate this vector space.

Consider k2 and the vecotrs

e1 =

(

10

)

23

and

e2 =

(

01

)

Then the set of vectors {e1, e2} spans R2.

• Let V be a k-vector space. A subset A of V is said to generate V is V is the span of A.

In the examples above {1, x, . . . , xn} generates k[x]n but 1, {1, x}, {1, x, x2}, {1, x, x2, . . . , xn−1}do not generate V .

The set {e1, e2} certainly generates R2 while {e1} or {e2} do not.

• Let V be a k-vector space. A subset W ⊂ V is called a subspace if any linear combinationof elements in W is in W . In other words, a subspace is a subset which is a vector spacewith the same addition and scalar multiplication as V .

Let V be a k-vector space and take v ∈ V . The set kv of all multiples of v by elements ofk is a subspace. More generally, take any set A ⊂ V , then the set of linear combinationsof elements of A is a vector subspace.

As an (easy) exercise, prove that given any collection Wi, i ∈ I (I is some set, finite orinfinite) of subspaces of V , the intersection ∩i∈IVi is a subspace. The union is not ! Forexample, let V = k2 and W1 = ke1 and e2 = ke2. It is quite clear that W1 ∪ W2 not asubspace, for example e1 + e2 is not in it.

Let C be the set of continuous functions [0, 1] −→ R. We have seen that this is an R-vectorspace. Let W = {f ∈ C : f(0) = 0}. This is a subspace (easy exercise).

We have seen that k[x] is a vector space. The space k[x]n is a vector subspace.

• Vectors v1, . . . , vn ∈ V are called linearly independent if whenever∑n

i=1 λivi = 0 (for someλi ∈ R), then λi = 0 for all i.

For example, in k2, vetors e1, e2 are linearly independent. Clearly e1 and 2e1 are notlinearly independent.

In k[x], the vectors {1, x, x2, . . . , are linearly independent.

• A set {v1, . . . , vn} of vectors is a basis for V if it is linearly independent and it’s span isV . If this is the case then every vector has a unique expression as a linear combination of{v1, . . . , vn}.For example {e1, e2} is a basis of R2. The set {1, x, x2, . . . , xn} is basis of k[x]n.

The set {1, x, x2, . . . , is a basis of k[x].

• The dimension of a vector space is the number of vectors in a basis. This does not dependon the basis: any two bases have the same number of elements.

For example kn has dimension n, k[x]n has dimension n+1 while k[x] is infinite dimensionaland so is C.

R viewed as vactor space over itself has dimension 1 but viewed as vector space over Q isinfinite dimensional.

24

C viewed as a vector space over itself has dimension 1 but as a vector space over R it hasdimension 2 : a basis is {1, i}.In this course we will be mainly concerned with finite dimensional vector spaces.

Let V, W be vector spaces. A function T : V → W is a linear map if

• T (v + w) = T (v) + T (w),

• T (xv) = xT (v).

or equivalently, T (v + xw) = T (v) + xT (w). A bijective linear map is called an isomorphism ofvector spaces.

For example the map T : C −→ C that sends z to z is not a linear map of C-vector spaces :T (λz) = λT (z) ! But it is a map of real vector spaces : if λ ∈ R, then λ = λ.

If T is linear, we define its kernel and image:

ker(T ) = {v ∈ V : T (v) = 0},

Im(T ) = {T (v) : v ∈ V }.The rank of T is the dimension of the image of T , and the nullity of T is the dimension of thekernel of T .

This implies the following:

3.1.51 Rank-Nullity Theorem Let T : V → W be a linear map. Then

rank(T ) + null(T ) = dimV.

Proof. Let {v1, . . . , vr} be a basis of ker(T ) and {w1, . . . , ws} be a basis of Im(T ). By definition

of the image, there exist {u1, . . . , us} vectors of V such that

T (ui) = wi

We claim that {u1, . . . , us, v1, . . . , vr} form a basis of V which will conclude the proof.First we show linear independence. Suppose that

a1v1 + · · · + asvr + b1u1 + · · · + brus = 0

Apply T , we get

0 = T (0) = b1T (u1) + · · · + brT (us) = b1w1 + · · · + bsws

(note that a1v1 + · · ·+ asvr = 0 because the vis are in the kernel of T ). Now, as {w1, . . . , ws} isa basis of Im(T ) (in particular it is linearly independent), we get that bi = 0 for all i.

So we have a1v1 + · · ·+ asvr = 0 and, because vis for a basis of ker(T ) (and in particular arelinearly independent), we get that ai = 0 for all i.

We have shown that ais and b)is are all zero hence {u1, . . . , us, v1, . . . , vr} is linearly inde-pendent.

It remains to show that {u1, . . . , us, v1, . . . , vr} spans V .

25

Let x ∈ V . By the choice of {w1, . . . , ws} as a basis of the image of T , we have

T (x) =s∑

i=1

aiwi =s∑

i=1

aiT (ui) = T (s∑

i=1

T (aiui))

Therefore

T (x −s∑

i=1

aiui) = 0

and hence

x −s∑

i=1

aiui ∈ ker(T )

and now, by the choice of {vi} as basis of ker(T ), there exist bis such that

x =s∑

i=1

aiui +r∑

j=1

bivi

which shows that {u1, . . . , us, v1, . . . , vr} generates V .This finishes the proof. 2

Here are some consequences of this theorem.

3.1.52 Definition A linear map T : V −→ W is called isomorphism if there exists

1. T1 : W −→ V such that TT1 = IV (identity of W )

2. T2 : W −→ V such that T2T = IW (identity of V )

In particular, a linear map T−1 : V −→ V is an isomorphism if there exists T−1 such that T−1Tis the identity.

It is easy (and left as exercise) to see that T : V −→ W is an isomorphism if and only if Tis both surjective and injective. (for the converse you will need to constrict T1 and T2 as mapsand then show that they are linear.)

3.1.53 Corollary Let T : V −→ W be a linear map with dimV = dimW . If T is injective,then T is an isomorphism. If T is surjective, then T is an isomorphism. Proof. If T is

injective, then ker(T ) = {0}. By the above theorem, dim(Im(T )) = dim(V ) = dim(W ) andhence Im(T ) = W and T is surjective. Injective + Surjective = Isomorphism.

Similarly, if T is surjective, then dim(Im(T )) = dim(W ) = dim(V ) and hence dim(ker(T )) =0. It follows that T is injective. Injective + Surjective = Isomorphism. 2

3.1.54 Corollary Let V and W be two vector spaces of same dimension. Then V is isomorphicto W . Proof. Let r = dim(V ) = dim(W ) and let {v1, . . . , vr} be a basis of V and {w1, . . . , wr}

be a basis of W . Define T by T (vi) = wi. By construction, T is surjective and by the theoremit’s also injective hence an isomorphism. 2

26

Lecture 8

3.2 Matrix representation of linear maps

Let V and W be two finite dimensional vector space over a field k. Suppose that V is ofdimension r and W is of dimension t.

Let B = {b1, . . . , br} be a basis for V and B′ = {b′1, . . . , b′s} be a basis for W .For any vector v ∈ V we shall write [v]B (in the future, we will by abuse of notation simply

call this column vector v when it is obvious which basis we are referring to) for the columnvector of coefficients of v with respect to the basis B, i.e.

[v]B =

x1...

xr

, v = x1b1 + . . . + xrbr.

Given a linear map T : V → W we have

T (v) = T (r∑

i=1

xibi) =r∑

i=1

xiT (bi)

Now we have

T (bi) =s∑

j=1

ajiwj

We get :

T (v) =∑

1≤i≤r,1≤j≤s

xiajiwj

In other words it is the s× r matrix product of the matrix, usually denoted by AT , with entries

(AT )i,j = aji

One also writes [T ]B,B′ for this matrix AT .In practice, to write a matrix of T with respect to given bases, decompose T (bi)

in the basis of W and write column vectors, this gives the matrix AT

The matrix AT or [T ]B,B′ is called the matrix of T with respect to bases B and B′.A LINEAR TRANSFORMATION IS THE MATRIX WITH RESPECT TO SPEC-IFIED BASES OF THE SOURCE AND THE TARGET SPACES.

Example. V = W = R3 with canonical bases {e1, e2, e3}.

T

xyz

=

xy0

(notice that this is the projection onto the plane z = 0).One finds

AT =

1 0 00 1 00 0 0

27

One can find the kernel and the image. In this case, clearly the image is the span of e1 and e2

hence dim(ImT ) = 2. By rank-nullity theorem, dim ker(T ) = 1 and quite clearly it is generatedby e3.

Let us look at T : R3 → R3 given by

T =

xyz

=

2x − y + 3z4x − 2y + 6z−6x + 3y − 9z

One finds :

AT =

2 −1 34 −2 6−6 −3 −9

Quite clearly, the first column vector in this matrix is −2 times the second and the third

is the first minus the second, therefore Im(T ) is one dimensional and spanned by

24−6

The

rank-nullity theorem implies that the dimension of ker(T ) is 2. To find ker(T ) one needs to solveAT v = 0. By elimination, one easily shows that the kernel has equation 2x − y + 3 = 0, hence

can be spanned by the vectors

120

and

031

Another example : k[x]n −→ k[x]n−1 sending f to its derivative f ′. Quite clearly it’s a linearmap. Find its matrix, image and kernel.

Same question with k[x]n −→ k[x]n sending f to f + f ′.Let us consider the transformation T : R2 −→ R2

T =

(

xy

)

=

(

x + yx − y

)

and B1 = B2 = {v1 =

(

1−1

)

, v1 =

(

−32

)

}.

One calculates T (v1) = v1 =

(

02

)

= −6v1 − 2v2 and T (v2) = v1 =

(

−32

)

= 17v1 + 6v2.

The matrix of T with respect to these bases is

(

−6 17−2 6

)

In the canonical bases, of course the matrix is:

(

1 11 −1

)

One one changes the bases, the matrix gets multiplied on the left and on the right byappropriate ‘base change’ matrices.

More precisely, let T : V −→ W be a linear map. Let B1 = {v1, . . . , vr} be a basis for V andlet B′

1 = {v′1, . . . , v′r} be another basis for V .

28

Similarly let B2 = {w1, . . . , ws} be a basis for W and let B′2 = {w′

1, . . . , w′s} be another basis

for W .Let AT = [T ]B1,B2

be the matrix of T in the bases B1 and B2.Let

v =

x1...

xr

, v = x1v

′1 + . . . + xrv

′r.

be a vector of V written in the basis B′1. Now write

v′i =r∑

j=1

bijvi

be the expression of the vector v′i from B′1 in the basis B1. Thus we obtain a r × r matrix

B = (bij) which has the property thatv′i = Bvi

for all is.This matrix B is called the transition (or base change) matrix from basis B1 to

B′1.

Then AT Bv = x1w1 + · · · + xsws, vector in W in the basis B2. Now write

wi =s∑

k=1

aikw′k

Then the s × s matrix A = (aik) is such that AAT Bv is a column vecor that expresses thecoordinates of T (v) in the basis B′

2.We summarise :

[T ]B′

1,B′

2= A[T ]B1,B2

B

where A is the s× s matrix whose columns are coordinates of wi in the basis B′2 and B is r × r

matrix whose columns are coordinates of v′i in the basis B1.In the particular case where r = s and V = W and B1 = B2 and B′

1 = B′2 we get that

A = B−1 and[T ]B′

1,B′

2= B−1[T ]B1,B2

B

Example.

T =

(

x − 2y2x + y

)

=

(

x + yx − y

)

and B1 = B2 = {v1 =

(

1−2

)

, v1 =

(

32

)

}.

One calculates T (v1) = v1 =

(

50

)

= 54(v1 + v2)

and T (v2) = v1 =

(

−18

)

= −134 v1 + 3

4v2.

29

The matrix of T with respect to these bases is

[T ]B1,B2=

(

5/4 −13/45/4 3/4

)

In the canonical basis B the matrix is

[T ]B,B =

(

1 −22 1

)

The transition matrix from B to B1 is

A =

(

1 3−2 2

)

And the transition matrix from B1 to B is

A−1 =1

8

(

2 −32 1

)

One easily checks that [T ]B1,B2= A−1[T ]B,BA.

To summarise what we have seen :Let T : V −→ W be a linear map, we let r be the dimension of V and s be the dimension of

W . Let B = {v1, . . . , vr} be a basis of V and B′ a basis of W .The matrix AT of T in the bases B and B′ is the matrix (T (v1), . . . , T (vr)). where column

vectors are coordinates of T (vi) in the basis B′. This is an s × r matrix.Let v be a vector in V and write it as a column vector (r × 1 vector) of its coordinates in

the basis B. Then AT v (matrix multiplication !) is the column vector (s × 1 matrix) whichrepresents the coordinates of T (v) ∈ W in the basis B′.

Suppose B1 is another basis of V and B′1 is another basis of W . Then T is represented in

the bases B1 and B′1 by the matrix AT multiplied on the left and on the right by appropiate

base change matrices.It can be shown that the only matrices which do not depend on the choice of a basis are

diagonals with all coefficients on the diagonal equal (for example identity and zero).We also have the following :

3.2.55 Proposition Let T1 : V −→ W and T2 : W −→ U be two linear maps and suppose weare given bases B, B1, B2 of the vector spaces V , W , U . Then for the composed map T2 ◦ T1

(usually simply denoted by T2T1) the matrix is

[T2T1]B,B2= [T2]B1,B2

[T1]B,B1

In particular if T : V −→ V (such a map is called endomorphism) and B is a basis for V ,then

[Tn]B = [T ]nB

and, if we suppose that T is an isomorphism

[T−1]B = [T ]−1B

In particular, if B and B′ are two bases of V , then the transition matrix of B to B′ is theinverse of the transition matrix of B′ to B.

30

Lecture 9

In what follows V is a vector space of dimension n and B is a basis.Let A be the matrix representing T in the basis B. Because A is an n × n matrix, we can

form powers Ak for any k with the convention that A0 = In. Note that Ak represents thetransformation T composed ktimes as seen above.

Notice for example that when the matrix is diagonal with coefficients λi on the diagonal,then An is diagonal with coefficients λn

i . Notice also that such a matrix is invertible if and onlyif all λis are non-zero, then A−1 is the diagonal matrix with coefficients λ−1

i on the diagonal.

3.2.56 Definition Let f(X) =∑

aiXi ∈ k[X]. We define

f(T ) =∑

aiTi.

where we define T 0 = id. This is a linear transformation.If A ∈ Mn(k) is a matrix then we define

f(A) =∑

aiAi,

This matrix f(A) represents f(T ) in the basis B.

What is means is that we can ‘evaluate’ a polynomial at a n×n matrix and get another n×nmatrix. We write [f(T )]B for this matrix in the basis B, obviously it is the same as f([T ]B).

Let’s look at an example.

Take A =

(

−1 34 7

)

and f(x) = x2 − 5x + 3. Then

f(A) = A2 − 5A + 3 =

(

21 34 29

)

3.2.57 Definition Recall that the characteristic polynomial of an n × n matrix A is definedby

chA(X) = det(X · In − A).

This is a monic polynomial of degree n over k. Now suppose T : V → V is a linear map. Wecan define chT to be ch[T ]B but we need to check that this does not depend on the basis B. IfC is another basis with transition matrix M then we have:

ch[T ]C (X) = det(X · In − M−1[T ]BM)

= det(M−1(X · In − [T ]B)M)

= det(M)−1 det(X · In − [T ]B) det(M)

= det(X · In − [T ]B)

= ch[T ]B (X)

In other words, the characteristic polynomial does not depend on the choice of thebasis in which we write our matrix.

The following (important !) theorem was proved in the first year courses.

31

3.2.58 Cayley–Hamilton Theorem For any A be an n × n matrix. We have chA(A) = 0.

We therefore have:

3.2.59 Cayley–Hamilton Theorem For any T : V −→ V linear map, we have chT (T ) = 0.

3.2.60 Example Take A =

(

λ1 00 λ2

)

Then chA(x) = (x−λ1)(x−λ2) and clearly chA(A) = 0.

Take A =

(

1 23 4

)

. Calculate chA(x) and check that chA(A) = 0.

There are plenty of polynomials f such that f(A) = 0, all the multiples of f for example.What can also happen is that some divisor g of chA is already such that g(A) = 0. Take the

identity In for example. Its characteristic polynomial is (1−x)n but in fact g = 1−x is alreadysuch that g(In) = 0. This leads us to the notion of minimal polynomial.

32

Lecture 10

In the last lecture, we showed that given a linear transformation T there is a polynomial f(namely the characteristic polynomial) such that f(T ) = 0. Among all polynomial with thisproperty, there is one of minimal degree.

3.3 Minimal polynomials

3.3.61 Definition Let V be a finite dimensional vector space over a field k and T : V → V alinear map. A minimal polynomial of T is a monic polynomial m ∈ k[X] such that

• m(T ) = 0;

• if f(T ) = 0 and f 6= 0 then deg f ≥ deg m.

3.3.62 Theorem Every linear map T : V → V has a unique minimal polynomial mT .Furthermore f(T ) = 0 iff mT |f .

Proof. Firstly the Cayley-Hamilton theorem implies that there exists a polynomial f sat-isfying f(T ) = 0, namely f = chT . Suppose that mT is not unique then there exists a monicpolynomial n(x) ∈ k[X] satisfying deg(m) = deg(n) and n(T ) = 0.

If f(x) = m(x) − n(x) then

f(T ) = m(T ) − n(T ) = 0,

also deg(f) < deg(m) = deg(n), a contradiction.Suppose f ∈ k[X] and f(T ) = 0. By the Division Algorithm for polynomials there exist

unique q, r ∈ k[X] with deg(r) < deg(m) and

f = qm + r.

Thenr(T ) = f(T ) − q(T )m(T ) = 0 − q(T ) · 0 = 0.

So r is the zero polynomial (by the minimality of deg(m).) Hence f = qm and so m|f .Conversely if f ∈ k[X] and m|f then f = qm for some q ∈ k[X] and so f(T ) = q(T )m(T ) =

q(T ) · 0 = 0. 2

3.3.63 Corollary If T : V → V is a linear map then mT |chT .

Proof. By the Cayley-Hamilton Theorem chT (T ) = 0. 2

Using the corollary we can calculate the minimal polynomial as follows:

• Calculate chT and factorize it into irreducibles.

• Make a list of all the factors.

• Find the monic factor m of smallest degree such that m(T ) = 0.

33

3.3.64 Example Suppose T is represented by the matrix

2 12

2

. The characteristic

polynomial ischT (X) = (X − 2)3.

The factors of this are:1, (X − 2), (X − 2)2, (X − 2)3.

The minimal polynomial is (X − 2)2.

In fact this method can be speeded up: there are certain factors of the characteristic poly-nomial which cannot arise. To explain this we recall the definition of an eigenvalue

3.3.65 Definition Recall that a number λ ∈ k is called an eigenvalue of T if there is anon-zero vector v satisfying

T (v) = λ · v.

The non-zero vector v is called an eigenvector

3.3.66 Remark It is important that an eigenvector be non-zero. If you allow zero to be aneigenvector, then any λ would be an eigenvalue.

3.3.67 Proposition Let v be an eigenvector of T with eigenvalue λ ∈ k. Then for anypolynomial f ∈ k[X],

(f(T ))(v) = f(λ) · v.

Proof. Just use that T (v) = λv. 2

3.3.68 Theorem If T : V → V is linear and λ ∈ k then the following are equivalent:

(i) λ is an eigenvalue of T .

(ii) mT (λ) = 0.

(iii) chT (λ) = 0.

Proof. (i) ⇒ (ii): Assume T (v) = λv with v 6= 0. Then by the proposition,

(mT (T ))(v) = mT (λ) · v.

But mT (T ) = 0 so we have mT (λ) · v = 0. Since v 6= 0 this implies mT (λ) = 0.(ii) ⇒ (iii): This is trivial since we have already shown that mT is a factor of chT .(iii) ⇒ (i): Suppose chT (λ) = 0. Therefore det(λ · id− T ) = 0. It follows that (λ · id− T ) is

not invertible so there is a non-zero solution to (λ · id − T )(v) = 0. But then T (v) = λ · v. 2

34

Now suppose the characteristic polynomial of T factorizes into irreducibles as

chT (X) =r∏

i=1

(X − λi)a1 .

By fundamental theorem of algebra, if k = C, you can always factorise it like this.Then the minimal polynomial has the form

mT (X) =r∏

i=1

(X − λi)b1 , 1 ≤ bi ≤ ai.

This makes it much quicker to calculate the minimal polynomial. Indeed, in practice, thenumber of factors and the ais are quite small.

3.3.69 Example Suppose T is represented by the matrix diag(2, 2, 3). The characteristicpolynomial is

chT (X) = (X − 2)2(X − 3).

The possibilities for the minimal polynomial are:

(X − 2)(X − 3), (X − 3)2(X − 3).

The minimal polynomial is (X − 2)(X − 3).

35

Lecture 11

3.4 Generalized Eigenspaces

3.4.70 Definition Let V be a finite dimensional vector space over a field k, and let λ ∈ k bean eigenvalue of a linear map T : V → V . We define for t ∈ N the t-th generalized eigenspaceby:

Vt(λ) = ker((λ · id − T )t).

Note that V1(λ) is the usual eigenspace (i.e. the set of eigenvectors together with zero).

3.4.71 Remark We obviously have

V1(λ) ⊆ V2(λ) ⊆ . . .

and by definition,dimVt(λ) = null

(

(λ · id − T )t)

.

3.4.72 Example Let

A =

2 2 20 2 20 0 2

.

We have chA(X) = (X − 2)3 so 2 is the only eigenvalue. We’ll now calculate the generalizedeigenspaces Vt(2):

V1(2) = ker

0 2 20 0 20 0 0

.

We calculate the kernel by row-reducing the matrix:

V1(2) = ker

0 1 00 0 10 0 0

= span

100

.

Similarly

V2(2) = ker

0 0 10 0 00 0 0

= span

100

,

010

.

V3(2) = ker

0 0 00 0 00 0 0

= span

100

,

010

,

001

.

36

3.4.73 Example Let

A =

1 1 −21 1 −21 1 −2

.

Let V be a vector space and U and W be two subspaces. Then (exercise on Sheet 4), U +Wis a subspace of V . If furthermore U ∩ V = {0}, then we call this subspace the direct sum ofU and W and denote it by U ⊕ W . In this case

dim(U ⊕ W ) = dimU + dimW

3.4.74 Primary Decomposition Theorem If V is a finite dimensional vector space over

C and T : V → V is linear, with distinct eigenvalues λ1, . . . , λr ∈ k, minimal polynomial

mT (X) =r∏

i=1

(X − λi)bi ,

thenV = Vb1(λ1) ⊕ · · · ⊕ Vbr

(λr).

3.4.75 Lemma Let k be a field. If f, g ∈ k[x] satisfy hcf(f, g) = 1 and T is as above then

ker(fg(T )) = ker(f(T )) ⊕ ker(g(T )).

Proof of the Theorem. By definition of mT we have mT (T ) = 0 so ker(mT (T )) = V . Wehave a factorization of mT into pairwise coprime factors of the form (x−λi)

bi (this is where thefact that the ground field is C is used) so the lemma implies that

V = ker(mT (α)) = ker

(

r∏

i=1

(T − λi1)bi

)

= ker(T − λ1)b1 ⊕ · · · ⊕ ker(T − λt)

bt

= Vb1(λ1) ⊕ · · · ⊕ Vbt(λt).

2

Proof of the lemma. Let f, g ∈ k[x] satisfy hcf(f, g) = 1.Firstly if v ∈ ker f(T ) + ker g(T ), say v = w1 + w2, with w1 ∈ ker f(T ) and w2 ∈ ker g(T )

thenfg(T )v = fg(T )(w1 + w2) = f(g(T )w1) + f(g(T )w2) = f(g(T )w1)

Now, f and g are polynomials in k[x], hence fg = gf , therefore

f(g(T )w1) = g(f(T )w1) = 0

because w2 ∈ ker(f(T )).

37

Therefore ker(f(α)) + ker(g(α)) ⊆ ker(fg(α)).We need to prove the eqaulity, here we will use that hcf(f, g) = 1.Now since hcf(f, g) = 1 there exist a, b ∈ k[x] such that

af + bg = 1.

Soa(T )f(T ) + b(T )g(T ) = 1 (the identity map).

Let v ∈ ker(fg(T )). Ifv1 = a(T )f(T )v, v2 = b(T )g(T )v

then v = v1 + v2 and

g(T )v1 = (gaf)(T )v = a(fg(T )v) = a(T )0 = 0.

So v1 ∈ ker(g(T )). Similarly v2 ∈ ker(f(T )) since

f(T )v2 = (fbg)(T )v = b(fg(T )v) = b(T )0 = 0.

Hence ker fg(T ) = ker f(T ) + ker g(T ). Moreover, if v ∈ ker f(T )∩ ker g(T ) then v1 = 0 = v2 sov = 0. Hence

ker fg(T ) = ker f(T ) ⊕ ker g(T ).

2

38

Lecture 12

3.4.76 Definition Recall that a linear map T : V → V is diagonalizable if there is a basis Bof V such that [T ]B is a diagonal matrix. This is equivalent to saying that the basis vectors inB are all eigenvectors.

3.4.77 Theorem Let V is a finite dimensional vector space over a field k and let T : V → Vbe a linear map with eigenvalues λ1, . . . , λr ∈ k. Then T is diagonalizable iff we have (in k[X]):

mT (X) = (X − λ1) . . . (X − λr).

Proof. First suppose that T is diagonalizable and let B be a basis of eigenvectors. Forsimplicity let f(X) = (X − λ1) . . . (X − λr). We already know that f |mT , so to prove thatf = mT we just have to check that f(T ) = 0. To show this, it is sufficient to check thatf(T )(v) = 0 for each basis vector v ∈ B. Suppose v ∈ B, so v is an eigenvector with someeigenvalue λi. Then we have

f(T )(v) = f(λ) · v = 0 · v = 0.

Therefore mT = f .Conversely if mT = f then by the primary decomposition theorem we have

V = V1(λ1) ⊕ . . . ⊕ V1(λr).

Let Bi be a basis for V1(λi). Then obviously the elements of Bi are eigenvectors and B =B1 ∪ . . . ∪ Br is a basis of V . Therefore T is diagonalizable. 2

This gives a practical criterion to check whether a given matrix is diagonalisableor not : calculate the minimal polynomial and factor it over C. If it does not havemultiple roots then the matrix is diagonalisable, if it does then it is not.

3.4.78 Example Let k = C and let

A =

(

4 23 3

)

.

The characteristic polynomials is (x − 1)(x − 6). The minimal polynomial is the same. Thematrix is diagonalisable.

One finds that the basis of eigenvectors is

(

2−3

)

,

(

11

)

.In fact this matrix is diagonalisable over R or even Q.

39

3.4.79 Example Let k = R and let

A =

(

0 −11 0

)

.

The characteristic polynomial is x2 + 1. It is irreducible over R. The minimal polynomial is thesame. The matrix is not diagonalisable over R, however over C x2 + 1 = (x − i)(x + i) and thematrix is diagonalisable.


A =

(

1 −11 −1

)

.

The characteristic polynomials is X2. Since A 6= 0 the minimal polynomial is also X2. Sincethis is not a product of distinct linear factors, A is not diagonalizable over C.


A =

(

1 01 1

)

.

The minimal polynomial is (x − 1)2. Not diagonalisable.

3.5 Jordan Bases in the one eigenvalue case

Let T : V → V be a linear map. Fix a basis for V and let A be the matrix of T in this basis. Aswe have seen above, it is not always the case that T can be diagonalised; i.e. there is not alwaysa basis of V consisting of eigenvalues of T . This the case that there is no basis of eigenvalues,the best kind of basis is a Jordan basis. We shall define a Jordan basis in several steps.

Suppose λ ∈ k is the only eigenvalue of a linear map T : V → V . We have defined generalizedeigenspaces:

V1(λ) ⊆ V2(λ) ⊆ . . . ⊆ Vb(λ),

where b is the power of X − λ in the minimal polynomial mT .We can choose a basis B1 for V1(λ). Then we can choose B2 so that B1 ∪ B2 is a basis for

V2(λ) etc. Eventually we end up with a basis B1 ∪ . . . ∪ Bb for Vb(λ). We’ll call such a basis apre-Jordan basis.

3.5.82 Example Consider

A =

(

3 −28 −5

)

.

One calculates the characteristic polynomial and finds (x + 1)2 hence λ = −1 is the only eigen-

value. The unique eigenvector is v =

(

12

)

. Hence V1(−1) = Span(v). Of course we have

40

(A − λI2)2 = 0 hence V2(−1) = C2 and we complete v to a basis of C2 = V2(−1), by v2 =

(

01

)

for example. We have Av2 = −2v1 − v2 and hence in the basis {v1, v2} the matrix of A

(

−1 −20 −1

)

The basis {v1, v2} is a pre-Jordan basis for A and in this basis the matrix of A is upper triangular.

This is a general fact :AVk(λ) ⊂ Vk(λ)

Indeed, if v ∈ Vk(λ), we have

(A − λIn)kAv = A(A − λIn)kv = 0

hence Av ∈ Vk(λ) and therefore in the pre-Jordan basis the matrix of A is upper triangular.

3.5.83 Example

A =

2 1 −21 2 −21 1 −1

We have chA(X) = (X − 1)3 and mA(X) = (X − 1)2. There is only one eigenvalue λ = 1, andwe have generalized eigenspaces

V1(1) = ker(

1 1 −2)

, V2(1) = ker(0) = C3.

So we can choose a pre-Jordan basis as follows:

B1 =

1−10

,

021

, B2 =

100

.

This in fact also works over R.

Now note the following:

3.5.84 Lemma If v ∈ Vt(λ) with t > 1 then

(T − λ · id)(v) ∈ Vt−1(λ).

Proof. Clear from the definition of the generalised eigenspaces. 2

Now suppose we have a pre-Jordan basis B1 ∪ . . . ∪ Bb. We call this a Jordan basis if inaddition we have the condition:

(T − λ · id)Bt ⊂ Bt−1, t = 2, 3, . . . , b.

If we have a pre-Jordan basis B1 ∪ . . . ∪ Bb, then to find a Jordan basis, we do the following:

41

• For each basis vector v ∈ Bb, replace one of the vectors in Bb−1 by (T − λ · id)(v). Whenchoosing which vector to replace, we just need to take care that we still have a basis atthe end.

• For each basis vector v ∈ Bb−1, replace one of the vectors in Bb−2 by (T −λ · id)(v). Whenchoosing which vector to replace, we just need to take care that we still have a basis atthe end.

• etc.

• For each basis vector v ∈ B2, replace one of the vectors in B1 by (T − λ · id)(v). Whenchoosing which vector to replace, we just need to take care that we still have a basis atthe end.

We’ll prove later that this method always works.

3.5.85 Example Let’s look again at

A =

(

3 −28 −5

)

.

We have seen that {v1, v2} is a pre-Jordan basis, here v2 is the second vector in the standardbasis.

Replace v1 by the vector (A + I2)v2 =

(

24

)

. Then {v1, v2} still forms a basis of C2. This is

the Jordan basis for A.We have Av1 = −v1 and Av2 = v1−v2 (you do not need to calculate, just use (A+I2)v2 = v1).

Hence in the new basis the matrix is(

−1 10 −1

)

3.5.86 Example In the example above, we replace one of the vectors in B1 by

(A − I3)

100

=

111

.

So we can choose a Jordan basis as follows:

B1 =

1−10

,

111

, B2 =

100

.

42

Lecture 13

3.5.87 Example Take k = R.

A =

(

1 −11 −1

)

.

Here, we have seen that the characteristic and minimal polynomials are x2. Therefore, 0 isthe only eigenvalue.

Clearly v1 =

(

11

)

generates the eigenspace and V2(0) = R2. We complete the basis by taking

v2 =

(

01

)

. We get a pre-Jordan basis.

Let’s construct a Jordan basis. Replace v1 by Av2 =

(

−1−1

)

. This is a Jordan basis. The

matrix of A in the new basis is(

0 10 0

)

3.5.88 Example

A =

2 2 20 2 20 0 2

.

Clearly, the characteristic polynomial is (x− 2)3 and it is equal to the minimal polynomial, 2 isthe only eigenvalue.

V1(2) has equations y = z = 0, hence it’s spanned by v1 =

100

V2(2) is z = 0 and V3(2) is R3. Therefore, the standard basis is a pre-Jordan basis.We have

(A − 2I3)2 =

0 0 40 0 00 0 0

hence we getNow,

A − 2I3 =

0 2 20 0 20 0 0

We have

(A − 2I3)v3 =

220

and we replace v2 by this vector.

43

Now (A − 2I3)v2 =

400

We get:

v1 =

400

, v2 =

220

, v3 =

001

We haveAv1 = 2v1, Av2 = v1 + 2v2, Av3 = v2 + 2v3

In this basis the matrix of A is:

2 1 00 2 10 0 2

This is a Jordan basis.

3.6 Jordan Canonical (or Normal) Form in the one eigenvalue case

The Jordan canonical form of a linear map T : V → V is essentially the matrix of T with respectto a Jordan basis. We just need to order the vecors appropriately. Everything is over a field k,often k will be C.

Suppose for the moment that T has only one eigenvalue λ and choose a Jordan basis:

B = B1 ∪ . . . ∪ Bb,

where mT (x) = (x − λ)b. Of course, as (A − λid)b = 0, we have Vb(λ) = V .We have a chain of subspaces

V1(λ) ⊂ V2(λ) ⊂ · · · ⊂ Vb(λ) = V

and the pre-Jordan basis was consstructed by starting with a basis of V1(λ) and completing itsuccessefully to get a basis of Vb(λ) = V . WE then altered this basis so that

(T − λid)Bi ⊂ Bi−1

Notice that we can arrange the basis elements in chains : starting with a vector v ∈ Bb we geta chain

v, (T − λid)v, (T − λid)2v, . . . , (T − λid)b−1v

This last vector w = (T − λid)b−1v is in V1(λ). Indeed

(T − λid)bv = 0

hence(T − λid)w = 0

thereforeTw = λw

therefore w ∈ V1(λ).We have the following

44

3.6.89 Lemma For any v ∈ Bb (in particular v /∈ Bi for i < b !), the vectors

v, (T − λid)v, (T − λid)2v, . . . , (T − λid)b−1v

are linearly independent.

Proof.S 2uppose that

∑

i

µi(T − λid)iv = 0

Thenµ0v + (T − λid)w = 0

where w is a lineat combination of (T − λid)kv. Multiplying by (T − λid)b−1, we get

µ0(T − λid)b−1v = 0

but, as v /∈ Vb−1(λ), we see that(T − λid)b−1v 6= 0

hence µ0 = 0.Repeating the process inductively, we get that µi = 0 for all i and the vectors we consider

are linearly independent.Let us number the vectors in this chain as vb = (T − λid)b−1v, . . . , v0 = v. In other words

vi = (T − λid)b−iv

Then(T − λid)vi = vi−1

i.e.Tvi = λvi + vi−1

In other words,

T (vi) =

0...01λ0...0

,

in the basis formed by this chain.This gives a Jordan block i.e. b × b matrix:

Jk(λ) =

λ 1λ 1

. . .. . .. . . 1

λ

.

45

In this way, we arrange our Jordan basis in chains starting with Bi (for i = b, b − 1, . . . , 1)and terminating at V1(λ).

By putting the chains together, we get that in the Jordan basis, the matrix is of the followingform:

λ 1λ 1

λ 1λ

λ 1λ 1

λ 1λ

λ 1λ

λλ

.

We nad write it as[T ]B = diag(Jh1

(λ), . . . , Jhw(λ)).

where the Jhis are blocks corresponding to a chain of length hi.

We can actually say more; in fact the following results determines the number of blocks:

3.6.90 Lemma The number of blocks is the dimension of the eigenspace V1(λ). Proof. Let

(v1, . . . , vk) be the Jordan basis of the subspace U corresponding to one block. It is a chain, wehave

Tv1 = λv1

andTvi = λvi + vi−1

for 2 ≤ i ≤ k.Let v ∈ U be an eigenvector : Tv = λv. Write v =

∑ki=1 civi. Then

Tv = c1λv1 +∑

i≥2

ci(λvi + vi−1) = λv +∑

i≥2

civi

It follows that Tv = λv if and only if∑

i≥2 civi = 0 which implies that c2 = · · · = cn = 0and hence v is in the subspace generated by v1. Therefore, each block determines exactly oneeigenvector for eigenvalue λ. As eigenvectors from different blocks are linearly independent :they are members of a basis, the number of blocks is exactly the dimension of the eigenspaceV1(λ). 2

SUMMARY :To summarise what we have seen so far. Suppose T has one eigenvalue λ, let mT (x) =

(x−λid)b be its minimal polynomial. We construct a pre-Jordan basis by choosing a basis B1

for the eigenspace V1(λ) and then complete by B2 (a certain number of vectors in V2(λ)) and

46

then to B3, . . . ,Bb. Note that Vb(λ) = V . We get a pre-Jordan basis B = B1 ∪ · · · ∪ Bb. In apre-Jordan basis the matrix of T is upper tirangular.

Now we alter the pre-Jordan basis by doing the following. Start with a vector vb ∈ B,replace one of the vectors in Bb−1 by vb−1 = (T − λid)vb making sure that this vb−1 is linearlyindependent of the other vectors in Bb−1. Then replace a vector in Bb−2 by vb−2 = (T −λid)vb−1

(again choose a vector to replace by choosing one such that vb−2 is linearly independent of theothers)... continue until you get to V1(λ). The last vector will be v1 ∈ V1(λ) i.e. v1 is aneigenvector.

We obtain a chain of vectors

v1 = (T − λid)v2, v2 = (T − λid)v3, . . . , vb−1 = (T − λid)vb, vb

Hence in particularTvk = vk−1 + λvk

The subspace U spanned by this chain is T -stable (because Tvk = vk−1 + λvk) and this chain islinearly independent hence the chain forms a basis of U . In restriction to U and with respectto this basis the matrix of T is

J(b)(λ) =

λ 1λ 1

λ 1λ

λ 1λ 1

λ 1λ

λ 1λ

λλ

.

One constructs such chains with all elements of Bb. Once done, one looks for elements in Bb−1

which are not in the previously constructed chains starting at Bb and constructs chainswith them. Then with Bb−2, etc...

In the end, the union of chains will be a Jordan basis and in it the matrix of T is of theform :

diag(Jh1(λ), . . . , Jhw

(λ)).

Notice the following two observations :1. There is always a block of size b × b. Hence by knowing the degree of the

minimal polynomial, in some cases it is possible to determine the shape of Jordannormal form.2. The number of blocks is the dimension of the eigenspace V1(λ)

Here are some examples:Suppose you have a matrix such that

chA = (x − λ)5

47

andmA(x) = (x − λ)4

There is always a block of size 4× 4, hence the Jordan normal form has one 4× 4 block and one1 × 1 block.

Suppose chA is the same but mA(x) = (x − λ)3. Here you need to know more. There isone 3 × 3 block and then either two 1 × 1 blocks or one 2 × 2 block. This is determined by thediomension of V1(λ). If it’s three then the first possibility, if it’s two then the second.

A =

0 1 0 10 1 0 0−1 1 1 1−1 1 0 2

One calculates that chA(x) = (x − 1)4. We have

A − I =

−1 1 0 10 0 0 0−1 1 0 1−1 1 0 1

Clearly the rank of A − I is 1, hence dim V1(λ) = 3.This means that the Jordan normal form will have three blocks. Therefore there will be two

blocks of size 1 × 1 and one of size 2 × 2. The Jordan normal form is

1 0 0 00 1 0 00 0 1 10 0 0 1

Another example:

A =

−2 0 −1 10 −2 1 00 0 −2 00 0 0 −2

One calculates that chA(x) = (x + 2)4. We have

A + 2I =

0 0 −1 10 0 1 00 0 1 00 0 0 0

We see that (A + 2I)2 = 0 and therefore mA(x) = (x + 2)2. As there is always a block of sizetwo, there are two possibilities : either two 2 × 2 blocks or one 2 × 2 and two 1 × 1.

To decide which one it is, we see that the rank of A + 2I is 2 hence the dimension of thekernel is 2. There are therefore 2 blocks and the Jordan normal form is

−2 1 0 00 −2 0 00 0 −2 10 0 0 −2

48

Lecture 14

3.7 Jordan canonical form in general.

Once we know how to determine Jordan canonical form in one eigenvalue case, the general case iseasy. Let T be a linear transformation and λ1, . . . , λr its eigenvalues. Suppose that the minimalpolynomial decomposes as

mT (x) =r∏

i=1

(x − λi)bi

(recall again that this is always true over C.)The we have seen that

V = Vb1(λ1) ⊕ · · · ⊕ Vbr(λr)

and each Vbi(λi) is stable by T . Therefore in restriction to each Vbi

(λi), T is a linear transfor-mation with one eigenvelue, namely λi and the minimal polynomial of T restricted to Vbi

(λi) is(x − λi)

bi .One gets the Jordan basis by taking the union of Jordan bases for each Vbi

(λi)which are constructed as previously.

Let’s look at an example.

A =

−1 1 1−2 2 1−1 1 1

One calculates that chA(x) = x(x − 1)2. Then 0 and 1 are the only eigenvalues and

V = V1(0) ⊕ V2(1)

One finds that V1(0) is generated by

v0 =

110

That will be the first vector of the basis.For λ = 1.We have

A − I =

−2 1 1−2 1 1−1 1 0

We find that V1(λ) is spanned by

v1 =

111

Then

(A − I)2 =

1 0 −11 0 −10 0 0

49

and V2(λ) is spanned by

v1 =

111

and

v2 =

010

Notice that (A− I)v2 = v1 and therefore this is already a Jordan basis. The matrix in this basisis

0 0 00 1 10 0 1

Another example :

A =

5 4 24 5 22 2 2

One calculates that chA(x) = (x − 1)2(x − 10). Then 1 and 10 are the only eigenvalues.One finds

V1(1) = Span(u1 =

−102

, u2

−110

)

The dimension is two, therefore there will be two blocks of size 1 × 1 corresponding to theeigenvalue 1.

For V1(10), one finds

V1(10) = Span(u3 =

221

)

In the basis (u1, u2, u3), the matrix is

1 0 00 1 00 0 10

It is diagonal, the matrix is diagonalisable, in fact mA = (x − 1)(x − 10).And a last example : find Jordan basis and normal form of :

A =

4 0 1 02 2 3 0−1 0 2 04 0 1 2

One finds that the characteristic polynomial is chA(x) = (x − 2)2(x − 3)2.Hence 2 and 3 are eigenvalues and we have

50

A − 2I =

2 0 1 02 0 3 0−1 0 0 04 0 1 0

Clearly the dimension of the kernel is 2 and spanned by e2 and e4.The eigenspace has dimension two.So we will have two blocks of size 1 × 1 corresponding to eigenvalue 2.For the eigenvalue 3:

A − 3I =

1 0 1 02 −1 3 0−1 0 −1 04 0 1 −1

We see that rows one and three are identical, others are linearly independent. It follows that

the eigenspace is one-dimensional and spanned by u =

1−1−13

)

We will have one block.Let us calculate:

(A − 3I)2 =

0 0 0 0−3 1 −4 00 0 0 0−1 0 2 1

We take the vector v =

041−2

) to complete the basis of ker(A − 3I)2.

Now, we have (A − 3I)v = u hence we already have a Jordan basis.The basis (e2, e4, u, v) is a Jordan basis and in this basis the matrix is

2 0 0 00 2 0 00 0 3 10 0 0 3

51

Lecture 15

4 Bilinear and Quadratic Forms

4.1 Matrix Representation

4.1.91 Definition Let V be a vector space over k. A bilinear form on V is a functionf : V × V → k such that

• f(u + λv, w) = f(u, w) + λf(v, w);

• f(u, v + λw) = f(u, w) + λf(u, w).

I.e. f(v, w) is linear in both v and w.

4.1.92 Example An obvious example is the following : take V = R and f : R × R −→ R

defined by f(x, y) = xy.Notice here the difference between linear and bilinear : f(x, y) = x + y is linear,

f(x, y) = xy is bilinear.More generally f(x, y) = λxy is bilienar for any λ ∈ R.More generally still, given a matrix A ∈ Mn(k), the following is a bilinear form on kn:

f(v, w) = vtAw. =∑

i,j

viai,jwj , v =

v1...

vn

, w =

w1...

wn

.

We’ll see that in fact all bilenear form are of this form.

4.1.93 Example If A =

(

1 23 4

)

then the corresponding bilinear form is

f

((

x1

y1

)

,

(

x2

y2

))

=(

x1 y1

)

(

1 23 4

)(

x2

y2

)

= x1x2 + 2x1y2 + 3y1x2 + 4y1y2.

Recall that if B = {b1, . . . , bn} is a basis for V and v =∑

xibi then we write [v]B for thecolumn vector

[v]B =

x1...

xn

.

4.1.94 Definition If f is a bilinear form on V and B = {b1, . . . , bn} is a basis for V then wedefine the matrix of f with respect to B by

[f ]B =

f(b1, b1) . . . f(b1, bn)...

...f(bn, b1) . . . f(bn, bn)

52

4.1.95 Proposition Let B be a basis for a finite dimensional vector space V over k, dim(V ) =n. Any bilinear form f on V is determined by the matrix [f ]B. Moreover for v, w ∈ V ,

f(v, w) = [v]tB[f ]B[w]B.

Proof. Letv = x1b1 + x2b2 + · · · + xnbn,

so

[u]B =

x1...

xn

.

Similarly suppose

[w]B =

y1...

yn

.

Then

f(v, w) = f

(

n∑

i=1

xibi, w

)

=n∑

i=1

xif (bi, w)

=n∑

i=1

xif

bi,n∑

j=1

yjbj

=n∑

i=1

xi

n∑

j=1

yjf (bi, bj)

=n∑

i=1

n∑

j=1

xiyjf (bi, bj)

=n∑

i=1

n∑

j=1

ai,jxiyj

= [v]tB[f ]B[w]B.

2

Let us give some examples.Suppose that f : R2 × R2 −→ R is given by

f

((

x1

y1

)

,

(

x2

y2

))

= 2x1x2 + 3x1y2 + x2y1

Let us write the matrix of f in the standard basis.

53

f(e1, e1) = 2, f(e1, e2) = 3, f(e2, e1) = 1, f(e2, e2) = 0

hence the matrix in the standard basis is(

2 31 0

)

Now suppose B = {b1, . . . , bn} and C = {c1, . . . , cn} are two bases for V . We may write onebasis in terms of the other:

ci =n∑

j=1

λj,ibj .

The matrix

M =

λ1,1 . . . λ1,n...

...λn,1 . . . λn,n

is called the transition matrix from B to C. It is always an invertible matrix: its inverse in thetransition matrix from C to B. Recall that for any vector v ∈ V we have

[v]B = M [v]C ,

and for any linear map T : V → V we have

[T ]C = M−1[T ]BM.

We’ll now describe how bilinear forms behave under change of basis.

4.1.96 Change of Basis Formula Let f be a bilinear form on a finite dimensional vectorspace V over k. Let B and C be two bases for V and let M be the transition matrix from B to C.

[f ]C = M t[f ]BM.

Proof. Let u, v ∈ V with [u]B = x, [v]B = y, [u]C = s and [v]C = t.

Let A = (ai,j) be the matrix representing f with respect to B.By Theorem 5.1 we have

f(u, v) = xtAy.

Now x = Ms and y = Mt so

f(u, v) = (Ms)tA(Mt)

= (stM t)A(Mt)

= st(M tAM)t.

We have f(bi,bj) = (M tAM)i,j . Hence

[f ]C = M tAM = M t[f ]BM.

54

2

For example, let f be the linear form from the previous example. It is given by

(

2 31 0

)

in the standard basis. We want to write this matrix in the basis(

11

)

,

(

10

)

The transition matrix is :

M =

(

1 11 0

)

it’s transpose is the same. The matrix of f in the new basis is

(

6 35 2

)

55

Lecture 16

4.2 Symmetric bilinear forms and quadratic forms

As before let V be a finite dimensional vector space over a field k.

4.2.97 Definition A bilinear form f on V is called symmetric if it satisfies f(v, w) = f(w, v)for all v, w ∈ V .

4.2.98 Definition Given a symmetric bilinear form f on V , the associated quadratic form isthe function q(v) = f(v, v).

Notice that q has the property that q(λv) = λ2q(v).For exemple, take the bilinear form f defined by

(

6 00 5

)

The corresponding quadratic form is

q(

(

xy

)

) = 6x2 + 5y2

4.2.99 Proposition Let f be a bilinear form on V and let B be a basis for V . Then f is asymmetric bilinear form if and only if [f ]B is a symmetric matrix (that means ai,j = aj,i.).

Proof. This is because f(ei, ej) = f(ej , ei). 2

4.2.100 Polarization Theorem If 1+1 6= 0 in k then for any quadratic form q the underlyingsymmetric bilinear form is unique.

Proof. If u, v ∈ V then

q(u + v) = f(u + v, u + v)

= f(u, u) + 2f(u, v) + f(v, v)

= q(u) + q(v) + 2f(u, v).

So f(u, v) = 12(q(u + v) − q(u) − q(v)). 2

Let’s look at an example :Consider

A =

(

2 11 0

)

it is a symmetric matrix.Let f be the corresponding bilinear form. We have

f((x1, y1), (x2, y2)) = 2x1x2 + x1y2 + x2y1

56

andq(x, y) = 2x2 + 2xy = f((x, y), (x, y))

Let u = (x1, y1), v = (x2, y2) and let us calculate

1

2(q(u + v) − q(u) − q(v)) =

1

2(2(x1 + x2)

2 + 2(x1 + x2)(y1 + y2) − x21 − 2x1y1 − 2x2

2 − 2x1y2)

=1

2(4x1x2 + 2(x1y2 + x2y1)) = f((x1, y1), (x2, y2))

If A = (ai,j) is a symmetric matrix, then the corresponding form is

f(x, y) =∑

i

ai,ixiyi +∑

i<j

ai,j(xiyj + xjyi)

and the corresponding quadratic form is

q(x) =

n∑

i=1

ai,ix2i + 2

∑

i<j

ai,jxixj

then the symmetric matrix A = (ai,j) is the matrix representing the underlying bilinear form f .

4.2.101 Example Let q(x1, x2, x3) = x21 + 3x2

2 + 5x23 + 4x1x2 + 6x1x3 + 8x2x3. The matrix of

this quadratic form is

A =

1 2 32 3 43 4 5

The underlying bilinear form is represented by the same matrix.Let us write down the matrix of this form in the basis

{

100

,

2−10

,

001

}

Consider the matrix

M =

1 2 10 −1 −20 0 1

Then the matrix we are looking for is

M tAM =

1 0 00 −1 00 0 0

Notice that in this new basis the quadratic form is

q((x1, x2)) = x21 − y2

2

57

4.3 Orthogonality and diagonalization

4.3.102 Definition Let V be a vector space over k with a symmetric bilinear form f . We calltwo vectors v, w ∈ V orthogonal if f(v, w) = 0. It is a good idea to imagine this means that vand w are at right angles to each other. This is written v ⊥ w. If S ⊂ V then the orthogonalcomplement of S is defined to be

S⊥ = {v ∈ V : ∀w ∈ S, w ⊥ v}.

4.3.103 Proposition S⊥ is a subspace of V .

Proof. Let v, w ∈ V and λ ∈ k. Then for any u ∈ S we have

f(v + λw, u) = f(v, u) + λf(w, u) = 0.

Therefore v + λw ∈ S⊥. 2

4.3.104 Definition A basis B is called an orthogonal basis if any two distinct basis vectorsare orthogonal. Thus B is an orthogonal basis if and only if [f ]B is diagonal.

4.3.105 Diagonalisation Theorem Let f be a symmetric bilinear form on a finite dimen-sional vector space V over a field k in which 1 + 1 6= 0. Then there is an orthogonal basis B forV ; i.e. a basis such that [f ]B is a diagonal matrix.

Notice that the existence of an orthogonal basis is indeed equivalent to the matrix beingdiagonal.

Let B = {v1, . . . , vn} be an orthogonal basis. By definition f(vi, vj) = 0 if i 6= j hence theonly possible non-zero values are f(vi, vi) i.e. on the diagonal.

And of course the converse holds : if the matrix is diagonal, then f(vi, vj) = 0 of i 6= j.The quadratic form associated to such a bilinear form is

q(x1, . . . , xn) =∑

i

λix2i

where λis are elements on the diagonal.

58

Lecture 17

4.3.106 Recall Let U, W be two subspaces of V . The sum of U and W is the subspace

U + W = {u + w : u ∈ U, w ∈ W}.

We call this a direct sum U ⊕ W if U ∩ W = {0}. This is the same as saying that ever elementof U + W can be written uniquely as u + w with u ∈ U and w ∈ W .

4.3.107 Key Lemma Let v ∈ V and assume that q(v) 6= 0. Then

V = span{v} ⊕ {v}⊥.

Proof. For w ∈ V , let

w1 =f(v, w)

f(v, v)v, w2 = w − f(v, w)

f(v, v)v.

Clearly w = w1 + w2 and w1 ∈ span{v}. Note also that

f(w2, v) = f

(

w − f(v, w)

f(v, v)v, v

)

= f(w, v) − f(v, w)

f(v, v)f(v, v) = 0.

Therefore w2 ∈ {v}⊥. It follows that span{v} + {v}⊥ = V . To prove that the sum is direct,suppose that w ∈ span{v}∩{v}⊥. Then w = λv for some λ ∈ k and we have f(w, v) = 0. Henceλf(v, v) = 0. Since q(v) = f(v, v) 6= 0 it follows that λ = 0 so w = 0. 2

Proof of the theorem. We use induction on dim(V ) = n. If n = 1 then the theorem is true,since any 1 × 1 matrix is diagonal. So suppose the result holds for vector spaces of dimensionless than n = dim(V ).

If f(v, v) = 0 for every v ∈ V then using Theorem 5.3 for any basis B we have [f ]B = [0],which is diagonal. [This is true since

f(ei, ej) =1

2(f(ei + ej , ei + ej) − f(ei, ei) − f(ej , ej)) = 0.]

So we can suppose there exists v ∈ V such that f(v, v) 6= 0. By the Key Lemma we have

V = span{v} ⊕ {v}⊥.

Since span{v} is 1-dimensional, it follows that {v}⊥ is n−1-dimensional. Hence by the inductivehypothesis there is an orthonormal basis {b1, . . . , bn−1} of {v}⊥.

Now let B = {b1, . . . , bn−1, v}. This is a basis for V . Any two of the vectors bi are orthogonalby definition. Furthermore bi ∈ {v}⊥, so bi ⊥ v. Hence B is an orthogonal basis. 2

59

4.4 Examples of Diagonalising

4.4.108 Definition Two matrices A, B ∈ Mn(k) are congruent if there is an invertible matrixP such that

B = P tAP.

We have shown that if B and C are two bases then for a bilinear form f , the matrices [f ]B and[f ]C are congruent.

4.4.109 Theorem Let A ∈ Mn(k) be symmetric, where k is a field in which 1 + 1 6= 0, thenA is congruent to a diagonal matrix.

Proof. This is just the matrix version of the previous theorem. 2

We shall next find out how to calculate the diagonal matrix congruent to a given symmetricmatrix.

4.4.110 Recall There are three kinds of row operation:

• swap rows i and j;

• multiply row(i) by λ 6= 0;

• add λ × row(i) to row(j).

To each row operation there is a corresponding elementary matrix E; the matrix E is the resultof doing the row operation to In. The row operation transforms a matrix A into EA.

We may also define three corresponding column operations:

• swap columns i and j;

• multiply column(i) by λ 6= 0;

• add λ × column(i) to column(j).

Doing a column operation to A is the same a doing the corresponding row operation to At. Wetherefore obtain (EAt)t = AEt.

4.4.111 Definition By a double operation we shall mean a row operation followed by thecorresponding column operation.

If E is the corresponging elementary matrix then the double operation transforms a matrixA into EAEt.

4.4.112 Lemma I f we do a double operation to A then we obtain a matrix congruent to A.

Proof. EAEt is congruent to A. 2

60

Lecture 18

Recall that a symmetric bilinear forms are represented by symmetric matrices. If we changethe basis then we will obtain a congruent matrix. We’ve seen that if we do a double operationto matrix A then we obtain a congruent matrix. This corresponds to the same quadratic formwith respect to a different basis. We can always do a sequence of double operations to transformany symmetric matrix into a diagonal matrix.

4.4.113 Example Consider the quadratic form q(x, y)t = x2 + 4xy + 3y2

A =

(

1 22 3

)

→(

1 20 −1

)

→(

1 00 −1

)

.

This shows that there is a basis B = {b1, b2} such that

q(xb1 + yb2) = x2 − y2.

Notice that when we have done the first operation, we have multiplied A on the left by

E2,1(−2) =

(

1 0−2 1

)

and when we have done the second, we have multiplied on the right

by E2,1(−2)t =

(

1 20 −1

)

We find that

E2,1(−2)AE2,1(−2)t =

(

1 00 −1

)

Hence in the basis(

10

)

,

(

−21

)

the matrix of the corresponding quadratic form is

(

1 00 −1

)

4.4.114 Example Consider the quadratic form q(x, y)t = 4xy + y2

(

0 22 1

)

→(

2 10 2

)

→(

1 22 0

)

→(

1 20 −4

)

→(

1 00 −4

)

→(

1 00 −2

)

→(

1 00 −1

)

This shows that there is a basis {b1, b2} such that

q(xb1 + yb2) = x2 − y2.

The last step in the previous example transformed the -4 into a -1. In general, once we havea diagonal matrix we are free to multiply or divide the diagonal entries by squares:

61

4.4.115 Lemma For µ1, . . . , µn ∈ k× = k \ {0} and λ1, . . . , λn ∈ k

D(λ1, . . . , λn) is congruent to D(µ21λ1, . . . µ

2nλn).

Proof. Since µ1, . . . , µn ∈ k \ {0} then µ1 · · ·µn 6= 0. So

P = D(µ1, . . . , µn)

is invertible. Then

P tD(λ1, . . . , λn)P = D(µ1, . . . , µn)D(λ1, . . . , λn)D(µ1, . . . , µn)

= D(µ21λ1, . . . , µ

2nλn).

2

4.4.116 Definition Two bilinear forms f, f ′ are equivalent if they are the same up to a changeof basis.

4.4.117 Definition The rank of a bilinear form f is the rank [f ]B for any basis B.

Clearly if f and f ′ have different rank then they are not equivalent.

62

Lecture 19

4.5 Canonical forms over C

4.5.118 Definition Let q be a quadratic form on vector space V over C, and suppose thereis a basis B of V such that

[q]B =

(

Ir

0

)

.

We call the matrix

(

Ir

0

)

a canonical form of q (over C).

4.5.119 Canonical forms over C Let V be a finite dimensional vector space over C and letq be a quadratic form on V . Then q has exactly one canonical form.

Proof. (Existence) We first choose an orthogonal basis B = {b1, . . . , bn}. After reordering thebasis we may assume that q(b1), . . . , q(br) 6= 0 and q(br+1), . . . , q(bn) = 0. Since every complexnumber has a square root in C, we may divide bi by

√

q(bi) if i ≤ r.(Uniqueness) row and column operations do not change the rank of a matrix. Hence con-

gruent matrices have the same rank. 2

4.5.120 Corollary Two quadratic forms over C are equivalent iff they have the same canon-ical form.

4.6 Canonical forms over R

4.6.121 Definition Let q be a quadratic form on vector space V over R, and suppose thereis a basis B of V such that

[q]B =

Ir

−Is

0

.

We call the matrix

Ir

−Is

0

a canonical form of q (over R).

4.6.122 Sylvester’s Law of Inertia Let V be a finite dimensional vector space over R andlet q be a quadratic form on V . Then q has exactly one (real) canonical form.

Proof. (existence) Let B = {b1, . . . , bn} be an orthogonal basis. We can reorder the basis sothat

q(b1), . . . , q(br) > 0, q(br+1), . . . , q(br+s) < 0, q(br+s+1), . . . , q(bn) = 0.

Then define a new basis by

ci =

1√|q(bi,bi)|

bi i ≤ r + s,

bi i > r + s.

63

The matrix of q with respect to C is a canonical form.(uniqueness) Suppose we have two bases B and C with

[q]B =

Ir

−Is

0

, [q]C =

Ir′

−Is′

0

.

By comparing the ranks we know that r + s = r′ + s′. It’s therefore sufficient to prove thatr = r′. Define two subspaces of V by

U = span{b1, . . . , br}, W = span{cr′+1, . . . , cn}.

If u is a non-zero vector of U then we have u = x1b1 + . . . + xrbr. Hence

q(u) = x21 + . . . + x2

r > 0.

Similarly if w ∈ W then w = yr′+1cr′+1 + . . . + yncn, and

q(w) = −y2r′+1 − . . . − y2

r′+s′ ≤ 0.

It follows that U ∩ W = {0}. Therefore

U + W = U ⊕ W ⊂ V.

From this we havedimU + dimW ≤ dimV.

Hencer + (n − r′) ≤ n.

This implies r ≤ r′. A similar argument shows that r′ ≤ r, so we have r = r′. 2

4.6.123 Definition The rank of a quadratic form is the rank of the corresponding matrix.Clearly, in the complex case it is the integer r that appears in the canonical form.

In the real case, it is r + s.For a real quadratic form, the signature is the pair (r, s). In this case q(v) > 0 for all

non-zero vectors v.A real form q is positive definite if its signature is (r, 0), negative definite if its signature

is (0, s). In this case q(v) < 0 for all non-zero vectors v.There exists a non-zero vector v such that q(v) = 0 if and only is the signature is (r, s) with

r > 0 and s > 0.

64

Lecture 20

Examples of canonical forms over R and C.

65

Lecture 21

5 Inner Product Spaces

5.1 Geometry of Inner Product Spaces

5.1.124 Definition Let V be a vector space over R and let 〈−,−〉 be a symmetric bilinearform on V . We shall call the form positive definite if for all non-zero vectors v ∈ V we have

〈v, v〉 > 0.

5.1.125 Remark A symmetric bilinear form is positive definite if and only if its canonicalform (over R) is In.

Proof. Clearly x21 + . . . + x2

n is positive definite on Rn. Conversely, suppose B is a basis suchthat the matrix with respect to B is the canonical form. For any basis vector bi, the diagonalentry satisfies 〈bi, bi〉 > 0 and hence 〈bi, bi〉 = 1. 2

5.1.126 Definition Let V be a vector space over C. A Hermitian form on V is a function〈−,−〉 : V × V → C such that:

• For all u, v, w ∈ V and all λ ∈ C,

〈u + λv, w〉 = 〈u, w〉 + λ〈v, w〉;

• For all u, v ∈ V ,〈u, v〉 = 〈v, u〉.

5.1.127 Example The simplest example is the following : take V = C, then < z, w >= zw isa hermitian form on C.

A matrix A ∈ Mn(C) is called a Hermitian matrix if At = A. Here A is the matrix obtainedfrom A by applying complex conjugation to the entries.

If A is a Hermitian matrix then the following is a Hermitian form on Cn:

〈v, w〉 = vtAw.

In fact every Hermitian form on Cn is one of these.To see why, suppose we are given a Hermitian form <, >. Choose a basis B = (b1, . . . , bn).

Let v =∑

i λibi and w =∑

j µjbj . We calculate

< v, w >=<∑

i

λibi,∑

j

µjbj >=∑

i,j

λiµj < bi, bj >= vtAw

66

where A = (< bi, bj >). Of course At = A because < bi, bj >= < bj , bi >.

Note that a Hermitian form is conjugate-linear in the second variable, i.e.

〈u, v + λw〉 = 〈u, v〉 + λ〈u, w〉.

Note also that by the second axiom〈u, u〉 ∈ R.

5.1.128 Definition A Hermitian form is positive definite if for all non-zero vectors v we have

〈v, v〉 > 0.

Clearly, the fom zw is positive definite.

5.1.129 Definition By an inner product space we shall mean one of the following:

either A finite dimensional vector space V over R with a positive definite symmetric bilinearform;

or A finite dimensional vector space V over C with a positive definite Hermitian form.

We shall often write K to mean the field R or C, depending on which is relevant.

5.1.130 Example Consider the vector space V of all continuous functions [0, 1] −→ C.Then we can define

〈f, g〉 =

∫ 1

0f(x)g(x)dx.

This defines an inner product on V (easy exercise).Another example. Let V = Mn(R) the vector space of n×n-matrices with real entries. Then

< A, B >= tr(ABt)

is an inner product on V .Similarly, if V = Mn(C) and < A, B >= tr(AB

t) is an inner product.

5.1.131 Definition Let V be an inner product space. We define the norm of a vector v ∈ Vby

||v|| =√

〈v, v〉.

5.1.132 Lemma For λ ∈ K we have λλ = |λ|2 for for v ∈ V we have ||λv|| = |λ| ||v||.

Proof. Easy. 2

67

5.1.133 Cauchy-Schwarz inequality If V is an inner product space then

∀u, v ∈ V |〈u, v〉| ≤ ||u|| · ||v||.

Proof. If v = 0 then the result holds so suppose v 6= 0. We have for all λ ∈ K,

〈u − λv, u − λv〉 ≥ 0.

Expanding this out we have:

||u||2 − λ〈v, u〉 − λ〈u, v〉 + |λ|2||v||2 ≥ 0.

Setting λ = 〈u,v〉||v||2 we have:

||u||2 − 〈u, v〉||v||2 〈v, u〉 − 〈v, u〉

||v||2 〈u, v〉 +

∣

∣

∣

∣

〈u, v〉||v||2

∣

∣

∣

∣

2

||v||2 ≥ 0.

Multiplying by ||v||2 we get

||u||2||v||2 − 2|〈u, v〉|2 + |〈u, v〉|2 ≥ 0.

Hence||u||2||v||2 ≥ |〈u, v〉|2 .

Taking the square root of both sides we get the result. 2

5.1.134 Triangle inequality If V is an inner product space with norm || · || then

∀u, v ∈ V ||u + v|| ≤ ||u|| + ||v||.

Proof. We have

||u + v||2 = 〈u + v, u + v〉= ||u||2 + 2 〈u, v〉 + ||v||5.

So the Cauchy–Schwarz inequality implies that

||u + v||2 ≤ ||u||2 + 2||u|| ||v|| + ||v||2 = (||u|| + ||v||)5.

Hence||u + v|| ≤ ||u|| + ||v||.

2

5.1.135 Definition Two vectors v, w in an inner product space are called orthogonal if〈v, w〉 = 0.

68

5.1.136 Pythagoras’ Theorem Let (V, <, >) be an inner product space. If v, w ∈ V areorthogonal, then

||v||2 + ||w||2 = ||v + w||2

Proof. Since

||v + w||2 = 〈v + w, v + w〉 = ||v||2 + 2ℜ〈v, w〉 + ||w||2,so we have

||v||2 + ||w||2 = ||v + w||2

if 〈v, w〉 = 0. 2

69

Lecture 22

5.2 Gram–Schmidt Orthogonalization

5.2.137 Definition Let V be an inner product space. We shall call a basis B of V an or-thonormal basis if 〈bi, bj〉 = δi,j .

5.2.138 Proposition If B is an orthonormal basis then for v, w ∈ V we have:

〈v, w〉 = [v]tB[w]B.

Proof. If the basis B = (b1, . . . , bn) is orthonormal, then the matrix of <, > in this basis isthe identity In. The proposition follows. 2

5.2.139 Gram–Schmidt Orthogonalization Let B be any basis. Then the basis C definedby

c1 = b1

c2 = b2 −〈b2, c1〉〈c1, c1〉

c1

c3 = b3 −〈b3, c1〉〈c1, c1〉

c1 −〈b3, c2〉〈c2, c2〉

c2

...

cn = bn −n−1∑

r=1

〈bn, cr〉〈cr, cr〉

cr,

is orthogonal. Furthermore the basis D defined by

dr =1

||cr||cr,

is orthonormal.

Proof. Clearly each bi is a linear combination of C, so C spans V . Hence C is a basis. Itfollows also that D is a basis. We’ll prove by induction that {c1, . . . , cr} is orthogonal. Clearlyany one vector is orthogonal. Suppose {c1, . . . , cr−1} are orthogonal. The for s < r we have

〈cr, cs〉 = 〈br, cs〉 −r−1∑

t=1

〈br, ct〉〈ct, ct〉

〈ct, cs〉.

By the inductive hypothesis we have

〈cr, cs〉 = 〈br, cs〉 −〈br, cs〉〈cs, cs〉

〈cs, cs〉. = 〈br, cs〉 − 〈br, cs〉 = 0.

This shows that {c1, . . . , cr} are orthogonal. Hence C is an orthogonal basis. It follows easilythat D is orthonormal. 2

70

5.2.140 Example A few examples.

This theorem shows in particular that an orthonormal basis always exists. Indeed, takeany basis and turn it into an orthonormal one by applying Gram-Schmidt process to it.

5.2.141 Proposition If V is an inner product space with an orthonormal basis B = {b1, . . . , bn},then any v ∈ V can be written as v =

∑ni=1〈v, ei〉ei.

Proof. We have v =∑n

i=1 λiei and 〈v, ej〉 =∑n

i=1 λi〈ei, ej〉 = λj . 2

71

Lecture 23

5.2.142 Definition Let S be a subspace of an inner product space V . The orthogonal com-plement of S is defined to be

S⊥ = {v ∈ V : ∀w ∈ S 〈v, w〉 = 0}.

5.2.143 Theorem If V is a Euclidean space and W is a subspace of V then

V = W ⊕ W⊥,

and hence any v ∈ V can be written as

v = w + w⊥,

for unique w ∈ W and w⊥ ∈ W⊥.

Proof. We show first that V = W + W⊥.Let E = {e1, . . . , en} be an orthonormal basis for V , such that {e1, . . . , er} is a basis for W .

This can be constructed by Gram-Schmidt orthogonalisation. (choose a basis {b1, . . . , br} for Wand complete to a basis {b1, . . . , bn} of V .

Then apply Gram-Schmidt process. Notice that in Gram-Schmidt process, when constructingorthonormal basis, the vectors c1, . . . , ck lie in the space generated by c1, . . . , ck−1, bk. It followsthat the process will give an orthonormal basis e1, . . . , en such that e1, . . . , er is an orthonormalbasis of W .)

If v ∈ V then

v =r∑

i=1

λiei +n∑

i=r+1

λiei.

Nowr∑

i=1

λiei ∈ W.

If w ∈ W then there exist µi ∈ R such that

w =r∑

i=1

µiei.

So⟨

w,n∑

j=r+1

λiej

⟩

=r∑

i=1

n∑

j=r+1

µiλj 〈ei, ej〉 = 0.

Hencen∑

i=r+1

λiei ∈ W⊥.

72

ThereforeV = W + W⊥.

Next suppose v ∈ W ∩ W⊥. So 〈v, v〉 = 0 and so v = 0.Hence V = W ⊕ W⊥ and so any vector v ∈ V can be expressed uniquely as

v = w + w⊥,

where w ∈ W and w⊥ ∈ W⊥. 2

5.3 Adjoints

5.3.144 Definition An adjoint of a linear map T : V → V is a linear map T ∗ such that〈T (u), v〉 = 〈u, T ∗(v)〉 for all u, v ∈ V .

5.3.145 existence and uniqueness Every T : V → V has a unique adjoint. If T isrepresented by A (w.r.t. an orthonormal basis) then T ∗ is represented by At.

Proof. (Existence) Let T ∗ be the linear map represented by At. We’ll prove that it is anadjoint of A.

〈Tv, w〉 = [v]tAt[w] = [v]tAt[w]. = 〈v, T ∗w〉.Notice that here we have used that the basis is orthonormal : we said that the matrix of <, >was the identity. (Uniqueness) Let T ∗, T ′ be two adjoints. Then we have

〈u, (T ∗ − T ′)v〉 = 0.

for all u, v ∈ V . In particular, let u = (T ∗ − T ′)v, then ||(T ∗ − T ′)v|| = 0 hence T ∗(v) = T (v)for all v ∈ V . Therefore T ∗ = T ′. 2

5.3.146 Example Consider V = C2 with the standard orthonormal basis and let T be repre-sented by

A =

(

1 i−i 1

)

Then T ∗ = T (such a linear map is called autoadjoint).Notice that T being autoadjoint is equivalent to the matrix representing it being

hermitian

A =

(

2i 1 + i−1 + i i

)

Then T ∗ = −T

We also see that T ∗∗ = T (using that T ∗ is represented by At).

73

5.4 Isometries.

5.4.147 Theorem If T : V → V be a linear map of a Euclidean space V then the followingare equivalent.

(i) TT ∗ = id (i.e. T ∗ = T−1).

(ii) ∀u, v ∈ V 〈Tu, Tv〉 = 〈u, v〉. (i.e. T preserves the inner product.)

(iii) ∀v ∈ V ||Tv|| = ||v||. (i.e. T preserves the norm.)

5.4.148 Definition If T satisfies any of the above (and so all of them) then T is called anisometry.

We also see that T ∗∗ = T (using that T ∗ is represented by At).

Proof. (i) =⇒ (ii)Let u, v ∈ V then

〈Tu, Tv〉 = 〈u, T ∗Tv〉 = 〈u, v〉 ,

since T ∗ = T−1.(ii) =⇒ (iii)

If v ∈ V then||Tv||2 = 〈Tv, Tv〉

so by (ii)||Tv||2 = 〈v, v〉 = ||v||2.

Hence ||Tv|| = ||v||, so (iii) holds.(iii) =⇒ (ii) We just show that the form can be recovered from the norm. We have

2ℜ〈u, v〉 = ||u + v||2 − ||u||2 − ||v||2, ℑ〈v, w〉 = ℜ〈v, iw〉.

For the second equality, notice that,

2ℜ < v, iw >=< v, iw > +< v, iw > = −i < v, w > +i< v, w > =

−i(< v, w > −< v, w >) =1

i(< v, w > −< v, w >) = 2ℑ < v, w >

Now suppose ||Tv|| = ||v|| for all v. Take u, v ∈ V . We have ||T (u + v)|| = ||u + v||,||T (u)|| = ||u||, and ||T (u)|| = ||v||. It follows that

2ℜ < T (u), T (v) >= ||T (u)+T (v)||2−||T (u)||2−||T (v)||2 = ||u+v||2−||u||2−||v||2 = 2ℜ < u, v >

and the second inequality shows that

ℑ < T (u), T (v) >= R < T (u), iT (v) >= R(< T (u), T (iv) >) = R(< u, iv >) = ℑ < u, v >

. Hence< T (u), T (v) >=< u, v >

.

74

(ii) implies (i):

〈T ∗Tu, v〉 = 〈Tu, Tv〉= 〈u, v〉 .

Therefore < (TT ∗ − I)u, v >= 0 for all v. In particular, take v = (TT ∗ − I)u, then 〈(TT ∗ −I)u, (TT ∗ − 1)u〉 = 0. Therefore TT ∗ = I. 2

Notice that in an orthonormal basis, an isometry is represented by a matrix suchthat A

t= A−1.

75

Lecture 24

We let On(R) be the set of n×n real matrices satisfying AAt = In (in other words At = A−1).

5.4.149 Proposition (a) If A ∈ On(R) then detA = ±1.(b) On(R) is a subgroup of GLn(R).

Proof. If A ∈ On(R) then At = A−1 so

det A = det At = det(A−1) = det A−1.

Therefore det(A)2 = 1 and detA = ±1.Clearly On(R) is a subset of GLn(R) so to show that it is a subgroup it is enough to show

that if A, B ∈ On(R) then AB−1 ∈ On(R).Let A, B ∈ On(R). Then

(AB−1)−1 = BA−1 = BAt = (ABt)t = (AB−1)t.

Hence AB−1 ∈ On(R) and so On(R) is a subgroup of GLn(R). 2

5.4.150 Theorem If A ∈ GLn(R) then the following are equivalent.

(i) A ∈ On(R).

(ii) The columns of A form an orthonormal basis for Rn (for the standard inner product onRn).

(iii) The rows of A form an orthonormal basis for Rn.

Proof. We prove (i) ⇐⇒ (ii) (the proof of (i) ⇐⇒ (iii) is identical).Consider AtA. If A = [C1, . . . , Cn], so the jth column of A is Cj , then the (i, j)th entry of

AtA is CtiCj .

So AtA = In ⇐⇒ CtiCj = δi,j ⇐⇒ 〈Ci, Cj〉 = δi,j ⇐⇒ {C1, . . . , Cn} is an orthonormal

basis for Rn. 2

For example take the matrix:

(

1/√

2 −1/√

2

1/√

2 1/√

2

)

This matrix is in O2(R).In fact it is the matrix of rotation by angle −π/4.

76

5.4.151 Theorem Let V be a Euclidean space with orthonormal basis E = {e1, . . . , en}. IfF = {f1, . . . , fn} is a basis for V and P is the transition matrix from E to F , then

P ∈ On(R) ⇐⇒ F is an orthonormal basis for V .

Proof. The jth column of P is [fj ]E so

fj =n∑

k=1

pk,jek.

Hence

〈fi, fj〉 =

⟨

n∑

k=1

pk,iek,n∑

l=1

pl,jel

⟩

=n∑

k=1

n∑

l=1

pk,ipl,j 〈ek, el〉 =n∑

k=1

pk,ipk,j = (P tP )i,j .

So F is an orthonormal basis for Rn ⇐⇒ 〈fi, fj〉 = δi,j iff P tP = In ⇐⇒ P ∈ On(R). 2

Notice that it is NOT true that matrices in On(R) are diagonalisable.Indeed, take

(

cos(θ) sin(θ)− sin(θ) cos(θ)

)

where θ is not a multiple of π.The characteristic polynomial is x2 − 2 cos(θ)x + 1. Then, as cos(θ)2 − 1 < 0, there are no

real eigenvalues and the matrix is not diagonalisable.Notice that for a given matrix A, it is easy to check that columns are orthogonal. If that is

the case, then A is in On(R) and it is easy to calculate inverse : A−1 = At.

77

Lecture 25

5.5 Orthogonal Diagonalization

5.5.152 Definition Let V be an inner space. A linear map T : V −→ V is self-adjoint if

T ∗ = T

Notice that in an othonormal basis, T is represented by a matrix A such that At

= A. Inparticular if V is real, then A is symmetric.

5.5.153 If A ∈ Mn(C) is Hermitian then all the eigenvalues of A are real. Proof.2

Recall that Hermitian means that A = At

and that this implies that < Au, v >=< u, Av > forall u, v. Let λ be an eigenvalue of A and let v 6= 0 be a corresponding eigenvector. Then

Av = λv

It follows that

< Av, v >= λ < v, v >=< v, Av >= λ < v, v >

As v 6= 0, we can divide by < v, v >= ||v||2 6= 0 hence we can divide by it. It follows that λ = λIn particular a real symmetric matrix always has an eigenvalue : take a complex eigenvalue

(always exists !), then by the above theorem it will be real.

5.5.154 Theorem Spectral theorem Let T : V → V be a self-adjoint linear map of an inner

product space V . Then V has an orthonormal basis of eigenvectors. Proof. This is rather

similar to Theorem 5.4.We use induction on dim(V ) = n. True for n = 1 so suppose the result holds n − 1 and let

dim(V ) = n.Since T is self-adjoint, if E is an orthonormal basis for V and A is the matrix representing

T in E thenA = A

t.

So A is Hermitian. Hence by Theorem 6.18 A has a real eigenvalue λ.So there is a vector e1 ∈ V \ {0} such that Te1 = λe1. Normalizing (dividing by ||e1||) we

can assume that ||e1|| = 1.Let W = Span{e1} then by Theorem 6.9 we have V = W ⊕ W⊥. Now

n = dim(V ) = dim(W ) + dim(W⊥) = 1 + dim(W⊥),

so dim(W⊥) = n − 1.We claim that T : W⊥ → W⊥, i.e. T (W⊥) ⊆ α(W⊥). Let w = µe1 ∈ W , µ ∈ R and

v ∈ W⊥. Then

〈w, Tv〉 = 〈T ∗w, v〉 = 〈Tw, v〉 = 〈T (µe1), v〉 = 〈µTe1, v〉 = 〈µλe1, v〉 = 0,

78

since µλe1 ∈ W . Hence T : W⊥ → W⊥.By induction there exists an orthonormal basis of eigenvectors {e2, . . . , en} for W⊥. But

V = W ⊕ W⊥ so E = {e1, . . . , en} is a basis for V and 〈e1, ei〉 = 0 for 2 ≤ i ≤ n and ||e1|| = 1.Hence E is an orthonormal basis of eigenvectors for V . 2

5.5.155 Theorem Let T : V → V be a self-adjoint linear map of a Euclidean space V . Ifλ, µ are distinct eigenvalues of T then

∀u ∈ Vλ ∀v ∈ Vµ 〈u, v〉 = 0.

Proof. If u ∈ Vλ and v ∈ Vµ then

λ 〈u, v〉 = 〈λu, v〉 = 〈Tu, v〉 = 〈u, T ∗v〉 = 〈u, Tv〉 = 〈u, µv〉 = µ 〈u, v〉 .

So (λ − µ) 〈u, v〉 = 0, with λ 6= µ. Hence 〈u, v〉 = 0. 2

5.5.156 Example Let

A =

(

1 i−i 1

)

This matrix is self-adjoint.One calculates the cracteristic polynomial and finds t(t − 2) (in particular the minimal

polynomial is the same, hence you know that the matrix is diagonalisable for other reasons thanbeing self-adjoint). For eigenvalue zero, one finds eigenvector

(

−i1

)

For eigenvalue 2, one finds

(

i1

)

Then we normalise the vectors : v1 = 1√2

(

−i1

)

and v1 = 1√2

(

i1

)

We let

P =1√2

(

−i i1 1

)

and

P−1AP =

(

0 00 2

)

In general the procedure for orthogonal orthonormalisationis as follows.Let A be an n × n self-adjoint matrix.Find eigenvalues λi and eigenspaces V1(λi). Because it is diagonalisable, you will have:

V = V1(λ1) ⊕ · · · ⊕ Vr(λr)

Choose a basis for V as union of bases of V1(λi). Apply Gram-Schmidt to it to get an orthonormalbasis.

79

For example :

A =

1 −2 2−2 4 −42 −4 4

This matrix is symmetric hence self-adjoint.One calculates the characteristic polynomial and finds λ2(λ − 9).

For V1(9), one finds v1 =

1−22

To make this orthonormal, divide by the norm, i.e replace

v1 by 13v1.

For V1(0), one finds V1(0) = Span(v2, v3) with

v3 =

210

and

v4 =

−201

By Gram-Schmidt process we replace v3 by

1√5

210

and v4 by

1

3√

5

−245

Let

P =

1/3 2/√

5 −2/3√

5

−2/3 1/√

5 4/3√

5

2/3 0 5/3√

5

We have

P−1AP =

9 0 00 0 00 0 0

80

Documents

MATH2201 Lecture Notes - UCLucahaya/2201notesNew.pdf · 2009-12-08 · • Quadratic and bilinear forms; • Euclidean and Hermitian spaces. Prerequisites (you should know all this