Lecture Notes 1: Matrix Algebra Part D: Similar Matrices

Lecture Notes 1: Matrix AlgebraPart D: Similar Matrices and Diagonalization

Peter J. Hammond

minor revision 2020 September 26th

University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 1 of 76

OutlineEigenvalues and Eigenvectors

Real CaseThe Complex CaseLinear Independence of Eigenvectors

Diagonalizing a General MatrixSimilar Matrices

Properties of Adjoint and Symmetric MatricesA Self-Adjoint Matrix has only Real Eigenvalues

Diagonalizing a Symmetric MatrixOrthogonal MatricesOrthogonal ProjectionsRayleigh QuotientThe Spectral Theorem

Quadratic Forms and Their DefinitenessQuadratic FormsThe Eigenvalue Test of DefinitenessSylvester’s Criterion for Definiteness


Definitions in the Real Case

DefinitionConsider any n × n matrix A.

The scalar λ ∈ R is an eigenvalue of A,just in case the equation Ax = λx has a non-zero solution.

In this case the solution x ∈ Rn \ {0} is an eigenvector,and the pair (λ, x) is an eigenpair.

The spectrum of the matrix A is the set SA of its eigenvalues.

Let SRA denote the subset of its real eigenvalues.

Let SCA denote the subset of its complex eigenvalues,

which satisfies SCA = SA \ SR

A .


Summary of Main Properties

We will be demonstrating the following properties:

1. SRA ⊆ SA and #SA ≤ n

2. The number #SCA of complex eigenvalues is even,

and the members of SCA are complex conjugate pairs λ± µi .

3. SRA = ∅ is possible in case n is even, but not if n is odd.

4. In case A is symmetric, one has SCA = ∅ and SR

A = SA.


The Eigenspace

Given any eigenvalue λ, let Eλ := {x ∈ Rn \ {0} | Ax = λx}denote the associated set of eigenvectors.

Given any two eigenvectors x, y ∈ Eλand any two scalars α, β ∈ R, note that

A(αx + βy) = αAx + βAy = αλx + βλy = λ(αx + βy)

Hence the linear combination αx + βy,unless it is 0, is also an eigenvector in Eλ.

It follows that the set Eλ ∪ {0} is a linear subspace of Rn

which we call the eigenspace associated with the eigenvalue λ.


Powers of a Matrix

TheoremSuppose that (λ, x) is an eigenpair of the n × n matrix A.

Then Amx = λmx for all m ∈ N.

Proof.By definition, Ax = λx.

Premultiplying each side of this equation by the matrix A gives

A2x = A(Ax) = A(λx) = λ(Ax) = λ(λx) = λ2x

As the induction hypothesis,suppose that Am−1x = λm−1x for any m = 2, 3, . . .

Premultiplying each side of this last equation by the matrix A gives

Amx = A(Am−1x) = A(λm−1x) = λm−1(Ax) = λm−1(λx) = λmx

This completes the proof by induction on m.University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 6 of 76

Characteristic EquationThe equation Ax = λx holds for x 6= 0if and only if x 6= 0 solves (A− λI)x = 0.

This holds iff the matrix A− λI is singular,which holds iff λ is a characteristic root— i.e., it solves the characteristic equation |A− λI| = 0.

Equivalently, λ is a zero of the polynomial |A− λI| of degree n.

Suppose |A− λI| = 0 has k distinct real roots λ1, λ2, . . . , λkwhose multiplicities are respectively m1,m2, . . . ,mk .

This means that

|A− λI| = (−1)n(λ− λ1)m1(λ− λ2)m2 · · · (λ− λk)mk

= (−1)n∏k

j=1(λ− λj)mj

The polynomial has degree m1 + m2 + . . .+ mk , which equals n.

This implies that k ≤ n,so there can be at most n distinct real eigenvalues.University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 7 of 76

Eigenvalues of a 2× 2 matrix

Consider the 2× 2 matrix A =

(a11 a12a21 a22

).

The characteristic equation for its eigenvalues is

|A− λI| =

∣∣∣∣a11 − λ a12a21 a22 − λ

∣∣∣∣ = 0

Evaluating the determinant gives the equation

0 = (a11 − λ)(a22 − λ)− a12a21= λ2 − (a11 + a22)λ+ (a11a22 − a12a21)= λ2 − (tr A)λ+ |A| = (λ− λ1)(λ− λ2)

where the two roots λ1 and λ2 of the quadratic equation have:I a sum λ1 + λ2 equal to the trace tr A of A

(the sum of its diagonal elements);I a product λ1 · λ2 equal to the determinant of A.

Let Λ denote the diagonal matrix diag(λ1, λ2)whose diagonal elements are the eigenvalues.

Note that tr A = tr Λ and |A| = |Λ|.University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 8 of 76

The Case of a Diagonal Matrix, I

For the diagonal matrix D = diag(d1, d2, . . . , dn),the characteristic equation |D− λI| = 0takes the degenerate form

∏nk=1(dk − λ) = 0.

So the spectrum SA equals {d1, d2, . . . , dn},the set of diagonal elements.

#SA could be any number beteen 1 and n.

The ith component of the vector equation Dx = dkxtakes the form dixi = dkxi ,which has a non-trivial solution if and only if di = dk .

The kth vector ek = (δjk)nj=1

of the canonical orthonormal basis of Rn

always solves the equation Dx = dkx,and so is an eigenvector associated with the eigenvalue dk .


The Case of a Diagonal Matrix, II

Apart from non-zero multiples of ek ,there are other eigenvectors associated with dkonly if a different element di of the diagonal also equals dk .

In fact, the eigenspace associated with each eigenvalue dkequals the space spanned by the set {ei | di = dk}of canonical basis vectors.

Example

In case D = diag(1, 1, 0) the spectrum is {0, 1} with:

I the one-dimensional eigenspace

E0 = {x3 (0, 0, 1)> | x3 ∈ R}

I the two-dimensional eigenspace

E1 = {x1 (1, 0, 0)> + x2 (0, 1, 0)> | (x1, x2) ∈ R2}


Characterizing 2× 2 Orthogonal Matrices

By definition, an orthogonal matrix P satisfies P>P = PP> = I.

In the 2× 2 case when P = (pij)2×2 , the matrix PP> equals(p11 p12p21 p22

)(p11 p21p12 p22

)=

((p11)2 + (p12)2 p11p21 + p21p22p21p11 + p22p12 (p21)2 + (p22)2

)Here we use trigonometric identitiesand put p11 = cos θ, p22 = cos η, p12 = − sin θ, p21 = sin η.

This makes PP> equal(cos θ − sin θsin η cos η

)(cos θ sin η− sin θ cos η

)=

(1 sin(η − θ)

sin(η − θ) 1

)

This equals the identity matrix

(1 00 1

)if and only if P equals the rotation matrix Rθ :=

(cos θ − sin θsin θ cos θ

).


Rotations Illustrated

Illustrating Pθ = Rθ

(10

)=


)(10

)=

(cos θsin θ

).

Also Pθ+ 12π = Rθ

(01

)=


)(01

)=

(− sin θcos θ

).


Rotations in Polar CoordinatesRecall that a 2-dimensional rotation matrix takes the form

Rθ :=


)for θ ∈ R, which is the angle of rotation measured in radians.The rotation Rθ transforms any vector x = (x1, x2) ∈ R2 to

Rθx =


)(x1x2

)=

(x1 cos θ − x2 sin θx1 sin θ + x2 cos θ

)Introduce polar coordinates (r , η),where x = (x1, x2) = r(cos η, sin η). Then

Rθx = r

(cos η cos θ − sin η sin θcos η sin θ + sin η cos θ

)= r

(cos(η + θ)sin(η + θ)

)This makes it easy to verify that Rθ+2kπ = Rθ for all θ ∈ Rand k ∈ Z, and that RθRη = RηRθ = Rθ+η for all θ, η ∈ R.University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 13 of 76

Does a Rotation Matrix Have Real Eigenvalues?

The characteristic equation |Rθ − λI| = 0 takes the form

0 =

∣∣∣∣cos θ − λ − sin θsin θ cos θ − λ

∣∣∣∣ = (cos θ−λ)2+sin2 θ = 1−2λ cos θ+λ2

1. A degenerate case occurs when θ = 2kπ for some k ∈ Zso cos θ = 1 and sin θ = 0.

Then Rθ reduces to the identity matrix I2.

2. Otherwise, the real matrix Rθ has no real eigenvalues.

Indeed, if cos θ < 1, the characteristic equationhas two roots λ = cos θ ± i sin θ = e±iθ.

Because sin θ =√

1− cos2 θ 6= 0,there are two distinct complex conjugate eigenvalues.

The associated eigenspacesmust both consist of complex eigenvectors.









Complex Eigenvalues

To consider complex eigenvalues properly,we need to leave Rn and consider instead the linear space Cn

whose elements are n-vectors with complex coordinates.

That is, we consider a linear space whose field of scalarsis the plane C of complex numbers,rather than the line R of real numbers.

Suppose A is any n × n matrixwhose elements may be real or complex.

The complex scalar λ ∈ C is an eigenvaluejust in case the equation Ax = λx has a non-zero solution,in which case that solution x ∈ Cn \ {0} is an eigenvector.


Fundamental Theorem of Algebra

TheoremLet P(λ) = λn +

∑n−1k=0 pkλ

k

be a polynomial function of λ of degree n in the complex plane C.

Then there exists at least one root λ ∈ C such that P(λ) = 0.

Corollary

The polynomial P(λ) can be factorizedas the product Pn(λ) ≡

∏nr=1(λ− λr ) of exactly n linear terms.

Proof.The proof will be by induction on n.

When n = 1 one has P1(λ) = λ+ p0, whose only root is λ = −p0.

Suppose the result is true when n = m − 1.

By the fundamental theorem of algebra,there exists λ ∈ C such that Pm(λ) = 0.

Polynomial division gives Pm(λ) ≡ Pm−1(λ)(λ− λ), etc.


Characteristic Roots as Eigenvalues

TheoremEvery n × n matrix A ∈ Cn×n with complex elementshas exactly n eigenvalues (real or complex)corresponding to the roots, counting multiple roots,of the characteristic equation |A− λI| = 0.

Proof.The characteristic equation can be written in the form Pn(λ) = 0where Pn(λ) ≡ |λI− A| is a polynomial of degree n.

By the fundamental theorem of algebra, together with its corollary,the polynomial |λI− A| equalsthe product

∏nr=1(λ− λr ) of n linear terms.

For any of these roots λr the matrix A− λr I is singular.

So there exists x 6= 0 such that (A− λr I)x = 0 or Ax = λrx,implying that λr is an eigenvalue.









Linear Independence of Eigenvectors

The following theorem tells usthat eigenvectors associated with distinct eigenvaluesmust be linearly independent.

TheoremLet {λk}mk=1 = {λ1, λ2, . . . , λm}be any collection of m ≤ n distinct eigenvalues.

Then any corresponding set {xk}mk=1 of associated eigenvectorsmust be linearly independent.

The proof will be by induction on m.

Because x1 6= 0, the set {x1} is linearly independent.

So the result is evidently true when m = 1.

As the induction hypothesis, suppose the result holds for m − 1.


Completing the Proof by Induction, I

Suppose that one solution of the equation Ax = λmx,which may be zero, is the linear combination x =

∑m−1k=1 αkxk

of the preceding m − 1 eigenvectors. Hence

Ax = λmx =∑m−1

k=1αkλmxk

Then the hypothesis that {(λk , xk)}m−1k=1is a collection of eigenpairs implies that this x satisfies

Ax =∑m−1

k=1αkAxk =

∑m−1

k=1αkλkxk

Subtracting this equation from the prior equation gives

0 =∑m−1

k=1αk(λm − λk)xk


Completing the Proof by Induction, II

So we have0 =

∑m−1

k=1αk(λm − λk)xk

The induction hypothesis is that the set {xk}m−1k=1of distinct eigenvectors is linearly independent, implying that

αk(λm − λk)xk = 0 for k = 1, . . . ,m − 1

But we are assuming that λm 6∈ {λk}m−1k=1 ,so λm − λk 6= 0 for k = 1, . . . ,m − 1.

It follows that αk = 0 for k = 1, . . . ,m − 1.

We have proved that if x =∑m−1

k=1 αkxk solves Ax = λmx,then x = 0, so x is not an eigenvector.

This completes the proof by induction that no eigenvector x ∈ Eλmcan be a linear combinationof the eigenvectors xk ∈ Eλk (k = 1, . . . ,m − 1).









Similar Matrices

DefinitionThe two n × n matrices A and B are similarjust in case there exists an invertible n × n matrix Ssuch that the following three equivalent statements all hold

B = S−1AS⇐⇒ SB = AS⇐⇒ A = SBS−1

in which case we write A ∼ B.


Similarity is an Equivalence Relation

TheoremThe similarity relation is an equivalence relation — i.e., ∼ is:

reflexive A ∼ A;

symmetric A ∼ B⇐⇒ B ∼ A;

transitive A ∼ B & B ∼ C =⇒ A ∼ C

Proof.The proofs that ∼ is reflexive and symmetric are elementary.

Suppose that A ∼ B and B ∼ C.

By definition, there exist invertible matrices S and Tsuch that B = S−1AS and C = T−1BT.

Define U := ST, which is invertible with U−1 = T−1S−1.

Then C = T−1(S−1AS)T = (T−1S−1)A(ST) = U−1AU.

So A ∼ C.


Similar Matrices Have Identical Spectra

TheoremIf A ∼ B then SA = SB.

Proof.Suppose that A = SBS−1 and that (λ, x) is an eigenpair of A.

Then x 6= 0 solves Ax = SBS−1x = λx.

Premultiplying each side of the equation SBS−1x = λx by S−1,it follows that y := S−1x solves By = λy.

Moreover, because S−1 has the inverse S, the equation S−1x = ywould have only the trivial solution x = Sy = 0 in case y = 0.

Hence y 6= 0, implying that (λ, y) is an eigenpair of B.

A symmetric argument showsthat if (λ, y) is an eigenpair of B = S−1SA,then (λ,Sy) is an eigenpair of A.


Diagonalization

DefinitionAn n × n matrix A matrix is diagonalizable just in caseit is similar to a diagonal matrix Λ = diag(λ1, λ2, . . . , λn).

TheoremGiven any diagonalizable n × n matrix A:

1. The columns of any matrix S that diagonalizes Amust consist of n linearly independent eigenvectors of A.

2. The matrix A is diagonalizable if and only ifit has a set of n linearly independent eigenvectors.

3. The matrix A and its diagonalization Λ = S−1AShave the same set of eigenvalues.


Proof of Part 1

Suppose that AS = SΛ where A = (aij)n×n, S = (sij)

n×n,and Λ = diag(λ1, λ2, . . . , λn).

Then for each i , k ∈ {1, 2, . . . , n},equating the elements in row i and column kof the equal matrices AS and SΛ implies that∑n

j=1aijsjk =

∑n

j=1sijδjkλk = sikλk

It follows that Ask = λksk

where sk = (sik)ni=1 denotes the kth column of the matrix S.

Because S must be invertible:

I each column sk must be non-zero, so an eigenvector of A;

I the set of all these n columns must be linearly independent.


Proofs of Parts 2 and 3

Proof of Part 2: By part 1, if the diagonalizing matrix S exists,its columns must form a set of n linearly independent eigenvectorsfor the matrix A.

Conversely, suppose that A does have a set {x1, x2, . . . , xn}of n linearly independent eigenvectors,with Axk = λkxk for k = 1, 2, . . . , n.

Now define S as the n × n matrix whose kth columnis the eigenvector xk , for each k = 1, 2, . . . , n.

Then it is easy to check that AS = SΛwhere Λ = diag(λ1, λ2, . . . , λn).

Proof of Part 3: This follows from the general propertythat similar matrices have the same spectrum of eigenvalues.









Complex Conjugates and Adjoint MatricesRecall that any complex number c ∈ C can be expressed as a + ibwith a ∈ R as the real part and b ∈ R as the imaginary part.

The complex conjugate of c is c = a− ib.

Note that cc = cc = (a + ib)(a− ib) = a2 + b2 = |c|2,where |c | is the modulus of c.

Any m × n complex matrix C = (cij)m×n ∈ Cm×n

can be written as A + iB, where A and B are real m × n matrices.

The adjoint of the m × n complex matrix C = A + iB,is the n ×m complex matrix C∗ := (A− iB)> = A> − iB>.

This is the transpose of the matrix A− iB whose elementsare the complex conjugates cjk of the corresponding elements of C.

That is, each element of C∗ is given by c∗jk = akj − ibkj .

In the case of a real matrix A, whose imaginary part is 0,its adjoint is simply the transpose A>.


Self-Adjoint and Symmetric Matrices

An n × n complex matrix C = A + iB is self-adjointjust in case C∗ = C, which holds if and only if A>− iB> = A + iB,and so if and only if:

I the real part A is symmetric;

I the imaginary part B is anti-symmetricin the sense that B> = −B.

Of course, a real matrix is self-adjoint if and only if it is symmetric.

TheoremAny eigenvalue of a self-adjoint complex matrix is a real scalar.

Corollary

Any eigenvalue of a symmetric real matrix is a real scalar.


Proof that Eigenvalues of a Self-Adjoint Matrix are Real

Suppose that the scalar λ ∈ C and vector x ∈ Cn together satisfythe eigenvalue equation Ax = λx for any A ∈ Cn×n.

Taking complex conjugates throughout, one has λx∗ = x∗A∗.

By the associative law of complex matrix multiplication,one has x∗Ax = x∗(Ax) = x∗(λx) = λ(x∗x)as well as x∗A∗x = (x∗A∗)x = (λx∗)x = λ(x∗x).

In case A is self-adjoint and so A∗ = A,subtracting the second equation from the first gives

x∗Ax− x∗A∗x = x∗(A− A∗)x = 0 = (λ− λ)(x∗x)

But in case x is an eigenvector, one has x = (xi )ni=1 ∈ Cn \ {0}

and so x∗x =∑n

i=1 |xi |2 > 0.

Because 0 = (λ− λ)x∗x, it follows that the eigenvalue λsatisfies λ− λ = 0, implying that λ is real.









Orthogonal and Orthonormal Sets of Vectors

Recall our earlier definition:

DefinitionA set of k vectors {x1, x2, . . . , xk} ⊂ Rn is said to be:

I pairwise orthogonal just in case xi · xj = 0 whenever j 6= i ;

I orthonormal just in case, in addition, each ‖xi‖ = 1— i.e., all k elements of the set are vectors of unit length.

The set of k vectors {x1, x2, . . . , xk} ⊂ Rn is orthonormaljust in case xi · xj = δij for all pairs i , j ∈ {1, 2, . . . , k}.


Orthogonal Matrices: Recall

DefinitionAny n × n matrix is orthogonaljust in case its n columns (or rows) form an orthonormal set.

TheoremGiven any n × n matrix P, the following are equivalent:

1. P is orthogonal;

2. PP> = P>P = I;

3. P−1 = P>;

4. P> is orthogonal.


The Complex Case: Self-Adjoint and Unitary Matrices

We briefly consider matrices with complex elements.

Recall that the adjoint A∗ of an m × n matrix Ais the matrix formed from the transpose A>

by taking the complex conjugate of each element.

The appropriate extension to complex numbers of:

I a symmetric matrix satisfying A> = Ais a self-adjoint matrix satisfying A∗ = A;

I an orthogonal matrix satisfying P−1 = P>

is a unitary matrix satisfying U−1 = U∗.









Orthogonal Projection Matrices

DefinitionAn n × n matrix P is an orthogonal projection if P2 = Pand u>v = 0 whenever Pv = 0 and u = Px for some x ∈ Rn.

TheoremSuppose that the n ×m matrix X has full rank m < n.

Let L ⊂ Rn be the linear subspace spannedby m linearly independent columns of X.

Define the n × n matrix P := X(X>X)−1

X>. Then:

1. The matrix P is a symmetric orthogonal projection onto L.

2. The matrix I− P is a symmetric orthogonal projectiononto the orthogonal complement L⊥ of L.

3. For each vector y ∈ Rn, its orthogonal projection onto Lis the unique vector v = Py ∈ Lthat minimizes the distance ‖y − v‖ between y and L— i.e., ‖y − v‖ ≤ ‖y − u‖ for all u ∈ L.


Orthogonal Complements

DefinitionA subset L ⊆ Rn is a linear subspace just in case λx + µy ∈ Lfor every pair of vectors x, y in L and every pair of scalars λ, µ in R.

DefinitionGiven any linear subspace L of Rn, its orthogonal complement L⊥

is the set of all vectors y ∈ Rn such that x · y = 0 for all x ∈ L.


Two Examples

Example

Suppose that L is the space spannedby any finite subset {ei | i ∈ I}of the canonical basis {ei | i = 1, 2, . . . , n} of Rn.

Then L⊥ is the space spannedby the complementary set {ei | i 6∈ I} of canonical basis vectors.

Example

Any c 6= 0 in Rn generates the straight line L(c) = {λc | λ ∈ R},which is a one-dimensional linear subspace in Rn.

Its orthogonal complement L(c)⊥ = {x | c · x = 0}consists of an n − 1-dimensional subspacewhich is the unique hyperplane in Rn that has c as a normal.


Proof of Part 1

First note that if X is an n ×m matrix, then X>X is m ×m.

Then, provided that (X>X)−1 exists,so does the n × n matrix P = X(X>X)−1X>.

Because of the rules for the transposes of products and inverses,

the definition P := X(X>X)−1

X> implies that P> = P and also

P2 = X(X>X)−1

X>X(X>X)−1

X> = X(X>X)−1

X> = P

Moreover, if Pv = 0 and u = Px for some x ∈ Rn, then

u · v = u>v = x>P>v = x>Pv = 0

Finally, for every y ∈ Rn, the vector Py equals Xb, where

b = (X>X)−1

X>y

Hence Py ∈ L.


Proof of Part 2

Evidently (I− P)> = I− P> = I− P, and

(I− P)2 = I− 2P + P2 = I− 2P + P = I− P

Hence I− P is a projection.

This projection is also orthogonal because if (I− P)v = 0and u = (I− P)x for some x ∈ Rn, then

u · v = u>v = x>(I− P)>v = x>(I− P)v = 0

Next, suppose that v = Xb ∈ L and that y = (I− P)xbelongs to the range of (I− P). Then

y · v = y>v = x>(I− P)>Xb = x>Xb− x>Xb = 0

Hence y ∈ L⊥.


Proof of Part 3For any vector v = Xb ∈ L and any y ∈ Rn,because y>Xb and b>X>y are equal scalars, one has

‖y − v‖2 = (y − Xb)>(y − Xb) = y>y − 2y>Xb + b>X>Xb

Now define b := (X>X)−1

X>y (which is the OLS estimator of bin the linear regression equation y = Xb + e) and v := Xb = Py.

Because P>P = P> = P = P2, one has

‖y − v‖2 = y>y − 2y>Xb + b>X>Xb

= (b− b)>X>X(b− b) + y>y − b>X>Xb

= ‖v − v‖2 + y>y − y>P>Py = ‖v − v‖2 + y>y − y>Py

On the other hand, given that v = Py, one also has

‖y − v‖2 = y>y − 2y>v + v>v

= y>y − 2y>Py + y>P>Py = y>y − y>Py

So ‖y − v‖2 − ‖y − v‖2 = ‖v − v‖2 ≥ 0 with = iff v = v.University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 44 of 76








A Trick Function for Generating EigenvaluesFor all x 6= 0, define the Rayleigh quotient function

Rn \ {0} 3 x 7→ f (x) :=x>Ax

x>x=

∑ni=1

∑nj=1 xiaijxj∑n

i=1 x2i

It is homogeneous of degree zero, and left undefined at x = 0.

Its partial derivative w.r.t. any component xh of the vector x is

∂f

∂xh=

2

(x>x)2

[∑n

j=1ahjxj(x>x)− (x>Ax)xh

]At any critical point x 6= 0 where ∂f /∂xh = 0 for all h,one therefore has (x>x)Ax = (x>Ax)x.

Hence Ax = λx where λ = (x>Ax)/(x>x) = f (x).

That is, a stationary point x 6= 0 must be an eigenvector,with the corresponding function value f (x)as the associated eigenvalue.


More Properties of the Rayleigh Quotient

Using the Rayleigh quotient

Rn \ {0} 3 x 7→ f (x) :=x>Ax

x>x=

∑ni=1

∑nj=1 xiaijxj∑n

i=1 x2i

one can state and prove the following lemma.

LemmaEvery n × n symmetric square matrix A:

1. has a maximum eigenvalue λ∗ with λ∗ real,and any associated eigenvector x∗ as a maximum point of f ;

2. has a minimum eigenvalue λ∗ with λ∗ real,and any associated eigenvector x∗ as a minimum point of f ;

3. satisfies A = λI if and only if λ∗ = λ∗ = λ.


Proof of Parts 1 and 2

The unit sphere Sn−1 is a closed and bounded subset of Rn.

Moreover, the Rayleigh quotient function f is continuouswhen restricted to Sn−1.

By the extreme value theorem, f restricted to Sn−1 must have:

I a maximum value λ∗ attained at some point x∗;

I a minimum value λ∗ attained at some point x∗.

Because f is homogeneous of degree zero,these are the maximum and minimum values of fover the whole domain Rn \ {0}.In particular, f must be critical at any maximum point x∗,as well as at any minimum point x∗.

But critical points must be eigenvectors.

This proves parts 1 and 2 of the lemma.

Part 3 is left as an exercise.









A Minor Lemma

LemmaLet A be a symmetric n × n matrix.

Suppose that λ and µ are distinct eigenvalues,with corresponding eigenvectors x and y.

Then x and y are orthogonal — that is, x · y = 0.

Proof.Suppose that the non-zero vectors x and ysatisfy Ax = λx and Ay = µy.

Because A is symmetric, one has

λx>y = (Ax)>y = x>A>y = x>Ay = µx>y

In case λ 6= µ, it follows that 0 = x>y = x · y.


A Useful Lemma

LemmaLet A be a symmetric n × n matrix.

Suppose that there are m < n eigenvectors {uk}mk=1

which form an orthonormal set of column n-vectors,as well as the columns of an n ×m matrix U.

Then there is at least one more eigenvector x 6= 0that satisfies U>x = 0— i.e., it is orthogonal to each of the m eigenvectors uk .


Constructive Proof, Part 1

For each eigenvector uk , let λk be the associated eigenvalue,so that Auk = λkuk for k = 1, 2, . . . ,m.

Then the n ×m matrix U satisfies AU = UΛwhere Λ is the m ×m matrix diag(λk)mk=1.

Also, because the eigenvectors {uk}mk=1 form an orthonormal set,one has U>U = Im.

Hence U>AU = U>UΛ = Λ.

Also, transposing AU = UΛ gives U>A = ΛU>.



Consider now the n × n matrix A := (In −UU>)A(In −UU>).

Then A is symmetric because both A and UU> are symmetric.

Note that, because AU = UΛ and so U>A = ΛU>, one has

A = A−UU>A− AUU> + UU>AUU>

= A−UΛU> −UΛU> + UΛU> = A−UΛU>

The symmetric matrix A has at least one real eigenvalue λ.

Let x 6= 0 be an associated real eigenvector, which must satisfy

Ax = (I−UU>)A(I−UU>)x = (A−UΛU>)x = λx

Pre-multiplying each side of the last equalityby the m × n matrix U> shows that

λU>x = U>Ax−U>UΛU>x = ΛU>x− ΛU>x = 0m



There are now two cases.

Consider first the generic casewhen A has at least one eigenvalue λ 6= 0.

Then there is a corresponding eigenvector x 6= 0 of A

that satisfies λU>x = 0m and so U>x = 0>m.

But then the earlier equation

Ax = (I−UU>)A(I−UU>)x = (A−UΛU>)x = λx

implies thatAx = (A + UΛU>)x = Ax = λx

Hence x is an eigenvector of A as well as of A.



The remaining exceptional case occurswhen the only eigenvalue of the symmetric matrix A is λ = 0.

Given the properties of the Rayleigh quotient function,this implies that A = 0 and so A = UΛU>.

Then any vector x 6= 0 satisfying U>x = 0must satisfy Ax = UΛU>x = 0.

This implies that x is an eigenvector of Aassociated with the eigenvalue λ = 0.

In both cases there is an eigenvector x of Aassociated with the common eigenvalue λ of both A and A

that satisfies U>x = 0>m.

This completes the proof. �


Spectral Theorem

TheoremGiven any symmetric n × n matrix A:

1. the set of all its eigenvectors spans the whole of Rn;

2. there exists an orthogonal matrix P that diagonalizes Ain the sense that P>AP = P−1AP is a diagonal matrix Λ,whose elements are the eigenvalues of A, all real.

When A is required to be self-adjoint rather than symmetric,and the eigenvectors may be complex, the corresponding result is:

Given any self-adjoint n × n matrix A:

1. the set of all its eigenvectors spans the whole of Cn;

2. there exists a unitary matrix U that diagonalizes Ain the sense that U∗AU = U−1AU is a diagonal matrix Λwhose elements are the eigenvalues of A, all real.

We give a proof for the case when A is symmetric.University of Warwick, EC9A0 Maths for Economists Peter J. Hammond 56 of 76

Proof of Spectral Theorem, Part 1

The symmetric matrix A has at least one eigenvalue λ,which must be real.

The associated eigenvector x, normalized to satisfy x>x = 1,forms an orthonormal set {u1}satisfying u>j uk = δjk for all j , k = 1, 2, . . . ,m.

As the induction hypothesis,suppose that there are m < n eigenvectors {uk}mk=1

which form an orthonormal set of vectors.

We have just proved that this hypothesis holds for m = 1.

The “useful lemma” shows that, if the hypothesis holdsfor any m = 1, 2, . . . , n − 1, then it holds for m + 1.

So the result follows for m = n by induction.

In particular, when m = n, there exists an orthonormal setof n eigenvectors, which must then span the whole of Rn.


Proof of Spectral Theorem, Part 2

Also, by the previous result,we can take P as an orthogonal matrixwhose columns are an orthonormal set of n eigenvectors.

Then AP = PΛ.

So premultiplying by P> = P−1 gives P>AP = P−1AP = Λ. �









Definition of Quadratic Form

DefinitionA quadratic form on the n-dimensional Euclidean space Rn

is a mapping

Rn 3 x 7→ q(x) = x>Qx =∑n

i=1

∑n

j=1xiqijxj ∈ R

where Q is a symmetric n × n matrix.

The quadratic form x>Qx is diagonal just in casethe matrix Q is diagonal, with Q = Λ = diag(λ1, λ2, . . . , λn).

In this case x>Qx reduces to x>Λx =∑n

i=1 λi (xi )2.


The Hessian Matrix of a Quadratic Form

Example

1. Given the quadratic form q(x , y) = (x , y)

(a bc d

)(xy

),

show that, even if

(a bc d

)is not symmetric,

its Hessian matrix of second-order partial derivatives

is the symmetric matrix

(q′′xx q′′xyq′′yx q′′yy

)=

(2a b + c

b + c 2d

).

2. Given the quadratic form defined for x ∈ Rn by q(x) = x>Ax,show that, even if A is not symmetric,its Hessian matrix

(∂2q/∂xi∂xj

)n×n

of second-order partial derivativesis the symmetric matrix A + A>.


Symmetry Loses No Generality

Requiring Q in x>Qx to be symmetric loses no generality.

This is because, given a general non-symmetric n × n matrix A,repeated transposition implies that

x>Ax = (x>Ax)> = 12 [x>Ax + (x>Ax)>] = 1

2x>(A + A>)x

Hence x>Ax = x>A>x = x>Qxwhere Q is the symmetrized matrix 1

2(A + A>).

Note that Q is indeed symmetric because

Q> = 12(A + A>)> = 1

2 [A> + (A>)>

] = 12(A> + A) = Q


Definiteness of a Quadratic Form

When x = 0, then x>Qx = 0. Otherwise:

DefinitionThe quadratic form Rn 3 x 7→ x>Qx ∈ R is:

positive definite just in case x>Qx > 0 for all x ∈ Rn \ {0};negative definite just in case x>Qx < 0 for all x ∈ Rn \ {0};positive semi-definite just in case x>Qx ≥ 0 for all x ∈ Rn;

negative semi-definite just in case x>Qx ≤ 0 for all x ∈ Rn;

indefinite just in case there exist x+ and x− in Rn

such that (x+)>Qx+ > 0 and (x−)>Qx− < 0.


Definiteness of a Diagonal Quadratic Form

TheoremThe diagonal quadratic form

∑ni=1 λi (xi )

2 ∈ R is:

positive definite if and only if λi > 0 for i = 1, 2, . . . , n;

negative definite if and only if λi < 0 for i = 1, 2, . . . , n;

positive semi-definite if and only if λi ≥ 0 for i = 1, 2, . . . , n;

negative semi-definite if and only if λi ≤ 0 for i = 1, 2, . . . , n;

indefinite if and only if there exist i , j ∈ {1, 2, . . . , n}such that λi > 0 and λj < 0.

Proof.The proof is left as an exercise.

The result is obvious if n = 1, and straightforward if n = 2.

Working out these two cases first suggests the proof for n > 2.









Diagonalizing Quadratic Forms

Consider a quadratic form Rn 3 x 7→ x>Qx ∈ R where,without losing generality,we assume that the n × n matrix Q is symmetric.

By the spectral theorem for symmetric matrices,there exists a matrix P that diagonalizes Q,meaning that P−1QP is a diagonal matrix that we denote by Λ.

Moreover P can be made orthogonal, meaning that P−1 = P>.

Given any x 6= 0, because P−1 exists, we can define y = P−1x.

This implies that x = Py, where y 6= 0 because (P−1)−1 = P.

Then x>Qx = y>P>QPy = y>Λy,so the diagonalization leads to a diagonal quadratic form.


The Eigenvalue Test of Definiteness

A standard result says that Qand its diagonalization P−1QP have the same set of eigenvalues.

From the theorem on the definiteness of a diagonal quadratic form,it follows that:

TheoremThe quadratic form x>Qx is:

positive definite if and only if all its eigenvalues are positive;

negative definite if and only if all its eigenvalues are negative;

positive semi-definite if and only ifall its eigenvalues are non-negative;

negative semi-definite if and only ifall its eigenvalues are non-positive;

indefinite if and only ifit has both positive and negative eigenvalues.


Concavity or Convexity of a Quadratic Form

TheoremAs a function of x, the quadratic form x>Qx is:

strictly convex if and only if all it is positive definite;

strictly concave if and only if all it is negative definite;

convex if and only if it is positive semi-definite;

concave if and only if it is negative semi-definite.

Otherwise x>Qx is neither concave nor convexif and only if it is indefinite.









The Case of a Quadratic Form in Two Variables

The general quadratic form in 2 variables is

(x , y)

(a hh b

)(xy

)= ax2 + 2hxy + by2

If it is positive definite, it is positive whenever x 6= 0 and y = 0,implying that a > 0.

If a > 0, completing the square implies thatax2 + 2hxy + by2 = a(x + hy/a)2 + (b − h2/a)y2.

Given that a > 0, this is positive definite if and only if b > h2/a,

or if and only if ab − h2 =

∣∣∣∣a hh b

∣∣∣∣ > 0.

For the case of 2 variables, this proves Sylvester’s Criterion:The real-symmetric matrix A is positive definiteif and only if all the leading principal minors of A are positive.


The Case of a Diagonal Quadratic Form

The general diagonal quadratic form in n variables is x>Λxwhere x is an n-vectorand Λ is an n × n diagonal matrix diag(λ1, . . . , λn).

Then the quadratic form x>Λx =∑n

i=1 λi (xi )2:

1. is positive definite if and only if λi > 0 for i = 1, 2, . . . , n.

This is true if and only if the k-fold product∏k

i=1 λiis positive for k = 1, 2, . . . , n.

But∏k

i=1 λi = |diag(λ1, . . . , λk)|is the leading principal minor of order kfor the diagonal matrix Λ = diag(λ1, . . . , λn).

2. is positive semi-definite if and only if λi ≥ 0 for i = 1, 2, . . . , n.

This is true if and only if the product∏

i∈K λi is nonnegativefor every nonempty K ⊆ Nn = {1, 2, . . . , n}.But each product

∏i∈K λi is a principal minor

for the diagonal matrix Λ = diag(λ1, . . . , λn).


Sylvester’s Criterion: General Statement

Theorem (Sylvester’s criterion)

Given the symmetric matrix A, for each k = 1, . . . , n,and for each non-empty subset K ⊆ Nn with k = #K :let Dk denote the leading principal minor of order k;let ∆K

k denote an arbitrary principal minor of order k.

Then the quadratic form x>Ax is:

positive definite ⇐⇒ Dk > 0 for all k = 1, . . . , n

positive semidefinite ⇐⇒ ∆Kk ≥ 0 for all ∆K

k of any order k

negative definite ⇐⇒ (−1)kDk > 0 for all k = 1, . . . , n

negative semidefinite ⇐⇒ (−1)k∆Kk ≥ 0 for all ∆K

k of any order k

Note that the necessary and sufficient conditionsfor A to be negative (semi-) definiteare exactly those for −A to be positive (semi-) definite,


Necessity of Sylvester’s Criterion

Necessity for a Positive (Semi-) Definite Matrix.

Given any non-empty subset K ⊆ Nn with with k = #K ,let AK denote the k × k matrix (aij)(i ,j)∈K×K .

Then, given any n-vector x,let xK denote (xj)j∈K and let x−K denote (xj)j 6∈K .

In case A is positive definite, whenever xK 6= 0 and x−K = 0,then x>Ax = (xK )>AKxK > 0, so AK is positive definite.

Because the determinant of a symmetric matrixis the product of its eigenvalues,it follows that |AK | > 0.

This holds in particular when K = Nk , and so |AK | = Dk .

In case A is positive semi-definite,the same argument shows that AK is positive semi-definite,and so ∆K

k = |AK | ≥ 0.


Sufficiency of Sylvester’s Criterion

Apart from the 2× 2 and diagonal cases,the proof of sufficiency is much more challenging!


An Improved (?) Eigenvalue Test of DefinitenessWhen restricted to the unit sphere Sn−1,

the Rayleigh quotient Rn \ {0} 3 x 7→ f (x) =x>Qx

x>x→ R

reduces to the quadratic form x>Qx, whose:

1. maximum value is the largest eigenvalue λ∗ of Q;

2. minimum value is the smallest eigenvalue λ∗ of Q.

It follows that:

TheoremThe quadratic form x>Qx is:

positive definite iff λ∗ > 0;

negative definite iff λ∗ < 0;

positive semi-definite (but not positive definite) iff λ∗ = 0;

negative semi-definite (but not negative definite) iff λ∗ = 0;

indefinite if and only if λ∗ > 0 > λ∗;

identically zero if and only if λ∗ = λ∗ = 0.


Envoi

See you, at least online, on Tuesday September 29th.

Enjoy the next five days of lectures with my colleague Pablo Beker!


Documents

Lecture Notes 1: Matrix Algebra Part D: Similar Matrices