Introduction to Linear Algebra, Second Edition, Serge ... 2013/Math... · Introduction to Linear Algebra, Second Edition, Serge Lange Chapter I: Vectors Rn de ned. Addition and scalar

Introduction to Linear Algebra, Second Edition, Serge Lange

Chapter I: Vectors

Rn defined.

Addition and scalar multiplication in Rn.

Two geometric interpretations for a vector: point and displacement.

As a point: place a dot at the coordinates.

As a displacement of a point: If the point is A and the displacement is B,the displaced point is A+B.

Addition of displacements and scalar multiplication of displacements: alge-braic definition.

Addition of displacements and scalar multiplication of displacements: geo-metric definition.

Sum of displacements: interpret as first displacement followed by seconddisplacement. To form A−B, draw an arrow from endpoint of B to endpointof A.

Every point can be thought of as a displacement from the origin.

Every pair of points A,B gives rise to a displacement from A to B:−→AB.

Coordinates of displacement: B − A.

Two displacements A and B are parallel if A = cB for some c 6= 0. Reason:same slope. Same direction if c > 0, opposite directions if c < 0.

The quadrilateral produced by two displacements is a parallelogram.

We will refer objects with coordinates as vectors.

Norm of a vector: square root of sum of squares of coordinates. Produceslength of vector in R2 and R3 by Pythagorean Theorem.

When are two displacements perpendicular in R3? Pythagorean Theoremimplies a1b1 + a2b2 + a3b3 = 0.

Law of Cosines yields a1b1 + a2b2 + a3b3 = ||A||||B|| cos θ.

Scalar product of vectors: A ·B = a1b1 + · · ·+ anbn.

Properties: page 13.

1

Two vectors defined to be orthogonal if A ·B = 0. Agrees with perpendicularin low dimensions.

Law of Cosines yields A ·B = ||A||||B|| cos θ.

Distance between two points: norm of the displacement between them.

Circles, spheres, open and closed discs, open and closed balls.

General Pythagorean Theorem: When two vectors are orthogonal, ||A+B||2 = ||A||2 + ||B||2.

Proof: You can either use coordinates or properties of the dot product.

Orthogonal projection of A onto B, producing P : P = cB for some c. Werequire A− cB ⊥ B, hence (A− cB) ·B = 0. This yields

c =A ·BB ·B

.

c is called the component of A along B and is a number.

Unit vectors: Ei. The component of A along Ei is ai.

Schwarz Inequality: In Rn, |A ·B| ≤ ||A||||B||.

Proof: Apply Pythagorean Theorem to A = A− cB+ cB to derive ||A||2 ≥c2||B||2. Multiply through by ||B||2 and simplify.

Note: We knew this already in low dimensions using A ·B = ||A||||B|| cos θ,but θ not defined in high dimension. But Schwarz implies that

−1 ≤ A ·B||A||||B||

≤ 1,

so this number is equal to cos θ for a unique θ ∈ [0, π], so we define θ in highdimensions using the formula

θ = cos−1A ·B||A||||B||

.

Triangle Inequality: ||A+B|| ≤ ||A||+ ||B||.

Proof: Compute square norm of A + B using dot product, then applySchwarz Inequality.

2

Lines: The equation of a line is y = mx + b. y-intercept is (0, b) and slopeis m, so every time you run 1 you rise m. Every time you run t you rise mt.This brings you to (t,mt+ b). Point corresponding to t is P (t) = (t,mt+ b).

Geometric interpretation: Initial point is A = (0, b) and displacement isB = (1,m). So P (t) = A+ tB.

Parametric equation of line through A with displacement B: P (t) = A+ tB.This yields equations for x(t) and y(t).

Recovering the equation satisfied by the coordinates on a parametric line:(x, y) = (a1, a2) + t(b1, b2). Now solve for y in terms of x.

Slope of P (t) = A+ tB: ratio of coordinates in B.

Equation of line starting at A when t = 0 and ending at B when t = 1:P (t) = A+ t(B − A). Also written P (t) = tA+ (1− t)B.

Note: when the equation is written this way, the distance from A to P (t) ist||B−A||. Since the distance between A and B is ||B−A||, t measures whatfraction of the way you have traveled. Midpoint: Use t = 1

2. One-third of

the way there: use t = 13.

Given x(t) and y(t), one can either write y in terms of x or write (x(t), y(t)) =A+tB and figure out the slope. Since the line passes through A, the equationis y − a2 = m(x− a1).Planes: A plane in R3 is determined by 3 non-collinear points. Typicalpoint in the plane through by A,B,C: Starting at A, one can more in thedirection from A to B, in the direction from A to C, and in any combinationof these. So the typical point is P (s, t) = A+ s(B − A) + t(C − A).

Example: If A = (1, 1, 1), B = (2, 3, 3), C = (5, 4, 7) then P (s, t) =(1, 1, 1)+s(1, 2, 2)+ t(4, 3, 6) = (1+s+4t, 1+2s+3t, 1+2s+6t). We obtainparametric equations

x(s, t) = 1 + s+ 4t, y(s, t) = 1 + 2s+ 3t, z(s, t) = 1 + 2s+ 6t.

Getting an equation out of this: Solve for x and y in terms of s and t, thenexpress z in terms of x and y. This yields

s = 1/5(−1− 3x+ 4y), t = −(1/5) + (2x)/5− y/5, z = 1/5(−3 + 6x+ 2y).

Normalized, the equation of the plane is

6x+ 2y − 5z = 3.

3

Using (1, 1, 1) as a solution we also get the equation

6(1) + 2(1)− 5(1) = 3.

Subtracting, we obtain

6(x− 1) + 2(y − 1)− 5(z − 1) = 0.

Generalizing this: the general equation of a plane is ax + by + cz = d.Assuming that it passes through the point (x0, y0, z0), another equation isa(x− x0) + b(y − y0) + c(z − z0) = 0. We call this the standard equation ofthe plane.

Geometrically: Let N = (a, b, c) and let Q = (x − x0, y − y0, z − z0). ThenN ·Q = 0, so N and Q are perpendicular. The plane can be described as allpoints (x, y, z) such that (x− x0, y − y0, z − z0) is perpendicular to (a, b, c).N is called the normal vector.

Example: find the equation through (1, 2, 3) and perpendicular to (4, 5, 6).Solution: (4, 5, 6) · (x− 1, y − 2, z − 3) = 0.

Finding a, b, c: consider the example A = (1, 1, 1), B = (2, 3, 3), C = (5, 4, 7)again. Two displacements in the plane are B − A = (1, 2, 2) and C − A =(4, 3, 6). So we want (a, b, c) · (1, 2, 2) = 0 and (a, b, c) · (4, 3, 6) = 0. Solvingfor a and b in terms of c we obtain a = −((6c)/5) and b = −((2c)/5). Thereare infinitely many choices for (a, b, c). Choosing the one where c = 5 weobtain (a, b, c) = (−6,−2, 5). This is consistent with the previous method ofsolution, but faster.

Angle between planes: defined to be the angle between the normal vectors.

Planes are parallel when their normal vectors are parallel.

Projection of the point Y onto the plane N ·(X−X0) = 0: We seek the vectorX such that X is in the plane and (Y −X)||N . This yields the equations

N · (X −X0) = 0, Y −X = αN.

Substituting X = Y − αN in the first equation yields

N · (Y − αN −X0) = 0.

4

Solving for α yields

α =N · (Y −X0)

N ·N.

Therefore

X = Y − N · (Y −X0)

N ·NN.

Distance from the point Y to the plane N · (X − X0) = 0: This is definedto be the distance between Y and the projection of Y onto the plane. SinceY −X = αN and α = N ·(Y−X0)

N ·N , the distance is

|α|||N || = |N · (Y −X0)|||N ||

= ||Y −X0|| cos θ

where θ is the angle between Y −X0 and N .

Remark: Let’s call the projection of Y onto the plane the point P . We claimthat P is the point in the plane closest to Y . Reason: Let X be any pointin the plane. Then Y − X = Y − P + P − X. By construction, Y − P isparallel to N . Since N is orthogonal to any displacement in the plane andP −X is a displacement in the plane, Y −P is orthogonal to P −X. By thePythagorean Theorem, this implies

||(Y − P ) + (P −X)||2 = ||Y − P ||2 + ||P −X||2.

Hence||Y −X||2 ≥ ||Y − P ||2.

In other words, the distance from Y to any arbitrary X in the plane is ≥ thedistance from Y to P .

Chapter 2: Matrices and Linear Equations

Matrix: Another kind of vector, since it has coordinates. So addition andscalar multiplication are defined.

Matrix multiplication: the ij entry ofAB isRi•Cj where the row-decompositionof A is (A1, . . . Am) and the column-decomposition of B is (B1, . . . , CB). Thenumber of coordinates in the rows of A must match the number of coordinatesin the columns of B.

Formula for cij given AB = C.

5

Let A be a matrix and let X be a column vector. Then

AX = x1A1 + · · ·+ xnAn.

Transforming a system of equations into a matrix equation: see the exampleon page 49. Write as xA1 + yA2 + zA3 = B and re-write as AX = B.

Application: formula for rotation of a vector about the origin. Input vector[xy

], output vector

[xθyθ

]. The relationship is

[xθyθ

]= R(θ)

[xy

].

More matrix algebra:

1. Let the columns ofB be (B1, B2, . . . , Bn). ThenAB = (AB1, AB2, . . . , ABn).

2. Elementary column vector: Ei. It satisfies AEi = Ai where Ai is columni of A.

3. Identity matrix: I = (E1, E2, . . . , En). It satisfies AI = A by #2.

4. Distributive and associative laws: see page 53. Distributive law followsfrom dot product properties. Associative law can be done using brute force.

5. A property not satisfied: commutativity. Just do an example.

Invertible matrix: A is invertible if square and there exists a square matrixB such that AB = BA = I. Notation: A−1.

Rotation matrices are invertible: First note that R(α)R(β)X = R(α + β)Xfor all column vectors X. In particular, for X = Ei. So R(α)R(β) andR(α+ β) have the same columns and are equal. This implies that R(α) hasinverse R(−α).

Solving an equation of the form AX = B is easy if we know that A isinvertible: X = A−1B.

Not all square matrices are invertible: the zero matrix for example.

Homogeneous system of equations: a matrix equation of the form AX = 0where X is a column vector.

6

Theorem: When there are more variables than equations in a homogeneoussystem of equations, then there are an infinite number of solutions.

Proof: By induction on the number of equations. 1 equation: true. Nowassume true for n homogeneous equations with more than n variables. Con-sider n+ 1 equations and more than n+ 1 variables. Take the first equation,express one of the variables in terms of the others, then substitute this intothe remaining n equations. In the remaining n equations there are more thann variables, so they have an infinite number of solutions. Each is a solutionto the first one.

Corollary: More variables than equations in homogeneous system guaran-tees at least one non-trivial solution.

Application to vectors: say that vectors A1, . . . , Ak are linearly dependent ifthere is a non-trivial solution to x1A1+· · ·+xkAk = 0. Then any n+1 vectorsin Rn are linearly dependent. Reason: more variables than equations.

Application to square matrices, treated as vectors: any n2 + 1 n×n matricesare linearly dependent.

Solving AX = B using Gauss Elimination: First, represented in augmentedmatrix form. Second use the following elementary transformations whichdon’t change the solution set: swap equations, multiply equation by number,add two equations. Most importantly, adding a multiple of a given row toanother. Leading term in row: first non-zero coefficient. Pivoting on a lead-ing term: adding multiples of the row it is in to other rows to get zeros in thecolumn it is in. Iterate this procedure from to bottom so that the survivingnon-zero rows have different leading term columns (row echelon form). Thevariables fall into two categories: leading and slack. Slack variables can beassigned arbitrary values. Use back-substitution to get the leading variablesexpressed in terms of the slack variables.

In a homogeneous system with more variables than equations, there will beat least one slack variable, so there will be an infinite number of solutions.

Application to finding the inverse of a matrix: We wish to solve AB = I.In other words, (AB1, AB2, . . . , ABN) = (E1, E2, . . . , En). Consider solvingAB1 = E1. Compare to solving AB2 = E2. Coefficient matrix is the same,all that changes is the augmented columns. Do these simultaneously. If thereis an inverse, we should be able to continue until the coefficient matrix lookslike I, in which case the augmented side can be read off as B.

7

Matrix units: Eab. Properties:

1. EabEcd = 0 if b 6= c and EabEbc = Eac.

2. EppA zeros out all rows except row p in A.

3. EpqA zeros out all rows except row q in A and moves it to row p.

4. A− EppA− EqqA+ EpqA+ EqpA = (I − Epp − Eqq + Epq + Eqp)A swapsrows p and q in A.

5. A− EppA+ xEppA = (I − Epp + xEpp)A multiplies row p of A by x.

6. A+ xEpqA = (I + xEpq)A adds x copies of row q in A to row p of A.

The matrices in 4, 5, 6 mimic elementary row operations. They are invertible,since element row operations can be undone.

Theorem: If AB = I then BA = I.

Proof: We have seen the procedure for finding B such that AB = I: performelementary row operations on [A|I] until it becomes [I|B]. We can see thatevery operation applied to A is also applied to I. So if EnEn−1 · · ·E1A = Ithen EnEn−1 · · ·E1I = B. But this says BA = I.

The last section can be read by students and skipped in lecture since we havecovered the topics above.

Chapter 3: Vector Spaces

Vector Space: any set behaving like Rn. The required properties are listedon page 89.

Examples: matrices, polynomials.

Subspace of a vector space: a subset of a vector space which is closed withrespect to vector addition and scalar multiplication.

Examples: solutions to a homogeneous system of equations, upper-triangularmatrices, polynomials with a given root.

More examples: line through origin, plane through origin.

Intersection and sum of subspaces produces new subspace.

Skip Section 3, Convex Sets.

Linear combinations.

8

The set of linear combinations of v1, . . . , vk produces a subspace W . (Theyspan the subspace.)

Linearly independent vectors: the opposite of linearly dependent vectors.

Do examples of linearly independent vectors, including trig and exponentialfunctions.

Basis for a vector space: a set of linearly independent vectors that span thevector space. Basis for a subspace: same idea.

Examples: basis for Rn. Basis for a homogeneous system of equations. Basisfor polynomials of degree ≤ 3 with 1 as a root.

Theorem 4.2, p. 107: Coefficients in basis linear combination are unique.

Every spanning set yields a basis. Reason: If the spanning set is alreadylinearly independent, you have a basis. But if there is a non-trivial linearcombination of one of them that produces 0, you can discard one. Keep ongoing until what you have left is linearly independent. This is essentiallyTheorem 5.3.

Definition: when a vector space has a basis of n vectors, we say that thedimension of the vector space is n.

Problem with this definition: it seems to imply that every basis has the samenumber of vectors in it.

Question: can a vector space have bases of different sizes?

Answer: no.

Proof: Suppose that there is a basis (u1, . . . , um) and another basis (v1, . . . , vn)where n > m. Express each vi in terms of u1, . . . , um:

v1 = a11u1 + · · ·+ a1mum

v2 = a21u1 + · · ·+ a2mum

· · ·vn = an1u1 + · · ·+ anmum.

Consider the coordinate vectors (a11, . . . , a1m), (a21, . . . , a2m), . . . , (an1, . . . , anm)in Rm. There are n of them, so they must be linearly dependent with a non-trivial way to combine them into (0, . . . , 0) via (x1, . . . , xn). This implies

x1v1 + · · ·+ xnvn = 0.

9

This cannot happen because the vi are linearly dependent. So you cannothave bases of different sizes.

This is Theorem 5.2.

Note: If you look at this proof carefully you see that it says that any n > mvectors in an m-dimensional space are linearly dependent. This is Theorem5.1.

Every linearly independent set in a finite-dimensional basis can be expandedto a basis. Reason: If they already span the vector space, you have a basis.But if there is a vector outside the span, add it, and the larger set is still lin-early independent. Keep on going. You must eventually arrive at a spanningset and basis, because above we showed that there is an upper limit to thenumber of linearly independent vectors you can produce. This is Theorem5.7.

If V is a vector space of dimension n and W is a subspace then W hasdimension k ≤ n. Reason: Find any non-zero vector in W . As before, keepon growing the list of linearly independent vectors. You can’t outrun thedimension of n, so the process has to stop. This is Theorem 5.8.

If V has dimension n, any n linearly independent vectors in V form a basis.Reason: expand to a basis. This is Theorem 5.5. It also implies Theorem5.6.

Row rank of a matrix: the dimension of its row space.

Column rank of a matrix: the dimension of its column space.

How to compute these: Every elementary row operation yields a matrix withthe same row space. Reduce matrix to reduced row echelon form and read offthe dimension. Similarly, every elementary column operation yields a matrixwith the same column space. Reduce matrix to reduced column echelon formand read of the dimension.

Theorem: Let A be an m× n matrix. Then dim RS(A) = dim CS(A).

Proof: If we don’t worry about the impact on the row space and the columnspace of A, we can always perform a series of elementary row operationsfollowed by a series of elementary column operations so that the resultingmatrix A′ has the very simple form depicted in Theorem 6.2 on page 118 ofthe textbook. One can see that both the row space and the column space ofA′ have the same dimension r. All we need to do is to prove that the row

10

space dimension of A is the same as the row space dimension of A′ and thatthe column space dimension of A is the same as the column space dimensionof A′.

In class I proved that if a subset of columns of A forms a basis for the columnspace of A, then the corresponding columns of EA form a basis for the columnspace of EA, where E is an m×m elementary matrix representing elementaryrow operations on A. Therefore the column space dimension of EA is thesame as the column space dimension of A. The argument was

∑i

αiEAi = 0 =⇒ E

(∑i

αiAi

)= 0 =⇒ E−1E

(∑i

αiAi

)= 0 =⇒

∑i

αiAi = 0 =⇒ α1 = α2 = · · · = 0.

We also know that the row space dimension of EA is the same as the rowspace dimension of A because EA has the same row space as A. Summary:row operations on a matrix preserve both the row space dimension and thecolumn space dimension. Similarly, since elementary column operations on Acan be expressed as AF where F is an n×n elementary matrix representingcolumn operations on A, AF has the same row space dimension and samecolumn space dimension as A. So if A→ A1 → A2 → · · · → A′ is a sequenceof elementary row operations followed by a sequence of elementary columnoperations, all the row space dimensions and the column space dimensionsare unchanged from what they are in A. Since they are the same in A′, theymust be the same in A.

Note: if row or column swaps are involved, we must change the meaning ofcorresponding rows and columns accordingly.

Chapter 4: Linear Mappings

Linear mapping: A function T : V → W between two vector spaces thatsatisfies T (u+ v) = T (u) + T (v) and T (cv) = cT (v).

Terminology: domain, range, image, inverse image, kernel.

A large source of examples: Tv = Av. Includes rotation and reflection.

Other examples: Among polynomials, multiplication by x and differentiation.

11

Another example: reflection across the plane in R3 through origin with nor-mal vector (1, 2, 3). Formula: given input v, T (v) = v− 1

7(v · (1, 2, 3))(1, 2, 3).

Verify directly that this is linear.

Another example: projection on to the same plane. v 7→ v− 114

(v·(1, 2, 3))(1, 2, 3).

A linear map T : V → W is completely determined by where it sends a basisof V .

The range of T is the span of T (v1), T (v2), . . . , T (vk) where v1, . . . , vk is abasis of V . Therefore im(T ) is a subspace of W and dim(W ) ≤ dim(V ).

The kernel of T is a subspace of the domain.

Use of the kernel: classifying all solutions to T (v) = b. The solution set is{v0 + k : k ∈ kernel}. Example: solutions to an inhomogeneous system ofequations. Example: solutions to the differential equation y′ = cos x. (Thevector space is the set of differentiable functions, the linear transformation isdifferentiation, b = cosx, the kernel consists of constant functions). See alsoTheorem 4.4, p. 148. (I have stated the more general result.)

A map is injective if it satisfies T (v) = T (v′) =⇒ v = v′. Reflection acrossa plane is one-to-one. Projection onto the plane is not one-to-one.

Criterion for injective linear map: kernel is trivial.

Theorem 3.1, p. 139: When T : V → W is injective T sends linearlyindependent vectors to linearly independent vectors.

Exact relationship between dim(V ) and dim(T (V )) in T : V → W : dim(V ) =dim(kernel) + dim(image). (Theorem 3.2, p. 139)

Example: projection onto plane.

Proof: Find a basis for V of the right side. First, choose a basis for theimage: w1, . . . , wi. Second, find v1, . . . , vi with T (vi) = wi for each i. Theymust be linearly independent. Let v′1, v

′2, . . . , v

′k be a basis for the kernel. If

we can just show that v1, . . . , vi, v′1, . . . , v

′k is a basis for V , we are done.

Linearly independent: If a linear combination of them is zero, then a linearcombination of their images is 0. So the coefficients of the images of v1, . . . , viare 0. That just leaves a linear combination of the kernel basis equal to zero,so all coefficients are 0.

Span: Choose any v ∈ V . Then T (v) ∈ span(w1, . . . , wi), therefore T (v) =T (∑civi), therefore v −

∑civi is in the kernel, so v is a span of the vj and

the v′j vectors.

12

Example: projection.

Relation of the Rank-Nullity theorem to matrices: Let A be an m×n matrix.It gives rise to a linear mapping T : Rn → Rn via T (v) = Av. The imageof T is the column space of A. Therefore dim(image) = r where r is therank of A. The kernel of T is the solution set to Av = 0, and we know thatrow operations on A produce r linearly-independent rows and that there aren− r slack variables. This implies dim(kernel) = n− r. So we can see thatdim(image) + dim(kernel) = n = dim(V ).

Geometric interpretation of the kernel of T when T (v) = Av: the set ofvectors perpendicular to every vector in the column space of A. So if A hasm rows and n columns, CS(A) is a subspace of Rn and ker(T ) considers of theorthogonal complement. It has dimension n− r. For example, the equationof a plane through the origin is ax+ by + cz = 0, so the vectors in the planebelong to the kernel of T defined by the matrix

[a b c

]. The rank is 1 and

the nullity is 3 − 1 = 2. More generally, a hyperplane in Rn is the solutionset to a1x1 + · · ·+ anxn = 0. This corresponds to a 1×n matrix, so the rankis 1 and the nullity is n − 1. In other words, a hyperplane has dimensionn − 1. One can also try to compute the dimension of the intersection of mhyperplanes. This corresponds to the the kernel corresponding to an m× nmatrix. The dimension of the intersection has to be n− r.The matrix associated with a linear map: If T : Rn → Rm is defined byT (v) = Av, then the matrix is A. If the matrix is not given, we can find itas follows:

Suppose that T (E1) = v1, T (E2) = v2, . . . , T (En) = vn, vectors in Rm. Thenby linearity

T (

x1x2· · ·xn

) = T (x1E1 + · · ·+ xnEn) = x1v1 + · · ·+ xnvn.

But this is exactly Av where the columns of A are v1, . . . , vn. Hence T (v) =Av and the matrix is A.

Example: projection onto a plane, reflection across a plane.

When T : V → W is a linear map but neither V nor W is in the form Rk, wecan still find a matrix representation for T : Choose a basis {v1, . . . , vn} for

13

V , choose a basis {w1, . . . , wm} for W , and identify each v ∈ V with a vectorin Rn whose entries come from the unique way the basis produces v. Do thesame for vectors in W . You can now identify T with a map S : Rn → Rm

and it has a matrix representation A. This represents T also, but it is onlyvalid for the particular choice of bases we choose.

Example: Let V = P3 and let W = P2 (polynomial vector spaces). LetT : P3 → P2 be given by T (p(x)) = p′(x). A basis for P3 is {1, x, x2, x3}. Abasis for P2 is {1, x, x2}. Since T sends the polynomial a0 +a1x+a2x

2 +a3x3

to the polynomial a1 + 2a2x+ 3a3x2, S sends the vector

a0a1a2a3

to the vector

a12a23a3

. The matrix representation is A =

0 1 0 00 0 2 00 0 0 3

.

Example: Rotation through θ◦ about a directed line through the origin inR3: If the line has has direction vector (0, 0, 1) then we rotate (x, y) throughθ and send z to z. This is represented by

A =

cos θ − sin θ 0sin θ cos θ 0

0 0 1

.

The vector

xyz

is sent to the vector

x cos θ − y sin θx sin θ + y cos θ

z

.

But suppose instead we want to rotate about the line in the direction (1,1,1).We will find a new coordinate system in which (1, 1, 1) is acting like the z axis.The plane perpendicular to (1, 1, 1) through the origin is given by x+y+z = 0.The typical vector in the plane is (−a− b, a, b). We will find two perpendicu-lar vectors in this plane. For the first one we choose (1, 0,−1). For the secondone we choose a and b so that (1, 0,−1) · (−a− b, a, b) = 0. One choice is a =−2, b = 1. This yields (1,−2, 1). We scale these three down to length 1 di-viding by

√2,√

6, and√

3 respectively to obtain v1 = (1/√

6,−2/√

6, 1/√

6),v2 = (1/

√2, 0,−1/

√2), and v3 = (1/

√3, 1/√

3, 1/√

3). (We want to rotatecounterclockwise from v1 to v2.) The three vectors v1, v2, v3 form an alterna-tive coordinate system (basis) for R3. Identifying the vector xv1 + yv2 + zv3

14

with the vector

xyz

, the matrix representing rotation about the line through

(1, 1, 1) is

A =

cos θ − sin θ 0sin θ cos θ 0

0 0 1

.So the vector xv1 + yv2 + zv3 is sent to the vector (x cos θ − y sin θ)v1 +(x sin θ + y cos θ)v2 + zv3. We can express this map in matrix form usingmatrix algebra.

Setting V =[v1 v2 v3

], this map can be described as

T (V

xyz

) = V A

xyz

.Setting XY

Z

= V

xyz

,we have

T (

XYZ

) = V AV −1

XYZ

.Note that it is very easy to compute the inverse of this V because the columnshave dot products which are all equal to 0 or 1. When θ = π

2we obtain

V AV −1 =

13

13− 1√

313

+ 1√3

13

+ 1√3

13

13− 1√

313− 1√

313

+ 1√3

13

.One can check that this does send v1 to v2 and v2 to −v1 and v3 to v3. Fora general angle θ, we have

V AV −1 = 13(1 + 2 cos θ) 1

3

(1− cos θ −

√3 sin θ

)13

(1− cos θ +

√3 sin θ

)13

(1− cos θ +

√3 sin θ

)13(1 + 2 cos θ) 1

3

(1− cos θ −

√3 sin θ

)13

(1− cos θ −

√3 sin θ

)13

(1− cos θ +

√3 sin θ

)13(1 + 2 cos θ)

.15

Chapter 5: Composition of Linear Maps

Define composition. The composition is linear and associative. The matrixof a composition is the product of the matrices. Associativity of compositionimplies associativity of matrix multiplication. Look at Section 1 exercises.

A linear map has an inverse if unique inputs produce unique outputs andevery vector in the codomain is the image of a vector in the domain. Injec-tive can be detected using the kernel. Surjective can be determined usinga dimension argument. When the linear map is given by the matrix, thedimension of the kernel is the number of slack variables and the dimensionof the image is the dimension of the column space, which by rank-nullitytheorem is number of columns minus number of slack variables, i.e. numberof leading variables. See also Theorems 2.4 and 2.5. The inverse of a bijec-tive linear map is linear and its matrix representation is the inverse matrix,assuming domain and codomain are Euclidean space of the same dimension.Look at Section 2 exercises.

Chapter 6: Scalar Products and Orthogonality

Let V be a vector space over F = R or F = C, finite or infinite-dimensional.An inner product on V is a function 〈·, ·〉 : V × V → F which satisfies thefollowing axioms:

1. Positive-Definiteness: 〈v, v〉 ≥ 0 for all v ∈ V , and 〈v, v〉 = 0 if andonly if v = 0V .

2. Multilinearity: 〈v + v′, w〉 = 〈v, w〉 + 〈v′, w〉 and 〈av, w〉 = a〈v, w〉 forall v, v′, w ∈ V and a ∈ W .

3. Conjugate Symmetry: 〈w, v〉 = 〈v, w〉 for all v, w ∈ V .

Inner-Product Space: A real or complex vector space V equipped withan inner-product.

Note that axioms 2 and 3 imply 〈v, aw〉 = a〈v, w〉 and 〈v, w + w′〉 = 〈v, w〉+〈v, w′〉 for all v, w ∈ V and a ∈ F .

Examples: The usual dot product on Rn, the generalized dot product onCn, the inner-product on P ([a, b]) defined by 〈f, g〉 =

∫ baf(x)g(x) dx.

16

Norm: ||v|| =√〈v, v〉. This satisfies ||av|| = |a| · ||v|| where |a| =

√aa is

absolute value (if real) or length (if complex).

Orthogonal vectors: u1, . . . , un are mutually orthogonal iff 〈ui, uj〉 = 0 forall i 6= j.

Orthonormal vectors: u1, . . . , un are mutually orthonormal iff 〈ui, uj〉 =δi,j for all i, j. In other words, they are mutually orthogonal and have length1.

Orthonormal projection: Let u1, . . . , un be mutually orthonormal. LetU = span(u1, . . . , un). The linear operator P : V → U defined by Pv =∑〈v, ui〉ui is called orthonormal projection onto U .

Properties of orthogonal and orthonormal vectors:

1. Mutually orthogonal vectors u1, . . . , un are linearly independent.

Proof: Suppose∑aiui = 0V . Taking the inner product with uj we obtain

0 = 〈0V , uj〉 = 〈∑aiui, uj〉 =

∑ai〈ui, uj〉 = aj||uj||2 = aj.

2. Let u1, . . . , un be mutually orthogonal. Then ||∑ui||2 =

∑||ui||2. This

is called the Pythagorean Theorem.

Proof: 〈∑ui,∑ui〉 =

∑〈ui, uj〉 =

∑〈ui, ui〉.

3. Let u1, . . . , un be mutually orthonormal. Then ||∑aiui|| =

√∑|ai|2.

Proof: 〈∑aiui,

∑aiui〉 =

∑aiaj〈ui, uj〉 =

∑aiai.

4. Let u1, . . . , un be mutually orthonormal. Let U = span(u1, . . . , un). Thenfor any u ∈ U , u =

∑〈u, ui〉ui. In other words, u = Pu where P is orthonor-

mal projection onto U . This also implies P 2 = P .

Proof: Write u =∑aiui. Then 〈u, uj〉 = 〈

∑aiui, ui〉 =

∑ai〈ui, uj〉 = aj.

Properties of orthonormal projection:

1. Let u1, . . . , un be mutually orthonormal. Let U = span(u1, . . . , un). Thenfor any v ∈ V and for any u ∈ U , v−Pv and u are orthogonal to each other,where P is orthonormal projection onto U .

Proof: For any j, 〈Pv, uj〉 = 〈∑〈v, ui〉ui, uj〉 =

∑〈v, ui〉〈ui, uj〉 = 〈v, uj〉.

Subtracting, 〈v − Pv, uj〉 = 0.

2. Let u1, . . . , un be mutually orthonormal. Let U = span(u1, . . . , un). Thenfor any v ∈ V , the unique vector u ∈ U that minimizes ||v − u|| is Pv.

17

Proof: Let u ∈ U be given. Then we know that v − Pv and Pv − uare orthogonal to each other. By the Pythagorean Theorem, ||v − u||2 =||v − Pv||2 + ||Pv − u||2 ≥ ||v − Pv||2, with equality iff ||Pv − u|| = 0 iffu = Pv.

Theorem: Every finite-dimensional subspace of an inner product space hasan orthonormal basis.

Proof: Let V be the inner product space. Let U be a subspace of dimensionn. We prove that U has an orthonormal basis by induction on n.

Base Case: n = 1. Let {u1} be a basis for U . Then { u1||u1||} is an orthonormal

basis for U .

Induction Hypothesis: If U has dimension n then it has an orthonormalbasis {u1, . . . , un}.Inductive Step: Let U be a subspace of dimension n+1. Let {v1, . . . , vn+1}be a basis for U . Write Un = span(v1, . . . , vn). By the induction hypothesis,Un has an orthonormal basis {u1, . . . , un}. Let P be orthonormal projectiononto Un. Then the vectors u1, . . . , un, vn+1 − Pvn+1 are mutually orthogonaland form a basis for U . Setting

un+1 =vn+1 − Pvn+1

||vn+1 − Pvn+1||,

the vectors u1, . . . , un+1 form an orthonormal basis for U .

Remark: The proof of this last theorem provides an algorithm (Gram-Schmidt) for producing an orthonormal basis for a finite-dimensional sub-space U : Start with any basis {v1, . . . , vn}. Set u1 = v1

||v1|| . This is an or-

thonormal basis for span(v1). Having found an orthonormal basis {u1, . . . , uk}for span(v1, . . . , vk), one can produce an orthonormal basis for span(v1, . . . , vk+1)by appending the vector

uk+1 =vk+1 − Pvk+1

||vk+1 − Pvk+1||,

where P is orthonormal projection onto u1, . . . , uk.

A Minimization Problem: Consider the problem of finding the best poly-nomial approximation p(x) ∈ P5([−π, π]) of sinx, where by best we meanthat ∫ π

−π(sinx− p(x))2 dx

18

is a small as possible. To place this in an inner-product setting, we con-sider P5([−π, π]) to be a subspace of C([−π, π]), where the latter is thevector space of continuous functions from [−π, π] to R. Then C([−π, π]) hasinner product defined by 〈f, g〉 =

∫ π−π f(x)g(x) dx. We are trying to mini-

mize || sinx − p(x)||2. However, we know how to minimize || sinx − p(x)||:p(x) = P (sinx) where P is orthogonal projection onto the finite-dimensionalsubspace P5([−π, π]). The latter has basis

{1, x, x2, x3, x4, x5},

and Gram-Schmidt can be applied to produce an orthonormal basis

{u0(x), u1(x), u2(x), u3(x), u4(x), u5(x)}.

Therefore the best polynomial approximation is∑αiui(x) where

αi = 〈sinx, ui(x)〉 =

∫ π

−πsinx · ui(x) dx.

The approximation to sinx given in the book on page 115 is

x

1.01229− x3

6.44035+

x3

177.207,

in contrast to the Taylor Polynomial

x

1− x3

6+

x5

120.

Cauchy-Schwarz Inequality: |〈u, v〉| ≤ ||u|| · ||v||.Proof: Project u onto v yielding p = λv. We have u− p ⊥ p, therefore

||u||2 = ||u− p+ p||2 = ||u− p||2 + ||p||2 ≥ ||p||2 = λ2||v||2.

Given that λ = 〈u, v〉/v · v, this yields

||u||2||v||4 ≥ 〈u, v〉2||v||2,

and this implies Cauchy-Schwartz.

Triangle Inequality: ||u+ v|| ≤ ||u||+ ||v||.

19

Proof: Square both sides and subtract the left-hand side from the right-handside. The result is

2||u||·||v||−〈u, v〉−〈v, u〉 = 2||u||·||v||−2Re 〈u, v〉 ≥ 2||u||·||v||−2|〈u, v〉| ≥ 0

by Cauchy-Schwarz.

The Orthogonal Complement of a Subspace: Let V be a finite-dimensionalinner-product space and let U be a subspace. We define

U⊥ = {v ∈ V : 〈v, u〉 = 0 for all u ∈ U}.

We can construct U⊥ explicitly as follows: Let {u1, . . . , uk} be an orthonor-mal basis for U . Expand to an orthonormal basis {u1, . . . , un} for V usingGram-Schmidt. The vectors in span(uk+1, . . . , un) are orthogonal to the vec-tors in U . Moreover, for any v ∈ U⊥, the coefficients of v in terms of theorthonormal basis are the inner product of v with each basis vector, whichplaces v ∈ span(uk+1, . . . , un). Therefore U⊥ = span(uk+1, . . . , un). Thisimmediately implies that (U⊥)⊥ = span(u1, . . . , uk) = U . Note also thatV = U

⊕U⊥. To decompose a vector in V into something in U plus some-

thing in U⊥ we can use v = Pv + (v − Pv).

Chapter 7: Determinants

Prove directly that the 2×2 determinant has the following properties: det(I) =1, that as a function of the columns, det(A1, A2) is multilinear and skew-symmetric. In particular, the determinant of a matrix with repeated columnsis 0. Moreover, det(AB) = det(A) det(B). Proof of the last statement:The fact that the determinant is skew-symmetric implies that the deter-minant is zero when there is a repeated column. AB = C has columnsC1 = b11A1 + b21A2 and C2 = b12A1 + b22A2, therefore

det(AB) = det(b11A1 + b21A2, b12A1 + b22A2) =

b11b12 det(A1, A2)− b21b12 det(A1, A2) = det(B) det(A).

Define the n × n determinant recursively and state that it is also has thesame properties as above.

Theorem: When the columns of a matrix are linearly dependent, the de-terminant is 0.

20

Proof: Expand one column in terms of the others, compute the determinant,and note that all terms are zero.

Theorem: When the columns of a matrix are linearly independent, thedeterminant is not 0.

Proof: The linear map defined by the matrix is invertible, so the map hasan inverse, so the matrix has an inverse. The determinant of the product is1, so each determinant is non-zero.

Cramer’s Rule: Suppose Ax = b. Then b = x1A1 + · · · + xnAn. The de-terminant of the matrix whose columns are A1, . . . , b, . . . , An, where the re-placement is in column i, is xi det(A). So xi is this determinant divided bydet(A).

Chapter 8: Eigenvalues and Eigenvectors

Let A be an n × n matrix of real numbers. We way that a non-zero vectorv is an eigenvector of A if there is a number λ such that Av = λv. How tofind: In matrix terms, we are solving Av = λIv = 0, (A − λI)v = 0. Thissays that the columns of A − λI are linearly dependent, which implies thatdet(A− λI) = 0. Expand this in terms of the unknown λ, then solve for λ,then go back and calculate v.

Using the dot product as scalar product, the dot product of two columnvectors x and y is xTy.

Real symmetric matrices have real eigenvalues. Proof: Let Av = λv whereλ ∈ C, v ∈ Cn, and v 6= 0. Write λ = a+ bi and v = x+ iy. Comparing realand complex values in

A(x+ iy) = (a+ bi)(x+ iy)

we obtainAx = ax− by

andAy = ay + bx.

ThereforexTAy = axTy + bxTx

andxTAy = (Ax)Ty = (axT − byT )y = axTy − byTy.

21

Comparing,b||x||2 = −b||y||2.

If b 6= 0 then x = y = 0 since they have zero length. This contradicts v 6= 0.Hence b = 0 and λ is real.

Let A be a square matrix with orthonormal columns. Then AT = A−1.Proof: multiply and look at the dot products.

For any two matrices A and B, (AB)T = BTAT .

Let A and B be square matrices with orthonormal columns. Then ABhas orthonormal columns. Proof: If B1, . . . , Bn are the columns of B thenAB1, . . . , ABn are the columns of AB. Dot product of columns i and j is(ABi)

T (ABj) = BTi A

TABj = BTi Bj = δij.

Let A be a real symmetric matrix. Then there is an orthonormal matrixC such that CTAC is diagonal with eigenvalue diagonal entries. Proof: Byinduction on number of rows and columns. Trivial when n = 1. Moregenerally, let v be eigenvector with eigenvalue λ1. Find basis incorporatingv and use Gram-Schmidt to produce orthonormal basis v1, . . . , vn. ThenAv1 = λ1v1. Let

C =[v1 | v2 | · · · | vn

].

ThenAC =

[Av1 | Av2 | · · · | Avn

]=[

λ1v1 | Av2 | · · · | Avn],

CTAC =[λ1C

Tv1 | CTAv2 | · · · | CTAvn]

=

matrix with first row λ1, b2, . . . , bn, first column λ1, 0, . . . , 0, and lower righthand submatrix B. See notes.

Corollary: CTAC = diag(λ1, . . . , λn) implies AC = (λ1C1, λ2C2, . . . , λnCn).In other words, the columns of C are an orthonormal set of eigenvectors.Finding them:

First note that eigenvectors of a real symmetric matrix A corresponding todistinct eigenvalues are orthogonal: Suppose AT = A and Au = αu andAv = βv where α 6= β. Then

βuTv = uT (βv) = uT (Av) = (uTA)v =

(uTAT )v = (Au)Tv = (αu)Tv = α(uTv).

22

Since α 6= β, this forces uTv = 0. In other words, their dot product is 0.

Find each eigenvalue using the characteristic polynomial, then find a basisfor each eigenspace, then use Gram-Schmidt to find an orthonormal basis foreach eigenspace. The union of the bases will be orthonormal and there willbe enough of them to form an orthonormal basis. These are the columns ofC.

Applications: (1) Matrix powers and solutions to recurrence relations (2)Diagonalizing a quadratic binary form (3) Solution to system of differentialequations.

Example: Graph ax2 + bxy + cy2 = 1. In matrix form, this reads

[x y

] [ a b/2b/2 c

] [xy

]=[1].

We have already proved that for each symmetric matrix A there is a rotationmatrix C of eigenvectors of A such that A = CDCT where D is a diagonalmatrix. Making the substitution we obtain

[x y

]C

[λ1 00 λ2

]CT

[xy

]=[1].

Writing [XY

]= CT

[xy

]we obtain

[X Y

] [λ1 00 λ2

] [XY

]=[1].

In other words,λ1X

2 + λ2Y2 = 1.

This is much easier to graph. Since[xy

]= C

[XY

]and C is a rotation matrix, all we have to do is identify the angle of rotationθ and rotate the XY graph by θ to obtain the xy graph.

23

Example: Graph x2 + 3xy + 5y2 = 1. The eigenvalues of

[1 3/2

3/2 5

]are

λ1 = 12

and λ2 = 112

. Eigenspace bases are {[

3−1

]} and {

[13

]}. This yields

C =

[3/√

10 1/√

10

−1/√

10 3√

10

].

Identifying this with

R(θ) =

[cos(θ) − sin(θ)sin(θ) cos(θ)

]yields cos θ = 3√

10, sin θ = − 1√

10, tan θ = −1

3, θ = tan−1 −1

3= −0.321751

radians or −18.4349 degrees. So we graph 12X2 + 11

2Y 2 = 1, then rotate

−18.4349 degrees. For example, one solution to 12X2 + 11

2Y 2 = 1 is X =

√2,

Y = 0. This yields solution[xy

]= C

[XY

]=

[3/√

10 1/√

10

−1/√

10 3√

10

] [√2

0

]=

[3√5

− 1√5

].

We have(3/√

5)2 + 3(3/√

5)(−1/√

5) + 5(−1/√

5)2 = 1.

Graph of 12X2 + 11

2Y 2 = 1:

-1.5 -1.0 -0.5 0.5 1.0 1.5

-0.4

-0.2

0.2

0.4

Graph of x2 + 3xy + 5y2 = 1:

-1.5 -1.0 -0.5 0.5 1.0 1.5

-0.6

-0.4

-0.2

0.2

0.4

0.6

24

A related problem: find the maximum and minimum value of x2 + 3xy+ 5y2

subject to x2 + y2 = 1. Given that (x, y) is related to (X, Y ) by a rotation,x2 + y2 = 1 is equivalent to X2 + Y 2 = 1. So equivalently we can find themaximum of 1

2X2 + 11

2Y 2 subject to X2 + Y 2 = 1. Writing Y 2 = 1 − X2

we want the maximum of 112− 5X2 where |X| ≤ 1. The maximum is 11

2

using (X, Y ) = (0, 1), (x, y) = (1/√

10, 3/√

10). The minimum is 12

using

(X, Y ) = (1, 0), (x, y) = (3/√

10,−1/√

10).

25

Documents

Introduction to Linear Algebra, Second Edition, Serge ... 2013/Math... · Introduction to Linear Algebra, Second Edition, Serge Lange Chapter I: Vectors Rn de ned. Addition and scalar