View
214
Download
0
Category
Preview:
Citation preview
Corso di Interazione Naturale
Prof. Giuseppe Boccignone
Dipartimento di InformaticaUniversità di Milano
boccignone@di.unimi.itboccignone.di.unimi.it/IN_2016.html
Computazione per l’interazione naturale: Richiami di algebra lineare (3)
(e primi esempi di Machine Learning)
• Una matrice quadrata A è diagonalizzabile se esiste una matrice Q invertibile che consente la decomposizione
• Una matrice quadrata A reale e simmetrica è diagonalizzabile e avrà autovalori e autovettori distinti. Li normalizzo e li metto in Q che diventa quindi ortonormale
Teorema spettrale
Un po’ di algebra lineare di base //autovettori e autovalori
Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)
Se la matrice non è quadrata?
Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)
• Vale il seguente teorema
(aligner)(stretcher) x(hanger) x
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
Decomposizione SVD
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
0 2 4 6 8 100
24
68
10
Figure 1: Best-fit regression line reduces data from two dimensions into one.
original data more clearly and orders it from most variation to the least. What makes SVDpractical for NLP applications is that you can simply ignore variation below a particularthreshhold to massively reduce your data but be assured that the main relationships ofinterest have been preserved.
8.1 Example of Full Singular Value Decomposition
SVD is based on a theorem from linear algebra which says that a rectangular matrix A canbe broken down into the product of three matrices - an orthogonal matrix U , a diagonalmatrix S, and the transpose of an orthogonal matrix V . The theorem is usually presentedsomething like this:
Amn = UmmSmnVTnn
where UTU = I, V TV = I; the columns of U are orthonormal eigenvectors of AAT , thecolumns of V are orthonormal eigenvectors of ATA, and S is a diagonal matrix containingthe square roots of eigenvalues from U or V in descending order.
The following example merely applies this definition to a small matrix in order to computeits SVD. In the next section, I attempt to interpret the application of SVD to documentclassification.
Start with the matrix
A =
!3 1 1−1 3 1
"
15
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)
• Vale il seguente teorema
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary
real matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)
• Esempio
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
0 2 4 6 8 100
24
68
10
Figure 1: Best-fit regression line reduces data from two dimensions into one.
original data more clearly and orders it from most variation to the least. What makes SVDpractical for NLP applications is that you can simply ignore variation below a particularthreshhold to massively reduce your data but be assured that the main relationships ofinterest have been preserved.
8.1 Example of Full Singular Value Decomposition
SVD is based on a theorem from linear algebra which says that a rectangular matrix A canbe broken down into the product of three matrices - an orthogonal matrix U , a diagonalmatrix S, and the transpose of an orthogonal matrix V . The theorem is usually presentedsomething like this:
Amn = UmmSmnVTnn
where UTU = I, V TV = I; the columns of U are orthonormal eigenvectors of AAT , thecolumns of V are orthonormal eigenvectors of ATA, and S is a diagonal matrix containingthe square roots of eigenvalues from U or V in descending order.
The following example merely applies this definition to a small matrix in order to computeits SVD. In the next section, I attempt to interpret the application of SVD to documentclassification.
Start with the matrix
A =
!3 1 1−1 3 1
"
15
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
0 2 4 6 8 10
02
46
810
Figure 2: Regression line along second dimension captures less variation in original data.
In order to find U , we have to start with AAT . The transpose of A is
AT =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
so
AAT =
[3 1 1−1 3 1
] ⎡
⎢⎣3 −11 31 1
⎤
⎥⎦ =
[11 11 11
]
Next, we have to find the eigenvalues and corresponding eigenvectors of AAT . We know thateigenvectors are defined by the equation Av = λv, and applying this to AAT gives us
[11 11 11
] [x1
x2
]
= λ
[x1
x2
]
We rewrite this as the set of equations
11x1 + x2 = λx1
x1 + 11x2 = λx2
and rearrange to get(11− λ)x1 + x2 = 0
16
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
0 2 4 6 8 10
02
46
810
Figure 2: Regression line along second dimension captures less variation in original data.
In order to find U , we have to start with AAT . The transpose of A is
AT =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
so
AAT =
[3 1 1−1 3 1
] ⎡
⎢⎣3 −11 31 1
⎤
⎥⎦ =
[11 11 11
]
Next, we have to find the eigenvalues and corresponding eigenvectors of AAT . We know thateigenvectors are defined by the equation Av = λv, and applying this to AAT gives us
[11 11 11
] [x1
x2
]
= λ
[x1
x2
]
We rewrite this as the set of equations
11x1 + x2 = λx1
x1 + 11x2 = λx2
and rearrange to get(11− λ)x1 + x2 = 0
16
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
0 2 4 6 8 10
02
46
810
Figure 2: Regression line along second dimension captures less variation in original data.
In order to find U , we have to start with AAT . The transpose of A is
AT =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
so
AAT =
[3 1 1−1 3 1
] ⎡
⎢⎣3 −11 31 1
⎤
⎥⎦ =
[11 11 11
]
Next, we have to find the eigenvalues and corresponding eigenvectors of AAT . We know thateigenvectors are defined by the equation Av = λv, and applying this to AAT gives us
[11 11 11
] [x1
x2
]
= λ
[x1
x2
]
We rewrite this as the set of equations
11x1 + x2 = λx1
x1 + 11x2 = λx2
and rearrange to get(11− λ)x1 + x2 = 0
16
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
0 2 4 6 8 10
02
46
810
Figure 2: Regression line along second dimension captures less variation in original data.
In order to find U , we have to start with AAT . The transpose of A is
AT =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
so
AAT =
[3 1 1−1 3 1
] ⎡
⎢⎣3 −11 31 1
⎤
⎥⎦ =
[11 11 11
]
Next, we have to find the eigenvalues and corresponding eigenvectors of AAT . We know thateigenvectors are defined by the equation Av = λv, and applying this to AAT gives us
[11 11 11
] [x1
x2
]
= λ
[x1
x2
]
We rewrite this as the set of equations
11x1 + x2 = λx1
x1 + 11x2 = λx2
and rearrange to get(11− λ)x1 + x2 = 0
16
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
0 2 4 6 8 10
02
46
810
Figure 2: Regression line along second dimension captures less variation in original data.
In order to find U , we have to start with AAT . The transpose of A is
AT =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
so
AAT =
[3 1 1−1 3 1
] ⎡
⎢⎣3 −11 31 1
⎤
⎥⎦ =
[11 11 11
]
Next, we have to find the eigenvalues and corresponding eigenvectors of AAT . We know thateigenvectors are defined by the equation Av = λv, and applying this to AAT gives us
[11 11 11
] [x1
x2
]
= λ
[x1
x2
]
We rewrite this as the set of equations
11x1 + x2 = λx1
x1 + 11x2 = λx2
and rearrange to get(11− λ)x1 + x2 = 0
16
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)
• ortonormalizziamo la matrice
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
x1 + (11− λ)x2 = 0
Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1
1 (11− λ)
!!!!! = 0
which works out as(11− λ)(11− λ)− 1 · 1 = 0
(λ− 10)(λ− 12) = 0
λ = 10,λ = 12
to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get
(11− 10)x1 + x2 = 0
x1 = −x2
which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have
(11− 12)x1 + x2 = 0
x1 = x2
and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "
1 11 −1
#
Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.
u1 =v1|v1|
=[1, 1]√12 + 12
=[1, 1]√
2= [
1√2,1√2]
Computew2 = v2 − u1 · v2 ∗ u1 =
17
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√
2,1√2] =
[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]
and normalize
u2 =w2
|w2|= [
1√2,−1√2]
to give
U =
[ 1√2
1√2
1√2
−1√2
]
The calculation of V is similar. V is based on ATA, so we have
ATA =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
[3 1 1−1 3 1
]
=
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
Find the eigenvalues of ATA by
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦ = λ
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦
which represents the system of equations
10x1 + 2x3 = λx1
10x2 + 4x3 = λx2
2x1 + 4x2 + 2x3 = λx2
which rewrite as(10− λ)x1 + 2x3 = 0
(10− λ)x2 + 4x3 = 0
2x1 + 4x2 + (2− λ)x3 = 0
which are solved by setting
∣∣∣∣∣∣∣
(10− λ) 0 20 (10− λ) 42 4 (2− λ)
∣∣∣∣∣∣∣= 0
18
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√
2,1√2] =
[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]
and normalize
u2 =w2
|w2|= [
1√2,−1√2]
to give
U =
[ 1√2
1√2
1√2
−1√2
]
The calculation of V is similar. V is based on ATA, so we have
ATA =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
[3 1 1−1 3 1
]
=
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
Find the eigenvalues of ATA by
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦ = λ
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦
which represents the system of equations
10x1 + 2x3 = λx1
10x2 + 4x3 = λx2
2x1 + 4x2 + 2x3 = λx2
which rewrite as(10− λ)x1 + 2x3 = 0
(10− λ)x2 + 4x3 = 0
2x1 + 4x2 + (2− λ)x3 = 0
which are solved by setting
∣∣∣∣∣∣∣
(10− λ) 0 20 (10− λ) 42 4 (2− λ)
∣∣∣∣∣∣∣= 0
18
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√
2,1√2] =
[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]
and normalize
u2 =w2
|w2|= [
1√2,−1√2]
to give
U =
[ 1√2
1√2
1√2
−1√2
]
The calculation of V is similar. V is based on ATA, so we have
ATA =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
[3 1 1−1 3 1
]
=
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
Find the eigenvalues of ATA by
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦ = λ
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦
which represents the system of equations
10x1 + 2x3 = λx1
10x2 + 4x3 = λx2
2x1 + 4x2 + 2x3 = λx2
which rewrite as(10− λ)x1 + 2x3 = 0
(10− λ)x2 + 4x3 = 0
2x1 + 4x2 + (2− λ)x3 = 0
which are solved by setting
∣∣∣∣∣∣∣
(10− λ) 0 20 (10− λ) 42 4 (2− λ)
∣∣∣∣∣∣∣= 0
18
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√
2,1√2] =
[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]
and normalize
u2 =w2
|w2|= [
1√2,−1√2]
to give
U =
[ 1√2
1√2
1√2
−1√2
]
The calculation of V is similar. V is based on ATA, so we have
ATA =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
[3 1 1−1 3 1
]
=
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
Find the eigenvalues of ATA by
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦ = λ
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦
which represents the system of equations
10x1 + 2x3 = λx1
10x2 + 4x3 = λx2
2x1 + 4x2 + 2x3 = λx2
which rewrite as(10− λ)x1 + 2x3 = 0
(10− λ)x2 + 4x3 = 0
2x1 + 4x2 + (2− λ)x3 = 0
which are solved by setting
∣∣∣∣∣∣∣
(10− λ) 0 20 (10− λ) 42 4 (2− λ)
∣∣∣∣∣∣∣= 0
18
• stesso procedimento per V
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√
2,1√2] =
[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]
and normalize
u2 =w2
|w2|= [
1√2,−1√2]
to give
U =
[ 1√2
1√2
1√2
−1√2
]
The calculation of V is similar. V is based on ATA, so we have
ATA =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
[3 1 1−1 3 1
]
=
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
Find the eigenvalues of ATA by
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦ = λ
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦
which represents the system of equations
10x1 + 2x3 = λx1
10x2 + 4x3 = λx2
2x1 + 4x2 + 2x3 = λx2
which rewrite as(10− λ)x1 + 2x3 = 0
(10− λ)x2 + 4x3 = 0
2x1 + 4x2 + (2− λ)x3 = 0
which are solved by setting
∣∣∣∣∣∣∣
(10− λ) 0 20 (10− λ) 42 4 (2− λ)
∣∣∣∣∣∣∣= 0
18
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√
2,1√2] =
[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]
and normalize
u2 =w2
|w2|= [
1√2,−1√2]
to give
U =
[ 1√2
1√2
1√2
−1√2
]
The calculation of V is similar. V is based on ATA, so we have
ATA =
⎡
⎢⎣3 −11 31 1
⎤
⎥⎦
[3 1 1−1 3 1
]
=
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
Find the eigenvalues of ATA by
⎡
⎢⎣10 0 20 10 42 4 2
⎤
⎥⎦
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦ = λ
⎡
⎢⎣x1
x2
x3
⎤
⎥⎦
which represents the system of equations
10x1 + 2x3 = λx1
10x2 + 4x3 = λx2
2x1 + 4x2 + 2x3 = λx2
which rewrite as(10− λ)x1 + 2x3 = 0
(10− λ)x2 + 4x3 = 0
2x1 + 4x2 + (2− λ)x3 = 0
which are solved by setting
∣∣∣∣∣∣∣
(10− λ) 0 20 (10− λ) 42 4 (2− λ)
∣∣∣∣∣∣∣= 0
18
Gram-Schmid
Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)
• Stesso procedimento per V
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
and use the Gram-Schmidt orthonormalization process to convert that to an orthonormalmatrix.
u1 =v1|v1|
= [1√6,2√6,1√6]
w2 = v2 − u1 · v2 ∗ u1 = [2,−1, 0]
u2 =w2
|w2|= [
2√5,−1√5, 0]
w3 = v3 − u1 · v3 ∗ u1 − u2 · v3 ∗ u2 = [−2
3,−4
3,10
3]
u3 =w3
|w3|= [
1√30
,2√30
,−5√30
]
All this to give us
V =
⎡
⎢⎢⎣
1√6
2√5
1√30
2√6
−1√5
2√30
1√6
0 −5√30
⎤
⎥⎥⎦
when we really want its transpose
V T =
⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦
For S we take the square roots of the non-zero eigenvalues and populate the diagonal withthem, putting the largest in s11, the next largest in s22 and so on until the smallest valueends up in smm. The non-zero eigenvalues of U and V are always the same, so that’s whyit doesn’t matter which one we take them from. Because we are doing full SVD, instead ofreduced SVD (next section), we have to add a zero column vector to S so that it is of theproper dimensions to allow multiplication between U and V . The diagonal entries in S arethe singular values of A, the columns in U are called left singular vectors, and the columnsin V are called right singular vectors.
S =
[ √12 0 00
√10 0
]
Now we have all the pieces of the puzzle
Amn = UmmSmnVTnn =
[ 1√2
1√2
1√2
−1√2
] [ √12 0 00
√10 0
]⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦ =
⎡
⎣
√12√2
√10√2
0√12√2
−√10√2
0
⎤
⎦
⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦ =
[3 1 1−1 3 1
]
20
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
and use the Gram-Schmidt orthonormalization process to convert that to an orthonormalmatrix.
u1 =v1|v1|
= [1√6,2√6,1√6]
w2 = v2 − u1 · v2 ∗ u1 = [2,−1, 0]
u2 =w2
|w2|= [
2√5,−1√5, 0]
w3 = v3 − u1 · v3 ∗ u1 − u2 · v3 ∗ u2 = [−2
3,−4
3,10
3]
u3 =w3
|w3|= [
1√30
,2√30
,−5√30
]
All this to give us
V =
⎡
⎢⎢⎣
1√6
2√5
1√30
2√6
−1√5
2√30
1√6
0 −5√30
⎤
⎥⎥⎦
when we really want its transpose
V T =
⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦
For S we take the square roots of the non-zero eigenvalues and populate the diagonal withthem, putting the largest in s11, the next largest in s22 and so on until the smallest valueends up in smm. The non-zero eigenvalues of U and V are always the same, so that’s whyit doesn’t matter which one we take them from. Because we are doing full SVD, instead ofreduced SVD (next section), we have to add a zero column vector to S so that it is of theproper dimensions to allow multiplication between U and V . The diagonal entries in S arethe singular values of A, the columns in U are called left singular vectors, and the columnsin V are called right singular vectors.
S =
[ √12 0 00
√10 0
]
Now we have all the pieces of the puzzle
Amn = UmmSmnVTnn =
[ 1√2
1√2
1√2
−1√2
] [ √12 0 00
√10 0
]⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦ =
⎡
⎣
√12√2
√10√2
0√12√2
−√10√2
0
⎤
⎦
⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦ =
[3 1 1−1 3 1
]
20
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
• Calcolo la radice degli autovalori
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
and use the Gram-Schmidt orthonormalization process to convert that to an orthonormalmatrix.
u1 =v1|v1|
= [1√6,2√6,1√6]
w2 = v2 − u1 · v2 ∗ u1 = [2,−1, 0]
u2 =w2
|w2|= [
2√5,−1√5, 0]
w3 = v3 − u1 · v3 ∗ u1 − u2 · v3 ∗ u2 = [−2
3,−4
3,10
3]
u3 =w3
|w3|= [
1√30
,2√30
,−5√30
]
All this to give us
V =
⎡
⎢⎢⎣
1√6
2√5
1√30
2√6
−1√5
2√30
1√6
0 −5√30
⎤
⎥⎥⎦
when we really want its transpose
V T =
⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦
For S we take the square roots of the non-zero eigenvalues and populate the diagonal withthem, putting the largest in s11, the next largest in s22 and so on until the smallest valueends up in smm. The non-zero eigenvalues of U and V are always the same, so that’s whyit doesn’t matter which one we take them from. Because we are doing full SVD, instead ofreduced SVD (next section), we have to add a zero column vector to S so that it is of theproper dimensions to allow multiplication between U and V . The diagonal entries in S arethe singular values of A, the columns in U are called left singular vectors, and the columnsin V are called right singular vectors.
S =
[ √12 0 00
√10 0
]
Now we have all the pieces of the puzzle
Amn = UmmSmnVTnn =
[ 1√2
1√2
1√2
−1√2
] [ √12 0 00
√10 0
]⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦ =
⎡
⎣
√12√2
√10√2
0√12√2
−√10√2
0
⎤
⎦
⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦ =
[3 1 1−1 3 1
]
20
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
and use the Gram-Schmidt orthonormalization process to convert that to an orthonormalmatrix.
u1 =v1|v1|
= [1√6,2√6,1√6]
w2 = v2 − u1 · v2 ∗ u1 = [2,−1, 0]
u2 =w2
|w2|= [
2√5,−1√5, 0]
w3 = v3 − u1 · v3 ∗ u1 − u2 · v3 ∗ u2 = [−2
3,−4
3,10
3]
u3 =w3
|w3|= [
1√30
,2√30
,−5√30
]
All this to give us
V =
⎡
⎢⎢⎣
1√6
2√5
1√30
2√6
−1√5
2√30
1√6
0 −5√30
⎤
⎥⎥⎦
when we really want its transpose
V T =
⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦
For S we take the square roots of the non-zero eigenvalues and populate the diagonal withthem, putting the largest in s11, the next largest in s22 and so on until the smallest valueends up in smm. The non-zero eigenvalues of U and V are always the same, so that’s whyit doesn’t matter which one we take them from. Because we are doing full SVD, instead ofreduced SVD (next section), we have to add a zero column vector to S so that it is of theproper dimensions to allow multiplication between U and V . The diagonal entries in S arethe singular values of A, the columns in U are called left singular vectors, and the columnsin V are called right singular vectors.
S =
[ √12 0 00
√10 0
]
Now we have all the pieces of the puzzle
Amn = UmmSmnVTnn =
[ 1√2
1√2
1√2
−1√2
] [ √12 0 00
√10 0
]⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦ =
⎡
⎣
√12√2
√10√2
0√12√2
−√10√2
0
⎤
⎦
⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦ =
[3 1 1−1 3 1
]
20
ROUGH DRAFT - BEWARE suggestions kirklbaker@gmail.com
and use the Gram-Schmidt orthonormalization process to convert that to an orthonormalmatrix.
u1 =v1|v1|
= [1√6,2√6,1√6]
w2 = v2 − u1 · v2 ∗ u1 = [2,−1, 0]
u2 =w2
|w2|= [
2√5,−1√5, 0]
w3 = v3 − u1 · v3 ∗ u1 − u2 · v3 ∗ u2 = [−2
3,−4
3,10
3]
u3 =w3
|w3|= [
1√30
,2√30
,−5√30
]
All this to give us
V =
⎡
⎢⎢⎣
1√6
2√5
1√30
2√6
−1√5
2√30
1√6
0 −5√30
⎤
⎥⎥⎦
when we really want its transpose
V T =
⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦
For S we take the square roots of the non-zero eigenvalues and populate the diagonal withthem, putting the largest in s11, the next largest in s22 and so on until the smallest valueends up in smm. The non-zero eigenvalues of U and V are always the same, so that’s whyit doesn’t matter which one we take them from. Because we are doing full SVD, instead ofreduced SVD (next section), we have to add a zero column vector to S so that it is of theproper dimensions to allow multiplication between U and V . The diagonal entries in S arethe singular values of A, the columns in U are called left singular vectors, and the columnsin V are called right singular vectors.
S =
[ √12 0 00
√10 0
]
Now we have all the pieces of the puzzle
Amn = UmmSmnVTnn =
[ 1√2
1√2
1√2
−1√2
] [ √12 0 00
√10 0
]⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦ =
⎡
⎣
√12√2
√10√2
0√12√2
−√10√2
0
⎤
⎦
⎡
⎢⎢⎣
1√6
2√6
1√6
2√5
−1√5
01√30
2√30
−5√30
⎤
⎥⎥⎦ =
[3 1 1−1 3 1
]
20(aligner)(stretcher) x(hanger) x
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
(aligner)(stretcher) x(hanger) x
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)
• In generale
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary
real matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary
real matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
le ultime N-D colonne di U sono irrilevanti
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary
real matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary
real matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
Thin SVD o Economy sized
SVD
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary
real matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
le ultime N-D colonne di U sono irrilevanti
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary
real matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary
real matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
Thin SVD o Economy sized
SVD
12.2. Principal components analysis (PCA) 393
=
0
σ1
σD
. . .
D
D
DN − DDD
N
X = U S V T
(a)
≃
L
σ1. . .σL
L D
L
D
N
X ≃ UL SL V TL
(b)
Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.
Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).However, the connection between PCA and SVD goes deeper. From Equation 12.46, we can
represent a rank r matrix as follows:
X = σ1
⎛
⎝|u1
|
⎞
⎠(− vT
1 −)+ · · ·+ σr
⎛
⎝|ur
|
⎞
⎠(− vT
r −)
(12.54)
If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:
X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)
This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is
NL+ LD + L = L(N +D + 1) (12.56)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
Un po’ di algebra lineare di base //Singular Value Decomposition (SVD) troncata
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary
real matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
SVD troncata
12.2. Principal components analysis (PCA) 393
=
0
σ1
σD
. . .
D
D
DN − DDD
N
X = U S V T
(a)
≃
L
σ1. . .σL
L D
L
D
N
X ≃ UL SL V TL
(b)
Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.
Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).
However, the connection between PCA and SVD goes deeper. From Equation 12.46, we canrepresent a rank r matrix as follows:
X = σ1
⎛
⎝|u1
|
⎞
⎠(− vT
1 −)+ · · ·+ σr
⎛
⎝|ur
|
⎞
⎠(− vT
r −)
(12.54)
If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:
X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)
This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is
NL+ LD + L = L(N +D + 1) (12.56)
E’ possibile costruire una approssimazione di rango L < r
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
12.2. Principal components analysis (PCA) 393
=
0
σ1
σD
. . .
D
D
DN − DDD
N
X = U S V T
(a)
≃
L
σ1. . .σL
L D
L
D
N
X ≃ UL SL V TL
(b)
Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.
Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).However, the connection between PCA and SVD goes deeper. From Equation 12.46, we can
represent a rank r matrix as follows:
X = σ1
⎛
⎝|u1
|
⎞
⎠(− vT
1 −)+ · · ·+ σr
⎛
⎝|ur
|
⎞
⎠(− vT
r −)
(12.54)
If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:
X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)
This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is
NL+ LD + L = L(N +D + 1) (12.56)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
12.2. Principal components analysis (PCA) 393
=
0
σ1
σD
. . .
D
D
DN − DDD
N
X = U S V T
(a)
≃
L
σ1. . .σL
L D
L
D
N
X ≃ UL SL V TL
(b)
Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.
Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).However, the connection between PCA and SVD goes deeper. From Equation 12.46, we can
represent a rank r matrix as follows:
X = σ1
⎛
⎝|u1
|
⎞
⎠(− vT
1 −)+ · · ·+ σr
⎛
⎝|ur
|
⎞
⎠(− vT
r −)
(12.54)
If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:
X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)
This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is
NL+ LD + L = L(N +D + 1) (12.56)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
Esempio: riduzione di dimensionalità
Esempio: riduzione di dimensionalità
Esempio: riduzione di dimensionalità
% Read the picture of the faces, and convert to black and white.faces = rgb2gray(imread('faces.png')); % Downsample, just to avoid dealing with high-res images.faces = im2double(imresize(faces, 0.5)); % Compute SVD of this tiger[U, D, V] = svd(faces); % Plot the magnitude of the singular values (log scale)sigmas = diag(D);figure; plot(log10(sigmas)); title('Singular Values (Log10 Scale)');figure; plot(cumsum(sigmas) / sum(sigmas)); title('Cumulative Percent of Total Sigmas'); % Show full-rank tigerfigure; subplot(4, 2, 1), imshow(faces), title('Full-Rank Faces'); % Compute low-rank approximations of the faces, and show themranks = [100, 50, 30, 20, 10, 3, 2];for i = 1:length(ranks) % Keep largest singular values, and nullify others. approx_sigmas = sigmas; approx_sigmas(ranks(i):end) = 0; % Form the singular value matrix, padded as necessary ns = length(sigmas); approx_S = D; approx_S(1:ns, 1:ns) = diag(approx_sigmas); % Compute low-rank approximation by multiplying out component matrices. approx_faces = U * approx_S * V'; % Plot approximation subplot(4, 2, i + 1), imshow(approx_faces), title(sprintf('Rank %d Faces', ranks(i)));end
12.2. Principal components analysis (PCA) 393
=
0
σ1
σD
. . .
D
D
DN − DDD
N
X = U S V T
(a)
≃
L
σ1. . .σL
L D
L
D
N
X ≃ UL SL V TL
(b)
Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.
Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).However, the connection between PCA and SVD goes deeper. From Equation 12.46, we can
represent a rank r matrix as follows:
X = σ1
⎛
⎝|u1
|
⎞
⎠(− vT
1 −)+ · · ·+ σr
⎛
⎝|ur
|
⎞
⎠(− vT
r −)
(12.54)
If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:
X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)
This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is
NL+ LD + L = L(N +D + 1) (12.56)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
12.2. Principal components analysis (PCA) 393
=
0
σ1
σD
. . .
D
D
DN − DDD
N
X = U S V T
(a)
≃
L
σ1. . .σL
L D
L
D
N
X ≃ UL SL V TL
(b)
Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.
Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).However, the connection between PCA and SVD goes deeper. From Equation 12.46, we can
represent a rank r matrix as follows:
X = σ1
⎛
⎝|u1
|
⎞
⎠(− vT
1 −)+ · · ·+ σr
⎛
⎝|ur
|
⎞
⎠(− vT
r −)
(12.54)
If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:
X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)
This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is
NL+ LD + L = L(N +D + 1) (12.56)
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary
real matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
392 Chapter 12. Latent linear models
12.2.3 Singular value decomposition (SVD)
We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.
In particular, any (real) N × D matrix X can be decomposed as follows
X!"#$N×D
= U!"#$N×N
S!"#$N×D
VT!"#$D×D
(12.46)
where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U
are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have
X!"#$N×D
= U!"#$N×D
S!"#$D×D
VT!"#$D×D
(12.47)
as in Figure 12.8(a). If N < D, we have
X!"#$N×D
= U!"#$N×N
S!"#$N×N
VT!"#$N×D
(12.48)
Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).
The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have
XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)
where D = S2 is a diagonal matrix containing the squares singular values. Hence
(XTX)V = VD (12.50)
so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly
XXT = USVT VSTUT = U(SST )UT (12.51)
(XXT )U = U(SST ) = UD (12.52)
so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:
U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
Esempio: riduzione di dimensionalità
Esempio: riduzione di dimensionalità
12.2. Principal components analysis (PCA) 395
As an example, consider the 200 × 320 pixel image in Figure 12.9(top left). This has 64,000numbers in it. We see that a rank 20 approximation, with only (200+ 320+ 1)× 20 = 10, 420numbers is a very good approximation.
One can show that the error in this approximation is given by
||X − XL||F ≈ σL+1 (12.57)
Furthermore, one can show that the SVD offers the best rank L approximation to a matrix (bestin the sense of minimizing the above Frobenius norm).
Let us connect this back to PCA. Let X = USVT be a truncated SVD of X. We know thatW = V, and that Z = XW, so
Z = USVTV = US (12.58)
Furthermore, the optimal reconstruction is given by X = ZWT , so we find
X = USVT (12.59)
This is precisely the same as a truncated SVD approximation! This is another illustration of thefact that PCA is the best low rank approximation to the data.
12.2.4 Probabilistic PCA
We are now ready to revisit PPCA. One can show the following remarkable result.
Theorem 12.2.2 ((Tipping and Bishop 1999)). Consider a factor analysis model in which Ψ = σ2Iand W is orthogonal. The observed data log likelihood is given by
log p(X|W,σ2) = −N
2ln |C| − 1
2
N!
i=1
xTi C
−1xi = −N
2ln |C|+ tr(C−1Σ) (12.60)
where C = WWT + σ2I and S = 1N
"Ni=1 xixT
i = (1/N)XTX. (We are assuming centereddata, for notational simplicity.) The maxima of the log-likelihood are given by
W = V(Λ − σ2I)12R (12.61)
where R is an arbitrary L × L orthogonal matrix, V is the D × L matrix whose columns are thefirst L eigenvectors of S, and Λ is the corresponding diagonal matrix of eigenvalues. Without lossof generality, we can set R = I. Furthermore, the MLE of the noise variance is given by
σ2 =1
D − L
D!
j=L+1
λj (12.62)
which is the average variance associated with the discarded dimensions.
Thus, as σ2 → 0, we have W → V, as in classical PCA. What about Z? It is easy to see thatthe posterior over the latent factors is given by
p(zi|xi, θ) = N (zi|F−1WTxi,σ2F−1) (12.63)
F ! WTW + σ2I (12.64)
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS
5.2.3 Symmetric
Assume A is symmetric, then
VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)
Tr(Ap) =P
i�pi (262)
eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)
eig(A�1) = ��1
i (265)
For a symmetric, positive matrix A,
eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)
5.2.4 Characteristic polynomial
The characteristic polynomial for the matrix A is
0 = det(A� �I) (267)= �n � g
1
�n�1 + g2
�n�2 � ... + (�1)ngn (268)
Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g
1
is the trace of A, and g2
is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].
5.3 Singular Value Decomposition
Any n⇥m matrix A can be written as
A = UDVT , (269)
whereU = eigenvectors of AAT n⇥ n
D =p
diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m
(270)
5.3.1 Symmetric Square decomposed into squares
Assume A to be n⇥ n and symmetric. Then⇥
A⇤
=⇥
V⇤ ⇥
D⇤ ⇥
VT⇤, (271)
where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.
Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30
Errore di approssimazione
Esempio: riduzione di dimensionalità
Un po’ di algebra lineare di base //Forme quadratiche
Importanti: compaiono nelle funzioni di costo e
nella distribuzione Gaussiana
A è definita positiva
A è semi-definita positiva
Recommended