Upload
dotram
View
214
Download
0
Embed Size (px)
Citation preview
Chapter 2: Linear Algebra and PrincipalComponent Analysis (PCA)
Dietrich KlakowSpoken Language Systems
Saarland University, [email protected]
Neural Networks Implementation and Application
Outline
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 1 / 1
Motivation
I Linear algebra (matrices) will be needed throughout the lectureI Principal component analysis is
I A machine learning algorithm solely using linear algebraI A simple example of representation learning
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 2 / 1
Scalars and Vectors
I Scalars: a scalar is a single numberI Vectors: a vector is an array of numbersI elements of the array are typically real numbersI
x =
x1x2...
xn
(1)
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 3 / 1
Matrices
I Matrices: two dimensional array of numbers numberI elements of the array are typically real numbers Ai,j ∈ Rm×n
I For example a 3× 2 matrix is
A =
A1,1 A1,1
A2,1 A2,2
A3,1 A3,3
(2)
I First index is rows, second one columnsI Transpose of a matrix: mirror matrix
A> =
[A1,1 A2,1 A3,1
A1,2 A2,2 A3,2
](3)
I In general (A>)i,j = Aj,i
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 4 / 1
Tensors
I Tensors: generalization of matrices to more dimensionsI E.g. a rank three tensor is a the dimensional array Ai,j,k ∈ Rl×m×n
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 5 / 1
Product of Matrices
I Let Ai,j ∈ Rm×n, Bi,j ∈ Rn×p and Ci,j ∈ Rm×p
I Matrix product is denoted by C = ABI Elements are calculated by:
Ci,k =∑
jAi,jBj,k (4)
I Matrix product is distributive
C(A+ B) = CA+ CB (5)
I Matrix product is associative
A(BC) = (AB)C (6)
I Matrix product is in general not commutative
AB 6= BA (7)
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 6 / 1
Product of Matrices
I Transpose of product(AB)> = B>A> (8)
I Linear system of equationsAx = b (9)
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 7 / 1
Identity and Inverse Matrices
I Identity matrix In
(In)i,i = 1 and (In)i 6=j = 0 (10)
I Example
I3 =
1 0 00 1 00 0 1
(11)
I Inverse matrix is denoted by A−1
I It satisfiesA−1A = In (12)
I Linear system of equations is solved by
x = A−1b (13)
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 8 / 1
Norms
I A norm is a function f that satisfies:I f (x) = 0 ⇒ x = 0I f (x+ y) ≤ f (x) + f (y)I ∀α ∈ R , f (αx) = |α|f (x)
I Example Lp norm
||x||p =
(∑i
|xi |p) 1
p
(14)
I p = 2 is the Euclidean norm for short ||x||I It can also be calculated by x>xI L1 norm simplifies to ||x||1 =
∑i |xi |
I Frobenius norm for matrices ||A||F =√∑
i,j A2i,j
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 9 / 1
Special Kinds of Matrices and Vectors
I Symmetric matrix: A = A>
I Orthogonal matrix: AA> = A>A = II Orthogonality implies A−1 = A>
I Unit vector: ||x||2 = 1
I Orthogonal vectors: x>y = 0
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 10 / 1
Eigendecomposition and SVD
I An eigenvector A of a matrix A satisfies Av = λvI λ is called eigenvalueI The eigendecomposition is A = V diagλV>
I V matrix containing all eigenvectorsI diagλ diagonal matrix with all eigenvaluesI Singular Value Decomposition (SVD)
A = UDV> (15)
I U and V are orthogonal matricesI D is a diagonal matrix of so-called singular values
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 11 / 1
Moore-Penrose Pseudoinverse
I Matrix inversion is not defined for matrices that are not squareI Task solve linear equation Ax = y approximatelyI as to minimize ||Ax− y||2I Solution Moore-Penrose Pseudoinverse
A+ = limα↘0
(A>A+ αI)−1A> (16)
I Can be calculated using SVD
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 12 / 1
Motivation for PCA
I Lossy compression/encoding of dataI Remove less relevant dimensions by projection
Figure: Idea of PCA (source: Wikipedia)Dietrich Klakow Neural Networks Implementation and Application Chapter 2 13 / 1
Coding and Decoding
I Collection of m points {x(1), ..., x(m)} with x(i) ∈ Rn
I For each point x(i) ∈ Rn find code vector c(i) ∈ Rl
I l is smaller than nI Define mapping function f (): c(i) = f (x(i))I Decoding function g(): x(i) ≈ g(f (x(i)))
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 14 / 1
Linear Coding and Decoding
I PCA is defined by linear coding and decodingI Define g(c) = Dc with D ∈ Rn×l
I Constraint: columns of D are orthonormalI Objective for finding D, minimize L2 norm:
c∗ = arg minc
‖x− g(c)‖2 (17)
I Result:c∗ = D>x (18)
I Thus:f (x) = D>x (19)
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 15 / 1
The Optimal Decoding Matrix D
I Introduce reconstruction operation
r(x) = g(f (x)) = DD>x (20)
I Minimize reconstruction error
D∗ = arg minD
∑i
‖x(i) − r(x(i))‖2 (21)
= arg minD
∑i
‖x(i) −DD>x(i)‖2 (22)
subject to D>D = Il
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 16 / 1
Determining D in Theano
I .
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 17 / 1
Exact Solution for Decoding Matrix D
I The mean of all training vectors needs to be 0 (remove mean!)I Let X ∈ Rm×n be the matrix obtained by stacking all training vectorsI that is Xi,: = x(i)>
I D contains the l eigenvectors corresponding to the largest eigenvalues of X>X
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 18 / 1
Tensorflow: simple Examples
I Tensorflow is a popular software toolkit for neural networksI Define network as computation graphI Example for computation graph: tensorboard_add_numbers.pyI Example for using placeholders: placeholder.pyI Example for symbolic calculation of derivatives: simple_gradient.pyI Example for automatic minimization of a function:
minimize_function_tensorflow.pyI Usefull for the example on the next slide: reduce_sum.py
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 19 / 1
Example: MNIST
I See code example pca_tensorflow.py
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 20 / 1
Summary
I Overview of elementary concepts in linear algebraI Example application: Principal Component Analysis
Dietrich Klakow Neural Networks Implementation and Application Chapter 2 21 / 1