Chapter 2: Linear Algebra and Principal Component Analysis ... · PDF fileDietrichKlakow NeuralNetworksImplementationandApplication Chapter2 3/1. Matrices I Matrices:twodimensionalarrayofnumbersnumber

Chapter 2: Linear Algebra and PrincipalComponent Analysis (PCA)

Dietrich KlakowSpoken Language Systems

Saarland University, [email protected]

Neural Networks Implementation and Application

Outline

Dietrich Klakow Neural Networks Implementation and Application Chapter 2 1 / 1

Motivation

I Linear algebra (matrices) will be needed throughout the lectureI Principal component analysis is

I A machine learning algorithm solely using linear algebraI A simple example of representation learning


Scalars and Vectors

I Scalars: a scalar is a single numberI Vectors: a vector is an array of numbersI elements of the array are typically real numbersI

x =

x1x2...

xn

(1)


Matrices

I Matrices: two dimensional array of numbers numberI elements of the array are typically real numbers Ai,j ∈ Rm×n

I For example a 3× 2 matrix is

A =

A1,1 A1,1

A2,1 A2,2

A3,1 A3,3

(2)

I First index is rows, second one columnsI Transpose of a matrix: mirror matrix

A> =

[A1,1 A2,1 A3,1

A1,2 A2,2 A3,2

](3)

I In general (A>)i,j = Aj,i


Tensors

I Tensors: generalization of matrices to more dimensionsI E.g. a rank three tensor is a the dimensional array Ai,j,k ∈ Rl×m×n


Product of Matrices

I Let Ai,j ∈ Rm×n, Bi,j ∈ Rn×p and Ci,j ∈ Rm×p

I Matrix product is denoted by C = ABI Elements are calculated by:

Ci,k =∑

jAi,jBj,k (4)

I Matrix product is distributive

C(A+ B) = CA+ CB (5)

I Matrix product is associative

A(BC) = (AB)C (6)

I Matrix product is in general not commutative

AB 6= BA (7)


Product of Matrices

I Transpose of product(AB)> = B>A> (8)

I Linear system of equationsAx = b (9)


Identity and Inverse Matrices

I Identity matrix In

(In)i,i = 1 and (In)i 6=j = 0 (10)

I Example

I3 =

1 0 00 1 00 0 1

(11)

I Inverse matrix is denoted by A−1

I It satisfiesA−1A = In (12)

I Linear system of equations is solved by

x = A−1b (13)


Norms

I A norm is a function f that satisfies:I f (x) = 0 ⇒ x = 0I f (x+ y) ≤ f (x) + f (y)I ∀α ∈ R , f (αx) = |α|f (x)

I Example Lp norm

||x||p =

(∑i

|xi |p) 1

p

(14)

I p = 2 is the Euclidean norm for short ||x||I It can also be calculated by x>xI L1 norm simplifies to ||x||1 =

∑i |xi |

I Frobenius norm for matrices ||A||F =√∑

i,j A2i,j


Special Kinds of Matrices and Vectors

I Symmetric matrix: A = A>

I Orthogonal matrix: AA> = A>A = II Orthogonality implies A−1 = A>

I Unit vector: ||x||2 = 1

I Orthogonal vectors: x>y = 0


Eigendecomposition and SVD

I An eigenvector A of a matrix A satisfies Av = λvI λ is called eigenvalueI The eigendecomposition is A = V diagλV>

I V matrix containing all eigenvectorsI diagλ diagonal matrix with all eigenvaluesI Singular Value Decomposition (SVD)

A = UDV> (15)

I U and V are orthogonal matricesI D is a diagonal matrix of so-called singular values


Moore-Penrose Pseudoinverse

I Matrix inversion is not defined for matrices that are not squareI Task solve linear equation Ax = y approximatelyI as to minimize ||Ax− y||2I Solution Moore-Penrose Pseudoinverse

A+ = limα↘0

(A>A+ αI)−1A> (16)

I Can be calculated using SVD


Motivation for PCA

I Lossy compression/encoding of dataI Remove less relevant dimensions by projection

Figure: Idea of PCA (source: Wikipedia)Dietrich Klakow Neural Networks Implementation and Application Chapter 2 13 / 1

Coding and Decoding

I Collection of m points {x(1), ..., x(m)} with x(i) ∈ Rn

I For each point x(i) ∈ Rn find code vector c(i) ∈ Rl

I l is smaller than nI Define mapping function f (): c(i) = f (x(i))I Decoding function g(): x(i) ≈ g(f (x(i)))


Linear Coding and Decoding

I PCA is defined by linear coding and decodingI Define g(c) = Dc with D ∈ Rn×l

I Constraint: columns of D are orthonormalI Objective for finding D, minimize L2 norm:

c∗ = arg minc

‖x− g(c)‖2 (17)

I Result:c∗ = D>x (18)

I Thus:f (x) = D>x (19)


The Optimal Decoding Matrix D

I Introduce reconstruction operation

r(x) = g(f (x)) = DD>x (20)

I Minimize reconstruction error

D∗ = arg minD

∑i

‖x(i) − r(x(i))‖2 (21)

= arg minD

∑i

‖x(i) −DD>x(i)‖2 (22)

subject to D>D = Il


Determining D in Theano

I .


Exact Solution for Decoding Matrix D

I The mean of all training vectors needs to be 0 (remove mean!)I Let X ∈ Rm×n be the matrix obtained by stacking all training vectorsI that is Xi,: = x(i)>

I D contains the l eigenvectors corresponding to the largest eigenvalues of X>X


Tensorflow: simple Examples

I Tensorflow is a popular software toolkit for neural networksI Define network as computation graphI Example for computation graph: tensorboard_add_numbers.pyI Example for using placeholders: placeholder.pyI Example for symbolic calculation of derivatives: simple_gradient.pyI Example for automatic minimization of a function:

minimize_function_tensorflow.pyI Usefull for the example on the next slide: reduce_sum.py


Example: MNIST

I See code example pca_tensorflow.py


Summary

I Overview of elementary concepts in linear algebraI Example application: Principal Component Analysis


Documents

Chapter 2: Linear Algebra and Principal Component Analysis ... · PDF fileDietrichKlakow NeuralNetworksImplementationandApplication Chapter2 3/1. Matrices I Matrices:twodimensionalarrayofnumbersnumber