27
Pattern Recognition and Machine Learning: Introduction Libao Jin November 17, 2016

Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Embed Size (px)

Citation preview

Page 1: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Pattern Recognition and Machine Learning:Introduction

Libao Jin

November 17, 2016

Page 2: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Example: Handwritten Digit Recognition

Training Set: x, to tune the parameters of an adaptive model

Target Vector: t, to express the category of a digitNote that there is one such target vector t for each digit imagex

Page 3: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Example: Handwritten Digit Recognition

Training Set: x, to tune the parameters of an adaptive modelTarget Vector: t, to express the category of a digit

Note that there is one such target vector t for each digit imagex

Page 4: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Example: Handwritten Digit Recognition

Training Set: x, to tune the parameters of an adaptive modelTarget Vector: t, to express the category of a digitNote that there is one such target vector t for each digit imagex

Page 5: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

The Result of Running the Machine Learning Algorithm

y = y(x), which encoded in the same way as the target vectors

Once the model is trained it can then determine the identity ofnew digit images, which are said to comprise a test setIn practical applications, training data can comprise only a tinyfraction of all possible input vectors, and so generalization is acentral goal in pattern recognition

Page 6: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

The Result of Running the Machine Learning Algorithm

y = y(x), which encoded in the same way as the target vectorsOnce the model is trained it can then determine the identity ofnew digit images, which are said to comprise a test set

In practical applications, training data can comprise only a tinyfraction of all possible input vectors, and so generalization is acentral goal in pattern recognition

Page 7: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

The Result of Running the Machine Learning Algorithm

y = y(x), which encoded in the same way as the target vectorsOnce the model is trained it can then determine the identity ofnew digit images, which are said to comprise a test setIn practical applications, training data can comprise only a tinyfraction of all possible input vectors, and so generalization is acentral goal in pattern recognition

Page 8: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Polynomial Curve Fitting

Training Set (blue circles): x ≡ (x1, . . . , xN )T

Target Vector (green line): t ≡ (t1, . . . , tN )T

y(x, w) = w0 + w1x + w2x2 + . . . + wM xM =M∑

j=0wjxj

Page 9: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Polynomial Curve Fitting

Training Set (blue circles): x ≡ (x1, . . . , xN )T

Target Vector (green line): t ≡ (t1, . . . , tN )T

y(x, w) = w0 + w1x + w2x2 + . . . + wM xM =M∑

j=0wjxj

Page 10: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Polynomial Curve Fitting

Training Set (blue circles): x ≡ (x1, . . . , xN )T

Target Vector (green line): t ≡ (t1, . . . , tN )T

y(x, w) = w0 + w1x + w2x2 + . . . + wM xM =M∑

j=0wjxj

Page 11: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Sum-of-Squares Error Function

E(w) = 12

N∑n=1{y(xn, w)− tn}2

Page 12: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Minimize Sum-of-Squares Error Function

E(w) = 12

N∑n=1{y(xn, w)− tn}2 = 1

2

N∑n=1

M∑j=0

wjxjn − tn

2

∂E(w)∂wj

=N∑

n=1

M∑j=0

wjxjn − tn

xjn

=[xj

1 · · · xjN

]

x01 x1 · · · xM

1x0

2 x2 · · · xM2

...... . . . ...

x0N xN · · · xM

N

w0w1...

wM

t1t2...

tN

=[xj

1 · · · xjN

](Xw− t) = 0⇒ w = (XT X)−1XT t

Page 13: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

0th Order Polynomial

Page 14: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

1st Order Polynomial

Page 15: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

3rd Order Polynomial

Page 16: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

9th Order Polynomial

Page 17: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Over-fitting

Root-Mean-Square (RMS) Error: ERMS =√

2E(w∗)/N

Page 18: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Polynomial Coefficients

Page 19: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Data Set Size: N = 15

9th Order Polynomial

Page 20: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Data Set Size: N = 100

9th Order Polynomial

Page 21: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Probability Theory

Marginal Probability: p(X = xi) = ciN .

Joint Probability: p(X = xi, Y = yj) = nij

N .

Conditional Probability: p(Y = yj |X = xi) = nij

ci.

Page 22: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Probability Theory

Sum Rule:p(X = xi) = ci

N = 1N

∑Lj=1 nij =

∑Lj=1 p(X = xi, Y = yj).

Product Rule:p(X = xi, Y = yj) = nij

N = nij

ci· ci

N = p(Y = yj |X = xi)p(X = xi).

Page 23: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

The Rules of Probability

Sum Rule p(X) =∑Y

p(X, Y )

Product Rule p(X, Y ) = p(Y |X)p(X)

Page 24: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Bayes’ Theorem

By Product Rule, we have

p(X, Y ) = p(Y, X)⇒ p(Y |X)p(X) = p(X|Y )p(Y )

p(Y |X) = p(X|Y )p(Y )p(X)

p(X) =∑Y

P (X|Y )p(Y )

posterior ∝ likelihood × prior

Page 25: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Probability Density

P (z) =∫ z

−∞p(x)dx

p(x) ≥ 0∫ ∞−∞

p(x)dx = 1

Page 26: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Expectations

E[f ] =∑

x p(x)f(x) E[f ] =∫

p(x)f(x)dx

E[f |y] =∑

x p(x|y)f(x) Conditional ExpectationE[f ] ≈ 1

N

∑Nn=1 f(xn) Approximate Expectation

Page 27: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters

Variances and Covariances

var[f ] = E[(f(x)− E[f(x)])2] = E[f(x)2]− E[f(x)]2.

cov[x, y] = Ex,y[{x− E[x]}{y − E[y]}] = Ex,y[xy]− E[x]E[y].

cov[x, y] = Ex,y[{x−E[x]}{yT−E[yT ]}] = Ex,y[xyT ]−E[x]E[yT ].