Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… ·...

Preview:

Citation preview

Teacher:Gianni A. Di Caro

Lecture 18:Kernel methods 3

Introduction to Machine Learning10-315 Fall ‘19

Disclaimer: These slides can include material from different sources. I’ll happy to explicitly acknowledge a source if required. Contact me for requests.

2

Feature spaces can grow very large and very quickly!

𝑛𝑛 – input features 𝑑𝑑 – degree of polynomial

For a polynomial kernel of degree 𝑑𝑑

E.g., 𝑑𝑑 = 6,𝑛𝑛 = 100 → 1.6 billion terms → dimensions in the feature space!!

# terms of degree 𝑑𝑑 = 𝑑𝑑+𝑚𝑚−1𝑑𝑑 = 𝑑𝑑+𝑛𝑛−1 !

𝑑𝑑! 𝑚𝑚−1 !~ 𝑛𝑛𝑑𝑑

E.g., 𝑛𝑛 = 3,𝑑𝑑 = 2: ϕ 𝑥𝑥1, 𝑥𝑥2, 𝑥𝑥3 = 𝑥𝑥12, 𝑥𝑥22, 𝑥𝑥32, 𝑥𝑥1𝑥𝑥2, 𝑥𝑥2𝑥𝑥1, 𝑥𝑥1𝑥𝑥3, 𝑥𝑥3𝑥𝑥1, 𝑥𝑥2𝑥𝑥3, 𝑥𝑥3𝑥𝑥2𝑇𝑇 → 9 terms of degree 2

3

Kernel matrix can be very large too!

4

Dual soft-margin kernelized training

5

Dual soft-margin kernelized prediction

6

Kernel methods can be slow

7

Complexity and Computational issues

Check the paper:

L. Bottou and C.-J. Lin. Support Vector Machine Solvers. In Large Scale Kernel Machines, L. Bottou, O. Chapelle, D. DeCoste, and J. Weston editors, 1-28, MIT Press, 2007.

Implementation: LIBSVM

https://www.csie.ntu.edu.tw/~cjlin/libsvm/

8

SVMs vs. Logistic Regression

SVMs LogisticRegression

Loss function Hinge loss Log-loss

0-1 loss

0-1 1

Hinge lossLog loss

9

SVMs vs. Logistic Regression

SVM : Hinge loss

Logistic Regression : Log loss ( -ve log conditional likelihood)

0-1 loss

0-1 1

Hinge lossLog loss

10

SVMs vs. Logistic Regression

SVMs LogisticRegression

Loss function Hinge loss Log-loss

High dimensional features with kernels

Yes! Yes!

Solution sparse Often yes! Almost always no!

Semantics of output

“Margin” Real probabilities

11

Kernels in Logistic Regression

o Define weights in terms of features:

o Derive simple gradient descent rule on αi

12

SVMs vs. Logistic Regression

SVMs LogisticRegression

Loss function Hinge loss Log-loss

High dimensional features with kernels

Yes! Yes!

Solution sparse Often yes! Almost always no!

Semantics of output

“Margin” Real probabilities

13

SVMs vs. Logistic Regression

SVMs LogisticRegression

Loss function Hinge loss Log-loss

High dimensional features with kernels

Yes! Yes!

Solution sparse Often yes! Almost always no!

Semantics of output

“Margin” Real probabilities

14

SVMs vs. Logistic Regression

SVMs LogisticRegression

Loss function Hinge loss Log-loss

High dimensional features with kernels

Yes! Yes!

Solution sparse Often yes! Almost always no!

Semantics of output

“Margin” Real probabilities

Recommended