14
Teacher: Gianni A. Di Caro Lecture 18: Kernel methods 3 Introduction to Machine Learning 10 - 315 Fall ‘19 Disclaimer: These slides can include material from different sources. I’ll happy to explicitly acknowledge a source if required. Contact me for requests.

Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

Teacher:Gianni A. Di Caro

Lecture 18:Kernel methods 3

Introduction to Machine Learning10-315 Fall ‘19

Disclaimer: These slides can include material from different sources. I’ll happy to explicitly acknowledge a source if required. Contact me for requests.

Page 2: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

2

Feature spaces can grow very large and very quickly!

𝑛𝑛 – input features 𝑑𝑑 – degree of polynomial

For a polynomial kernel of degree 𝑑𝑑

E.g., 𝑑𝑑 = 6,𝑛𝑛 = 100 → 1.6 billion terms → dimensions in the feature space!!

# terms of degree 𝑑𝑑 = 𝑑𝑑+𝑚𝑚−1𝑑𝑑 = 𝑑𝑑+𝑛𝑛−1 !

𝑑𝑑! 𝑚𝑚−1 !~ 𝑛𝑛𝑑𝑑

E.g., 𝑛𝑛 = 3,𝑑𝑑 = 2: ϕ 𝑥𝑥1, 𝑥𝑥2, 𝑥𝑥3 = 𝑥𝑥12, 𝑥𝑥22, 𝑥𝑥32, 𝑥𝑥1𝑥𝑥2, 𝑥𝑥2𝑥𝑥1, 𝑥𝑥1𝑥𝑥3, 𝑥𝑥3𝑥𝑥1, 𝑥𝑥2𝑥𝑥3, 𝑥𝑥3𝑥𝑥2𝑇𝑇 → 9 terms of degree 2

Page 3: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

3

Kernel matrix can be very large too!

Page 4: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

4

Dual soft-margin kernelized training

Page 5: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

5

Dual soft-margin kernelized prediction

Page 6: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

6

Kernel methods can be slow

Page 7: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

7

Complexity and Computational issues

Check the paper:

L. Bottou and C.-J. Lin. Support Vector Machine Solvers. In Large Scale Kernel Machines, L. Bottou, O. Chapelle, D. DeCoste, and J. Weston editors, 1-28, MIT Press, 2007.

Implementation: LIBSVM

https://www.csie.ntu.edu.tw/~cjlin/libsvm/

Page 8: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

8

SVMs vs. Logistic Regression

SVMs LogisticRegression

Loss function Hinge loss Log-loss

0-1 loss

0-1 1

Hinge lossLog loss

Page 9: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

9

SVMs vs. Logistic Regression

SVM : Hinge loss

Logistic Regression : Log loss ( -ve log conditional likelihood)

0-1 loss

0-1 1

Hinge lossLog loss

Page 10: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

10

SVMs vs. Logistic Regression

SVMs LogisticRegression

Loss function Hinge loss Log-loss

High dimensional features with kernels

Yes! Yes!

Solution sparse Often yes! Almost always no!

Semantics of output

“Margin” Real probabilities

Page 11: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

11

Kernels in Logistic Regression

o Define weights in terms of features:

o Derive simple gradient descent rule on αi

Page 12: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

12

SVMs vs. Logistic Regression

SVMs LogisticRegression

Loss function Hinge loss Log-loss

High dimensional features with kernels

Yes! Yes!

Solution sparse Often yes! Almost always no!

Semantics of output

“Margin” Real probabilities

Page 13: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

13

SVMs vs. Logistic Regression

SVMs LogisticRegression

Loss function Hinge loss Log-loss

High dimensional features with kernels

Yes! Yes!

Solution sparse Often yes! Almost always no!

Semantics of output

“Margin” Real probabilities

Page 14: Introduction to Machine Learning - web2.qatar.cmu.edugdicaro/10315/lectures/315-F19-18-Ker… · Introduction to Machine Learning 10-315 Fall ‘19 Disclaimer: These slides can include

14

SVMs vs. Logistic Regression

SVMs LogisticRegression

Loss function Hinge loss Log-loss

High dimensional features with kernels

Yes! Yes!

Solution sparse Often yes! Almost always no!

Semantics of output

“Margin” Real probabilities