Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Teacher:Gianni A. Di Caro
Lecture 18:Kernel methods 3
Introduction to Machine Learning10-315 Fall ‘19
Disclaimer: These slides can include material from different sources. I’ll happy to explicitly acknowledge a source if required. Contact me for requests.
2
Feature spaces can grow very large and very quickly!
𝑛𝑛 – input features 𝑑𝑑 – degree of polynomial
For a polynomial kernel of degree 𝑑𝑑
E.g., 𝑑𝑑 = 6,𝑛𝑛 = 100 → 1.6 billion terms → dimensions in the feature space!!
# terms of degree 𝑑𝑑 = 𝑑𝑑+𝑚𝑚−1𝑑𝑑 = 𝑑𝑑+𝑛𝑛−1 !
𝑑𝑑! 𝑚𝑚−1 !~ 𝑛𝑛𝑑𝑑
E.g., 𝑛𝑛 = 3,𝑑𝑑 = 2: ϕ 𝑥𝑥1, 𝑥𝑥2, 𝑥𝑥3 = 𝑥𝑥12, 𝑥𝑥22, 𝑥𝑥32, 𝑥𝑥1𝑥𝑥2, 𝑥𝑥2𝑥𝑥1, 𝑥𝑥1𝑥𝑥3, 𝑥𝑥3𝑥𝑥1, 𝑥𝑥2𝑥𝑥3, 𝑥𝑥3𝑥𝑥2𝑇𝑇 → 9 terms of degree 2
3
Kernel matrix can be very large too!
4
Dual soft-margin kernelized training
5
Dual soft-margin kernelized prediction
6
Kernel methods can be slow
7
Complexity and Computational issues
Check the paper:
L. Bottou and C.-J. Lin. Support Vector Machine Solvers. In Large Scale Kernel Machines, L. Bottou, O. Chapelle, D. DeCoste, and J. Weston editors, 1-28, MIT Press, 2007.
Implementation: LIBSVM
https://www.csie.ntu.edu.tw/~cjlin/libsvm/
8
SVMs vs. Logistic Regression
SVMs LogisticRegression
Loss function Hinge loss Log-loss
0-1 loss
0-1 1
Hinge lossLog loss
9
SVMs vs. Logistic Regression
SVM : Hinge loss
Logistic Regression : Log loss ( -ve log conditional likelihood)
0-1 loss
0-1 1
Hinge lossLog loss
10
SVMs vs. Logistic Regression
SVMs LogisticRegression
Loss function Hinge loss Log-loss
High dimensional features with kernels
Yes! Yes!
Solution sparse Often yes! Almost always no!
Semantics of output
“Margin” Real probabilities
11
Kernels in Logistic Regression
o Define weights in terms of features:
o Derive simple gradient descent rule on αi
12
SVMs vs. Logistic Regression
SVMs LogisticRegression
Loss function Hinge loss Log-loss
High dimensional features with kernels
Yes! Yes!
Solution sparse Often yes! Almost always no!
Semantics of output
“Margin” Real probabilities
13
SVMs vs. Logistic Regression
SVMs LogisticRegression
Loss function Hinge loss Log-loss
High dimensional features with kernels
Yes! Yes!
Solution sparse Often yes! Almost always no!
Semantics of output
“Margin” Real probabilities
14
SVMs vs. Logistic Regression
SVMs LogisticRegression
Loss function Hinge loss Log-loss
High dimensional features with kernels
Yes! Yes!
Solution sparse Often yes! Almost always no!
Semantics of output
“Margin” Real probabilities