Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
So far…
Supervised machine learning
Linear models
Non-linear models
Unsupervised machine learning
Generic scaffolding
May 26, 2013
So far…
Supervised machine learning
Linear models
Least squares regression, SVR
Fisher’s discriminant, Perceptron, Logistic model, SVM
Non-linear models
Neural networks, Decision trees, Association rules
Unsupervised machine learning
Clustering/EM, PCA
Generic scaffolding
Probabilistic modeling, ML/MAP estimation
Performance evaluation, Statistical learning theory
Linear algebra, Optimization methods
May 26, 2013
Coming up next…
Supervised machine learning
Linear models
Least squares regression, SVR
Fisher’s discriminant, Perceptron, Logistic model, SVM
Non-linear models
Neural networks, Decision trees, Association rules
Kernel-XXX
Unsupervised machine learning
Clustering/EM, PCA, Kernel-XXX
Generic scaffolding
Probabilistic modeling, ML/MAP estimation
Performance evaluation, Statistical learning theory
Linear algebra, Optimization methods
KernelsMay 26, 2013
Logistic regression, Perceptron, Max. margin,
Fisher’s discriminant, Linear regression, Ridge
Regression, LASSO, …:
𝑓 𝒙 =
May 26, 2013
Logistic regression, Perceptron, Max. margin,
Fisher’s discriminant, Linear regression, Ridge
Regression, LASSO, …:
𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏
May 26, 2013
Logistic regression, Perceptron, Max. margin,
Fisher’s discriminant, Linear regression, Ridge
Regression, LASSO, …:
𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏
PCA, LDA, ICA, …:
𝑓 𝒙 =
May 26, 2013
Logistic regression, Perceptron, Max. margin,
Fisher’s discriminant, Linear regression, Ridge
Regression, LASSO, …:
𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏
PCA, LDA, ICA, …:
𝑓 𝒙 = 𝑨𝒙
May 26, 2013
Logistic regression, Perceptron, Max. margin,
Fisher’s discriminant, Linear regression, Ridge
Regression, LASSO, …:
𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏
PCA, LDA, ICA, …:
𝑓 𝒙 = 𝑨𝒙
K-means:
𝒄𝑖 =1
𝑚𝑿𝑖𝟏
CCA, GLM, …May 26, 2013
Too much linear
Logistic regression, Perceptron, Max. margin,
Fisher’s discriminant, Linear regression, Ridge
Regression, LASSO, …:
𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏
PCA, LDA, ICA, …:
𝒙𝑇 = 𝑨𝒙
K-means:
𝒄𝑖 =1
𝑚𝑿𝑖𝟏
CCA, GLM, …May 26, 2013
Linear is not enough
Limited generalization ability
May 26, 2013
Linear is not enough
Limited generalization ability
May 26, 2013
Linear is not enough
Limited applicability
Text?
Ordinal/Nominal data?
Graphs/Trees/Networks?
Shapes?
Graph nodes?
May 26, 2013
Solutions
May 26, 2013
Solutions
Feature space
Kernels
May 26, 2013
Solutions
Feature space
Nonlinear feature spaces
Kernels
The Kernel Trick
Dual representation
May 26, 2013
Important idea #1
Important idea #2
Important idea #3
𝑓 𝑥 = 𝑤𝑥
May 26, 2013
𝑥 → 𝑥′ ≔ 𝜙 𝑥 ≔ 𝑥, 𝑥2, 𝑥3
May 26, 2013
Nonlinear feature space
𝑥 → 𝑥′ ≔ 𝜙 𝑥 ≔ 𝑥, 𝑥2, 𝑥3
𝑓 𝑥′ = 𝑤1𝑥 + 𝑤2𝑥2 +𝑤3𝑥
3
May 26, 2013
May 26, 2013
𝑥 → 𝜙 𝑥 = (𝑥, 𝑥3−𝑥)
May 26, 2013
𝑥 → 𝜙 𝑥 = (𝑥, 𝑥3−𝑥)
May 26, 2013
𝑥 → 𝜙 𝑥 = (𝑥, 𝑥3−𝑥)
May 26, 2013
Nonlinear feature space
𝑓 𝒙 = 𝒘𝑇𝜙(𝒙)
May 26, 2013
+Support for arbitrary data types
𝜙 text = word counts𝜙 graph = node degrees𝜙 tree = path lengths
…
May 26, 2013
What if the dimensionality is high?
𝑥1, 𝑥2, … , 𝑥𝑚 → 𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚
May 26, 2013
What if the dimensionality is high?
𝑥1, 𝑥2, … , 𝑥𝑚 → 𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚𝑂(𝑚2) elements
For all k-wise products: 𝑂 𝑚𝑘
May 26, 2013
The Kernel Trick
Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)
Consider
𝜙 𝒙 , 𝜙 𝒚 =
𝑖𝑗
𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗
May 26, 2013
The Kernel Trick
Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)
Consider
𝜙 𝒙 , 𝜙 𝒚 =
𝑖𝑗
𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗
=
𝑖𝑗
𝑥𝑖𝑥𝑗𝑦𝑖𝑦𝑗
May 26, 2013
The Kernel Trick
Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)
Consider
𝜙 𝒙 , 𝜙 𝒚 =
𝑖𝑗
𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗
=
𝑖𝑗
𝑥𝑖𝑥𝑗𝑦𝑖𝑦𝑗 =
𝑖𝑗
𝑥𝑖𝑦𝑖𝑥𝑗𝑦𝑗
May 26, 2013
The Kernel Trick
Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)
Consider
𝜙 𝒙 , 𝜙 𝒚 =
𝑖𝑗
𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗
=
𝑖𝑗
𝑥𝑖𝑥𝑗𝑦𝑖𝑦𝑗 =
𝑖𝑗
𝑥𝑖𝑦𝑖𝑥𝑗𝑦𝑗
=
𝑖
𝑥𝑖𝑦𝑖
𝑗
𝑥𝑗𝑦𝑗May 26, 2013
The Kernel Trick
Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)
Consider
𝜙 𝒙 , 𝜙 𝒚 =
𝑖𝑗
𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗
=
𝑖𝑗
𝑥𝑖𝑥𝑗𝑦𝑖𝑦𝑗 =
𝑖𝑗
𝑥𝑖𝑦𝑖𝑥𝑗𝑦𝑗
=
𝑖
𝑥𝑖𝑦𝑖
𝑗
𝑥𝑗𝑦𝑗 =
𝑖
𝑥𝑖𝑦𝑖
2
May 26, 2013
The Kernel Trick
Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)
Consider
𝜙 𝒙 , 𝜙 𝒚 =
𝑖𝑗
𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗
=
𝑖
𝑥𝑖𝑦𝑖
2
= 𝒙, 𝒚 2
May 26, 2013
The Kernel Trick
Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)
Consider
𝜙 𝒙 , 𝜙 𝒚 =
𝑖𝑗
𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗
=
𝑖
𝑥𝑖𝑦𝑖
2
= 𝒙, 𝒚 2
May 26, 2013
The Kernel Trick
Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑛𝑥𝑛)
Consider
𝜙 𝑥 , 𝜙 𝑦 =
𝑖𝑗
𝜙 𝑥 𝑖𝑗𝜙 𝑦 𝑖𝑗
=
𝑖
𝑥𝑖𝑦𝑖
2
= 𝑥, 𝑦 2
May 26, 2013
Polynomial kernel
𝐾 𝒙, 𝒚 = 𝒙, 𝒚 + 𝑅 𝑑
The Kernel Trick
What about:
𝐾 𝒙, 𝒚 = 𝒙, 𝒚 + 0.5 𝒙, 𝒚 2?
May 26, 2013
The Kernel Trick
What about:
𝐾 𝒙, 𝒚 = 𝒙, 𝒚 + 0.5 𝒙, 𝒚 2
=
𝑖
𝑥𝑖𝑦𝑖 + 0.5
𝑖𝑗
𝜙𝑖𝑗 𝒙 𝜙𝑖𝑗(𝒚)
May 26, 2013
The Kernel Trick
What about:
𝐾 𝒙, 𝒚 = 𝒙, 𝒚 + 0.5 𝒙, 𝒚 2
=
𝑖
𝑥𝑖𝑦𝑖 + 0.5
𝑖𝑗
𝜙𝑖𝑗 𝒙 𝜙𝑖𝑗(𝒚)
= ⟨ 𝑥1, … , 𝑥𝑚, √0.5𝑥1𝑥1, … , √0.5𝑥𝑚𝑥𝑚 ,
(𝑦1, … , 𝑦𝑚, √0.5𝑦1𝑦1, … , √0.5𝑦𝑚𝑦𝑚)⟩
May 26, 2013
The Kernel Trick
What about:
𝐾 𝒙, 𝒚 = 𝒙, 𝒚 + 0.5 𝒙, 𝒚 2
=
𝑖
𝑥𝑖𝑦𝑖 + 0.5
𝑖𝑗
𝜙𝑖𝑗 𝒙 𝜙𝑖𝑗(𝒚)
= ⟨ 𝑥1, … , 𝑥𝑚, √0.5𝑥1𝑥1, … , √0.5𝑥𝑚𝑥𝑚 ,
(𝑦1, … , 𝑦𝑚, √0.5𝑦1𝑦1, … , √0.5𝑦𝑚𝑦𝑚)⟩
May 26, 2013
The Kernel Trick
What about:
𝐾 𝑥, 𝑦 = 1 + 𝑥, 𝑦 +1
2𝑥, 𝑦 2 +
1
6𝑥, 𝑦 3 +
1
24𝑥, 𝑦 4?
May 26, 2013
The Kernel Trick
What about:
𝐾 𝑥, 𝑦 =
𝑖=0
∞𝑥, 𝑦 𝑖
𝑖!
May 26, 2013
The Kernel Trick
What about:
𝐾 𝑥, 𝑦 =
𝑖=0
∞𝑥, 𝑦 𝑖
𝑖!= exp⟨𝑥, 𝑦⟩
May 26, 2013
The Kernel Trick
What about:
𝐾 𝑥, 𝑦 =
𝑖=0
∞𝑥, 𝑦 𝑖
𝑖!= exp⟨𝑥, 𝑦⟩
Infinite-dimensional feature space!
May 26, 2013
The Kernel Trick
What about:
𝐾 𝑥, 𝑦 =
𝑖=0
∞𝑥, 𝑦 𝑖
𝑖!= exp⟨𝑥, 𝑦⟩
Infinite-dimensional feature space!
May 26, 2013
Gaussian kernel
𝐾 𝒙, 𝒚 == exp(−𝛾‖𝒙 − 𝒚‖2)
= exp −𝒙 − 𝒚 2
2𝜎2
The Kernel Trick
What about:
𝐾 𝑥, 𝑦 =
𝑖=0
∞𝑥, 𝑦 𝑖
𝑖!= exp⟨𝑥, 𝑦⟩
Infinite-dimensional feature space!
May 26, 2013
Exponential kernel
𝐾 𝒙, 𝒚 = exp −𝒙 − 𝒚
2𝜎2
Kernels
May 26, 2013
http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html
Structured data kernels
String kernels
P-spectrum kernels
All-subsequences kernels
Gap-weighted subsequences kernels
…
Graph & tree kernels
Co-rooted subtrees
All subtrees
Random walks
…
May 26, 2013
Kernel
A function 𝐾(𝒙, 𝒚) is a kernel, if
𝐾 𝒙, 𝒚 = 𝜙 𝒙 , 𝜙 𝒚
for some feature map 𝜙.
May 26, 2013
Kernel matrix
For a given kernel function 𝐾 and a finite
dataset (𝒙1, 𝒙2, … , 𝒙𝑛) the 𝑛 × 𝑛 matrix
𝑲𝑖𝑗 ≔ 𝐾 𝒙𝑖 , 𝒙𝑗
is called the kernel matrix.
May 26, 2013
Kernel matrix
Let 𝑿 be the data matrix, then
𝑲 = 𝑿𝑿𝑇
is the kernel matrix for the linear kernel
𝐾 𝒙, 𝒚 = 𝒙𝑇𝒚
May 26, 2013
Kernel matrix
Let 𝑿 be the data matrix, then
𝑲 = 𝑿𝑿𝑇
is the kernel matrix for the linear kernel
𝐾 𝒙, 𝒚 = 𝒙𝑇𝒚
Let 𝜙 be a feature mapping. Then*
𝑲 = 𝜙 𝑿 𝜙 𝑿 𝑇
is the kernel matrix for the corresponding
kernel 𝐾 𝒙, 𝒚 = ⟨𝜙 𝒙 , 𝜙 𝒚 ⟩.
May 26, 2013
Kernel theorem
Not every function K is a kernel!
May 26, 2013
Example?
Kernel theorem
Not every function K is a kernel!
e. g. 𝐾 𝑥, 𝑦 = −1 is not
Not every 𝑛 × 𝑛 matrix is a Kernel matrix!
May 26, 2013
Kernel theorem
Theorem:
𝐾 is a kernel function ⇔ 𝐾 is symmetric positive
semidefinite
A function is positive semidefinite iff for any
finite dataset {𝒙1, 𝒙2, … , 𝒙𝑛} the corresponding
kernel matrix is positive semidefinite.
May 26, 2013
Kernel closure
May 26, 2013
Kernel closure
May 26, 2013
Feature space concatenation
Kernel closure
May 26, 2013
Feature space scaling
Kernel closure
May 26, 2013
Feature space tensor product
Kernel closure
May 26, 2013
Feature map composition
Kernel normalization
Let 𝜙′ 𝑥 =𝜙 𝑥
𝜙 𝑥
Then
𝐾′ 𝑥, 𝑦 = 𝜙′ 𝑥 , 𝜙′ 𝑦 =𝜙 𝑥
𝜙 𝑥,𝜙 𝑦
𝜙 𝑦=
𝜙 𝑥 ,𝜙 𝑦
𝜙 𝑥 2 𝜙 𝑦 2=
=𝐾 𝑥, 𝑦
𝐾 𝑥, 𝑥 𝐾 𝑦, 𝑦
May 26, 2013
Kernel matrix normalization
Then
𝐾′ 𝑥, 𝑦 = 𝜙′ 𝑥 , 𝜙′ 𝑦 =𝜙 𝑥
𝜙 𝑥,𝜙 𝑦
𝜙 𝑦=
𝜙 𝑥 , 𝜙 𝑦
𝜙 𝑥 2 𝜙 𝑦 2=
=𝐾 𝑥, 𝑦
𝐾 𝑥, 𝑥 𝐾 𝑦, 𝑦
𝐾′𝑖𝑗 ≔𝐾𝑖𝑗
𝐾𝑖𝑖𝐾𝑗𝑗
May 26, 2013
Kernel matrix centering
𝒙𝑖 → 𝒙𝑖 −1
𝑛
𝑘
𝒙𝑘
May 26, 2013
Kernel matrix centering
𝒙𝑖 → 𝒙𝑖 −1
𝑛
𝑘
𝒙𝑘
May 26, 2013
Kernel matrix centering
𝒙𝑖 → 𝒙𝑖 −1
𝑛
𝑘
𝒙𝑘
𝑿 → 𝑿 −1
𝑛𝟏𝑛𝟏𝑛𝑇𝑿
May 26, 2013
Kernel matrix centering
𝒙𝑖 → 𝒙𝑖 −1
𝑛
𝑘
𝒙𝑘
𝑿 → 𝑿−1
𝑛𝟏𝑛𝟏𝑛𝑇𝑿
𝑿𝑿𝑇 → 𝑿−1
𝑛𝟏𝟏𝑇𝑿 𝑿 −
1
𝑛𝟏𝟏𝑇𝑿
𝑇
May 26, 2013
Kernel matrix centering
𝒙𝑖 → 𝒙𝑖 −1
𝑛
𝑘
𝒙𝑘
𝑿 → 𝑿−1
𝑛𝟏𝑛𝟏𝑛𝑇𝑿
𝑿𝑿𝑇 → 𝑿−1
𝑛𝟏𝟏𝑇𝑿 𝑿 −
1
𝑛𝟏𝟏𝑇𝑿
𝑇
𝑿𝑿𝑇
→ 𝑿𝑿𝑇 −1
𝑛𝟏𝟏𝑇𝑿𝑿𝑇 −
1
𝑛𝑿𝑿𝑇𝟏𝟏𝑇
+1
𝑛2𝟏𝟏𝑇𝑿𝑿𝑇𝟏𝟏𝑇 May 26, 2013
Kernel matrix centering
𝑿𝑿𝑇
→ 𝑿𝑿𝑇 −1
𝑛𝟏𝟏𝑇𝑿𝑿𝑇 −
1
𝑛𝑿𝑿𝑇𝟏𝟏𝑇
+1
𝑛2𝟏𝟏𝑇𝑿𝑿𝑇𝟏𝟏𝑇
𝑲cent
≔ 𝑲−1
𝑛𝟏𝟏𝑇𝑲−
1
𝑛𝑲𝟏𝟏𝑇 +
1
𝑛2𝟏𝟏𝑇𝑲𝟏𝟏𝑇
May 26, 2013
The Dual Representation
Let 𝐴 be the input space, and let 𝐵 be the
higher-dimensional feature space.
Let 𝜙: 𝐴 → 𝐵 be the feature map.
Fix a dataset {𝒙1, 𝒙2, … , 𝒙𝑛} ⊂ 𝐴
Let 𝑤 = 𝑖 𝛼𝑖𝜙(𝒙𝑖) ∈ 𝐵
We say that 𝛼𝑖 are the dual coordinates for 𝑤.
May 26, 2013
Dual coordinates
𝒘 =
𝑖
𝛼𝑖𝜙(𝒙𝑖) = 𝜙 𝑿𝑇𝜶 = 𝚵𝑻𝜶
Note that 𝚵𝚵𝑇 = 𝜙 𝑿 𝜙 𝑿 𝑇 = 𝑲
Now we can do all of the useful stuff using dual
coordinates only.
May 26, 2013
Dual coordinates
Let
𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷
Then
2𝒘 =
May 26, 2013
Dual coordinates
Let
𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷
Then
2𝒘 = 𝚵T(2𝜶)
May 26, 2013
Dual coordinates
Let
𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷
Then
2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 =
May 26, 2013
Dual coordinates
Let
𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷
Then
2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T(𝜶 + 𝜷)
May 26, 2013
Dual coordinates
Let
𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷
Then
2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T 𝜶 + 𝜷𝒘, 𝒖 =
May 26, 2013
Dual coordinates
Let
𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷
Then
2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T 𝜶 + 𝜷𝒘, 𝒖 = 𝒘𝑇𝒖 = 𝜶𝑇𝚵𝚵𝑇𝜷 = 𝜶𝑇𝑲𝜷
May 26, 2013
Dual coordinates
Let
𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷
Then
2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T 𝜶 + 𝜷𝒘,𝒖 = 𝒘𝑇𝒖 = 𝜶𝚵𝚵𝑇𝜷 = 𝜶𝑲𝜷𝒘− 𝒖 2 =
May 26, 2013
Dual coordinates
Let
𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷
Then
2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T 𝜶 + 𝜷𝒘, 𝒖 = 𝒘𝑇𝒖 = 𝜶𝚵𝚵𝑇𝜷 = 𝜶𝑲𝜷𝒘− 𝒖 2 = 𝒘𝑇𝒘+ 𝒖𝑇𝒖 − 𝟐𝒘𝑇𝒖 = ⋯
May 26, 2013
Dual coordinates
Let
𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷
Then
2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T 𝜶 + 𝜷𝒘, 𝒖 = 𝒘𝑇𝒖 = 𝜶𝚵𝚵𝑇𝜷 = 𝜶𝑲𝜷𝒘− 𝒖 2 = 𝒘𝑇𝒘+ 𝒖𝑇𝒖 − 𝟐𝒘𝑇𝒖 = ⋯
May 26, 2013
Kernelization
Recall the Perceptron:
May 26, 2013
Kernelization
Recall the Perceptron:
Initialize 𝒘 ≔ 𝟎
Find a misclassified example (𝑥𝑖 , 𝑦𝑖)
Update weights:
𝒘 ≔ 𝒘+ 𝜇𝑦𝑖𝒙𝒊 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖
May 26, 2013
Kernelization
Recall the Perceptron:
Initialize 𝒘 ≔ 𝟎 ⇔ 𝜶 ≔ 𝟎
Find a misclassified example (𝑥𝑖 , 𝑦𝑖)
Update weights:
𝒘 ≔ 𝒘+ 𝜇𝑦𝑖𝒙𝒊 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖
May 26, 2013
Kernelization
Recall the Perceptron:
Initialize 𝒘 ≔ 𝟎 ⇔ 𝜶 ≔ 𝟎
Find a misclassified example (𝑥𝑖 , 𝑦𝑖)
Update weights:
𝒘 ≔ 𝒘+ 𝜇𝑦𝑖𝒙𝒊 ⇔ 𝛼𝑖 ≔ 𝛼𝑖 + 𝜇𝑦𝑖 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖
May 26, 2013
Kernelization
Recall the Perceptron:
Initialize 𝜶 ≔ 𝟎
Find a misclassified example (𝑥𝑖 , 𝑦𝑖)
Update weights:
𝛼𝑖 ≔ 𝛼𝑖 + 𝜇𝑦𝑖 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖
May 26, 2013
Kernelization
Recall the Perceptron:
Initialize 𝜶 ≔ 𝟎
Find a misclassified example (𝑥𝑖 , 𝑦𝑖)
𝒘𝑇𝒙𝑖 + 𝑏 ≠ 𝑦𝑖 ⇔ 𝑗 𝛼𝑗𝒙𝑗𝑇𝒙𝑖 + 𝑏 ≠ 𝑦𝑖
Update weights:
𝛼𝑖 ≔ 𝛼𝑖 + 𝜇𝑦𝑖 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖
May 26, 2013
Kernelization
Recall the Perceptron:
Initialize 𝜶 ≔ 𝟎
Find a misclassified example (𝑥𝑖 , 𝑦𝑖)
𝒘𝑇𝒙𝑖 + 𝑏 ≠ 𝑦𝑖 ⇔ 𝑲𝑖𝜶 + 𝑏 ≠ 𝑦𝑖
Update weights:
𝛼𝑖 ≔ 𝛼𝑖 + 𝜇𝑦𝑖 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖
May 26, 2013
Kernelization
Recall the Perceptron:
Initialize 𝜶 ≔ 𝟎
Find a misclassified example (𝑥𝑖 , 𝑦𝑖)
𝑲𝑖𝜶 + 𝑏 ≠ 𝑦𝑖
Update weights:
𝛼𝑖 ≔ 𝛼𝑖 + 𝜇𝑦𝑖 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖
May 26, 2013
Quiz
Today we heard three important ideas
Important idea #1: __________
Important idea #2: __________
Important idea #3: __________
Function/matrix 𝐾 is a kernel function/matrix
iff it is __________
Dual representation: ___ = ___ __
May 26, 2013
Quiz
Those algoritms have kernelized versions:
___________________________ …
May 26, 2013
May 26, 2013