89
Kernel Methods Konstantin Tretyakov ([email protected]) MTAT.03.227 Machine Learning

Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel Methods

Konstantin Tretyakov ([email protected])

MTAT.03.227 Machine Learning

Page 2: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

So far…

Supervised machine learning

Linear models

Non-linear models

Unsupervised machine learning

Generic scaffolding

May 26, 2013

Page 3: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

So far…

Supervised machine learning

Linear models

Least squares regression, SVR

Fisher’s discriminant, Perceptron, Logistic model, SVM

Non-linear models

Neural networks, Decision trees, Association rules

Unsupervised machine learning

Clustering/EM, PCA

Generic scaffolding

Probabilistic modeling, ML/MAP estimation

Performance evaluation, Statistical learning theory

Linear algebra, Optimization methods

May 26, 2013

Page 4: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Coming up next…

Supervised machine learning

Linear models

Least squares regression, SVR

Fisher’s discriminant, Perceptron, Logistic model, SVM

Non-linear models

Neural networks, Decision trees, Association rules

Kernel-XXX

Unsupervised machine learning

Clustering/EM, PCA, Kernel-XXX

Generic scaffolding

Probabilistic modeling, ML/MAP estimation

Performance evaluation, Statistical learning theory

Linear algebra, Optimization methods

KernelsMay 26, 2013

Page 5: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Logistic regression, Perceptron, Max. margin,

Fisher’s discriminant, Linear regression, Ridge

Regression, LASSO, …:

𝑓 𝒙 =

May 26, 2013

Page 6: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Logistic regression, Perceptron, Max. margin,

Fisher’s discriminant, Linear regression, Ridge

Regression, LASSO, …:

𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏

May 26, 2013

Page 7: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Logistic regression, Perceptron, Max. margin,

Fisher’s discriminant, Linear regression, Ridge

Regression, LASSO, …:

𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏

PCA, LDA, ICA, …:

𝑓 𝒙 =

May 26, 2013

Page 8: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Logistic regression, Perceptron, Max. margin,

Fisher’s discriminant, Linear regression, Ridge

Regression, LASSO, …:

𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏

PCA, LDA, ICA, …:

𝑓 𝒙 = 𝑨𝒙

May 26, 2013

Page 9: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Logistic regression, Perceptron, Max. margin,

Fisher’s discriminant, Linear regression, Ridge

Regression, LASSO, …:

𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏

PCA, LDA, ICA, …:

𝑓 𝒙 = 𝑨𝒙

K-means:

𝒄𝑖 =1

𝑚𝑿𝑖𝟏

CCA, GLM, …May 26, 2013

Page 10: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Too much linear

Logistic regression, Perceptron, Max. margin,

Fisher’s discriminant, Linear regression, Ridge

Regression, LASSO, …:

𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏

PCA, LDA, ICA, …:

𝒙𝑇 = 𝑨𝒙

K-means:

𝒄𝑖 =1

𝑚𝑿𝑖𝟏

CCA, GLM, …May 26, 2013

Page 11: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Linear is not enough

Limited generalization ability

May 26, 2013

Page 12: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Linear is not enough

Limited generalization ability

May 26, 2013

Page 13: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Linear is not enough

Limited applicability

Text?

Ordinal/Nominal data?

Graphs/Trees/Networks?

Shapes?

Graph nodes?

May 26, 2013

Page 14: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Solutions

May 26, 2013

Page 15: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Solutions

Feature space

Kernels

May 26, 2013

Page 16: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Solutions

Feature space

Nonlinear feature spaces

Kernels

The Kernel Trick

Dual representation

May 26, 2013

Important idea #1

Important idea #2

Important idea #3

Page 17: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

𝑓 𝑥 = 𝑤𝑥

May 26, 2013

Page 18: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

𝑥 → 𝑥′ ≔ 𝜙 𝑥 ≔ 𝑥, 𝑥2, 𝑥3

May 26, 2013

Page 19: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Nonlinear feature space

𝑥 → 𝑥′ ≔ 𝜙 𝑥 ≔ 𝑥, 𝑥2, 𝑥3

𝑓 𝑥′ = 𝑤1𝑥 + 𝑤2𝑥2 +𝑤3𝑥

3

May 26, 2013

Page 20: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

May 26, 2013

Page 21: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

𝑥 → 𝜙 𝑥 = (𝑥, 𝑥3−𝑥)

May 26, 2013

Page 22: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

𝑥 → 𝜙 𝑥 = (𝑥, 𝑥3−𝑥)

May 26, 2013

Page 23: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

𝑥 → 𝜙 𝑥 = (𝑥, 𝑥3−𝑥)

May 26, 2013

Page 24: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Nonlinear feature space

𝑓 𝒙 = 𝒘𝑇𝜙(𝒙)

May 26, 2013

Page 25: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

+Support for arbitrary data types

𝜙 text = word counts𝜙 graph = node degrees𝜙 tree = path lengths

May 26, 2013

Page 26: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

What if the dimensionality is high?

𝑥1, 𝑥2, … , 𝑥𝑚 → 𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚

May 26, 2013

Page 27: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

What if the dimensionality is high?

𝑥1, 𝑥2, … , 𝑥𝑚 → 𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚𝑂(𝑚2) elements

For all k-wise products: 𝑂 𝑚𝑘

May 26, 2013

Page 28: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)

Consider

𝜙 𝒙 , 𝜙 𝒚 =

𝑖𝑗

𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗

May 26, 2013

Page 29: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)

Consider

𝜙 𝒙 , 𝜙 𝒚 =

𝑖𝑗

𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗

=

𝑖𝑗

𝑥𝑖𝑥𝑗𝑦𝑖𝑦𝑗

May 26, 2013

Page 30: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)

Consider

𝜙 𝒙 , 𝜙 𝒚 =

𝑖𝑗

𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗

=

𝑖𝑗

𝑥𝑖𝑥𝑗𝑦𝑖𝑦𝑗 =

𝑖𝑗

𝑥𝑖𝑦𝑖𝑥𝑗𝑦𝑗

May 26, 2013

Page 31: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)

Consider

𝜙 𝒙 , 𝜙 𝒚 =

𝑖𝑗

𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗

=

𝑖𝑗

𝑥𝑖𝑥𝑗𝑦𝑖𝑦𝑗 =

𝑖𝑗

𝑥𝑖𝑦𝑖𝑥𝑗𝑦𝑗

=

𝑖

𝑥𝑖𝑦𝑖

𝑗

𝑥𝑗𝑦𝑗May 26, 2013

Page 32: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)

Consider

𝜙 𝒙 , 𝜙 𝒚 =

𝑖𝑗

𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗

=

𝑖𝑗

𝑥𝑖𝑥𝑗𝑦𝑖𝑦𝑗 =

𝑖𝑗

𝑥𝑖𝑦𝑖𝑥𝑗𝑦𝑗

=

𝑖

𝑥𝑖𝑦𝑖

𝑗

𝑥𝑗𝑦𝑗 =

𝑖

𝑥𝑖𝑦𝑖

2

May 26, 2013

Page 33: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)

Consider

𝜙 𝒙 , 𝜙 𝒚 =

𝑖𝑗

𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗

=

𝑖

𝑥𝑖𝑦𝑖

2

= 𝒙, 𝒚 2

May 26, 2013

Page 34: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑚𝑥𝑚)

Consider

𝜙 𝒙 , 𝜙 𝒚 =

𝑖𝑗

𝜙 𝒙 𝑖𝑗𝜙 𝒚 𝑖𝑗

=

𝑖

𝑥𝑖𝑦𝑖

2

= 𝒙, 𝒚 2

May 26, 2013

Page 35: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

Let 𝜙 𝒙 = (𝑥1𝑥1, 𝑥1𝑥2, … , 𝑥𝑛𝑥𝑛)

Consider

𝜙 𝑥 , 𝜙 𝑦 =

𝑖𝑗

𝜙 𝑥 𝑖𝑗𝜙 𝑦 𝑖𝑗

=

𝑖

𝑥𝑖𝑦𝑖

2

= 𝑥, 𝑦 2

May 26, 2013

Polynomial kernel

𝐾 𝒙, 𝒚 = 𝒙, 𝒚 + 𝑅 𝑑

Page 36: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

What about:

𝐾 𝒙, 𝒚 = 𝒙, 𝒚 + 0.5 𝒙, 𝒚 2?

May 26, 2013

Page 37: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

What about:

𝐾 𝒙, 𝒚 = 𝒙, 𝒚 + 0.5 𝒙, 𝒚 2

=

𝑖

𝑥𝑖𝑦𝑖 + 0.5

𝑖𝑗

𝜙𝑖𝑗 𝒙 𝜙𝑖𝑗(𝒚)

May 26, 2013

Page 38: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

What about:

𝐾 𝒙, 𝒚 = 𝒙, 𝒚 + 0.5 𝒙, 𝒚 2

=

𝑖

𝑥𝑖𝑦𝑖 + 0.5

𝑖𝑗

𝜙𝑖𝑗 𝒙 𝜙𝑖𝑗(𝒚)

= ⟨ 𝑥1, … , 𝑥𝑚, √0.5𝑥1𝑥1, … , √0.5𝑥𝑚𝑥𝑚 ,

(𝑦1, … , 𝑦𝑚, √0.5𝑦1𝑦1, … , √0.5𝑦𝑚𝑦𝑚)⟩

May 26, 2013

Page 39: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

What about:

𝐾 𝒙, 𝒚 = 𝒙, 𝒚 + 0.5 𝒙, 𝒚 2

=

𝑖

𝑥𝑖𝑦𝑖 + 0.5

𝑖𝑗

𝜙𝑖𝑗 𝒙 𝜙𝑖𝑗(𝒚)

= ⟨ 𝑥1, … , 𝑥𝑚, √0.5𝑥1𝑥1, … , √0.5𝑥𝑚𝑥𝑚 ,

(𝑦1, … , 𝑦𝑚, √0.5𝑦1𝑦1, … , √0.5𝑦𝑚𝑦𝑚)⟩

May 26, 2013

Page 40: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

What about:

𝐾 𝑥, 𝑦 = 1 + 𝑥, 𝑦 +1

2𝑥, 𝑦 2 +

1

6𝑥, 𝑦 3 +

1

24𝑥, 𝑦 4?

May 26, 2013

Page 41: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

What about:

𝐾 𝑥, 𝑦 =

𝑖=0

∞𝑥, 𝑦 𝑖

𝑖!

May 26, 2013

Page 42: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

What about:

𝐾 𝑥, 𝑦 =

𝑖=0

∞𝑥, 𝑦 𝑖

𝑖!= exp⟨𝑥, 𝑦⟩

May 26, 2013

Page 43: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

What about:

𝐾 𝑥, 𝑦 =

𝑖=0

∞𝑥, 𝑦 𝑖

𝑖!= exp⟨𝑥, 𝑦⟩

Infinite-dimensional feature space!

May 26, 2013

Page 44: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

What about:

𝐾 𝑥, 𝑦 =

𝑖=0

∞𝑥, 𝑦 𝑖

𝑖!= exp⟨𝑥, 𝑦⟩

Infinite-dimensional feature space!

May 26, 2013

Gaussian kernel

𝐾 𝒙, 𝒚 == exp(−𝛾‖𝒙 − 𝒚‖2)

= exp −𝒙 − 𝒚 2

2𝜎2

Page 45: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Kernel Trick

What about:

𝐾 𝑥, 𝑦 =

𝑖=0

∞𝑥, 𝑦 𝑖

𝑖!= exp⟨𝑥, 𝑦⟩

Infinite-dimensional feature space!

May 26, 2013

Exponential kernel

𝐾 𝒙, 𝒚 = exp −𝒙 − 𝒚

2𝜎2

Page 46: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernels

May 26, 2013

http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-learning.html

Page 47: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Structured data kernels

String kernels

P-spectrum kernels

All-subsequences kernels

Gap-weighted subsequences kernels

Graph & tree kernels

Co-rooted subtrees

All subtrees

Random walks

May 26, 2013

Page 48: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel

A function 𝐾(𝒙, 𝒚) is a kernel, if

𝐾 𝒙, 𝒚 = 𝜙 𝒙 , 𝜙 𝒚

for some feature map 𝜙.

May 26, 2013

Page 49: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel matrix

For a given kernel function 𝐾 and a finite

dataset (𝒙1, 𝒙2, … , 𝒙𝑛) the 𝑛 × 𝑛 matrix

𝑲𝑖𝑗 ≔ 𝐾 𝒙𝑖 , 𝒙𝑗

is called the kernel matrix.

May 26, 2013

Page 50: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel matrix

Let 𝑿 be the data matrix, then

𝑲 = 𝑿𝑿𝑇

is the kernel matrix for the linear kernel

𝐾 𝒙, 𝒚 = 𝒙𝑇𝒚

May 26, 2013

Page 51: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel matrix

Let 𝑿 be the data matrix, then

𝑲 = 𝑿𝑿𝑇

is the kernel matrix for the linear kernel

𝐾 𝒙, 𝒚 = 𝒙𝑇𝒚

Let 𝜙 be a feature mapping. Then*

𝑲 = 𝜙 𝑿 𝜙 𝑿 𝑇

is the kernel matrix for the corresponding

kernel 𝐾 𝒙, 𝒚 = ⟨𝜙 𝒙 , 𝜙 𝒚 ⟩.

May 26, 2013

Page 52: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel theorem

Not every function K is a kernel!

May 26, 2013

Example?

Page 53: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel theorem

Not every function K is a kernel!

e. g. 𝐾 𝑥, 𝑦 = −1 is not

Not every 𝑛 × 𝑛 matrix is a Kernel matrix!

May 26, 2013

Page 54: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel theorem

Theorem:

𝐾 is a kernel function ⇔ 𝐾 is symmetric positive

semidefinite

A function is positive semidefinite iff for any

finite dataset {𝒙1, 𝒙2, … , 𝒙𝑛} the corresponding

kernel matrix is positive semidefinite.

May 26, 2013

Page 55: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel closure

May 26, 2013

Page 56: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel closure

May 26, 2013

Feature space concatenation

Page 57: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel closure

May 26, 2013

Feature space scaling

Page 58: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel closure

May 26, 2013

Feature space tensor product

Page 59: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel closure

May 26, 2013

Feature map composition

Page 60: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel normalization

Let 𝜙′ 𝑥 =𝜙 𝑥

𝜙 𝑥

Then

𝐾′ 𝑥, 𝑦 = 𝜙′ 𝑥 , 𝜙′ 𝑦 =𝜙 𝑥

𝜙 𝑥,𝜙 𝑦

𝜙 𝑦=

𝜙 𝑥 ,𝜙 𝑦

𝜙 𝑥 2 𝜙 𝑦 2=

=𝐾 𝑥, 𝑦

𝐾 𝑥, 𝑥 𝐾 𝑦, 𝑦

May 26, 2013

Page 61: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel matrix normalization

Then

𝐾′ 𝑥, 𝑦 = 𝜙′ 𝑥 , 𝜙′ 𝑦 =𝜙 𝑥

𝜙 𝑥,𝜙 𝑦

𝜙 𝑦=

𝜙 𝑥 , 𝜙 𝑦

𝜙 𝑥 2 𝜙 𝑦 2=

=𝐾 𝑥, 𝑦

𝐾 𝑥, 𝑥 𝐾 𝑦, 𝑦

𝐾′𝑖𝑗 ≔𝐾𝑖𝑗

𝐾𝑖𝑖𝐾𝑗𝑗

May 26, 2013

Page 62: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel matrix centering

𝒙𝑖 → 𝒙𝑖 −1

𝑛

𝑘

𝒙𝑘

May 26, 2013

Page 63: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel matrix centering

𝒙𝑖 → 𝒙𝑖 −1

𝑛

𝑘

𝒙𝑘

May 26, 2013

Page 64: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel matrix centering

𝒙𝑖 → 𝒙𝑖 −1

𝑛

𝑘

𝒙𝑘

𝑿 → 𝑿 −1

𝑛𝟏𝑛𝟏𝑛𝑇𝑿

May 26, 2013

Page 65: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel matrix centering

𝒙𝑖 → 𝒙𝑖 −1

𝑛

𝑘

𝒙𝑘

𝑿 → 𝑿−1

𝑛𝟏𝑛𝟏𝑛𝑇𝑿

𝑿𝑿𝑇 → 𝑿−1

𝑛𝟏𝟏𝑇𝑿 𝑿 −

1

𝑛𝟏𝟏𝑇𝑿

𝑇

May 26, 2013

Page 66: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel matrix centering

𝒙𝑖 → 𝒙𝑖 −1

𝑛

𝑘

𝒙𝑘

𝑿 → 𝑿−1

𝑛𝟏𝑛𝟏𝑛𝑇𝑿

𝑿𝑿𝑇 → 𝑿−1

𝑛𝟏𝟏𝑇𝑿 𝑿 −

1

𝑛𝟏𝟏𝑇𝑿

𝑇

𝑿𝑿𝑇

→ 𝑿𝑿𝑇 −1

𝑛𝟏𝟏𝑇𝑿𝑿𝑇 −

1

𝑛𝑿𝑿𝑇𝟏𝟏𝑇

+1

𝑛2𝟏𝟏𝑇𝑿𝑿𝑇𝟏𝟏𝑇 May 26, 2013

Page 67: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernel matrix centering

𝑿𝑿𝑇

→ 𝑿𝑿𝑇 −1

𝑛𝟏𝟏𝑇𝑿𝑿𝑇 −

1

𝑛𝑿𝑿𝑇𝟏𝟏𝑇

+1

𝑛2𝟏𝟏𝑇𝑿𝑿𝑇𝟏𝟏𝑇

𝑲cent

≔ 𝑲−1

𝑛𝟏𝟏𝑇𝑲−

1

𝑛𝑲𝟏𝟏𝑇 +

1

𝑛2𝟏𝟏𝑇𝑲𝟏𝟏𝑇

May 26, 2013

Page 68: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

The Dual Representation

Let 𝐴 be the input space, and let 𝐵 be the

higher-dimensional feature space.

Let 𝜙: 𝐴 → 𝐵 be the feature map.

Fix a dataset {𝒙1, 𝒙2, … , 𝒙𝑛} ⊂ 𝐴

Let 𝑤 = 𝑖 𝛼𝑖𝜙(𝒙𝑖) ∈ 𝐵

We say that 𝛼𝑖 are the dual coordinates for 𝑤.

May 26, 2013

Page 69: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Dual coordinates

𝒘 =

𝑖

𝛼𝑖𝜙(𝒙𝑖) = 𝜙 𝑿𝑇𝜶 = 𝚵𝑻𝜶

Note that 𝚵𝚵𝑇 = 𝜙 𝑿 𝜙 𝑿 𝑇 = 𝑲

Now we can do all of the useful stuff using dual

coordinates only.

May 26, 2013

Page 70: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Dual coordinates

Let

𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷

Then

2𝒘 =

May 26, 2013

Page 71: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Dual coordinates

Let

𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷

Then

2𝒘 = 𝚵T(2𝜶)

May 26, 2013

Page 72: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Dual coordinates

Let

𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷

Then

2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 =

May 26, 2013

Page 73: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Dual coordinates

Let

𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷

Then

2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T(𝜶 + 𝜷)

May 26, 2013

Page 74: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Dual coordinates

Let

𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷

Then

2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T 𝜶 + 𝜷𝒘, 𝒖 =

May 26, 2013

Page 75: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Dual coordinates

Let

𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷

Then

2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T 𝜶 + 𝜷𝒘, 𝒖 = 𝒘𝑇𝒖 = 𝜶𝑇𝚵𝚵𝑇𝜷 = 𝜶𝑇𝑲𝜷

May 26, 2013

Page 76: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Dual coordinates

Let

𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷

Then

2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T 𝜶 + 𝜷𝒘,𝒖 = 𝒘𝑇𝒖 = 𝜶𝚵𝚵𝑇𝜷 = 𝜶𝑲𝜷𝒘− 𝒖 2 =

May 26, 2013

Page 77: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Dual coordinates

Let

𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷

Then

2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T 𝜶 + 𝜷𝒘, 𝒖 = 𝒘𝑇𝒖 = 𝜶𝚵𝚵𝑇𝜷 = 𝜶𝑲𝜷𝒘− 𝒖 2 = 𝒘𝑇𝒘+ 𝒖𝑇𝒖 − 𝟐𝒘𝑇𝒖 = ⋯

May 26, 2013

Page 78: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Dual coordinates

Let

𝒘 = 𝚵𝑇𝜶𝒖 = 𝚵T𝜷

Then

2𝒘 = 𝚵T 2𝜶𝒘+ 𝒖 = 𝚵T 𝜶 + 𝜷𝒘, 𝒖 = 𝒘𝑇𝒖 = 𝜶𝚵𝚵𝑇𝜷 = 𝜶𝑲𝜷𝒘− 𝒖 2 = 𝒘𝑇𝒘+ 𝒖𝑇𝒖 − 𝟐𝒘𝑇𝒖 = ⋯

May 26, 2013

Page 79: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernelization

Recall the Perceptron:

May 26, 2013

Page 80: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernelization

Recall the Perceptron:

Initialize 𝒘 ≔ 𝟎

Find a misclassified example (𝑥𝑖 , 𝑦𝑖)

Update weights:

𝒘 ≔ 𝒘+ 𝜇𝑦𝑖𝒙𝒊 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖

May 26, 2013

Page 81: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernelization

Recall the Perceptron:

Initialize 𝒘 ≔ 𝟎 ⇔ 𝜶 ≔ 𝟎

Find a misclassified example (𝑥𝑖 , 𝑦𝑖)

Update weights:

𝒘 ≔ 𝒘+ 𝜇𝑦𝑖𝒙𝒊 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖

May 26, 2013

Page 82: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernelization

Recall the Perceptron:

Initialize 𝒘 ≔ 𝟎 ⇔ 𝜶 ≔ 𝟎

Find a misclassified example (𝑥𝑖 , 𝑦𝑖)

Update weights:

𝒘 ≔ 𝒘+ 𝜇𝑦𝑖𝒙𝒊 ⇔ 𝛼𝑖 ≔ 𝛼𝑖 + 𝜇𝑦𝑖 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖

May 26, 2013

Page 83: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernelization

Recall the Perceptron:

Initialize 𝜶 ≔ 𝟎

Find a misclassified example (𝑥𝑖 , 𝑦𝑖)

Update weights:

𝛼𝑖 ≔ 𝛼𝑖 + 𝜇𝑦𝑖 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖

May 26, 2013

Page 84: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernelization

Recall the Perceptron:

Initialize 𝜶 ≔ 𝟎

Find a misclassified example (𝑥𝑖 , 𝑦𝑖)

𝒘𝑇𝒙𝑖 + 𝑏 ≠ 𝑦𝑖 ⇔ 𝑗 𝛼𝑗𝒙𝑗𝑇𝒙𝑖 + 𝑏 ≠ 𝑦𝑖

Update weights:

𝛼𝑖 ≔ 𝛼𝑖 + 𝜇𝑦𝑖 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖

May 26, 2013

Page 85: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernelization

Recall the Perceptron:

Initialize 𝜶 ≔ 𝟎

Find a misclassified example (𝑥𝑖 , 𝑦𝑖)

𝒘𝑇𝒙𝑖 + 𝑏 ≠ 𝑦𝑖 ⇔ 𝑲𝑖𝜶 + 𝑏 ≠ 𝑦𝑖

Update weights:

𝛼𝑖 ≔ 𝛼𝑖 + 𝜇𝑦𝑖 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖

May 26, 2013

Page 86: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Kernelization

Recall the Perceptron:

Initialize 𝜶 ≔ 𝟎

Find a misclassified example (𝑥𝑖 , 𝑦𝑖)

𝑲𝑖𝜶 + 𝑏 ≠ 𝑦𝑖

Update weights:

𝛼𝑖 ≔ 𝛼𝑖 + 𝜇𝑦𝑖 𝑏 ≔ 𝑏 + 𝜇𝑦𝑖

May 26, 2013

Page 87: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Quiz

Today we heard three important ideas

Important idea #1: __________

Important idea #2: __________

Important idea #3: __________

Function/matrix 𝐾 is a kernel function/matrix

iff it is __________

Dual representation: ___ = ___ __

May 26, 2013

Page 88: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

Quiz

Those algoritms have kernelized versions:

___________________________ …

May 26, 2013

Page 89: Gene Expression Analysis - ut...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 26, 2013 Coming up next… Supervised machine learning

May 26, 2013