Kernel Methods - Ihic/8803-Fall-09/slides/8803-09-lec08.pdf · Deﬁne the Gram matrix ... Must...

Introduction Dual Representations Kernel Design Radial Basis Functions Summary

Kernel Methods - I

Henrik I Christensen

Robotics & Intelligent Machines @ GTGeorgia Institute of Technology,

Atlanta, GA 30332-0280hic@cc.gatech.edu

Henrik I Christensen (RIM@GT) Kernel Methods 1 / 22

Outline

1 Introduction

2 Dual Representations

3 Kernel Design

4 Radial Basis Functions

5 Summary

Introduction

This far the process has been about data compression and optimalregressions / discrimination

Once process complete the training set is discarded and the model isused for processing

What if data were kept and used directly for estimation?

Why you ask?

The decision boundaries migth not be simple or the modelling is toocomplicated

Already discussed Nearest Neighbor (NN) as an example of directdata processing

A complete class of memory based techniques

Q: how to measure similarity between a data point and samples inmemory?

Kernel Methods

What if we could predict based on a linear combination of features?

Assume a mapping to a new feature space using φ(x)

A kernel function is defined by

k(x, x′) = φ(x)Tφ(x′)

Characteristics:

The function is symmetric: k(x, x′) = k(x′, x)Can be used both on continuous and symbolic data

Simple kernelk(x, x′) = xTx′

the linear kernel.

A kernel is basically an inner product performed in a feature/mappedspace.

Kernels

Consider a complete set of data in memory

How can we interpolate new values based on training values? I.e.,

y(x) =1∑k

N∑n=1

k(x , xn)xn

consider k(., .) a weight function that determines contribution basedon distance between x and xn

Outline

1 Introduction

3 Kernel Design

5 Summary

Dual Representation

Consider a regression problem as seen earlier

J(w) =1

N∑n=1

{wTφ(xn)− tn

with the solution

w = − 1

N∑n=1

{wTφ(xn)− tn

}φ(xn) =

N∑n=1

anφ(xn) = ΦTa

where a is defined by

an = − 1

{wTφ(xn)− tn

}Substitute w = Φta into J(w) to obtain

J(a) =1

2atΦΦTΦTΦa− aTΦΦT t +

2tT t +

2aTΦΦTa

which is termed the dual representationHenrik I Christensen (RIM@GT) Kernel Methods 7 / 22

Dual Representation II

Define the Gram matrix - K = ΦΦT to get

J(a) =1

2aTKKTa− aTKt +

2tT t +

whereKnm = φ(xm)Tφ(xn) = k(xm, xn)

J(a) is then minimized by

a = (K + λIN)−1t

Through substitution we obtain

y(x) = wTφ(x) = aTΦφ(x) = k(x)T (K + λIN)−1 t

We have in reality mapped the program to another (dual) space inwhich it is possible to optimize the regression/discrimination problem

Typically N >> M so the immediate advantage is not obvious. Seelater.

Outline

1 Introduction

3 Kernel Design

5 Summary

Constructing Kernels

How would we construct kernel functions?

One approach is to choose a mapping and find corresponding kernels

A one dimensional example

k(x , x ′) = φ(x)Tφ(x ′) =M∑

φi (x)φi (x′)

where φi (.) are basis functions

Kernel Basis Functions - Example

−1 0 1−1

−0.5

−1 0 10

−1 0 1−0.4

−1 0 10.0

Construction of Kernels

We can also design kernels directly.

Must correspond to a scala product in “some” space

Consider:k(x, z) = (xTz)2

for a 2-dimensional space x = (x1, x2)

k(x, z) = (xTz)2 = (x1z1 + x2z2)2

= x21 z2

1 + 2x1z1x2z2 + x22 z2

= (x21 ,√

2x1x2, x22 )(z2

1 ,√

2z1z2, z22 )T

= φ(x)Tφ(z)

In general if the Gram matrix, K, is positive semi-definite the kernelfunction is valid

Techniques for construction of kernels

k(x, x′) = c1k(x, x′)

k(x, x′) = f (x)k(x, x′)f (x′)

k(x, x′) = q(k(x, x′))

k(x, x′) = exp(k(x, x′))

k(x, x′) = k1(x, x′) + k2(x, x

k(x, x′) = k1(x, x′)k2(x, x

k(x, x′) = xTAx′

More kernel examples/generalizations

We could generalize k(x, x′) = (xTx′)2 in various ways1 k(x, x′) = (xTx′ + c)2

2 k(x, x′) = (xTx′)M

3 k(x, x′) = (xTx′ + c)M

Example correlation between image regions

Another option is

k(x, x′) = e−||xT−x′||/2σ2

called the “Gaussian kernel”

Several more examples in the book

Outline

1 Introduction

3 Kernel Design

5 Summary

Radial Basis Functions

What is a radial basis function?

φj(x) = h(||x− xj ||)

How to average/smooth across data entire based on distance?

y(x) =N∑

wnh(||x− xn||)

the weights wn could be estimated using LSQ

A popular interpolation strategy is

y(x) =N∑

tnh(x− xn)

h(x− xn) =ν(x− xn)∑j ν(x− xj)

The effect of normalization?

−1 −0.5 0 0.5 10

Nadaraya-Watson Models

Lets interpolate across all data!Using a Parzen density estimator we have

p(x, t) =1

N∑n=1

f (x− xn, t − tn)

We can then estimate

y(x) = E [t|x] =

∫ ∞

−∞tp(t|x)dt

∫tp(x, t)dt∫p(x, t)dt

∑n g(x− xn)tn∑m g(x− xm)

k(x, xn)tn

k(x, xn) =

∑n g(x− xn)∑m g(x− xm)

g(x) =

∫ ∞

−∞f (x, t)dt

Gaussian Mixture Example

Assume a particular one-dimensional function (here sine) with noise

Each data point is an iso-tropic Gaussian Kernel

Smoothing factors are determined for the interpolation

Gaussian Mixture Example

0 0.2 0.4 0.6 0.8 1−1.5

−0.5

Outline

1 Introduction

3 Kernel Design

5 Summary

Summary

Memory based methods - keeping the data!

Design of distrance metrics for weighting of data in learning set

Kernels - a distance metric based on dot-product in some featurespace

Being creative about design of kernels

We’ll come back to the complexity issues

Kernel Methods - Ihic/8803-Fall-09/slides/8803-09-lec08.pdf · Deﬁne the Gram matrix ... Must...

Documents

IHIC 2012 - Key note - HL7 Italia - S.Lotti - Is it really useful to have a framework?

8803 500 001 Bus Route Planning

XTZ 125E XTZ 125K OWNER’S MANUAL XTZ 125K ... - Yamaha …

IHIC 2014 Keynote Address by Drs. I Gede Ardika

8803 Inspection Camera Manual

8803 500 004 Bus Priority Measures

Ihic preso v2

xtz 125 2004

I believe-8803

FITTING INSTRUCTIONS FOR CR0079 OFFSET COTTON REELS FOR YAMAHA XTZ … · 2020. 5. 6. · for yamaha xtz 700 (tenere) 2019- picture a picture b this kit contains the items pictured

DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

IHIC 2014 Panel #2: Bali, Bali, Bali

Algebraic Number Theory Course Notes (Fall 2006) Math 8803

Neve 8803 Dual EQ - images.static-thomann.de€¦ · 7 – Block Diagrams 8803 - Page 3 - 88 Series Outboard / 8803 Dual EQ Issue 2 1 – Introduction The 8803 is a Dual Equaliser

Manual Yamaha XTZ 150 Crosser Blueflex

Mi 280C 212 1850(8803

Application Manual RV-8803-C7 · Micro Crystal DTCXO Temp. Compensated Real-Time Clock Module with I2C-Bus Interface RV-8803-C7 May 2019 6/74 Rev. 1.6 1.2. APPLICATIONS The RV-8803-C7

CS 8803 AIAD - Project Report

EAS 4/8803: Experimental Methods in AQ

XTZ 660 MANUAL