LMS Algorithm in a Reproducing Kernel Hilbert Space

Weifeng Liu, P. P. Pokharel, J. C. Principe

Computational NeuroEngineering Laboratory,

University of Florida

Acknowledgment: This work was partially supported by NSF grant ECS-0300340 and ECS-0601271.

Outlines

Introduction Least Mean Square algorithm (easy) Reproducing kernel Hilbert space (tricky) The convergence and regularization analysis

(important) Learning from error models (interesting)

Introduction

Puskal (2006) –Kernel LMS

Kivinen, Smola (2004) –Online learning with kernels (more like leaky LMS)

Moody, Platt (1990’s)—Resource allocation networks (growing and pruning)

LMS (1960, Widrow and Hoff)

Given a sequence of examples from U×R:

U: a compact set of RL. The model is assumed:

The cost function:

1 1(( , ),..., ( , ))N Nu y u y

)()( nvuwy no

1( ) ( ( ))

J w y w uN

The LMS algorithm

The weight after n iteration:

( )an n n n

an n n n

e y w u

w w e u

n an i iiw e u

Reproducing kernel Hilbert space

A continuous, symmetric, positive-definite kernel ,a mapping Φ, and an inner product

H is the closure of the span of all Φ(u). Reproducing Kernel trick The induced norm

:U U R

, ( ) ( )Hf u f u

1 2 1 2( ), ( ) ( , )Hu u u u 2|| || ,H Hf f f

Kernel trick: – An inner product in the feature space– A similarity measure you needed.

Mercer’s theorem:T

M uuuu )](),...,(),([)( 21

Common kernels

Gaussian kernel

Polynomial kernel

)||||exp(),( 2jiji uuauu

( , ) ( 1)T pi j i ju u u u

Kernel LMS

Transform the input ui to Φ(ui):

Assume Φ(ui) R∈ M

The model is assumed:

The cost function:

1 1(( ( ), ),..., ( ( ), ))N Nu y u y

( ( )) ( )on ny u v n

1( ) ( ( ( )))

J y uN

Kernel LMS

The KLMS algorithm

The weight after n iteration:

( ( ))

an n n n

n an i ii

Kernel LMS

( ( ))

( ), ( )

( , ),

( ( )),

n ai i ni H

n ai i ni

an n n n

n an i ii

Kernel LMS

After the learning, the input-output relation:

( ( ))

N ai ii

KLMS vs. RBF

α satisfy

G is the gram matrix: G(i,j)=ĸ(ui,uj) RBF needs regularization. Does KLMS need regularization?

1( ) ( , )

N ai ii

y e u u

1( , )

i iiy u u

KLMS vs. LMS

Kernel LMS is nothing but LMS in the feature space--a very high dimensional reproducing kernel Hilbert space (M>N)

Eigen-spread is awful—does it converge?

Example: MG signal predication

Time embedding: 10.

Learn rate: 0.2 500 training data 100 test data point. Gaussian noise noise variance: .04

0 100 200 300 400 5000

0.1mse linearmse kernel

MSE Linear LMS

KLMS RBF (λ=0)

RBF (λ=.1)

RBF (λ=1)

RBF (λ=10)

training 0.021 0.0060 0 0.0026 0.0036 0.010

test 0.026 0.0066 0.019 0.0041 0.0050 0.014

Complexity Comparison

RBF KLMS LMS

Computation O(N3) O(N2) O(L)

Memory O(N2+N*L) O(N*L) O(L)

The asymptotic analysis on convergence—small step-size theory

Denote The correlation matrix

is singular. Assume

x i ii

R x xN

Mii Rux )(

1 1... ... 0k k M T

xR P P

The asymptotic analysis on convergence—small step-size theory

Denote

we have1( )

Mon i ii

[ ( )] (1 ) (0)ni i iE n

2 2 2min min[| ( ) | ] (1 ) (| (0) | )2 2

ni i i

J JE n

The weight stays at the initial place in the 0-eigen-value directions

we have

[ ( )] (0)i iE n

2 2[| ( ) | ] | (0) |i iE n

The 0-eigen-value directions does not affect the MSE

Denote

2 2min minmin 1 1

( ) (| (0) | )(1 )2 2

M M ni i i ii i

J JJ n J

2( ) [| ( ) | ]iJ n E y x

It does not care about the null space! It only focuses on the data space!

The minimum norm initialization

The initialization gives the minimum norm possible solution.

n i iiw P

2 2 2 2

1 1 1ˆ ˆ ˆ|| || || || || || || ||

n i i ii i i kw w w

Minimum norm solution

0 2 4-1

Learning is Ill-posed

Over-learning

Regularization Technique

Learning from finite data is ill-posed. A priori information--Smoothness is needed. The norm of the function, which indicates the

‘slope’ of the linear operator is constrained. In statistical learning theory, the norm is

associated with the confidence of uniform convergence!

Regularized RBF

The cost function:

or equivalently2 2

1( ) ( ( ( ))) || ||

J y uN

1( ) ( ( ( )))

subject to || ||

J y uN

KLMS as a learning algorithm

The model with The following inequalities hold

The proof…(H∞ robust + triangle inequality + matrix transformation + derivative + …)

( ) ( )on ny x v n ( )n nx u

2 1 2 2|| || || || 2 || ||a oe v

2 2|| || 2 || ||ae y

The solution of regularized RBF is

The reason of ill-posedness is the inversion of the matrix (G+λI)

The numerical analysis

1( , )

i iiy u u

1( )G I y

1 1|| ( ) || as 0G I

The numerical analysis

The solution of KLMS is

By the inequality we have

1( , )

N ai ii

y e u u

|| || 2L

weight KLMS RBF (λ=0)

RBF (λ=.1)

RBF (λ=1)

RBF (λ=10)

norm 0.520 4.8e+3 10.90 1.37 0.231

The conclusion

The LMS algorithm can be readily used in a RKHS to derive nonlinear algorithms.

From the machine learning view, the LMS method is a simple tool to have a regularized solution.

LMS learning model

An event happens, and a decision made. If the decision is correct, nothing happens. If an error is incurred, a correction is made

on the original model. If we do things right, everything is fine and

life goes on. If we do something wrong, lessons are drawn

and our abilities are honed.

Would we over-learn?

If the real world is attempted to be modeled mathematically, what dimension is appropriate?

Are we likely to over-learn? Are we using the LMS algorithm? What is good to remember the past? What is bad to be a perfectionist?

"If you shut your door to all errors, truth will be shut out."---Rabindranath Tagore

LMS Algorithm in a Reproducing Kernel Hilbert Space

Documents

NONLINEAR SIGNAL PROCESSING BASED ON REPRODUCING KERNEL HILBERT SPACE By

Policy Search in Reproducing Kernel Hilbert Spaceby functionals in reproducing kernel Hilbert spaces. More speciﬁcally, the functional h(s) in Eq. 2 is an element of a vector-valued

A Unifying Framework in Vector-valued Reproducing Kernel Hilbert

REPRODUCING KERNEL HILBERT SPACES - Bilkent … · of reproducing kernel Hilbert spaces, generation of new spaces and relationships ... The Hilbert space with reproducing kernel K

Paulsen_An Introduction to the Theory of Reproducing Kernel Hilbert Spaces

Reproducing kernel Hilbert spaces and regularizationspaces ... · Reproducing kernel Hilbert spaces and regularizationspaces and regularization ... H is a vector space, ... the reproducing

Graphical Models: Modeling, Optimization, and Hilbert Space …€¦ · distances between means, especially means in reproducing kernel Hilbert spaces which are called kernel embedment

Elements of Positive Definite Kernels and Reproducing ...fukumizu/Kyushu2008/Kernel_elements_2.pdf · Elements of Positive Deﬁnite Kernels and Reproducing Kernel Hilbert Spaces

Explicit Signal to Noise Ratio in Reproducing Kernel Hilbert Spaces.pdf

Reproducing Kernel Hilbert Spaces - MathUniPDdemarchi/TAA1718/RKHS_presentazione… · LauraMeneghetti Reproducing Kernel Hilbert Spaces 28Novembre2017 2/43. Introduction Introduction

A Reproducing Kernel Hilbert Space framework for Spike Train Signal Processing

Reproducing Kernel Hilbert Spaces - Stanford University · 2. A. Berlinet and C. Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers,

An Introduction to Reproducing Kernel Hilbert Spaces …wahba/talks1/rotter.03/sysid1.pdf · An Introduction to Reproducing Kernel Hilbert Spaces and Why They ... Halmos, P. (1957),

Application of Reproducing Kernel Hilbert Space Method for ...downloads.hindawi.com/journals/mpe/2017/7498136.pdf · ResearchArticle Application of Reproducing Kernel Hilbert Space

Model-free Variable Selection in Reproducing Kernel Hilbert Space

Discrete Reproducing Kernel Hilbert Spaces: Sampling and

Reproducing Kernel Hilbert spaces Dr. M. Asaduzzaman Professor Department of Mathematics

REPRODUCING KERNEL HILBERT SPACES - Bilkent University

Reproducing-Kernel Hilbert Space Regression with Notes on ...Reproducing-Kernel Hilbert Space Regression with Notes on the Wasserstein Distance Stephen Page Submitted for the degree

Recent Advances in Hilbert Space Representation of Probability …lcsl.mit.edu/courses/regml/regml2020/slides/talk2.pdf · 2020-07-01 · Reproducing Kernel Hilbert Spaces Let Hbe