1 Geometric Methods for Learning and Memory A thesis presented by Dimitri Nowicki To Universite Paul Sabatier in partial fulfillment for the degree of

1

Geometric Methods for Learning and Memory

A thesis presentedby

Dimitri NowickiTo Universite Paul Sabatier

in partial fulfillmentfor the degree of Doctor es Science

in the subject ofApplied Mathematics

2

Outline

• Introduction

• Geodesics, Newton Method and Geometric Optimization

• Generalized averaging over RM and Associative memories

• Kernel Machines and AM

• Quotient spaces for Signal Processing

• Application: Electronic Nose

3

Outline

• Introduction






4

Models and Algorithms requiring Geometric Approach

• Kalman–like filters

• Blind Signal Separation

• Feed-Forward Neural Networks

• Independent Component Analysis

5

Introduction

• Riemannian spaces

• Lie groups and homogeneous spaces

• Metric spaces without any Riemannian structure

Spaces emerging in learning problems

6

Outline

• Introduction






7

Outline

• Some facts from Riemannian geometry

• Optimization algorithms– Smooth– Nonsmooth

• Implementation– The case of Submanifolds– Computing exponential maps– Computing Hessian etc.

8

Some concepts from Riemannian Geometry

• Geodesics

0

dtd

DtD

9

Exponential map

MMTu xx :)(exp

yu

MTuMxy

x

x

:)(exp

)0(,)0();1(

10

Parallel transport

• Computing parallel transport using an exponential map

vuv stds

dx

styx

exp)(

0,1,

Where u such that yux

)(exp

11

Newton Method for Geometric optimization

)(exp)(12 DDxN x

))(exp(~

kkk xDBN

The modified Newton operator

0;)(2 kkkkk BthatsuchIxDB

12

Wolfe condition for Riemannian manifolds

13

Global convergence of modified Newton method

14

Nonsmooth methods

• The subgradient:

MTu

uxgxfuf

x

fx

0

0

allfor

),()())(exp( 00

15

The r-algorithm

MTxgxgrkkkk xkfxxkfk

)()( 1,*

1B

MTr

rkx

k

kk

)( 211 1,1 kkxxk kkkRBB .

),)(1()( xR

1111*

11~exp),(~

kkkxkkfkk gBhxxgBgk

Here

16

Problem of constrained optimization

• Equality constraints

0)( tosubject

min

:

:

xF

D

DF mn

17

Classical (extrinsic) methods

• The Lagrangian

m

kkk xFxL

1

)(),(

Newton-Lagrange method

Sequential quadratic programming

18

Classical methods

• Penalty functions and the augmented Lagrangian

2

1 21

)(),( FxFxLm

kkk

19

Advantages of Geometric methods

• Dimension of the manifold is n-m against n+m in the case of Lagrangian-based methods

• We may have convex function in the manifold even if the Lagrangian is non-convex

• Geometric Hessian may be positive-definite even if the classical one is not

20

Implementation: The case of Submanifolds

surjective;))(();(

:

}0)(:{

2

xDFDCF

DF

xFxM

M

mn

n

21

Hamilton Equations for the Geodesics

• The Lagrangian:

m

iii xFxL

1

2)(

21

The Hamiltonian:

m

iii xDFxxpH

1

2)(

21

,

22

Hamilton Equations for the Geodesics

pxDF

pxDFxDFIpx

xFDp

MT

m

iii

x

))((

))()((

)(

*

1

2

23

Lagrange equation are also constrained Hamiltonian

• We can rewrite Lagrange equations in the form:

px

ppxFDxDFp

),)(()( 2

24

Symplectic Numerical Integration

• A transformation is called symplectic if it preserves following differential 2-form:

n

iii dxdp

1

2

25

Implicit Runge-Kutta Integrators

s

jjijki

s

iiikk

YGayY

YGbyy

yy

1

11

0

)(

)(

givenis)0(

The IRK method is called symplectic if associated transformation preserves 2

y=(x,p)

26

The Gauss method of order 4

6

3

4

1

6

3

4

1

i=1 i=2

j=1 1/4

j=2 1/4

1/2 1/2

ija

ib

6

3

4

1

6

3

4

1

27

Backward error analysis

28

Covariant Derivative on the Submanifold

)())()((

)()(ˆ

xfxDFxDFI

xfxf MTx

29

Computing the constrained Hessian

• Direct computation

))((ˆ 2 DDD MTMT xx

“Mixed” computation

where))()(( xDFxDFIMTx

DDF

FDDDT

TMTx

)(

)(ˆ 222

30

Example of geometric iterations

31

Outline

• Introduction






32

Neural Associative memory

• Hopfield-type auto-associative memory. Memorized vectors are bipolar: vk{-1, 1} n, k=1…m. Suppose these vectors are columns of nm matrix V. Then synaptic matrix C of the memory is given by:

•

)(1 tt f Cxx

VVC

Associative recall is performed using following procedure: the input vector x0 is a starting point of the iterations:

where f is a monotonic odd function such that

1)(lim sfs

33

Attraction radius

• We will call the stable fixed point of this discrete-time dynamical system an attractor. The maximum Hamming distance between x0 and a memorized pattern vk such that the examination procedure still converges to vk is called an attraction radius.

34

Problem statement

35

Generalized averaging on the manifold

argmin

argmin

36

Computing generalized average on the Grassmann manifold

m

N

kk

XXX

CXX

rank;

)(min

2

1

2

Generalized averaging as an optimization problem

N

k

n

jiijkijkijij

N

k

n

jiijkij ccxxcx

1 1,

2,,

2

1 1,

2, 2)()(X

Transforming objective function:

constconst1 2

2

1

CXCX NN

NN

kk

37

Statistical estimation

38

Statistical estimation

39

Experimental results: the simulated data

• n=256– for all experiments

Nature of the data

40

Experimental results: simulated data

41


1

10

100

1000

0 5 10 15 20 25

Attractors

Fre

qu

en

cy

m = 8

m = 16

m = 24

m = 32

Frequencies of attractors of associative clustering network for different m, p=8

42


1

10

100

1000

0 5 10 15 20 25 30 35

Attractors

Fre

qu

en

cy

p = 8

p = 16

p = 24

p = 32

Frequencies of attractors of associative clustering network for different p, and m=p

43


• Distinction coefficients of attractors of associative clustering network for different p, and m=p

0.0001

0.001

0.01

0.1

1

0 5 10 15 20 25 30 35

Attractors

Dis

tin

cti

on

Co

eff

icie

nt

p = 8

p = 16

p = 24

p = 32

44

The MNIST database: data description

• Gray-scale images 2828

• 10 classes: digits from “0” to “9”

• Training sample: 60000 images

• Test sample:10000 images

• Before entering to the network images were tresholded to obtain 784-dimensional bipolar vectors

45

Experimental results: the MNIST database

• Example of handwritten digits from MNIST database

46

Experimental results: the MNIST database

• Generalized images of digits found by the network

47

Outline

• Introduction






48

Kernel AM

• The main algorithm

49

Kernel AM• The Basic Algorithm (Continued)

50

Algorithm Scheme

51

Experimental Results

• Gaussian Kernel

Gaussian kernel

0

0.5

1

1.5

2

2.5

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

alpha

Att

ract

ion

ra

diu

s

52

Outline

• Introduction



• Kernel Machines and AM Quotient spaces for Signal Processing


53

Model of Signal

54

Signal Trajectories in the phase space

55

The Manifold

56

57

Example of Signal Processing

-4 -3 -2 -1 0 1 2 3 4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

t, msec

-4 -3 -2 -1 0 1 2 3 4-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

t, msec

-4 -3 -2 -1 0 1 2 3 4

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

t, msec

58

Outline

• Introduction






59

Application for Real-Life Problem

Electronic Nose: QCM Setup overview

Variance Distribution between principal Components

60

Chemical images in space spanned by first 3 PCs

Documents

1 Geometric Methods for Learning and Memory A thesis presented by Dimitri Nowicki To Universite Paul Sabatier in partial fulfillment for the degree of