Upload
zachary-byrd
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
1
Geometric Methods for Learning and Memory
A thesis presentedby
Dimitri NowickiTo Universite Paul Sabatier
in partial fulfillmentfor the degree of Doctor es Science
in the subject ofApplied Mathematics
2
Outline
• Introduction
• Geodesics, Newton Method and Geometric Optimization
• Generalized averaging over RM and Associative memories
• Kernel Machines and AM
• Quotient spaces for Signal Processing
• Application: Electronic Nose
3
Outline
• Introduction
• Geodesics, Newton Method and Geometric Optimization
• Generalized averaging over RM and Associative memories
• Kernel Machines and AM
• Quotient spaces for Signal Processing
• Application: Electronic Nose
4
Models and Algorithms requiring Geometric Approach
• Kalman–like filters
• Blind Signal Separation
• Feed-Forward Neural Networks
• Independent Component Analysis
5
Introduction
• Riemannian spaces
• Lie groups and homogeneous spaces
• Metric spaces without any Riemannian structure
Spaces emerging in learning problems
6
Outline
• Introduction
• Geodesics, Newton Method and Geometric Optimization
• Generalized averaging over RM and Associative memories
• Kernel Machines and AM
• Quotient spaces for Signal Processing
• Application: Electronic Nose
7
Outline
• Some facts from Riemannian geometry
• Optimization algorithms– Smooth– Nonsmooth
• Implementation– The case of Submanifolds– Computing exponential maps– Computing Hessian etc.
8
Some concepts from Riemannian Geometry
• Geodesics
0
dtd
DtD
9
Exponential map
MMTu xx :)(exp
yu
MTuMxy
x
x
:)(exp
)0(,)0();1(
10
Parallel transport
• Computing parallel transport using an exponential map
vuv stds
dx
styx
exp)(
0,1,
Where u such that yux
)(exp
11
Newton Method for Geometric optimization
)(exp)(12 DDxN x
))(exp(~
kkk xDBN
The modified Newton operator
0;)(2 kkkkk BthatsuchIxDB
12
Wolfe condition for Riemannian manifolds
13
Global convergence of modified Newton method
14
Nonsmooth methods
• The subgradient:
MTu
uxgxfuf
x
fx
0
0
allfor
),()())(exp( 00
15
The r-algorithm
MTxgxgrkkkk xkfxxkfk
)()( 1,*
1B
MTr
rkx
k
kk
)( 211 1,1 kkxxk kkkRBB .
),)(1()( xR
1111*
11~exp),(~
kkkxkkfkk gBhxxgBgk
Here
16
Problem of constrained optimization
• Equality constraints
0)( tosubject
min
:
:
xF
D
DF mn
17
Classical (extrinsic) methods
• The Lagrangian
m
kkk xFxL
1
)(),(
Newton-Lagrange method
Sequential quadratic programming
18
Classical methods
• Penalty functions and the augmented Lagrangian
2
1 21
)(),( FxFxLm
kkk
19
Advantages of Geometric methods
• Dimension of the manifold is n-m against n+m in the case of Lagrangian-based methods
• We may have convex function in the manifold even if the Lagrangian is non-convex
• Geometric Hessian may be positive-definite even if the classical one is not
20
Implementation: The case of Submanifolds
surjective;))(();(
:
}0)(:{
2
xDFDCF
DF
xFxM
M
mn
n
21
Hamilton Equations for the Geodesics
• The Lagrangian:
m
iii xFxL
1
2)(
21
The Hamiltonian:
m
iii xDFxxpH
1
2)(
21
,
22
Hamilton Equations for the Geodesics
pxDF
pxDFxDFIpx
xFDp
MT
m
iii
x
))((
))()((
)(
*
1
2
23
Lagrange equation are also constrained Hamiltonian
• We can rewrite Lagrange equations in the form:
px
ppxFDxDFp
),)(()( 2
24
Symplectic Numerical Integration
• A transformation is called symplectic if it preserves following differential 2-form:
n
iii dxdp
1
2
25
Implicit Runge-Kutta Integrators
s
jjijki
s
iiikk
YGayY
YGbyy
yy
1
11
0
)(
)(
givenis)0(
The IRK method is called symplectic if associated transformation preserves 2
y=(x,p)
26
The Gauss method of order 4
6
3
4
1
6
3
4
1
i=1 i=2
j=1 1/4
j=2 1/4
1/2 1/2
ija
ib
6
3
4
1
6
3
4
1
27
Backward error analysis
28
Covariant Derivative on the Submanifold
)())()((
)()(ˆ
xfxDFxDFI
xfxf MTx
29
Computing the constrained Hessian
• Direct computation
))((ˆ 2 DDD MTMT xx
“Mixed” computation
where))()(( xDFxDFIMTx
DDF
FDDDT
TMTx
)(
)(ˆ 222
30
Example of geometric iterations
31
Outline
• Introduction
• Geodesics, Newton Method and Geometric Optimization
• Generalized averaging over RM and Associative memories
• Kernel Machines and AM
• Quotient spaces for Signal Processing
• Application: Electronic Nose
32
Neural Associative memory
• Hopfield-type auto-associative memory. Memorized vectors are bipolar: vk{-1, 1} n, k=1…m. Suppose these vectors are columns of nm matrix V. Then synaptic matrix C of the memory is given by:
•
)(1 tt f Cxx
VVC
Associative recall is performed using following procedure: the input vector x0 is a starting point of the iterations:
where f is a monotonic odd function such that
1)(lim sfs
33
Attraction radius
• We will call the stable fixed point of this discrete-time dynamical system an attractor. The maximum Hamming distance between x0 and a memorized pattern vk such that the examination procedure still converges to vk is called an attraction radius.
34
Problem statement
35
Generalized averaging on the manifold
argmin
argmin
36
Computing generalized average on the Grassmann manifold
m
N
kk
XXX
CXX
rank;
)(min
2
1
2
Generalized averaging as an optimization problem
N
k
n
jiijkijkijij
N
k
n
jiijkij ccxxcx
1 1,
2,,
2
1 1,
2, 2)()(X
Transforming objective function:
constconst1 2
2
1
CXCX NN
NN
kk
37
Statistical estimation
38
Statistical estimation
39
Experimental results: the simulated data
• n=256– for all experiments
Nature of the data
40
Experimental results: simulated data
41
Experimental results: simulated data
1
10
100
1000
0 5 10 15 20 25
Attractors
Fre
qu
en
cy
m = 8
m = 16
m = 24
m = 32
Frequencies of attractors of associative clustering network for different m, p=8
42
Experimental results: simulated data
1
10
100
1000
0 5 10 15 20 25 30 35
Attractors
Fre
qu
en
cy
p = 8
p = 16
p = 24
p = 32
Frequencies of attractors of associative clustering network for different p, and m=p
43
Experimental results: simulated data
• Distinction coefficients of attractors of associative clustering network for different p, and m=p
0.0001
0.001
0.01
0.1
1
0 5 10 15 20 25 30 35
Attractors
Dis
tin
cti
on
Co
eff
icie
nt
p = 8
p = 16
p = 24
p = 32
44
The MNIST database: data description
• Gray-scale images 2828
• 10 classes: digits from “0” to “9”
• Training sample: 60000 images
• Test sample:10000 images
• Before entering to the network images were tresholded to obtain 784-dimensional bipolar vectors
45
Experimental results: the MNIST database
• Example of handwritten digits from MNIST database
46
Experimental results: the MNIST database
• Generalized images of digits found by the network
47
Outline
• Introduction
• Geodesics, Newton Method and Geometric Optimization
• Generalized averaging over RM and Associative memories
• Kernel Machines and AM
• Quotient spaces for Signal Processing
• Application: Electronic Nose
48
Kernel AM
• The main algorithm
49
Kernel AM• The Basic Algorithm (Continued)
50
Algorithm Scheme
51
Experimental Results
• Gaussian Kernel
Gaussian kernel
0
0.5
1
1.5
2
2.5
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
alpha
Att
ract
ion
ra
diu
s
52
Outline
• Introduction
• Geodesics, Newton Method and Geometric Optimization
• Generalized averaging over RM and Associative memories
• Kernel Machines and AM Quotient spaces for Signal Processing
• Application: Electronic Nose
53
Model of Signal
54
Signal Trajectories in the phase space
55
The Manifold
56
57
Example of Signal Processing
-4 -3 -2 -1 0 1 2 3 4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
t, msec
-4 -3 -2 -1 0 1 2 3 4-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
t, msec
-4 -3 -2 -1 0 1 2 3 4
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
t, msec
58
Outline
• Introduction
• Geodesics, Newton Method and Geometric Optimization
• Generalized averaging over RM and Associative memories
• Kernel Machines and AM
• Quotient spaces for Signal Processing
• Application: Electronic Nose
59
Application for Real-Life Problem
Electronic Nose: QCM Setup overview
Variance Distribution between principal Components
60
Chemical images in space spanned by first 3 PCs