Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
NN 5
Rossella Cancelliere 1
NN 5 1
Radial-Basis Function Networks• A function is radial (RBF) if its output depends on (is a non-
increasing function of) the distance of the input from a given stored vector.
• RBFs represent local receptors, as illustrated below, where each point is a stored vector used in one RBF.
• In a RBF network one hidden layer uses neurons with RBF activation functions describing local receptors. Then one outputnode is used to combine linearly the outputs of the hidden neurons.
w1
w3
w2
The vector P is “interpolated” using the three vectors; each vector gives a contribution that depends onits weight and on its distance from the point P. In the picture we have
231 www <<
P
RBF
NN 5 2
RBF ARCHITECTURE
• One hidden layer with RBF activation functions
• Output layer with linear activation function.
x2
xm
x1
y
wm1
w11ϕ
1mϕ
11 ... mϕϕ
||)(||...||)(|| 111111 mmm txwtxwy −++−= ϕϕ
txxxtx m vector from ),...,( of distance |||| 1=−
RBF
NN 5
Rossella Cancelliere 2
NN 5 3
HIDDEN NEURON MODEL
• Hidden units: use radial basis functions
x2
x1
xm
φσ( || x - t||)
t is called centerσ is called spreadcenter and spread are parameters
σϕ
φσ( || x - t||) the output depends on the distance of the input x from the center t
RBF
NN 5 4
Hidden Neurons
• A hidden neuron is more sensitive to data points near its center.
• For Gaussian RBF this sensitivity may be tuned by adjusting the spread σ, where a larger spread implies less sensitivity.
RBF
NN 5
Rossella Cancelliere 3
NN 5 5
Gaussian RBF φ
center
φ :
σ is a measure of how spread the curve is:
Large σ Small σ
RBF
NN 5 6
The interpolation problem:
Given a set of N different points }1,{ Nix mi L=ℜ∈ and a set of
N real numbers }1,{ Nid i L=ℜ∈ , find a function ℜ⇒ℜ mF :
that satisfies the interpolation condition: ( ) ii dxF =
If ( ) ( )∑=
−=N
iii xxwxF
1ϕ we have:
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
NN d
d
w
w......
11
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
−−
||)(||...||)(||...
||)(||...||)(||
1
111
NNN
N
xxxx
xxxx
ϕϕ
ϕϕ⇒ dw =Φ
Interpolation with RBF RBF
NN 5
Rossella Cancelliere 4
NN 5 7
Types of φ
• Multiquadrics: Inverse multiquadrics:
• Gaussian functions (most used):
21)()( 22 crr +=ϕ
⎟⎟⎠
⎞⎜⎜⎝
⎛−= 2
2
2exp)(
σϕ rr
21)(
1)( 22 crr
+=ϕ
||||0
txrc
−=>
0>σ
Micchelli’s theorem:
Let be a set of distinct points in . Then the N-by-N interpolation
matrix , whose ji-th element is is
nonsingular.
Niix 1}{ =
mℜ
Φ ( )ijji xx −= ϕϕ
RBF
NN 5 8
Example: the XOR problem
• Input space:
• Output space:
• Construct an RBF pattern classifier such that:(0,0) and (1,1) are mapped to 0, class C1(1,0) and (0,1) are mapped to 1, class C2
(1,1)(0,1)
(0,0) (1,0)x1
x2
y10
RBF
NN 5
Rossella Cancelliere 5
NN 5 9
• In the feature (hidden layer) space:
• When mapped into the feature space < ϕ1 , ϕ2 > (hidden layer), C1 and C2 become linearly separable. So a linear classifier with ϕ1(x) and ϕ2(x) as inputs can be used to solve the XOR problem.
Example: the XOR problem
22
21
||||22
||||11
||)(||
||)(||tx
tx
etx
etx−−
−−
=−
=−
ϕ
ϕ)0,0( and )1,1( with 21 == tt
φ1
φ2
1.0
1.0(0,0)
0.5
0.5 (1,1)
Decision boundary
(0,1) and (1,0)
RBF
NN 5 10
RBF NN for the XOR problem
22
21
||||22
||||11
||)(||
||)(||tx
tx
etx
etx−−
−−
=−
=−
ϕ
ϕ )0,0( and )1,1( with 21 == tt
x1
x2
t1
+1-1
-1
t2y
0 class otherwise 1 class then0 If1
22
21 ||||||||
>+−−= −−−−
yeey txtx
RBF
NN 5
Rossella Cancelliere 6
NN 5 11
RBF network parameters
• What do we have to learn for a RBF NN with a given architecture?– The centers of the RBF activation functions– the spreads of the Gaussian RBF activation
functions– the weights from the hidden to the output layer
• Different learning algorithms may be used for learning the RBF network parameters. We describe three possible methods for learning centers, spreads and weights.
RBF
NN 5 12
Learning Algorithm 1
• Centers: are selected at random– centers are chosen randomly from the training set
• Spreads: are chosen by normalization:
• Then the activation function of hidden neuron becomes:
1mmaxd
centers ofnumber centers 2any between distance Maximum
==σ
( ) ⎟⎟⎠
⎞⎜⎜⎝
⎛−−=− 2
i2max
12i tx
dmexptxiϕ
i
RBF
NN 5
Rossella Cancelliere 7
NN 5 13
Learning Algorithm 1
• Weights: are computed by means of the pseudo-inverse method.– For an example consider the output of
the network
– We would like for each example, that is
),( ii dx
ii dxy =)(
RBF
||)(||...||)(||)( 111111 mimmii txwtxwxy −++−≈ ϕϕ
imimmi dtxwtxw ≈−++− ||)(||...||)(|| 111111 ϕϕ
NN 5 14
Learning Algorithm 1
• This can be re-written in matrix form for one example
and
for all the examples at the same time
RBF
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
mw
w...
1
1
[ ] imimi dtxtx =−− ||)(||...||)(|| 1111 ϕϕ
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
mw
w...
1
1T
N
mNmN
mm
ddtxtx
txtx]...[
||)(||||)...(||...
||)(||||)...(||
1
1111
111111
=⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
−−
ϕϕ
ϕϕ
NN 5
Rossella Cancelliere 8
NN 5 15
Learning Algorithm 1
let
then we can write
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−−
−−=Φ
||)(||...||)(||...
||)(||...||)(||
1111
11111
mNmN
mNm
txtx
txtx
ϕϕ
ϕϕ
RBF
So the unknown vector w is the solution of the linear systems
dw TT Φ=ΦΦordw=Φ
=Φ⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
Nd
d...
1
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
mw
w...
1
1
NN 5 16
Pseudoinverse matrix
TN
Tm ddww ]...[]...[ 111
+Φ=
the matrix we can obtain the weights using the following formula
Φ
( ) TT ΦΦΦ≡Φ−+ 1
If we define as the pseudo-inverse of
The evaluation of usually requires to store in memorythe entire matrix .
+ΦΦ
NN 5
Rossella Cancelliere 9
NN 5 17
Pseudoinverse matrixA good idea is to divide in Q subsets each made by anarbitrary number of rows (i=1…..Q)
Φia
Then we build a set of new matrices named ; in the only set of rows different from zero is the i-th subset of matrix
iA iA
Φ so we have
∑=
=ΦQ
iiA
1
Φ 1A 2A 3A
NN 5 18
Pseudoinverse matrix
If we replace in the definition of we have:∑=
=ΦQ
iiA
1
+Φ
We define
NN 5
Rossella Cancelliere 10
NN 5 19
Pseudoinverse matrix
NN 5 20
Pseudoinverse matrix
NN 5
Rossella Cancelliere 11
NN 5 21
Pseudoinverse matrix
,
so
NN 5 22
Pseudoinverse matrix
NN 5
Rossella Cancelliere 12
NN 5 23
Learning Algorithm 1: summary
1. Choose the centers randomly from thetraining set.
2. Compute the spread for the RBF functionusing the normalization method.
3. Find the weights using the pseudo-inversemethod.
RBF
NN 5 24
Learning Algorithm 2: Centers
• clustering algorithm for finding the centers1 Initialization: tk(0) random k = 1, …, m1
2 Sampling: draw x from input space 3 Similarity matching: find index of center closer to x
4 Updating: adjust centers
5 Continuation: increment n by 1, goto 2 and continue until no noticeable changes of centers occur
)n(tx(n)min argk(x) kk −=
[ ] k(x)k if )n(tx(n))n(t kk =−+ηotherwise )n(tk
=+ )1 n(tk
RBF
NN 5
Rossella Cancelliere 13
NN 5 25
Learning Algorithm 2: summary
• Hybrid Learning Process:• Clustering for finding the centers.• Spreads chosen by normalization. • LMS algorithm (see Adaline) for finding the
weights.
RBF
NN 5 26
Learning Algorithm 3• Apply the gradient descent method for finding
centers, spread and weights, by minimizing the (instantaneous) squared error
• Update for:
centers
spread
weights
jtj t
tj ∂∂
−=∆Eη
jj σ
ησ σ ∂∂
−=∆E
j
ijijij w
w∂∂
−=∆Eη
2))((21 dxyE −=
RBF
NN 5
Rossella Cancelliere 14
NN 5 27
Comparison with FF NN
RBF-Networks are used for regression and for performing complex (non-linear) pattern classification tasks.
Comparison between RBF networks and FFNN:• Both are examples of non-linear layered feed-forward
networks.
• Both are universal approximators.
RBF
NN 5 28
Comparison with multilayer NN
• Architecture:– RBF networks have one single hidden layer.– FFNN networks may have more hidden layers.
• Neuron Model:– In RBF the neuron model of the hidden neurons is different from the
one of the output nodes.– Typically in FFNN hidden and output neurons share a common
neuron model.– The hidden layer of RBF is non-linear, the output layer of RBF is
linear. – Hidden and output layers of FFNN are usually non-linear.
RBF
NN 5
Rossella Cancelliere 15
NN 5 29
Comparison with multilayer NN
• Activation functions:– The argument of activation function of each hidden neuron in
a RBF NN computes the Euclidean distance between input vector and the center of that unit.
– The argument of the activation function of each hidden neuron in a FFNN computes the inner product of input vector and the synaptic weight vector of that neuron.
• Approximation:– RBF NN using Gaussian functions construct local
approximations to non-linear I/O mapping.– FF NN construct global approximations to non-linear I/O
mapping.
RBF
NN 5 30
We used neural networks to verify the similarity of real astronomicalimages to predefined reference profiles, so we analyse realistic imagesand deduce from them a set of parameters able to describe theirdiscrepancy with respect to the ideal, non-aberrated image.
Example: Astronomical image processing
Usually the image is perturbed byaberrations associated to the characteristics of real-world instruments(red line represents the perturbed image)
The focal plane image, in an ideal case, is the Airy function (blue line):
( ) ( ) 21
222
⎥⎦⎤
⎢⎣⎡=
rrJ
RSP
rI p
ππ
λ
NN 5
Rossella Cancelliere 16
NN 5 31
Example: Astronomical image processing
A realistic image is associated to the Fourier transform ofthe aperture function:
( ) ( ) ( )2
cos,2
0
1
0222, φθρπθρ
π
ρθρλπ
φ −−Φ∫∫= riip eeddR
SPrI
which includes the contributions of the classical (Seidel) aberrations:
( ) [ ]θρρθρθρρλπθρ coscoscos2, 22234
tdacs AAAAA ++++=Φ
: AstigmatismsA : Spherical aberration cA : Coma aA
dA : Field curvature (d = defocus) tA : Distortion (t = tilt)
NN 5 32
0,2µ 2,0µ 3,0µ 4,0µ 1,1µ 1,2µ 2,1µ 1,3µ 2,2µ 3,1µ
( )
( )∫∫
∫∫ ⋅⎟⎟⎠
⎞⎜⎜⎝
⎛ −⎟⎟⎠
⎞⎜⎜⎝
⎛ −
=yxIdydx
yxIyxdydx
m
y
yn
x
x
nm ,
,σµ
σµ
µ
where:
- The maximum range of variation considered is ± 0.3λ for the training set and ± 0.25λ for the test set
- To ease the computational task the image is encoded using the first moments:
Example: Astronomical image processing
NN 5
Rossella Cancelliere 17
NN 5 33
We performed this task using• a FNN neural network (with 60 hidden nodes) and • a RBF neural network (with 175 hidden nodes)Usually the former requires less internal nodes, but more training iterations than the latter
Example: Astronomical image processing
The training required 1000 iterations (FNN) and 175 iterations (RBF); both optimised networks providereasonably good results in terms of approximation of the desired unknown function relating aberrations and moments.
NN 5 34
Example: Astronomical image processing
Performances of the sigmoidal (left) and radial (right) neural networks; from top to bottom, coma, astigmatism, defocus and distortion are shown. The blue(solid) line followsthe test set targets, whereas the greendots are the values computed by the networks.