Radial-Basis Function Networks RBF - DiUniTocancelli/retineu11_12/Radial.pdf · Radial-Basis Function Networks • A function is radial (RBF) if its output depends on (is a non-increasing

NN 5

Rossella Cancelliere 1

NN 5 1

Radial-Basis Function Networks• A function is radial (RBF) if its output depends on (is a non-

increasing function of) the distance of the input from a given stored vector.

• RBFs represent local receptors, as illustrated below, where each point is a stored vector used in one RBF.

• In a RBF network one hidden layer uses neurons with RBF activation functions describing local receptors. Then one outputnode is used to combine linearly the outputs of the hidden neurons.

w1

w3

w2

The vector P is “interpolated” using the three vectors; each vector gives a contribution that depends onits weight and on its distance from the point P. In the picture we have

231 www <<

P

RBF

NN 5 2

RBF ARCHITECTURE

• One hidden layer with RBF activation functions

• Output layer with linear activation function.

x2

xm

x1

y

wm1

w11ϕ

1mϕ

11 ... mϕϕ

||)(||...||)(|| 111111 mmm txwtxwy −++−= ϕϕ

txxxtx m vector from ),...,( of distance |||| 1=−

RBF

NN 5


NN 5 3

HIDDEN NEURON MODEL

• Hidden units: use radial basis functions

x2

x1

xm

φσ( || x - t||)

t is called centerσ is called spreadcenter and spread are parameters

σϕ

φσ( || x - t||) the output depends on the distance of the input x from the center t

RBF

NN 5 4

Hidden Neurons

• A hidden neuron is more sensitive to data points near its center.

• For Gaussian RBF this sensitivity may be tuned by adjusting the spread σ, where a larger spread implies less sensitivity.

RBF

NN 5


NN 5 5

Gaussian RBF φ

center

φ :

σ is a measure of how spread the curve is:

Large σ Small σ

RBF

NN 5 6

The interpolation problem:

Given a set of N different points }1,{ Nix mi L=ℜ∈ and a set of

N real numbers }1,{ Nid i L=ℜ∈ , find a function ℜ⇒ℜ mF :

that satisfies the interpolation condition: ( ) ii dxF =

If ( ) ( )∑=

−=N

iii xxwxF

1ϕ we have:

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡=

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

NN d

d

w

w......

11

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−

−−

||)(||...||)(||...

||)(||...||)(||

1

111

NNN

N

xxxx

xxxx

ϕϕ

ϕϕ⇒ dw =Φ

Interpolation with RBF RBF

NN 5


NN 5 7

Types of φ

• Multiquadrics: Inverse multiquadrics:

• Gaussian functions (most used):

21)()( 22 crr +=ϕ

⎟⎟⎠

⎞⎜⎜⎝

⎛−= 2

2

2exp)(

σϕ rr

21)(

1)( 22 crr

+=ϕ

||||0

txrc

−=>

0>σ

Micchelli’s theorem:

Let be a set of distinct points in . Then the N-by-N interpolation

matrix , whose ji-th element is is

nonsingular.

Niix 1}{ =

mℜ

Φ ( )ijji xx −= ϕϕ

RBF

NN 5 8

Example: the XOR problem

• Input space:

• Output space:

• Construct an RBF pattern classifier such that:(0,0) and (1,1) are mapped to 0, class C1(1,0) and (0,1) are mapped to 1, class C2

(1,1)(0,1)

(0,0) (1,0)x1

x2

y10

RBF

NN 5


NN 5 9

• In the feature (hidden layer) space:

• When mapped into the feature space < ϕ1 , ϕ2 > (hidden layer), C1 and C2 become linearly separable. So a linear classifier with ϕ1(x) and ϕ2(x) as inputs can be used to solve the XOR problem.

Example: the XOR problem

22

21

||||22

||||11

||)(||

||)(||tx

tx

etx

etx−−

−−

=−

=−

ϕ

ϕ)0,0( and )1,1( with 21 == tt

φ1

φ2

1.0

1.0(0,0)

0.5

0.5 (1,1)

Decision boundary

(0,1) and (1,0)

RBF

NN 5 10

RBF NN for the XOR problem

22

21

||||22

||||11

||)(||

||)(||tx

tx

etx

etx−−

−−

=−

=−

ϕ

ϕ )0,0( and )1,1( with 21 == tt

x1

x2

t1

+1-1

-1

t2y

0 class otherwise 1 class then0 If1

22

21 ||||||||

>+−−= −−−−

yeey txtx

RBF

NN 5


NN 5 11

RBF network parameters

• What do we have to learn for a RBF NN with a given architecture?– The centers of the RBF activation functions– the spreads of the Gaussian RBF activation

functions– the weights from the hidden to the output layer

• Different learning algorithms may be used for learning the RBF network parameters. We describe three possible methods for learning centers, spreads and weights.

RBF

NN 5 12

Learning Algorithm 1

• Centers: are selected at random– centers are chosen randomly from the training set

• Spreads: are chosen by normalization:

• Then the activation function of hidden neuron becomes:

1mmaxd

centers ofnumber centers 2any between distance Maximum

==σ

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛−−=− 2

i2max

12i tx

dmexptxiϕ

i

RBF

NN 5


NN 5 13


• Weights: are computed by means of the pseudo-inverse method.– For an example consider the output of

the network

– We would like for each example, that is

),( ii dx

ii dxy =)(

RBF

||)(||...||)(||)( 111111 mimmii txwtxwxy −++−≈ ϕϕ

imimmi dtxwtxw ≈−++− ||)(||...||)(|| 111111 ϕϕ

NN 5 14


• This can be re-written in matrix form for one example

and

for all the examples at the same time

RBF

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

mw

w...

1

1

[ ] imimi dtxtx =−− ||)(||...||)(|| 1111 ϕϕ

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

mw

w...

1

1T

N

mNmN

mm

ddtxtx

txtx]...[

||)(||||)...(||...

||)(||||)...(||

1

1111

111111

=⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−

−−

ϕϕ

ϕϕ

NN 5


NN 5 15


let

then we can write

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−

−−=Φ

||)(||...||)(||...

||)(||...||)(||

1111

11111

mNmN

mNm

txtx

txtx

ϕϕ

ϕϕ

RBF

So the unknown vector w is the solution of the linear systems

dw TT Φ=ΦΦordw=Φ

=Φ⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

Nd

d...

1

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

mw

w...

1

1

NN 5 16

Pseudoinverse matrix

TN

Tm ddww ]...[]...[ 111

+Φ=

the matrix we can obtain the weights using the following formula

Φ

( ) TT ΦΦΦ≡Φ−+ 1

If we define as the pseudo-inverse of

The evaluation of usually requires to store in memorythe entire matrix .

+ΦΦ

NN 5


NN 5 17

Pseudoinverse matrixA good idea is to divide in Q subsets each made by anarbitrary number of rows (i=1…..Q)

Φia

Then we build a set of new matrices named ; in the only set of rows different from zero is the i-th subset of matrix

iA iA

Φ so we have

∑=

=ΦQ

iiA

1

Φ 1A 2A 3A

NN 5 18


If we replace in the definition of we have:∑=

=ΦQ

iiA

1

+Φ

We define

NN 5


NN 5 19


NN 5 20


NN 5


NN 5 21


,

so

NN 5 22


NN 5


NN 5 23

Learning Algorithm 1: summary

1. Choose the centers randomly from thetraining set.

2. Compute the spread for the RBF functionusing the normalization method.

3. Find the weights using the pseudo-inversemethod.

RBF

NN 5 24

Learning Algorithm 2: Centers

• clustering algorithm for finding the centers1 Initialization: tk(0) random k = 1, …, m1

2 Sampling: draw x from input space 3 Similarity matching: find index of center closer to x

4 Updating: adjust centers

5 Continuation: increment n by 1, goto 2 and continue until no noticeable changes of centers occur

)n(tx(n)min argk(x) kk −=

[ ] k(x)k if )n(tx(n))n(t kk =−+ηotherwise )n(tk

=+ )1 n(tk

RBF

NN 5


NN 5 25

Learning Algorithm 2: summary

• Hybrid Learning Process:• Clustering for finding the centers.• Spreads chosen by normalization. • LMS algorithm (see Adaline) for finding the

weights.

RBF

NN 5 26

Learning Algorithm 3• Apply the gradient descent method for finding

centers, spread and weights, by minimizing the (instantaneous) squared error

• Update for:

centers

spread

weights

jtj t

tj ∂∂

−=∆Eη

jj σ

ησ σ ∂∂

−=∆E

j

ijijij w

w∂∂

−=∆Eη

2))((21 dxyE −=

RBF

NN 5


NN 5 27

Comparison with FF NN

RBF-Networks are used for regression and for performing complex (non-linear) pattern classification tasks.

Comparison between RBF networks and FFNN:• Both are examples of non-linear layered feed-forward

networks.

• Both are universal approximators.

RBF

NN 5 28

Comparison with multilayer NN

• Architecture:– RBF networks have one single hidden layer.– FFNN networks may have more hidden layers.

• Neuron Model:– In RBF the neuron model of the hidden neurons is different from the

one of the output nodes.– Typically in FFNN hidden and output neurons share a common

neuron model.– The hidden layer of RBF is non-linear, the output layer of RBF is

linear. – Hidden and output layers of FFNN are usually non-linear.

RBF

NN 5


NN 5 29

Comparison with multilayer NN

• Activation functions:– The argument of activation function of each hidden neuron in

a RBF NN computes the Euclidean distance between input vector and the center of that unit.

– The argument of the activation function of each hidden neuron in a FFNN computes the inner product of input vector and the synaptic weight vector of that neuron.

• Approximation:– RBF NN using Gaussian functions construct local

approximations to non-linear I/O mapping.– FF NN construct global approximations to non-linear I/O

mapping.

RBF

NN 5 30

We used neural networks to verify the similarity of real astronomicalimages to predefined reference profiles, so we analyse realistic imagesand deduce from them a set of parameters able to describe theirdiscrepancy with respect to the ideal, non-aberrated image.

Example: Astronomical image processing

Usually the image is perturbed byaberrations associated to the characteristics of real-world instruments(red line represents the perturbed image)

The focal plane image, in an ideal case, is the Airy function (blue line):

( ) ( ) 21

222

⎥⎦⎤

⎢⎣⎡=

rrJ

RSP

rI p

ππ

λ

NN 5


NN 5 31


A realistic image is associated to the Fourier transform ofthe aperture function:

( ) ( ) ( )2

cos,2

0

1

0222, φθρπθρ

π

ρθρλπ

φ −−Φ∫∫= riip eeddR

SPrI

which includes the contributions of the classical (Seidel) aberrations:

( ) [ ]θρρθρθρρλπθρ coscoscos2, 22234

tdacs AAAAA ++++=Φ

: AstigmatismsA : Spherical aberration cA : Coma aA

dA : Field curvature (d = defocus) tA : Distortion (t = tilt)

NN 5 32

0,2µ 2,0µ 3,0µ 4,0µ 1,1µ 1,2µ 2,1µ 1,3µ 2,2µ 3,1µ

( )

( )∫∫

∫∫ ⋅⎟⎟⎠

⎞⎜⎜⎝

⎛ −⎟⎟⎠

⎞⎜⎜⎝

⎛ −

=yxIdydx

yxIyxdydx

m

y

yn

x

x

nm ,

,σµ

σµ

µ

where:

- The maximum range of variation considered is ± 0.3λ for the training set and ± 0.25λ for the test set

- To ease the computational task the image is encoded using the first moments:


NN 5


NN 5 33

We performed this task using• a FNN neural network (with 60 hidden nodes) and • a RBF neural network (with 175 hidden nodes)Usually the former requires less internal nodes, but more training iterations than the latter


The training required 1000 iterations (FNN) and 175 iterations (RBF); both optimised networks providereasonably good results in terms of approximation of the desired unknown function relating aberrations and moments.

NN 5 34


Performances of the sigmoidal (left) and radial (right) neural networks; from top to bottom, coma, astigmatism, defocus and distortion are shown. The blue(solid) line followsthe test set targets, whereas the greendots are the values computed by the networks.

Documents

Radial-Basis Function Networks RBF - DiUniTocancelli/retineu11_12/Radial.pdf · Radial-Basis Function Networks • A function is radial (RBF) if its output depends on (is a non-increasing