[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Application of Levenberg-Marquardt

Application of Levenberg-Marquardt method to thetraining of spiking neural networks

Sergio M. Silva, Ant6nio E. RuanoCentre for Intelligent Systems, University of Algarve, Faro, Portugal

E-mail: [email protected], [email protected]

Abstract- One of the basic aspects of some neural networks istheir attempt to approximate as much as possible their biologicalcounterparts. The goal is to achieve a simple and robust network,easy to comprehend and capable of simulating the human brainat a computational level. This paper presents improvements to theSpikepro algoritm, by introduting a new encoding scheme, andMustrate the application of the Levenberg Marquardt algorithmto this third generation of neural network.

I. INTRODUCTION

The third generation of neural networks [1], the SpikingNeural Networks (SNN), have a stronger biological inspirationthan those from the first and second generations.Most neural networks use analog values to communicate

information between neurons which compute a non-linearfunction of their inputs. In SNN the time of a electrical pulse,or spike, is used to encode the information. The way by whicha neuron of this type of neural networks computes its output isquite simple. All incoming spikes are integrated, resulting ina time decaying signal. When a spike arrives to the membraneits potential is altered; when the membrane potential crossesa prescribed threshold the neuron emits a spike at that exacttime instant and the membrane potential is reverted to theinitial value.

Based on this general description much research effort hasbeen put in the creation of models of spiking neurons [2] andon training algorithms for the various types of neurons. In theframework of supervised training, the Error Back Propagation(BP) algorithm is the most frequently found in literature. Forthe SNN this training algorithm is called Spikepro [3]. Thisalgorithm uses the time of occurrence of a single spike toencode the inputs and outputs of the network.

In order to apply the Levenberg-Marquardt (LM) algorithmto SNN, some parts of Spikepro need to be retained. The inputsencoding used is such that, to the 0 and 1 values correspondfiring times of 0 and 6 ms, respectively. The bias neuron firesat 6 ms analogously to other neural networks where the biasvalue is 1, given the previously stated correspondence. Thiscondition is not the same as presented in [3] where the biasinput is considered at a reference time with a value of 0 ms.The outputs encoding used is such that, to the 0 and 1 valuescorrespond fire times of 16 and 10 ms, respectively.The same network topology introduced in [3] is used, con-

sisting in a feed forward neural network where each synapseterminal is composed by sub-connections that have a delayand weigth associated. The initial conditions employed are

the same, except for the case where inhibitory neurons areused and where only positive weights are allowed. In our case,inhibitory neurons need not to be enforced given the fact thatthe weights can become positive or negative after the training.The only condition that must be fulfilled is that the weightsmust be arranged in such way that before the first weigthupdate all neurons must fire.

This paper is organized as follows: in the following twoSubsection the Spikepro and the LM algorithm are presented.In the next sections the methodology used, the results usingSpikepro and Spikelm algorithms are presented. In the lastsection conclusions are drawn.

A. Spikepro

In order to explain the Spikepro algorithm it is necessaryto review the spike neuron model used, in our case the SRMmodel defined in [2]. This model can be adapted using thespike-response function, reflecting the dynamics necessary forthis case.The equation that reflects the dynamics of membrane po-

tential is defined by:

m

xj(t) E k j(t)IiErj k=l

(1)

where Wk is the weigth of sub-connection k, and yk (t) isthe function of sub-connection k, both between the neuron iand j. The function of sub-connection k is given by:

(2)

where ti is the firing time of neuron i, dk is the delayassociated to the sub-connection k of neuron i and e is theresponse function of a neuron to one spike, which in turn isdefined by an exponential function that decays with time:

E(t) { e- tif t < 0if t 0 (3)

The model functionality is illustrated by figure 1.The Spikepro algorithm, like the BP algorithm for the

second generation neural networks, tries to minimize the errorbetween the current and desired output of the network. Theerror function used is the least mean square defined by:

0-7803-9422-4/05/$20.00 C2005 IEEE1354

k k)YZ,j (t) e(t (ti + di

Deign Considerations--a. -vue I ti l = 2 ti

NeYn(t) dl ,- ,jy (t)NeuronIi /-

,d wj,ywM (t)

ttti

tti+l

Fig. 1. Model of a neuron firing

I=Nj=1

where Xh(t) is the membrane potential of neuron h. Thefirst quotient in equation 9, using the approximation definedin [3], is given by:

0 h(t)h-1

Ek kwh()ok(ta X

(10)

Equation 8, appears from the substitution of (10) in (9).The equations 5 to 8 define the update rules used in

Spikepro.

B. Levenberg-Marquardt algorithmIn order to apply the LM algorithm to SNN, the Jacobean

matrix J must be computed and the update rule must beredefined. The Hessian matrix of the cost function can beapproximated [4] by:

(4)H[k] JT[k]J[k], (1 1)

where tqis the current firing time, tj'1 is the desired firtime and N is the number of neurons in the output la3After same mathematical operations the update rule betwthe hidden and the output layer, considering a network Mi inputs, h neurons in the hidden layer and j neurons inoutput layer, is defined by:

Wh,j =--77 jyh j

where q is the learning rate, YhkJ is the k sub-connectfunction at the time of firing of neuron j, and 6j irmathematical expression given by:

_(ta _ td)Ek,h Wh,( t )

The update rule between the input and hidden layer is gi'by:

.Wi,h = -7 hYi,h(th),

where 6h is defined by:

E rh 63 E>k (e)

6h ~ ~ , ,h - It

fh is the set of immediate successor neurons of neuronComparing equation 8 with the same equation in [3], a diirence in sign becomes evident. This happens because inthe mathematical approach to the layers was not corre(applied. In more detail, the mathematical deduction of 6hbe expressed by:

ah=aht (ta)E 6jE Wkh ih8,ih h(t) Z ZJh a3

jrhkh

Using this approach, the search direction of the LM algo-rithm, p[k], can be computed as the solution of the systemdefined by:

(JT [k] J[k] + v [k]I)p[k] =-J [k]e[k] X (12)

where v[k] controls the search direction and the magnitudeof p[k]. When v[k] is 0, p[k] is identical to the Gauss-Newtondirection [4]; when v[k] tends to infinity, p[k] tends to a vectorof zeros, and to a steepest descent direction.The Jacobean matrix and the gradient vector are given by:

k 6h1(6) J[k] = [Jh,j Ji,hI = t td X Yh O3 t X Yi,h]

g[k] = -J[k] x eT[k],

, (13)

(14)

where e[k] computed is the error vector.Having the prediction of the error and the variation of the

cost function it is possible to calculate r[k] given by:

[k] =AE[k]r[]=AEP[k]'1 (15)

where AE[k] is the variation of the cost function given by(8) equation 16 and AEP[k] is the predicted variation of the same

function defined in the equation 17.

AE[k] = E(w[k]) - E(w[k] +p[k]),

(eP[k])TeP[k]AEP[k] = E(w[k]) 2

(16)

(17)

E(w[k] + p[k]) is the cost function in the next trainingiteration, considering a p[k] update of the weights. eP[k] isthe predicted error given by:

eP[k] = e[k] - J[k]p[k], (18)

1355

tI

The ratio r[k] is employed to control the update ,of v. Acommonly used heuristic is:

v[k+1] = 4v[k], r[k] <

v[k], b

I)(19) co.

0C,D

If r[k] < 0 only v is updated, the model paramneters beingkept unchanged.

IL. METHODOLOGYIn a first step the XOR function was trained with Spikepro.

A vector of 400 input patterns was used and different valuesof the learning rate and synapses time constants were consi-dered. The results considered for comparison are the criterionevolution, the number of iterations and the execution time. Allsimulations were carried out using the Matlab language in acomputer with two AMD Athlon 2.4Ghz processors and 1Gbof Ram.

In a second step the same function was trained in the sameconditions, with the proposed LM algorithm application to theSNN which will be referred to Spikelm.

For both training algorithms the training process is stoppedwhen E 1 ins. The synapse time constant values consideredwere 7, 10, 13 and 16 ins. The learning rates used in Spikeprowere 1000, 5000 and 10000. These values are different fromthose used in [3] due to a scaling factor ofl10-6 used to scaleall time values to seconds instead of mili-seconds as used inthe referenced work. For example, a. learning rate of 10000scaled by 10-6 corresponds to 0.01 in the work presentedin [3]. The membrane potential threshold for all- neurons wasconsidered to be 1.The results from these two sets of experiments were com-

pared with those presented in [3]. In an attempt to improve theresults obtained the LM training algorithm was also appliedto a different network structure.

Ci)Eco,

Training Evolution - Learning Rate of 1000

-r= 7 ms

13 ms---- =16mg

10 20 30 40 50 60Iteration Number

70 80 90 100

Fig. 2. Evolution of the XOR function training, considering a learning rateof 1000 and different values of the time constant Tr

14000-

12000-

w 10000CO)

&" 8000-

= 6000-a~

ai)ECO 4000

Training Evolution - Learning Rate of 5000

T= 7 meS---=10ms

= 13 s-- le1ms

..I.

.. .. ..i.

Iteration Number

-Fig. 3. Evolution of the XOR function training, considering a learning rateof 5000 and different values of the time constant -r

III. RESULTSA. Spikepro

Considering a network structure with 3 input neurons, whereone is a bias neuron that is applied only in the hidden layer, 5neurons in the hidden layer, 1 neuron in the output layer and16 sub-connection for each synapse, the evolution of the costfunction is shown in figures 2, 3 and 4, for the learning ratevalues of 1000, 5000 and 10000, respectively.

Observing figure 4, corresponding to the best learning ratepresented in [3], it can be seen that the evolution of the trinincriterion is coherent, but the number of iterations is smallerthan that presented in [3].

During the execution of the experiments a certain sensitivitythe Spikepro algorithm to the initial random weight valueswas found. Some initial values cause one neuron not to fire,resulting in a undesired algorithm early stop.

Table I presents the number of iterations obtained for thedifferent learning rates and time constants considered.

Observing table I, it can be concluded that the values areall smaller when compared with the results obtained in [3],where the best case was around 250 iterations. The reduction isalways bigger than 50%, and is done to the different encodingapplied. It can also be said that the best learning rate is 5000.The time needed to accomplish the trainings is expressed

in table HI.The execution times presented in table II are greater than

those obtained with the BP algorithm to the second genera-tion of neural networks (results not presented where). If thecomparison is done with the original Spikepro, based on thenumber of iterations certainly the execution times in table IIare smaller.

B. SpikeimT'he initial value of the LM algorithm regularization factor,

v [k], should be of the same order of magnitude of the error

1356

Training Evolution

CO)

CO)

UJ .........................................................................................

2 2 .- .....

CO)E

CD 1.V<>

0vs ~~.

\ I,

-#

0 20 40 60 80 100 120Iteration Number

Fig. 4. Evolution of the XOR function training, considering a learning rateof 10000 and different values of the time constant r

TABLE INUMBER OF ITERATIONS WITH DIFFERENT LEARNING RATES AND TIME

CONSTANTS r

1000 5000 10000

7 97 28 7210 82 38 5713 61 32 10916 73 60 103

~ -c = 7ms.- - - X=10ms

. s= 13ms- - r=16ms

2 4

Fig. 5. Evolution of the XOR function training for different values of thetime constant r, considering Spikelm

TABLE III

NUMBER OF ITERATIONS AND TIME NEEDED TO TRAIN THE FUNCTION

XOR WITH SPIKELM

Number of iterations Time in seconds

7 14 93,110 10 65,913 13 82,916 11 6716 1 11 67

vector, in this case 10-3.Figure 5 shows the evolution of the training criterion,

considering the same condition and network topology as usedfor Spikepro.The evolution of training criterion is stable and presents fast

convergence. Clearly the number of iterations is very smallwhen compared to the current implementation of Spikepro,and much smaller if compared with the results presented in [3].Table III presents the number of iterations and execution timesobtained with Spikelm.The number of iterations is of the same order as found

using the LM algorithm with a Multi Layer Perceptron, butthe time spent in training is higher due to a greater number

TABLE IITIME, IN SECONDS, NEEDED TO TRAIN THE FUNCTION XOR WITH

DIFFERENT LEARNING RATES AND TIME CONSTANTS r

of free parameters in the SNN. A reduction of about 90%in execution time is found by comparing tables and Im,favouring the Spikelm algorithm.With the aim of further reducing the training time of

Spikelm other network structures were tested. For these struc-tures sub-connection delays from 1 to the number of sub-connections of synapses between two neurons was considered.After some experimentation it was concluded that the mini-mum number of sub-connections should be larger than the timeinterval used to encode inputs, in this case 6. In a first attempt a

network with 2 hidden neurons and 7 sub-connections for eachsynapse terminal was used, but the Spikelm ceased to convergeand after some iterations one neuron stoping firing causingSpikelm to terminate abnormally. Augmenting this networkwith one hidden neuron the training was successfully applied.Regarding this network results are presented in figure 6 andtable IV.

Figure 6 shows that the training criterion evolution is stableand that fast convergency is archived.As expected, the execution times presented in table IV are

smaller than those in table HI, due to the great reduction inthe number of network parameters. A reduction around 80%was archived. The change in the number of iterations is notso significant and varies with the initial weight values.

1357

1000 5000 10000

7 768,5 216 578,610 675,2 283,3 509,913 552 260,7 922,116 575,8 518,6 842,3

Training Evolution

- - -r=lOms:r = 3ms

- - T= l6ms -~~~~~~~~~~~~~~~~~~~~~~~~~~~~-,r=16m

REFERENCES

[1] W. Maass, "Network of spiking neurons: The third generation of neuralnetwork models," Institutefor Theorical Computer Science, August 1997.

[2] W. Gerstner and W. M. Kistler, Spiking neural models: Single Neurons,Populations, Plasticity. Cambridge University Press, 2002.

[3] S. Bohte, J. Kok, and H. La Poutre, "Spike-prop: error-backprogation inmulti-layer networks of spiking neurons." Neurocomputing, vol. 48, pp.

17-37, 2002.[4] A. E. Ruano, Ed., Intelligent Control Systems using Computational

Intelligence Techniques. London, United Kingdom: The Institution ofElectrical Engineers, 2005, ch. 2, pp. 37-79.

0 1 2 3 4 5 6 7Iteration Number

8 9 10

Fig. 6. Evolution of the XOR function training in a 3-3-1 network topologywith 7 sub-connections for each synapse terminal

TABLE IVNUMBER OF ITERATION AND TIME NEEDED TO TRAIN THE FUNCTION

XOR IN A 3-3-1 NETWORK TOPOLOGY WITH 7 SUB-CONNECTIONS FOR

EACH SYNAPSE TERMINAL

IV. CONCLUSIONS

This paper presents corrections and improvements to the

standard Spikepro training algorithm. It also demonstrateshow the LM method can be applied to train Spiking NeuralNetworks.

It was shown that the neuron bias inputs can be encodedlike any other input and must not be considered at referencetime. Given the results obtained, less than 50% iterations than

the standard Spikepro, this constitutes a good improvement tothe original Spikepro algorithm.The application of the LM method to train SNN's resulted

in an improvement of about 90% in execution time whencompared to the corrected Spikepro algoritm. It was foundthat the number of sub-connections for each synapse betweentwo neurons should be greater than the time interval used toencode inputs.The problem of weights initialisation remains open and

constitutes a good framework for future work, given the factthat random initialisation some times results in abnormaltermination of the training algorithms caused by a non-firingneuron.

1358

14000 r

12000 |

10000 F

8000

6000 F

COcnco

2

CucnI

4000

2000

Number of iterations Time in seconds

7 10 15,210 9 12,613 9 12,616 10 13,9

n ii.11

..............

.. ... ..

.. ...

.. .............

Documents

[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Application of Levenberg-Marquardt