[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Application of An

Application ofAn Improved Particle SwarmOptimization Algorithm for Neural Network

Trainingi

Fuqing ZHAO *, Zongyi REN , DongmeiYU

School of Computer and CommunicationLanzhou University of Technology

730050 Lanzhou, P.R.China{zhaofq & yudm}(mail2.lut.cn

Abstract - Particle Swarm Optimization (PSO) is anevolutionary computation technique developed by Kennedyand Eberhart in 1995 and has been applied successfully tovarious optimization problems. The PSO idea is inspired bynatural concepts such as fish schooling, bird flocking andhuman social relations. It combines local search (by selfexperience) and global search (by neighboring experience),possessing high search efficiency. Backpropagation (BP) isgenerally used for neural network training. It is veryimportant to choose a proper algorithm for training a neuralnetwork. In this paper, we present a modified particle swarmoptimization based training algorithm for neural network. Theproposed method modify the trajectories (positions andvelocities) of the particle based on the best positions visitedearlier by themselves and other particles, and also incorporatespopulation diversity method to avoid premature convergence.Experimental results have demonstrated that the modifiedPSO is a useful tool for training neural network.

I. INTRODUCTION

The role of artificial neural networks in the presentworld applications is gradually increasing and fasteralgorithms are being developed for training neuralnetworks. In general, backpropagation is a method used fortraining neural networks[1,2]. Gradient descent, conjugategradient descent, resilient, BFGS quasi-Newton, one-stepsecant, Levenberg-Marquardt and Bayesian regularizationare all different fonns of the backpropagation trainingalgorithm [3,4]. For all these algorithms storage and-computational requirements are different, some of these aregood for pattern recognition and others for functionapproximation but they have drawbacks in one way orother, like neural network size and their associated storagerequirements. Certain training algorithms are suitable forsome type of applications only, for example an algorithmwhich performs well for pattern recognition may not for

Yahong YANGCollege of Civil Engineering

Lanzhou University of TechnologyLanzhou,Gansu, 730050,P.R.China

[email protected]

classification problems and vice versa, in addition somecannot cater for high accuracy performance. It is difficult tofind a particular training algorithm that is the best for allapplications under all conditions all the time.A newly developed algorithm known as particle swarm is

an addition to existing evolutionary techniques, which isbased on simulation of the behavior of a flock of birds orschool of fish. The main concept is to utilize thecommunication involved in such swarms or schools. Someof the previous work related to neural network trainingusing the particle swarm optimization has been reported [5]-[7] hut none have compared against conventional trainingtechniques. In this paper, particle swarm optimization iscompared with the conventional backpropagation (agradient descent algorithm) for training a feedforwardneural network to learn a non-linear function. The problemconsidered is how fats and how accurate can the neuralnetwork weights be determined by BP and PSO learning acommon function. Detailed comparison of BP lo PSO ispresented with regard to their computational requirements.

The paper is organized as follows: In section2, thearchitecture of feedforward neural network considered inthis paper is explained with the forward path and thebackward path for the backpropagation method. In section3, a brief overview of the particle swarm optimization isgiven and its implementation is explained. Section 4 is adescription of the experimental settings , followed by theresults in section 5, and concluding remarks in section 6.

II. FEEDFORWARD NEURAL NETWORKS

Neural networks are known to be universalapproximators for any non-linear function and they aregenerally used for mapping error tolerant problems thathave much data trained in noise. Training algorithms arc

This work is supported by 863 High Technology Plan Foundation of China(grant NO 2002AA415270) and Natural Science foundation of GANSUprovince(grant NO 3ZS042-B25-005)

0-7803-9422-4/05/$20.00 C2005 IEEE1693

critical when neural networks are applied toapplications with complex nonlinearities[8,9]network consists of many layers namely: an innumber of hidden layers and an output layerlayer and the hidden layer are connected by sycalled weights and likewise the hidden layerlayer also have connection weights. When mohidden layer exists, weights exist between ENeural networks use some sort of "learning" ruthe connections weights are determined in orderthe error between the neural network outputoutput. A three layer feedforward neural netwoin Fig. 1The feedforward path equations for the netwo:with two input neurons, four hidden neurons an(neuron are given below. The first input is x ancis a bias input (I). The activation ftmction of alneurons is given by eq (1).

a, = WUX i = ,...,4, j = 1,2 (1)

where Wij is the weight and X= [x] is an input ve

The hidden layer output called the decision vector dcalculated as follows for sigmoidal functions:T die=ot/u -e(ai n) is=1d,m...,a4(2)

The output ofneural network is determined as foll

(3)

Input Hidden Output

Fig. 1. feedforward neural Network with one hiddeE

III. IMPROVED PARTICLE SWARM OPTIMIZA

high speed Particle swarm optimization is an evolutionaryA neural algorithm similar to genetic algorithms [10] and simulated

iput layer, a annealing, but it operates on a model of social interactionThe input between independent agents and utilizes swarm intelligence

maptic links to achieve the goal of optimizing a problem-specific fitnessand output function [11].

ire than one A. Simple particle swarm optimizationsuch layers. The PSO randomly initializes the position and velocity ofle by which each particle within the swarm at the beginning of theto minimize optimization. Each position represents a possible solution toand desired the problem, and is specified as the matrixArk is shown [xii X12 ... XIN

rk in Fig. 1 Xd one outputd the second XMi XM1 XMN_L1 the hidden where M is the number of particles in the simulation and

N is the number of dimensions of the problem. Each particlealso has an associated velocity, which is a function of thedistance from its current position to the positions which

ctor have previously resulted in a good fitness value. Thevelocity matrix must be the same size as the position matrix,

d is and is represented asV11 V12 V1N

V V21 V22 ..V2Nows:

VMl VM1 ... VMNjIn order to update the velocity matrix at each iteration of

the algorithm, every particle must also "know" the globalbest and personal best position vectors. The global best-position vector specifies the location in solution space atwhich the best fitness value was obtained. The global bestmay be attained by any particle at any iteration up to the

)O Yi present one. Similarly, the personal best-position vectorspecifies the position at which any given particle achievedits best fitness value up to the current iteration. Therefore,

->. y2 although every particle in the swarm accesses the sameglobal best position, the personal best positions are specificto a given particle. The personal best positions can also berepresented by an Mx N matrix:

P P12 ... PIN

-* Ym P_j P21 P22 ... P2N

_PMI PMI ... PMN_The global best position is an N-dimensional vector givenby

G=[91 92 ...9gN]X, V, P, and G together contain all of the information

a layer required by the particle-swarm algorithm. The heart of thealgorithm, however, is the process by which these matricesare updated on each successive iteration. In an effort to

MTION numerically model the behavior of groups of natural agentssuch as fish or birds, the algorithm requires that the position

1694

of each particle should move towards both the global bestand its personal best positions. For this to occur, thevelocity of the particle must be appropriately chosen. Thevelocity matrix is updated each iteration according to [11]:VmnT =Vmn + Clql (P.. X..) +C2171 (9n Xm.) (4)where 17, and 72 are uniform random variables in therange [0 1]. For every dimension, the particles move in thedirection specified by the velocity matrix according to asimple relationship given byX=X+V (5)

B. Improvedparticle swarm optimizationMost versions of PSO have operated in continuous and

real-number space. In continuous versions of PSO, velocityconsist of tree parts, the first is previous velocity of theparticle, the second and third parts are the terms associatedwith their best positions in the past. The PSO algorithmupdates a population of particles on a basis of informationabout each particle's previous best performance and the bestparticle in the population. In PSO, only best positions giveout the information to others. One notices that in GAs,chromosomes share information with each other. Asinformation sharing mechanism in PSO is significantlydifferent from GAs, all the particles tend to converge to thebest solution quickly.For a discrete problem expressed in a binary notation, aparticle moves in a search space restricted to 0 or 1 on eachdimension. In binary problem, updating a particle representschanges of a bit which should be in either state 1 or 0 andthe velocity represents the probability of bit xi taking thevalue 1 or 0.

According to information sharing mechanism of PSO, amodified PSO for variable selection is proposed as follows.The velocity vS of every individual is a random number inthe range of (0,1). The resulting change in position then isdefined by the following rule:

FXid if O<Vid<Xid = ipd if -r < vd <(1 + T)12 (6)

pPgd if(1 + z) < vi < 1where i* is a random value in the range of (0,1) namedstatic probability. The initial value of V is 0.5.Though the velocity in the modified PSO is different

from that in continuous version of PSO, information sharingmechanism and updating model of particle by following thetwo best positions is the same in two PSO versions. Someelements in xl are kept fixed according to Eq. (6) which issimilar to the three term of the right side of Eq. (4). Withoutthis part, the "flying" particles are only determined by theirbest positions in history and all particles would tend tomove toward the same position resembling a local search. Inthis sense, Eq. (6) really provides the particles a tendency toexpand the search space or the global search ability. The

second and the third part are similar to the second and thirdterms of the right side of in Eq. (4). Without these two parts,the particles would keep on "flying" randomly and PSOwould not be able to find a meaningful solution. Thereshould be a balance between the local and global searchability. The static probability a plays the role of balancingthe global and local search. That is to say, the larger thevalue of the parameter a, the greater the probability for themodified PSO to overleap local optima. On the other hand, asmall value of parameter V is, favorable for the particle tofollow the two best past positions and for the algorithm toconverge more quickly. The design of the modified PSO isintended to possess more exploitation ability at thebeginning and a more exploration ability to search the localarea around the particle. Accordingly, we define theparameter descending along with the generation. Staticprobability V started with a value 0.5 and decreases to 0.33when the iteration terminates.Even so, preliminary experiments indicates that such a PSOversion still tends to converge to local optima. Tocircumvent this drawback and improve the ability of themodified PSO algorithm to overleap local optima, 10% ofparticles are forced to fly randomly not following the twobest particles.

Using decreasing static probability and some percent ofrandomly fling particles to overleap local optima, themodified PSO remains having satisfactory convergingcharacteristics.

In particle swarm optimization, coefficients cl and c2should have different signs. Their signs indicate whether aparticle will be moving towards or away from its globalbest. Pursuing multiple objectives using different particlesimproves the chances of finding new and better candidatesolutions for the optimization problem to be solved. In somecases, this requires moving away rather than remaining inthe neighborhood of the best candidate solutions discoveredso far.The number of phases and groups, and the temporary goalsof search for each group and each phase, must be chosenprior to PSO algorithm execution.

Particles cycle through phases with different temporarygoals. The change of temporary goals of search within eachgroup is determined by the phase change frequency (PCF).For example, when PCF = 2, the phase of a particle changesafter every two swarms (iterations) in algorithm execution.We also investigated an adaptive variant of the algorithm, inwhich no PCF parameter is needed phase change is initiatedifno improvement in solution quality has been observed in apredetermined number of iterations executed within thecurrent phase. This is the variant for which experimentalresults are reported in this paper.

Many evolutionary algorithms rely on a restartmechanism to remedy the possibility of being stuck in localoptima of the objective function. In improved PSO, particle

1695

velocities are periodically re-initialized to random values,after a predetermined number of iterations governed by thevelocity change variable (VC).Some evolutionary algorithms incorporate a hill climbing orlocal improvement operator since many evolutionary globalsearch mechanisms do not necessarily reach even localoptima. In improved PSO, the position of each particle iscalculated as in the original PSO algorithm (equation 2), butthe particle does not move to the new location unless itimproves its performance (fitness), ensuring that the currentposition of a particle is the best one it has encountered sofar.

IV. EXPERIMENTAL DETAILS

In training feedforward neural Network(FNN), the maingoal is to obtain a set of weights that will minimize meansquared error (MSE). In previous work, the standard PSOalgorithm has been used in training FNN. Other variationsof PSO were also used to train FNNs [5-7], and theperformance was acceptable. The main advantage of usingPSO and its variations in training FNNs is that there is noneed for backward propagation. The PSO algorithm and itsvariants can be applied even if non-differentiable functionsare used as node functions, since they do not rely ongradient descent. In addition, these algorithms can be usedto minimize non-differentiable functions such asclassification error, rather than just the mean squared error.

The improved PSO algorithm, as well as the PSOalgorithm, is used as follows in training FNNs. Theposition of each particle in the swarm represents a set ofweights of the network for the current epoch. Thedimensionality of each particle is the number of weightsassociated with the network. The particle moves within theweight space attempting to minimize the error. Changing theposition of a particle means updating the weights of thenetwork in order to reduce the MSE of the current epoch.

Three different test functions were used in ourexperiments: Iris, New-Thyroid, and Glass [14-15].

The FNNs used are of the following sizes:1. Iris: 4x3x3 network, with 27 weights (including biasweights);2. New-Thyroid: 5x3x3 with 30 weights; and3. Glass: 9 x 7 x 7 with 126 weights.

The MPPSO algorithm's settings were as follows: Vmaxwas set to 5, swarm size was set to 8, pcf was set to 4 usingthe adaptive method, sl was set to 1, and VC was set to 10.The maximum epoch was set to 30 for the Glass problem,130 for the Thyroid problem, and 148 for the Iris problem(number offorward propagations =number of weightsnumerfwightx swarm size x max no. of epochnote that the number of weights is considered as thedimensionality of the problem, n).

The results obtained were averaged over 20 runs. In eachiteration, the trained and test data are randomly generatedusing the same seed to use the same set of patterns for allthe algorithms.

Four values are used to compare the results. They are:MSET, MSEG, 6T 60G, and MSET is the MSE calculated onthe training patterns, and MSEG is calculated on the testpatterns. 6T and sG are the classification and generalizederror, respectively.

TABLE I

Result of Iris problem

BP PSO Improved PSOMSET 0.0469 0.0101 0.0122E_T 3.8 2.15 2.45MSEG 0.0510 0.0222 0.0171

__3_9 4.5 2.6

TABLE II

Result ofThyroid problem

BP PSO Improved PSOMSET 0.1787 0.0098 0.0082CT 30.6 2.4 1.2

MSEG 0.1722 0.0297 0.0214_G 31.2 5.1 3.8

TABLE III

Result ofGlass problem

BP PSO Improved PSOMSET 0.1243 0.1101 0.0522

64.6 53.7 31.8

MSEG 0.1218 0.1143 0.0787_G 71.2 63.4 39.8

0.4

O.l-

0.1

0.1

O.c

25 1: IIPSO20- PSO

L-1BP15-

A:)5-

.I'-~ _1 I_0 10 20 30(a)

40 50

1696

(b)

0.25 I p

0.20- LIBF

0.15- \

0.1 -

0.05-

0 10 20 30 40(c)

50

Fig. 2. The MSE result over time (a) Iris problem

(b) Thyroid problem (c) Glass problem

V. RESULTS

For all three problems, PSO as well as improved PSOperformed better than backpropagation. Tables 1-3 andFigures 2 show the results averaged over 20 trials.The improved PSO algorithm's performance for the Irisproblem was better than that of the Backpropagationalgorithm but worse than that of the PSO algorithm. Thegeneralization error (EG) of the improved PSO algorithmwas the smallest, which was the opposite of what was

observed for the classification error. The same was true forthe MSEG . This suggests that the weights found byimproved PSO are more stable than the weights found byPSO. For the New-Thyroid problem, the MPPSOalgorithm's performance was better than that of theBackpropagation and PSO algorithms in both cases, usingthe training patterns and testing patterns. As shown in Table2, the classification error (cT ) for improved PSO was thesmallest. The generalization error ( e£ ) of the MPPSO

GA SPSO Hybrid approachBestT 71.12 57 55Mean 83.45 60.56 59.45Maximum 106.66 77.18 62.85CPU time 533" 93" 49"algorithm was the smallest.

For the Glass problem, the performance of the improvedPSO algorithm was better than that of both theBackpropagation and the PSO algorithms. From Table 3, wecan see that the classification error for improved PSO wasthe smallest. The generalization error (EG ) of the MPPSOalgorithm was the smallest.

Compared with results obtained using SPSO, thoseobtained using Hybrid Method are better. It is indicated thatthe average performance using Hybrid Method is nearlyindependent of the due date restriction and the initial settingof operations' start time. The chance of obtaining the bestschedule is still very low. This means that Hybrid Method ismore powerful than SPSO alone but it needs to be improvedfurther.

The average computing time is 8.9 s using Hybridapproach. And the average computing time is 19.6 s usingthe hybrid method. Compared with SPSO, we can see thatthe performance of the hybrid method exceeds that of theSPSO with a slight computing time cost. The convergencegeneration process is shown in Fig. 3. From this figure, itcan be found that the evolution process of the hybridapproach in this paper tends to be stable when thegeneration reaches more than 300,while SPSO and GA need400 and 900 respectively.The achieved solution is shown in Table 2. where the

"Best", "Mean'" and "Minimum" stand for the best one, themean one and the maximum aspects of the objective valuesachieved in 100 runs. The "Best rate" is the rate to reach thebest value. The algorithm was run 100 times with differentrandom seeds for each parameter setting to test the randomeffect on the solution. Therefore, the parameters with lowest"Best rate" are better than others.ok

k,0 - - N

100 200 300 400 500 600 700 800 900 1000generation

Fig. 3. The generation process ofthe different algorithm

TABLE V

The comparison of different algorithm

1697

soNO

VI. CONCLUDING REMARKS

A feedforward neural network learning a nonlinearfunction with the backpropagation and particle swarmoptimization algorithms have been presented in this paper.The number of computations required by each algorithm hasshown that PSO requires less to achieve the same error goalas with the BP. Thus, PSO is a better one for applicationsthat require fast learning algorithms. Further work is toinvestigate using PSO to optimize PSO parameters forneural network training and other tasks. The concept of thePSO can be incorporated into BP algorithm to improve itsglobal convergence rate. This is currently being studied foronline learning which is critical for adaptive real timeidentification and control fimctions.

[14] Liu, Puyin; Li, Hongxing. Efficient learning algorithms for three-layerregular feedforward fuzzy neural networks. IEEE Transactions on NeuralNetworks,vI5(3): 545-558,2004[15] R. Eberhart and Y. Shi. "Particle swarm optimization: developments,applications and resource. Proceedings of the 2001 Congress onEvolutionary Computation, vol I: 81-86.2001[16] Y. Shi and R. Eberhart, "Parameter Selection in Particle SwarmOptimization". Proc. Seventh Annual Conf on Evolutionary Programming.pp.591-601, March 1998[17] James Kennedy, Russell C. Eberhart, Yuhui Shi, Swarmintelligence,Morgan Kauflnann Publishers, 2001[18] H. Yoshida, Y. Fuknyama, S . Takayama. and Y. Nakanishi., "Aparticle swarm optimization for reactive power and voltage control inelectric power systems considering voltage security assessment. IEEE SMC'99 Conf Proceedings. Vol 6:497 -502, 1999.

ACKNOWLEDGMENT

This research is supported byfoundation of GANSU province(grant005)

Natural ScienceNO 3ZS042-B25-

REFERENCES

[1] Zhao, He-Ming; Xu, Jian-Jun; Zhou, Chun-Gui. Reliability predictionof fuze storage based on BP neural network. Journal of Test andMeasurement Technology, 19(l):95-97,2005[2] Li, Haixiao; Jiang, Lu; Shu, Huazhong. Texture segmentation based onZernike moment and BP neural network. Journal of Southeast University(Natural Science Edition),35(2):199-201,2005.[3] Griffin, David. High BP solvent solutions. Chemistry and Industry(London), 13(5):2-3,2004.[4] Kim, N.; Park, H.. Modified UMP-BP decoding algorithm based onmean square error. Electronics Letters,40(13):816-817,2004.[5]Fu, Qiang; Hu, Shang-Xu; Zhao, Sheng-Ying. PSO-based approach forneural network ensembles. Journal of Zhejiang University (EngineeringScience),38(12): 1596-1600,2004.[6] Baskar, S.; Alphones, A.; Suganthan, P.N. Concurrent PSO and FDR-PSO based reconfigurable phase-differentiated antenna array design.Proceedings of the 2004 Congress on Evolutionary Computation: 2173-2179. CEC2004[7] Sun, Jun; Feng, Bin; Xu, Wenbo. Particle swarm optimization withparticles having quantum behavior. Proceedings of the 2004 Congress onEvolutionary Computation: 325-331, CEC2004[8] Rattan, Sanjay S.P.; Hsieh, William W. Complex-valued neuralnetworks for nonlinear complex principal component analysis. NeuralNetworks. 18(1): 61-69,2005[9] Jensen, Robert R.; Karki, Shankar; Salehfar, Hossein. Artificial neuralnetwork-based estimation of mercury speciation in combustion flue gases.Fuel Processing Technology, 85(6-7): 451-462,2004.[10] C. Zhang, H. Shao. and Y.Li. Particle swarm optimization for evolvingartificial neural network, Proceedings of the IEEE InternationalConference on System, Man and Cybernatics, Vol 4 :2487-2490,2000[11] M. Sellles and B. Rylander. Neural network learning using particleswarm optimizers", Advances in Information Science and SoftComputing.Vol 22(11):224-226.2002[12] F. Van den Bergh. and A.P. Engelhrecht. Cooperative learning inneural network using particle swarm optimizers. South African ComputerJournal .Vol. 26: 84-90. 2000[13] Li, Hong-Xing; Lee, E.S. Interpolation fimctions of feedforward neuralnetworks. Computers and Mathematics with Applications,v46(12): 1861-1874,2003

1698

Documents

[IEEE 2005 International Conference on Neural Networks and Brain - Beijing, China (13-15 Oct. 2005)] 2005 International Conference on Neural Networks and Brain - Application of An