A Clustered Particle Swarm Algorithm for Retrieving all the Local Minima of a function C. Voglis & I. E. Lagaris Computer Science Department University

A Clustered Particle Swarm Algorithm for

Retrieving all the Local Minima of a function

C. Voglis & I. E. LagarisC. Voglis & I. E. Lagaris

Computer Science Computer Science DepartmentDepartment

University of Ioannina, University of Ioannina, GREECEGREECE

Presentation OutlinePresentation Outline

Global Optimization ProblemGlobal Optimization Problem Particle Swarm OptimizationParticle Swarm Optimization

Modifying Particle Swarm to form Modifying Particle Swarm to form clustersclusters

Clustering Approach Clustering Approach Modifying the affinity matrixModifying the affinity matrix

Putting the pieces together Putting the pieces together Determining the number of minimaDetermining the number of minima Identification of the clustersIdentification of the clusters

Preliminary results – Future researchPreliminary results – Future research

Global OptimizationGlobal Optimization

The goal is to find the Global The goal is to find the Global minimum inside a bounded domain:minimum inside a bounded domain:

One way to do that, is to find all the One way to do that, is to find all the local minima and choose among local minima and choose among them the global one (or ones).them the global one (or ones).

Popular methods of that kind are Popular methods of that kind are Multistart, MLSL, TMLSLMultistart, MLSL, TMLSL**, etc., etc.

NRSxxf ),(min

*M. Ali

Particle Swarm Particle Swarm Optimization Optimization Developed in 1995 by Developed in 1995 by

James Kennedy and Russ James Kennedy and Russ Eberhart.Eberhart.

It was inspired by social It was inspired by social behavior of bird flocking behavior of bird flocking or fish schooling.or fish schooling.

PSO applies the concept PSO applies the concept of social interaction to of social interaction to problem solving.problem solving.

Finds a global optimum.Finds a global optimum.

PSO-DescriptionPSO-Description The method allows the motion of The method allows the motion of

particles to explore the space of particles to explore the space of interest.interest.

Each particle updates its position in Each particle updates its position in discrete unit time steps.discrete unit time steps.

The velocity is updated by a linear The velocity is updated by a linear combination of two terms: combination of two terms: The first along the direction pointing to the The first along the direction pointing to the

best position discovered by the particle best position discovered by the particle The second towards the overall best The second towards the overall best

position.position.

PSO - RelationsPSO - Relations

Particle’s best position

Swarm’s best position

)1()()1(

)()(22

)()(11

)()1( )()(

ki

ki

ki

ki

kki

ki

ki

ki

vxx

xycxbcvv

)(kix

)(kib

)1( kix

)1( kiv

Where: is the position of the ith particle at step k is its velocity is the best position visited by the ith particle is the overall best position ever visited

)(kix

)1( kiv)(k

ib

)(ky

)(kyis the constriction factor

PS+Clustering PS+Clustering OptimizationOptimization

If the global component is weakened the If the global component is weakened the swarm is expected to form clusters around swarm is expected to form clusters around the minima.the minima.

If a bias is added towards the steepest If a bias is added towards the steepest descent direction, this will be accelerated.descent direction, this will be accelerated.

Locating the minima then may be tackled, Locating the minima then may be tackled, to a large extend, as a to a large extend, as a Clustering ProblemClustering Problem (CP).(CP).

However is not a regular CP, since it can However is not a regular CP, since it can benefit from information supplied by the benefit from information supplied by the objective function.objective function.

Modified PSOModified PSO

Global component is set to zero.Global component is set to zero. A component pointing towards the A component pointing towards the

steepest descent directionsteepest descent direction* * is added is added to accelerate the process.to accelerate the process.

So the swarm motion is described So the swarm motion is described by:by:

*A. Ismael F. Vaz, M.G.P. Fernantes

)(

)()( 22)()(

11)()1(

)1()()1(

i

iki

ki

ki

ki

ki

ki

ki

xf

xfcxbcvv

vxx

Modified PSO movieModified PSO movie

ClusteringClustering Clustering problem: Clustering problem:

““Partition a data set Partition a data set into M disjoint subsets into M disjoint subsets containing points with one or more containing points with one or more properties in common”properties in common”

A commonly used property refers to A commonly used property refers to topographical grouping based on topographical grouping based on distances.distances.

Plethora of Algorithms: Plethora of Algorithms: K-Means, Hierarchical -Single linkage-K-Means, Hierarchical -Single linkage-

Quantum-Newtonian clustering.Quantum-Newtonian clustering.

1 2, , , dn iX x x x x R

1 2, , , MC C C

Global k-meansGlobal k-means

Minimize the clustering error Minimize the clustering error

It is an incremental procedure using It is an incremental procedure using the k-Means algorithm repeatedlythe k-Means algorithm repeatedly

Independent of the initialization choice.Independent of the initialization choice. Has been successfully applied to many Has been successfully applied to many

problems.problems.

2

1 21 1

, , ,N M

M i k i ki j

E m m m I x C x m

Cx

CxCxI

,0

,1)(

A. Likas

Global K-Means movieGlobal K-Means movie

Spectral ClusteringSpectral Clustering

Algorithms that cluster points using Algorithms that cluster points using eigenvectors of matrices derived from the eigenvectors of matrices derived from the datadata

Obtain data representation in the low-Obtain data representation in the low-dimensional space that can be easily dimensional space that can be easily clusteredclustered

Variety of methods that use the Variety of methods that use the eigenvectors differentlyeigenvectors differently

Useful information can be extracted from Useful information can be extracted from the eigenvaluesthe eigenvalues

The Affinity MatrixThe Affinity Matrix

12 13 1

21 23 2

31 32

1 2 3

1

1

1

1

1

1

1

N

N

N N N

g g g

g g g

g g

A

g g g

This symmetric This symmetric matrix is of key matrix is of key importance. importance.

Each off-diagonal Each off-diagonal element is given element is given by:by:

)2

exp(2

2

ji

ij

xxg

The Affinity MatrixThe Affinity Matrix

Let and forLet and for

The Matrix is The Matrix is diagonalized and letdiagonalized and let

be its be its eigenvalues sorted in descending order. eigenvalues sorted in descending order.

The gap which is The gap which is biggest, identifies the number of clusters biggest, identifies the number of clusters (k).(k).

N

jijii AD

1

0ijD ji

21

21 ADDM

N 321

1 kkk

Simple exampleSimple example Subset of Cisi/Medline datasetSubset of Cisi/Medline dataset

Two clusters: IR abstracts, Medical abstractsTwo clusters: IR abstracts, Medical abstracts 650 documents, 3366 terms after pre-processing650 documents, 3366 terms after pre-processing

-3.00E-02

-2.50E-02

-2.00E-02

-1.50E-02

-1.00E-02

-5.00E-03

0.00E+00

5.00E-03

1.00E-02

1.50E-02

-0.01 -0.008 -0.006 -0.004 -0.002 0 0.002 0.004 0.006 0.008

Spectral embedded space based constructed from Spectral embedded space based constructed from two largest eigenvectors:two largest eigenvectors:

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

K

Eigenvalue

Largest Largest eigengapeigengap

λ1

λ2

How to select How to select kk?? EigengapEigengap: the difference between two : the difference between two

consecutive eigenvalues.consecutive eigenvalues. Most stable clustering is generally given by Most stable clustering is generally given by

the value the value kk that maximises the expression that maximises the expression

Choose Choose k=2k=2

12max k

Putting the pieces Putting the pieces togethertogether

1.1. Apply modified particle swarm to form Apply modified particle swarm to form clusters around the minimaclusters around the minima

2.2. Construct the affinity matrix Construct the affinity matrix AA and and compute the eigenvalues of compute the eigenvalues of M.M.

A.A. Use only distance informationUse only distance informationB.B. Add gradient informationAdd gradient information

3.3. Find the largest eigengap and identify Find the largest eigengap and identify kk..4.4. Perform global k-means using the Perform global k-means using the

determined determined kkA.A. Use pairwise distances and centroidsUse pairwise distances and centroidsB.B. Use affinity matrix and medoids (with Use affinity matrix and medoids (with

gradient info)gradient info)

Adding information to Adding information to Affinity matrixAffinity matrix

Use the gradient vectors to zero out Use the gradient vectors to zero out pairwise affinities.pairwise affinities.

New formula :New formula :

Do not associate particles that would Do not associate particles that would become more distant if they would become more distant if they would follow the negative gradient.follow the negative gradient.

2

2

0, , move further apart

exp( ) , otherwise 2

ij i j

i j

g x x

Adding information to Adding information to Affinity matrixAffinity matrix

Black arrow: Gradient of particle iGreen arrows: Gradient of j with non zero affinity to i Red arrows: Gradient of j with zero affinity to i

From global k-means to global k-From global k-means to global k-medoidsmedoids

Original global k-means Original global k-means

Rastrigin function (49 Rastrigin function (49 minima)minima)

After modified particle Swarm

Gradient information

Rastrigin functionRastrigin function

Estimation of k using distance

Estimation of k using gradient info

Rastrigin function Rastrigin function Global k-meansGlobal k-means

Rastrigin function Rastrigin function Global k-medoidsGlobal k-medoids

Shubert function (100 Shubert function (100 minima)minima)



Shubert functionShubert function



Shubert function Shubert function Global k-meansGlobal k-means

Shubert function Shubert function Global k-medoidsGlobal k-medoids

Ackley function (25 Ackley function (25 minima)minima)



Shubert functionShubert function



Shubert function Shubert function Global k-meansGlobal k-means

Shubert function Shubert function Global k-medoidsGlobal k-medoids

Documents

A Clustered Particle Swarm Algorithm for Retrieving all the Local Minima of a function C. Voglis & I. E. Lagaris Computer Science Department University