View
215
Download
0
Embed Size (px)
Citation preview
A Clustered Particle Swarm Algorithm for
Retrieving all the Local Minima of a function
C. Voglis & I. E. LagarisC. Voglis & I. E. Lagaris
Computer Science Computer Science DepartmentDepartment
University of Ioannina, University of Ioannina, GREECEGREECE
Presentation OutlinePresentation Outline
Global Optimization ProblemGlobal Optimization Problem Particle Swarm OptimizationParticle Swarm Optimization
Modifying Particle Swarm to form Modifying Particle Swarm to form clustersclusters
Clustering Approach Clustering Approach Modifying the affinity matrixModifying the affinity matrix
Putting the pieces together Putting the pieces together Determining the number of minimaDetermining the number of minima Identification of the clustersIdentification of the clusters
Preliminary results – Future researchPreliminary results – Future research
Global OptimizationGlobal Optimization
The goal is to find the Global The goal is to find the Global minimum inside a bounded domain:minimum inside a bounded domain:
One way to do that, is to find all the One way to do that, is to find all the local minima and choose among local minima and choose among them the global one (or ones).them the global one (or ones).
Popular methods of that kind are Popular methods of that kind are Multistart, MLSL, TMLSLMultistart, MLSL, TMLSL**, etc., etc.
NRSxxf ),(min
*M. Ali
Particle Swarm Particle Swarm Optimization Optimization Developed in 1995 by Developed in 1995 by
James Kennedy and Russ James Kennedy and Russ Eberhart.Eberhart.
It was inspired by social It was inspired by social behavior of bird flocking behavior of bird flocking or fish schooling.or fish schooling.
PSO applies the concept PSO applies the concept of social interaction to of social interaction to problem solving.problem solving.
Finds a global optimum.Finds a global optimum.
PSO-DescriptionPSO-Description The method allows the motion of The method allows the motion of
particles to explore the space of particles to explore the space of interest.interest.
Each particle updates its position in Each particle updates its position in discrete unit time steps.discrete unit time steps.
The velocity is updated by a linear The velocity is updated by a linear combination of two terms: combination of two terms: The first along the direction pointing to the The first along the direction pointing to the
best position discovered by the particle best position discovered by the particle The second towards the overall best The second towards the overall best
position.position.
PSO - RelationsPSO - Relations
Particle’s best position
Swarm’s best position
)1()()1(
)()(22
)()(11
)()1( )()(
ki
ki
ki
ki
kki
ki
ki
ki
vxx
xycxbcvv
)(kix
)(kib
)1( kix
)1( kiv
Where: is the position of the ith particle at step k is its velocity is the best position visited by the ith particle is the overall best position ever visited
)(kix
)1( kiv)(k
ib
)(ky
)(kyis the constriction factor
PS+Clustering PS+Clustering OptimizationOptimization
If the global component is weakened the If the global component is weakened the swarm is expected to form clusters around swarm is expected to form clusters around the minima.the minima.
If a bias is added towards the steepest If a bias is added towards the steepest descent direction, this will be accelerated.descent direction, this will be accelerated.
Locating the minima then may be tackled, Locating the minima then may be tackled, to a large extend, as a to a large extend, as a Clustering ProblemClustering Problem (CP).(CP).
However is not a regular CP, since it can However is not a regular CP, since it can benefit from information supplied by the benefit from information supplied by the objective function.objective function.
Modified PSOModified PSO
Global component is set to zero.Global component is set to zero. A component pointing towards the A component pointing towards the
steepest descent directionsteepest descent direction* * is added is added to accelerate the process.to accelerate the process.
So the swarm motion is described So the swarm motion is described by:by:
*A. Ismael F. Vaz, M.G.P. Fernantes
)(
)()( 22)()(
11)()1(
)1()()1(
i
iki
ki
ki
ki
ki
ki
ki
xf
xfcxbcvv
vxx
Modified PSO movieModified PSO movie
ClusteringClustering Clustering problem: Clustering problem:
““Partition a data set Partition a data set into M disjoint subsets into M disjoint subsets containing points with one or more containing points with one or more properties in common”properties in common”
A commonly used property refers to A commonly used property refers to topographical grouping based on topographical grouping based on distances.distances.
Plethora of Algorithms: Plethora of Algorithms: K-Means, Hierarchical -Single linkage-K-Means, Hierarchical -Single linkage-
Quantum-Newtonian clustering.Quantum-Newtonian clustering.
1 2, , , dn iX x x x x R
1 2, , , MC C C
Global k-meansGlobal k-means
Minimize the clustering error Minimize the clustering error
It is an incremental procedure using It is an incremental procedure using the k-Means algorithm repeatedlythe k-Means algorithm repeatedly
Independent of the initialization choice.Independent of the initialization choice. Has been successfully applied to many Has been successfully applied to many
problems.problems.
2
1 21 1
, , ,N M
M i k i ki j
E m m m I x C x m
Cx
CxCxI
,0
,1)(
A. Likas
Global K-Means movieGlobal K-Means movie
Spectral ClusteringSpectral Clustering
Algorithms that cluster points using Algorithms that cluster points using eigenvectors of matrices derived from the eigenvectors of matrices derived from the datadata
Obtain data representation in the low-Obtain data representation in the low-dimensional space that can be easily dimensional space that can be easily clusteredclustered
Variety of methods that use the Variety of methods that use the eigenvectors differentlyeigenvectors differently
Useful information can be extracted from Useful information can be extracted from the eigenvaluesthe eigenvalues
The Affinity MatrixThe Affinity Matrix
12 13 1
21 23 2
31 32
1 2 3
1
1
1
1
1
1
1
N
N
N N N
g g g
g g g
g g
A
g g g
This symmetric This symmetric matrix is of key matrix is of key importance. importance.
Each off-diagonal Each off-diagonal element is given element is given by:by:
)2
exp(2
2
ji
ij
xxg
The Affinity MatrixThe Affinity Matrix
Let and forLet and for
The Matrix is The Matrix is diagonalized and letdiagonalized and let
be its be its eigenvalues sorted in descending order. eigenvalues sorted in descending order.
The gap which is The gap which is biggest, identifies the number of clusters biggest, identifies the number of clusters (k).(k).
N
jijii AD
1
0ijD ji
21
21 ADDM
N 321
1 kkk
Simple exampleSimple example Subset of Cisi/Medline datasetSubset of Cisi/Medline dataset
Two clusters: IR abstracts, Medical abstractsTwo clusters: IR abstracts, Medical abstracts 650 documents, 3366 terms after pre-processing650 documents, 3366 terms after pre-processing
-3.00E-02
-2.50E-02
-2.00E-02
-1.50E-02
-1.00E-02
-5.00E-03
0.00E+00
5.00E-03
1.00E-02
1.50E-02
-0.01 -0.008 -0.006 -0.004 -0.002 0 0.002 0.004 0.006 0.008
Spectral embedded space based constructed from Spectral embedded space based constructed from two largest eigenvectors:two largest eigenvectors:
0
5
10
15
20
25
30
35
40
45
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
K
Eigenvalue
Largest Largest eigengapeigengap
λ1
λ2
How to select How to select kk?? EigengapEigengap: the difference between two : the difference between two
consecutive eigenvalues.consecutive eigenvalues. Most stable clustering is generally given by Most stable clustering is generally given by
the value the value kk that maximises the expression that maximises the expression
Choose Choose k=2k=2
12max k
Putting the pieces Putting the pieces togethertogether
1.1. Apply modified particle swarm to form Apply modified particle swarm to form clusters around the minimaclusters around the minima
2.2. Construct the affinity matrix Construct the affinity matrix AA and and compute the eigenvalues of compute the eigenvalues of M.M.
A.A. Use only distance informationUse only distance informationB.B. Add gradient informationAdd gradient information
3.3. Find the largest eigengap and identify Find the largest eigengap and identify kk..4.4. Perform global k-means using the Perform global k-means using the
determined determined kkA.A. Use pairwise distances and centroidsUse pairwise distances and centroidsB.B. Use affinity matrix and medoids (with Use affinity matrix and medoids (with
gradient info)gradient info)
Adding information to Adding information to Affinity matrixAffinity matrix
Use the gradient vectors to zero out Use the gradient vectors to zero out pairwise affinities.pairwise affinities.
New formula :New formula :
Do not associate particles that would Do not associate particles that would become more distant if they would become more distant if they would follow the negative gradient.follow the negative gradient.
2
2
0, , move further apart
exp( ) , otherwise 2
ij i j
i j
g x x
Adding information to Adding information to Affinity matrixAffinity matrix
Black arrow: Gradient of particle iGreen arrows: Gradient of j with non zero affinity to i Red arrows: Gradient of j with zero affinity to i
From global k-means to global k-From global k-means to global k-medoidsmedoids
Original global k-means Original global k-means
Rastrigin function (49 Rastrigin function (49 minima)minima)
After modified particle Swarm
Gradient information
Rastrigin functionRastrigin function
Estimation of k using distance
Estimation of k using gradient info
Rastrigin function Rastrigin function Global k-meansGlobal k-means
Rastrigin function Rastrigin function Global k-medoidsGlobal k-medoids
Shubert function (100 Shubert function (100 minima)minima)
After modified particle Swarm
Gradient information
Shubert functionShubert function
Estimation of k using distance
Estimation of k using gradient info
Shubert function Shubert function Global k-meansGlobal k-means
Shubert function Shubert function Global k-medoidsGlobal k-medoids
Ackley function (25 Ackley function (25 minima)minima)
After modified particle Swarm
Gradient information
Shubert functionShubert function
Estimation of k using distance
Estimation of k using gradient info
Shubert function Shubert function Global k-meansGlobal k-means
Shubert function Shubert function Global k-medoidsGlobal k-medoids