Upload
happy-singh
View
222
Download
0
Embed Size (px)
Citation preview
8/8/2019 bisc1.ps
1/5
On the Usage of Differential Evolution for FunctionOptimization
by Rainer StornSiemens AG, ZFE T SN2, Otto-Hahn Ring 6, D-81739 Muenchen, Germany, currently on leave at ICSI,
1947 Center Street, Berkeley, CA 94704, [email protected]
Abstract assumed unless otherwise stated. Basically, DE
generates new parameter vectors by adding the
weighted difference between two population
vectors to a third vector. If the resulting vector
yields a lower objective function value than a
predetermined population member, the newly
generated vector replaces the vector, with which
it was compared, in the next generation;
otherwise, the old vector is retained. This basic
principle, however, is extended when it comes to
the practical variants of DE. For example an
existing vector can be perturbed by adding more
than one weighted difference vector to it. In most
cases, it is also worthwhile to mix the
parameters of the old vector with those of the
perturbed one before comparing the objective
function values. Several variants of DE which
have proven to be useful will be described in the
following.
Differential Evolution (DE) has recently proven to
be an efficient method for optimizing real-valued
multi-modal objective functions. Besides its good
convergence properties and suitability for
parallelization, DE's main assets are its
conceptual simplicity and ease of use. This
paper describes several variants of DE and
elaborates on the choice of DE's control
parameters which corresponds to the application
of fuzzy rules. Finally the design of a howling
removal unit with DE is described to provide a
real-world example for DE's applicability.
1 IntroductionDifferential Evolution (DE) [1], [2] has proven to
be a promising candidate for minimizing real-
valued, multi-modal objective functions. Besides
its good convergence properties DE is very
simple to understand and to implement. DE is
also particularly easy to work with, having only a
few control variables which remain fixed
throughout the entire minimization procedure.
2 Scheme DE/rand/1For each vector x i G, , i = 0,1,2,...,NP-1, a
perturbed vectorv i G, +1 is generated accordingDE is a parallel direct search method which
utilizes NP D-dimensional parameter vectorsto v x F x xi G r G r G r G, , , ,( )+ = + 1 31 2 (2)
with r r r NP1 2 3 0 1, , , , integer and mutually
different, and F > 0.
xi,G, i = 0, 1, 2, ... , NP-1, (1)
as a population for each generation G, i.e. for
each iteration of the minimization. NP doesn't
change during the minimization process. The
initial populationis chosen randomly and should
try to cover the entire parameter space
uniformly. As a rule, a uniform probability
distribution for all random decisions will be
The randomly chosen integers r1, r2 and r3 are
also chosen to be different from the running
index i. F is a real and constant factor [0, 2]
which controls the amplification of the differential
variation ( ), ,x xr G r G2 3 . Note that the vector
1
8/8/2019 bisc1.ps
2/5
x r G1 , which is perturbed to yield v i G, +1has no
relation to x i G, but is a randomly chosen
population member. Fig. 1 shows a two-
dimensional example that illustrates the different
vectors which play a part in the vector-generation
scheme. The notation: DE/rand/1 specifies thatthe vector to be perturbed is randomly chosen,
and that the perturbation consists of one
weighted difference vector.
where rand() is supposed to generate a random
number [0,1):
L = 0;
do {
L = L + 1;
}while(rand()< CR) AND (L < D));
Hence the probability Pr(L>=) = (CR)-1, > 0.
CR is taken from the interval [0, 1] and
constitutes a control variable in the design
process. The random decisions for both n and L
are made anew for each newly generated vector
ui,G+1.
x
xx
xx
x
x
x
x
x i,Gx
x NP Parameter vectors from generation GNewly generated parameter vector
MINIMUM
x
r ,G3 xr ,G1
xr ,G2
F( - )x r ,G2 xr ,G3
xr ,G1 F( - )x r ,G2 xr ,G3+
xx
=
v
X 1
X 0
vi ,G+1
To decide whether or not it should become a
member of generation G+1, the new vector
ui,G+1 is compared to xi G, . If vector ui,G+1yields a smaller objective function value than
x i G, , then x i G, +1 is set to ui,G+1; otherwise, the
old value xi G, is retained.
3 Scheme DE/best/1Fig.1: An example of a two-dimensionalobjective function showing its contour
lines and the process for generating
vi,G+1 in scheme DE/rand/1.
Basically, scheme DE/best/1 works the same
way as DE/rand/1 except that it generates the
vector vi,G+1 according to:
v x F x xi G best G r G r G, , , ,( )+ = + 1 1 2 . (5)
In order to increase the potential diversity of theperturbed parameter vectors, crossover is
introduced. To this end, the vector:
This time, the vector to be perturbed is the best
performing vector of the current generation.
Again, the computation of ui,G+1 is defined by
eq. (4). This will be also be the case for the
remaining variants.
u u u ui G i G i G D i G, , , ( ) ,( , ,..., )+ + + +=1 0 1 1 1 1 1 (3)
4 Scheme DE/best/2with
Scheme DE/best/2 uses two difference vectors
as a perturbation:uv for j n n n L
x for all other j Dji G
ji G D D D
ji G
,
,
,
, ,...,
[ , ]+
+
== + +
%
&K
'K1
1 1 1
0 1
(4)
v x F x x x xi G best G r G r G r G r G, , , , , ,( )+ = + + 1 1 2 3 4 . (6)
Due to the central limit theorem the randomvariation is shifted slightly into gaussian direction
which seems to be beneficial for many functions.
is formed. The acute bracketsD
denote the
modulo function with modulus D. The starting
index, n, in (4) is a randomly chosen integer from
the interval [0,D-1]. The integer L, which denotes
the number of parameters that are going to be
exchanged, is drawn from the interval [1, D].
The algorithm which determines L works
according to the following lines of pseudo code
5 Scheme DE/rand-to best/1
Scheme DE/rand-to-best/1 places the
perturbation at a location between a randomly
chosen population member and the best
population member:
2
8/8/2019 bisc1.ps
3/5
v x x x F x xi G i G best G i G r G r G, , , , , ,
( ) ( )+
= + + 1 2 3. (7) crucial. The more knowledge one includes, the
more likely the minimization is going to converge.
The sum of error squares is not always a good
choice as it has the potential to hide the path to
the global minimum. To minimize the maximum
error is often a better objective but seems toyield more local minima.
controls the greediness of the scheme. To
reduce the number of control variables we
usually set = .
6 Rules for the usage of DESince it's invention [1], DE's has been tested
extensively against artificial and real-world
minimization problems. So far, the following set
of linguistic rules has emerged to be useful when
it comes to choose the control variables F, CR
and NP:
7 Design of a howling removerIn order to demonstrate DE's applicability to real-
world problems a howling removal unit has been
designed with DE. In modern audio
communication applications hands free
environments are the current trend where
headsets are replaced by loudspeakers and
microphones. The preferred way of audio
communication is full duplex, i.e. all
loudspeakers and microphones are active as
opposed to half duplex or "walkie talkie" mode
where only one party is allowed to talk at a time.
Howling is one of the problems in full duplex
communication and builds up due to the acoustic
feedback path. One way to reduce howling is to
frequency-shift the signal that is picked up by a
microphone by 10Hz to 20Hz before it is sent to
the other communicating parties. This shift is
usually not perceived as unnatural by the human
ear. The shifted signal appears at the destination
loudspeakers and travels back to the originator,
shifted by another 10Hz to 20Hz. The signal
travels many times through this acoustic path
and is quickly shifted out of band, thus reducing
the feedback problems. Fig. 2 shows the block
diagram of the howling removal unit.
# At initialization the population should be spread
as much as possible over the objective function
surface.
# Most often the crossover probability CR [0,
1] must be considerably lower than one (e.g.
0.3). If no convergence can be achieved,
however, CR [0.8, 1] often helps.
# For many applications NP=10*D is a good
choice. F is usually chosen [0.5, 1].
# The higher the population size NP is chosen,
the lower one should choose the weighting factor
F.
# watching the parameters: it's a goodconvergence sign if the parameters of the best
population member change a lot from generation
to generation, especially at the beginning of the
minimization and even if the objective function
value of the best population member decreases
slowly.
# watching the objective function: it is not
necessarily bad, if the objective function value of
the best population member exhibits plateaus
during the minimization process. However, it is
an indication that the minimization might take a
long time or that the increase of the population
size NP might be beneficial for convergence.
4 BP LP 4
Upsampler
Bandpass
Lowpass
Downsampler
cos(1- )
A
n2
xk
yk
Fig. 2: Howling removal unit.
The upsampler fills in three zero samples
between the adjacent signal samples xk and
xk+1. The bandpass, which operates at four
times the sampling frequency A, retains the
components of the spectrum which are
sopposed to be frequency-shifted by . The
actual shift is performed via multiplication with an
# The objective function value of the best
population member shouldn't drop too fast,
otherwise the optimization might get stuck in a
local minimum.
# The proper choice of the objective function is
3
8/8/2019 bisc1.ps
4/5
appropriate cosine signal. The lowpass removes
some artifacts which appear due to the shifting
operation, and the downsampler takes every
fourth sample of the lowpass result to yield the
output signal yk at the original sampling
frequency.
scheme the magnitude response of the
corresponding filter was sampled in the
frequency domain. The number of samples that
were used are indicated in figs. 3 and 4 which
also show the final results of the design
procedure.
One of the most important features of the
howling remover is low computational complexity
as the unit has to operate in a real-time
environment. Therefore it was crucial to design a
bandpass as well as a lowpass with minimum
degree. To this end the bandpass was chosen to
be a recursive digital filter (IIR-filter) the transfer
function of which had to meet specific magnitude
constraints defined by a tolerance scheme. The
lowpass was designed as a transversal filter
(FIR-filter) without the usual linear phase
requirement. Also the lowpass had to meet
magnitude specifications defined by a tolerance
scheme.
0 0.1 0.2 0.3 0.4 0.50
0.2
0.4
0.6
0.8
1
1.2
Normalized Frequency
Magnitude 20 samples
stop band
10 samples
stop band
0.01 0.01
40 samples
pass band
10 samples
tra nsition ban d transition b and
10 samples
1.01
0.99
Fig. 3: Magnitude response of the bandpass after
the design process.
The objective function in both cases was defined
to be the maximum deviation from the
corresponding tolerance scheme or to be one of
the following penalty terms pk, whichever value
was greater:
0 0.1 0.2 0.3 0.4 0.50
0. 2
0. 4
0. 6
0. 8
1
1. 2
Normalized Frequency
Magn
itude
1.005
0.995
0.01 0.001
40 samples
10 samples
20 samples
pass band
transition
band
stop band
a) For all parameters par[i]:
p par i if par i1 20 000 100 0= +
8/8/2019 bisc1.ps
5/5
was set to 0.005. The magnitude response of the
filter still violates some parts of the tolerance
scheme slightly, yet the design was satisfactory
for the howling remover.
interval x !
"$#
02
,
. The strategy used was
DE/best/1 with NP=20, F=0.9 and CR=1. It took
30,020 function evaluations to get the result of
fig. 5. The final speed increase of the cosine
function computed by opti(x) was 17% comparedto the library function cos(x).
The lowpass filter result of fig. 4 was obtained
using strategy DE/best/2 with NP=200, F=0.5
and CR=1. The entire design took 83,800
evaluations of the lowpass transfer function. A
total of 16 parameters was used, 8 zero radii and
8 zero angles in the complex z-plane [3]. The
overall gain constant a0 was set to 0.005.
ConclusionSeveral variants of Differential Evolution (DE)
have been introduced and general hints about
their usage have been provided. Three real-world
design tasks appearing in the development of a
howling remover for audio communications have
been solved successfully by applying DE. All
three design tasks could have been performed
with specialized design tools; the advantage of
using DE, however, was that neither specialized
and most probably expensive tools nor expert
knowledge concerning the design tasks
themselves was necessary.
The third optimization for the howling remover
was concerned with the cosine function the
evaluation of which takes up a non-neglectable
amount of computing time. Hence an
approximation of cos(x) in the interval 02
,
! "$#
was performed using a polynomial opti(x) of third
degree. Fig. 5 shows that opti(x) yields an
improved approximation compared to the taylor
polynomial taylor(x) of third degree.
References
-1
-0.8
-0.6
-0.4
-0.2
00. 2
0. 4
0. 6
0. 8
1
0 0.5 1 1.5 2 2.5
cos(x),
taylor(x),
p(x)
x
cos(x)taylor(x) = 1 + 0.5*x^2
opti(x)=0.9975575805
+0.03400468081*x- 0.6044035554*x^2+0.1129638031*x^3
[1] Storn, R. and Price, K., Differential Evolution
- a simple and efficient adaptive scheme for
global optimization over continuous spaces,
Technical Report TR-95-012, ICSI,http://http.icsi.berkeley.edu/~storn/litera.html
[2] Storn, R. and Price, K., Minimizing the real
functions of the ICEC'96 contest by
Differential Evolution, Int. Conf. on
Evolutionary Computation, Nagoya, Japan.
[2] Mitra, S.K. and Kaiser, J.F., Handbook for
digital signal processing, John Wiley, 1993.Fig. 5: Approximation of cos(x) by means of a
polynomial of third order.
The optimization of the coefficients in opti(x) was
performed by minimization of the sum of errorsquares obtained by 100 sampling points in the
5