Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
A Level-Value Estimation Algorithm and ItsStochastic Implementation for Global
Optimization
Zheng Peng ∗ Donghua Wu † Quan Zheng‡
Abstract: In this paper, we propose a new method for finding globaloptimum of continuous optimization problems, namely Level-Value Es-timation algorithm(LVEM). First we define the variance function v(c)and the mean deviation function m(c) with respect to a single variable(the level value c), and both of these functions depend on the optimizedfunction f(x). We verify these functions have some good properties forsolving the equation v(c) = 0 by Newton method. We prove that thelargest root of this equation is equivalent to the global optimum of thecorresponding optimization problem. Then we proposed LVEM algo-rithm based on using Newton method to solve the equation v(c) = 0,and prove convergence of LVEM algorithm. We also propose an im-plementable algorithm of LVEM algorithm, abbreviate to ILVEM algo-rithm. In ILVEM algorithm, we use importance sampling to calculateintegral in the functions v(c) and m(c). And we use the main ideas ofthe cross-entropy method to update parameters of probability densityfunction of sample distribution at each iteration. We verify that ILVEMalgorithm satisfies the convergent conditions of (one-dimensional) inex-act Newton method for solving nonlinear equation, and then we proveconvergence of ILEVM algorithm. The numerical results suggest thatILVEM algorithm is applicable and efficient in solving global optimiza-tion problem.
Keywords: Global Optimization; Level-Value Estimation Algorithm;Variance Equation; Inexact Newton Method; Importance Sampling.
∗Department of mathematics, NanJing University, PR. China 210093, and Depart-ment of mathematics, Shanghai University, PR. China 200444; Corresponding author, E-mail :[email protected]
†Department of mathematics, Shanghai University, PR. China 200444‡Department of mathematics, Shanghai University, PR. China 200444
1
1 Introduction
In this paper, we consider the following global optimization problem:
c∗ = minx∈D
f(x) (1.1)
where D ⊆ Rn. When D = Rn, we call Problem (1.1) unconstrained globaloptimization. When D = x ∈ Rn : gi(x) ≤ 0, i = 1, 2, ..., r. 6= ∅, we callProblem (1.1) constrained one. We always assume D is a compact subset in Rn.
The following notations will be used in this paper.ν(Ω): the Lebesgue measure of the set Ω, Ω ⊂ Rn.c∗: the global optimal value, i.e., c∗ = min
x∈Df(x);
c∗p: the global optimal value of penalty function F (x, a), i.e., c∗p = minx∈Rn
F (x, a).
Hc : Hc = x ∈ Rn : f(x) ≤ c, c ∈ R; Hpc : Hp
c = x|x ∈ Rn : F (x, a) ≤ c.H∗: the global optimal solutions set, i.e., H∗ = x ∈ D : f(x) = c∗.H∗
p : the global optimal solutions set of penalty function F (x, a) ,i.e., H∗
p = x ∈ Rn : F (x, a) ≤ c∗p.c∗ε : the ε−optimal value. H∗
ε : the ε−optimal solutions set.
Throughout this paper, we assume that
R). D is a robust set ( for this definition, one can refer to [3] ), i.e., cl (int D) =cl (D), and moreover, when c > c∗, 0 < ν(Hc ∩D) ≤ U < +∞.
I). f is lower Lipschitzian, and f and gi (i = 1, 2, ..., r) are finite integrableon Rn .
Global optimization problem (1.1) arises in a wide range of applications, suchas finance, allocation and location, operations research, structural optimization,engineering design and control, etc.. This problem attracts a number of specialistsin various fields, and they provided various methods to solve it. These methodscan be classified into two class: deterministic method and stochastic method. Thedeterministic method exploits analytical properties of Problem (1.1) to generatea deterministic sequence of points which converges to a global optimal solution.These properties generally include differentiability, convexity and monotonicity. Filled function method and tunnelling function method [1] are classical deter-ministic methods. The stochastic method generates a sequence of points whichconverges to a global optimal solution in probability [2, 24]. For examples, ge-netic algorithm, tabu search and simulated annealing algorithm, ect.. Combiningthe deterministic method and the stochastic method, we are possible to get amethod with better performance: We possibly overcomes some drawbacks of de-terministic method (e.g., we have to use analytical properties of problem but weknow them little, or we only get a local solution and we can not confirm whetheris it a global solution.). And we possibly overcomes some drawbacks of somestochastic methods (e.g., some stochastic methods do not converge to the globaloptimization with probability one [12]).
A representative of this combinatorial methods is the integral level-set method
2
(ILSM), which was first proposed by Chew and Quan [3]. In the integral level-setmethod, the authors construct the following two sequences:
ck+1 =1
ν(Hck)
∫
Hck
f(x)dν (1.2)
Hck= x ∈ D : f(x) ≤ ck (1.3)
where ν is Lebesgue measure. The stoping criterion of ILSM method is:
v(ck) =1
ν(Hck)
∫
Hck
(ck − f(x))2dν = 0. (1.4)
Under some suitable conditions, the sequences ck and Hck converge to op-
timal value and optimal solutions set, respectively. However ILSM method hasa drawback that the level-set Hck
in (1.3) is difficult to be determined. Soin the implementable algorithm of ILSM method, level-set Hck
is approximatelydetermined by using Monte-Carlo method. But to do in this way leads to anotherdrawback: Convergence of the implementable algorithm can not be proved untilnow. Wu and Tian [4, 5] gave some modifications of the implementable ILSMmethod.
Phu and Hoffmann [17] gave a method for finding the essential supremum ofsummable function. They also proposed a accelerate algorithm in [20] and theyimplemented their conceptual algorithm by using Monte-Carlo method. Conver-gence of the implementable algorithm of Phu and Hoffmanns’ method has notbeen proved too. Wu and Yu [18] used the main ideas of Phu and Hoffmanns’method to solve global optimization problem with box-constraints. Peng andWu [19] extended the method of [18] to solve global optimization problem withgeneral functional constrains. In [19], the authors used the uniform distributionin number theory to implement the conceptual algorithm and prove convergenceof the implementable algorithm. But as well known, it is low efficient by usingthe uniform distribution in number theory in calculating integral, though it ispolynomial-time with respect to the dimensions of the integrand function.
In this paper, we define the variance function v(c) [see (2.1)] and the meandeviation function m(c) [see (2.2)]. We verify these functions have some goodproperties for global optimization term, such as convexity, monotonicity anddifferentiability. Based on solving variance equation v(c) = 0 by Newton method,we propose the level-value estimation method (LVEM) for solving the globaloptimization problem (1.1).
To implement LVEM algorithm, we calculate integral of v(c) and m(c) byusing Monte-Carlo method with importance sampling. We introduce importancesampling [6, 7], and the updating mechanism of sample distribution based on themain idea of the cross-entropy method [8, 9, 10]. Then we propose implementableILVEM algorithm. By verifying ILVEM algorithm satisfies the convergent con-ditions of inexact Newton method [11] (We modify it to one-dimensional inexactNewton method), we get convergence of ILVEM algorithm.
3
We use discontinuous exact penalty function [4] to invert the problem withfunctional constraints into unconstrained one, and prove they are equivalent toeach other.
This paper is organized as follows. We define the variance function and themean deviation function, and study their properties in section 2. In section 3,based on solving variance equation v(c) = 0 by Newton method, we propose aconceptual LVEM algorithm. And we prove convergence of LVEM algorithm.In section 4, we introduce importance sampling, and the updating mechanismof sample distribution based on the main idea of the cross-entropy method. Insection 5, we propose implementable ILVAM algorithm and prove its convergence.Some numerical results are given in section 6. Finally in section 7, we give ourconclusions.
2 The variance function and the mean deviation
function
We always assume Hc ∩D 6= ∅ when c ≥ c∗. Moreover, when c > c∗, we assumex ∈ D : f(x) < c 6= ∅ and ν(Hc ∩ D) > 0. By assumption R), we know thatν(Hc ∩D) ≤ U < +∞.
We define the variance function as follows:
v(c) =
∫
Hc∩D
(c− f(x))2dν, (2.1)
and define the mean deviation function as follows:
m(c) =
∫
Hc∩D
(c− f(x))dν, (2.2)
where the integration is respect to x on the region Hc ∩D.
Theorem 2.1 For all c ∈ (−∞, +∞), the mean deviation function m(c) isLipschitz-continuous, non-negative, monotonous non-decreasing, convex and al-most everywhere has the derivative
m′(c) = ν(Hc ∩D).
Proof: It is similar to Theorem 2.1 in [17] or [20], and we omit it here.
Corollary 2.1 The mean deviation function m(c) is strictly increasing in (c∗, +∞).
Proof: By direct computing, we can get this assertion.
Theorem 2.2 The variance function v(c) is Lipschitz-continuously differentiableon (−∞, +∞), and
v′(c) = 2m(c). (2.3)
Proof: First we assume ∆c > 0 in calculating v′(c) by using its definition. When∆c < 0, we can get the same assertion in the same way.
4
Note when ∆c > 0, we have Hc ⊂ Hc+∆c. And f is finite integral, and
Hc+∆c ∩D = (Hc+∆c \Hc ∪Hc) ∩D = (Hc+∆c \Hc ∩D) ∪ (Hc ∩D).
Hence we have
v′(c) = lim∆c→0
1
∆c[v(c + ∆c)− v(c)]
= lim∆c→0
1
∆c[
∫
Hc+∆c∩D
(c + ∆c− f(x))2dν −∫
Hc∩D
(c− f(x))2dν]
= lim∆c→0
1
∆c[
∫
(Hc+∆c\Hc)∩D
(c + ∆c− f(x))2dν
+
∫
Hc∩D
(c + ∆c− f(x))2dν −∫
Hc∩D
(c− f(x))2dν]
= lim∆c→0
1
∆c
∫
(Hc+∆c\Hc)∩D
(c + ∆c− f(x))2dν
+ lim∆c→0
1
∆c
∫
Hc∩D
∆c(2c + ∆c− 2f(x))dν.
When x ∈ (Hc+∆c \ Hc) ∩ D we have c ≤ f(x) ≤ c + ∆c, which implies0 ≤ (c + ∆c− f(x))2 ≤ ∆c2. Thus
0 ≤ lim∆c→0
1
∆c
∫
(Hc+∆c\Hc)∩D
(c + ∆c− f(x))2dν
≤ lim∆c→0
1
∆c
∫
(Hc+∆c\Hc)∩D
(∆c)2dν
= lim∆c→0
∆c ν(Hc+∆c \Hc) ∩D) = 0,
and
lim∆c→0
1
∆c
∫
Hc∩D
∆c(2c + ∆c− 2f(x))dν
= lim∆c→0
∫
Hc∩D
(2c + ∆c− 2f(x))dν
=
∫
Hc∩D
[ lim∆c→0
(2c + ∆c− 2f(x))]dν
=
∫
Hc∩D
(2c− 2f(x))dν = 2m(c).
In the third line, since 2c + ∆c− 2f(x) > 0 when x ∈ Hc ∩D, we can exchangeintegral and limit operators. Therefore we have v′(c) = 2m(c).
On the other hand, by Theorem 2.1 we know that m(c) is Lipschitz-continuouson (−∞, +∞). Thus v′(c) = 2m(c) is Lipschitz-continuous on (−∞, +∞). Thiscompletes the proof of Theorem 2.2.
5
Theorem 2.3. The variance function v(c) has the following properties:
1). When c ∈ (−∞, +∞), it is monotonic non-decreasing; and when c ∈(c∗, +∞), it is strictly increasing.
2). It is a convex function on (−∞, +∞) and strictly convex in (c∗, +∞).
3). v(c) = 0 if and only if c ≤ c∗.Proof: The assertions 1), 2) follow from directly Theorem 2.2 and the mono-tonicity of m(c). The proof of assertion 3) follows from Theorem 2.9 in [3]. SinceHc ∩ D = ∅ and ν(∅) = 0 while c < c∗, the slight generalization for c < c∗ istrivial by the definition of v(c).
Corollary 2.3 m(c) = 0 if and only if c ≤ c∗.We already assumed that D = x ∈ Rn : gi(x) ≤ 0, i = 1, 2, ..., r. 6= ∅ is a
compact subset of Rn. We then define discontinuous exact penalty function asfollows:
F (x, a) = f(x) + ap(x), a > 1.0 (2.4)
where
p(x) =
0, x ∈ Dϑ + d(x), x /∈ D
,ϑ > 0 (2.5)
d(x) =r∑
i=1
max(0, gi(x)).
And we set Hpc = x|x ∈ Rn : F (x, a) ≤ c.
Clearly, by assumption I), when 1 < a < B1 < +∞, 0 < ϑ < B2 < +∞,F (x, a) is finite integral.
Furthermore, let
vp(c) =
∫
Hpc
(c− F (x, a))2dν, (2.6)
mp(c) =
∫
Hpc
(c− F (x, a))dν. (2.7)
Then we have the following theorem:
Theorem 2.4 When a > 0 is big enough, we have
vp(c) = v(c). (2.8)
Proof: Note that d(x) ≥ 0 and for any a > 0, ap(x) ≥ 0. Hence F (x, a) ≤ cimplies f(x) ≤ c. Thus we get Hp
c ⊇ Hc ∩D = x ∈ D : f(x) ≤ c.
6
For ∀ε > 0, we have
|vp(c)− v(c)|= |
∫
Hpc
(c− F (x, a))2dν −∫
(Hc∩D)
(c− f(x))2dν|
= |∫
Hpc
(c− F (x, a))2dν −∫
Hc∩D
(c− f(x))2dν|
= |∫
Hpc \(Hc∩D)
(c− F (x, a))2dν +
∫
Hc∩D
((c− F (x, a))2dν −∫
Hc∩D
(c− f(x))2)dν|
= |∫
Hpc \(Hc∩D)
(c− F (x, a))2dν +
∫
Hc∩D
((c− F (x, a))2 − (c− f(x))2)dν|
≤ |∫
Hpc \(Hc∩D)
(c− F (x, a))2dν|+ |∫
Hc∩D
(−ap(x))(2c− 2f(x)− ap(x))dν|= A1 + A2.
When x ∈ Hc ∩D, by using (2.5) we get p(x) = 0, which implies A2 = 0.
On the other hand, note that if ∀x ∈ Hpc \ (Hc ∩D), then x ∈ Hp
c . It followsthat if x ∈ Hc, then we know x /∈ D. Thus we get c − F (x, a) ≥ 0 and p(x) =
ϑ + d(x) > 0 for d(x) ≥ 0, ϑ > 0. For any fixed c, when c−f(x)−εp(x)
< a ≤ c−f(x)p(x)
, we
have |c− F (x, a)| = |c− f(x)− ap(x)| < ε. Then we have
A1 = |∫
Hpc \(Hc∩D)
(c− F (x, a))2dν|
< |∫
Hpc \(Hc∩D)
ε2dν|
= ν(Hpc \ (Hc ∩D))ε2
Therefore A1 → 0 when ε → 0. But when a > c−f(x)p(x)
, Hpc is empty which implies
that ν(Hpc \ (Hc ∩D)) = 0, and it follows A1 = 0. Thus we have A1 = 0.
Thus by summarizing, for a given x, when a > 0 is big enough we havevp(c) = v(c).
Similarly, we have
Corollary 2.5 When a > 0 is big enough, we get
mp(c) = m(c) (2.9)
v′p(c) = v′(c) (2.10)
The following corollary follows from Theorem 2.4 and Corollary 2.5 directly.
Corollary 2.6 When a > 0 is large enough, the functions mp(c) and vp(c) areconvex, monotonic and differentiable while c ∈ (−∞, +∞), which are as same asthe functions m(c) and v(c).
7
3 Algorithm LVEM and convergence
Following from the discussed properties of v(c) and m(c) in the previous section,we know that c∗ = min
x∈Df(x) is the largest root of variance equation vp(c) = 0,
and we can solve this equation by Newton method. Based on this, we proposethe Level Value Estimation Method for solving global optimization problem (1.1)conceptually.
Algorithm 3.1: LVEM Algorithm
step 1. Let ε > 0 be a small number. Give a point x0 ∈ Rn, calculate c0 =F (x0, a) and set k := 0.
step 2. Calculate vp(ck) and mp(ck) which are defined in the following:
vp(ck) =
∫
Hpck
(ck − F (x, a))2dν, (3.1)
mp(ck) =
∫
Hpck
(ck − F (x, a))dν, (3.2)
whereHp
ck= x ∈ Rn : F (x, a) ≤ ck. (3.3)
Let
λk =vp(ck)
2mp(ck). (3.4)
step 3. If λk < ε, goto next step; else let
ck+1 = ck − λk, (3.5)
and let k := k + 1, return to step 2.
step 4. Let cεp = ck and Hp
ε = Hpck
= x ∈ Rn : F (x, a) ≤ ck, where cεp is the
approximal global optimal value, and Hpε is the set of approximal global optimal
solutions.
LVEM algorithm is based on constrained problem, and it is clearly applicableto unconstrained problem.
On convergence of LVEM method, we have
Theorem 3.1 The sequences ck generated by LVEM method is monotonicallydecreasing. Set c∗p = min
x∈RnF (x, a) and H∗
p = x ∈ Rn : F (x, a) = c∗p. Then
limk→∞
ck = c∗p, (3.6)
andlimk→∞
Hpck
= H∗p . (3.7)
8
Proof: By the properties of vp(c), we know that c∗p is the largest root of thevariance equation vp(c) = 0. To solve this equation by Newton method, we have
ck+1 = ck − vp(ck)
v′p(ck)(3.8)
substitute v′p(ck) = 2mp(ck) into (3.8), we get the iterative formula (3.5). By (3.4)we know that λk ≥ 0, and then the sequence ck generated by LVEM methodis monotonically decreasing. By convergence of Newton method, when k → ∞,ck converges to the root of vp(c) = 0.
On the other hand, by Theorem 2.3 and Theorem 2.4, for all c (c ≤ c∗p) is theroot of equation vp(c) = 0. Thus we only need to prove ck ≥ c∗p, k = 0, 1, 2, ....We can prove this assertion by mathematical induction.
It is clear that c0 = F (x0, a) ≥ c∗p. If c0 = c∗p, then vp(c0) = 0, and theprocess stops at step 3. Else if c0 > c∗p, then vp(c0) > 0 and the process goes on.Assume ck ≥ c∗p, we have two cases: If ck = c∗p, then vp(ck) = 0, and the processstops; Else if ck > c∗p, since vp(c) is convex in (−∞, +∞) and c∗p, ck ∈ (−∞, +∞),then by Theorem 3.3.3 in page 89 in [16], we have
vp(c∗p)− vp(ck) ≥ v′p(ck)(c
∗p − ck). (3.9)
since vp(c∗p) = 0 and v′p(ck) = 2mp(ck) > 0 while ck > c∗p, the inequality (3.9) is
equivalent to−vp(ck) ≥ v′p(ck)(c
∗p − ck).
That is to say
c∗p − ck ≤ −vp(ck)
v′p(ck)= −λk.
By this inequality and the iterative formula (3.5), we have
ck+1 = ck − λk ≥ c∗p.
By mathematical induction, we get ck ≥ c∗p, k = 0, 1, 2, ....
Thus we can conclude that:
limk→∞
ck = c∗p.
To complete the proof, by the definition of Hpck
and the Property 2 in [4],when ck → c∗p we have
limk→∞
Hpck
= H∗p .
Theorem 3.2 Set c∗ = minx∈D
f(x), H∗ = x ∈ D : f(x) = c∗ and c∗p =
minx∈Rn
F (x, a), H∗p = x ∈ Rn : F (x, a) = c∗p respectively. When a is big enough,
we havec∗p = c∗, H∗
p = H∗. (3.10)
9
Proof : By the definition of discontinuous exact penalty function (2.4), for ϑ > 0and a > 0 big enough, F (x, a) attains its minimum over Rn only if p(x) = 0.Thus
c∗p = minx∈Rn
F (x, a)
= minx∈Rn
f(x) + ap(x)
= minx∈D
f(x)
= c∗
and the third equation holds since p(x) = 0 if and only if x ∈ D.
Next we prove H∗p = H∗. For ∀x ∈ H∗
p , we have F (x, a) = c∗p = c∗. SinceF (x, a) attains its minimum only if p(x) = 0, but p(x) = 0 if and only if x ∈ D,thus we have x ∈ D. Therefore, we have F (x, a) = f(x) + ap(x) = f(x) = c∗ andx ∈ D, i.e., x ∈ H∗. This follows H∗
p ⊆ H∗. Inversely, if x ∈ H∗, then we havex ∈ D and f(x) = c∗ = c∗p. Thus p(x) = 0, and F (x, a) = f(x)+ ap(x) = f(x) =c∗ = c∗p, which follows that x ∈ H∗
p . This implies H∗ ⊆ H∗p . By summarizing, we
have H∗p = H∗.
4 Importance sampling and updating mechanism
of sampling distribution
To implement LVEM algorithm, we need to calculate the integrals of vp(ck),mp(ck) in (3.1) and (3.2). We do this by using Monte-Carlo method based onimportance sampling [6]. And we use the main ideas of the cross-entropy methodto update parameters of probability density function of sample distribution. Thisidea is different from the methods in literature [3, 4, 5]. So in this section weshall introduce importance sampling and updating mechanism of sampling dis-tributions based on the main ideas of the cross-entropy method.
In this section, we assume that Ψ(x) is an arbitrary integrable function withrespect to measure ν, where Ψ is the integrand function in the formula (4.2).To calculate integral by using Monte Carlo method, we use importance samplingwith probability density function (pdf ) g(x) = g(x; v), where x = (x1, ..., xn)and v is parameter (or parameter vector). For example, for given random sam-ples X1, X2, ..., XN independent and identically distributed (iid) from distributionwith density g(x) on the region X , we use
Ig =1
N
N∑i=1
Ψ(Xi)
g(Xi)(4.1)
as an estimator of the integral:
I =
∫
XΨ(x)dν =
∫
X
Ψ(x)
g(x)g(x)dν. (4.2)
10
It is easy to show this estimator is unbiased. Furthermore the variance of thisestimator is:
var(Ig) =1
N
∫
X
(Ψ(x)
g(x)− Eg[
Ψ(X)
g(X)]
)2
g(x)dν. (4.3)
Moreover, if [Ψ(x)
g(x)]2 has a finite expectation with respect to g(x), then the vari-
ance (4.3) can be also estimated from the sample (X1, X2, ..., XN) by the followingformula:
var(Ig) =1
N2
N∑j=1
(Ψ(Xj)
g(Xj)− Ig
)2
= O(1
N), when N → +∞. (4.4)
See, for example, p83-84 and p92 Definition 3.9 in [26].
We get two useful assertions as follows. See, for example, Theorem 3.12 in[26], p94-95.
Assertion 4.1: If the expectation w.r.t g(x)
Eg
[(Ψ(X)
g(X)
)2]
=
∫
X
Ψ2(x)
g(x)dν < +∞,
then (4.1) converges almost surely to (4.2), that is
Pr
lim
N→+∞1
N
N∑i=1
Ψ(Xi)
g(Xi)=
∫
X
Ψ(x)
g(x)g(x)dν
= 1. (4.5)
Remark 4.1. We have many approaches to get the estimation of sample sizeN in Monte Carlo integration, see [29]. We relax (4.5) into the following form:
Pr
∣∣∣∣∣1
N
N∑i=1
Ψ(Xi)
g(Xi)−
∫
X
Ψ(x)
g(x)g(x)dν
∣∣∣∣∣ < ε
≥ 1− δ. (4.6)
In the one-dimensional case (when x ∈ R), by using the large deviation rate
function, the simplest estimation of sample size N is N >− ln δ
2ε2. In the worst
case, since we always get a sample X in Rn in component-independence style, acoarse estimator of sample size N which satisfies (4.6) (when x ∈ Rn) is
N >
(− ln δ
2ε2
)n
. (4.7)
Take limit with respect to ε → 0 and δ → 0, we get (4.5) from (4.6) directly.
Assertion 4.2: The choice of g minimizes the variance of the estimator (4.1)is as the following:
g∗(x) =|Ψ(x)|∫
X |Ψ(x)|dν.
11
But in this case, g∗(x) is respect to∫X |Ψ(x)|dν. So in practice, we preassign
a family pdf g(x; v), and choose proper parameter (or parameter vector) v tomake g(x; v) approximate to g∗(x). The cross-entropy method gives an idea forchoosing this parameter (or parameter vector) v.
The cross-entropy method was developed firstly by Rubinstein [10] to solvecontinuous multi-extremal and various discrete combinatorial optimization prob-lems. The method is an iterative stochastic procedure making use of the impor-tance sampling technique. The stochastic mechanism changes from iteration toiteration according to minimizing the Kullback-Leibler cross-entropy approach.Recently, under the assumption that the problem under consideration has onlyone global optimal point on its region, convergence of CE method is given in [30].
Let g(x; v) be the n-dimensional coordinate-normal density which is denotedby N(x; µ, σ2) and is defined as the following:
N(x; µ, σ2) =1
(√
2π)n∏n
i=1 σi
exp(−n∑
i=1
(xi − µi)2
2σ2i
), (4.8)
with component-independent mean vector µ = (µ1, µ2, ..., µn) and variance vectorσ = (σ1, σ2, ..., σn).
Then based on the main ideas of the cross-entropy method, for estimating theintegral (4.2) by using the estimator (4.1), the updating mechanism of samplingdistribution can be described as follows:
Updating algorithm of sampling distribution with pdf N(x; µ, σ2)(smoothing version [9] with a slight modification)
For given T samples X1, X2, ..., XT generated iid from distribution withpdf N(x; µ, σ2), we get the new pdf N(x; µ, σ2) by using the following procedure:
Step 1. Compute Ψ(Xi), i = 1, 2, ..., T , select the T elite best samples (correspond-ing the T elite least value Ψ(Xi)), construct the elite samples set, denote it by
Xe1 , X
e2 , ..., X
eT elite (4.9)
Step 2. Update the parameters (µ, σ2) by using the elite samples set (4.9), by thefollowing formulas:
µj :=1
T elite
T elite∑i=1
Xij, j = 1, 2, ..., n, (4.10)
and
σ2j :=
1
T elite
T elite∑i=1
(Xij − µj)2, j = 1, 2, ..., n. (4.11)
Step 3. Smooth:
µ = αµ + (1− α)µ, σ2j = βσ2
j + (1− β)σ2j , j = 1, 2, ..., n. (4.12)
where 0.5 < α < 0.9, 0.8 < β < 0.99, 5 ≤ q ≤ 10.
12
5 Implementable algorithm ILVEM and conver-
gence
We first introduce the following function:
ψ(x, c) =
c− F (x, a), x ∈ Hp
c
0, x /∈ Hpc ,
(5.1)
where Hpc = x ∈ Rn : F (x, a) ≤ c. Then it is clear that
ψ2(x, c) =
(c− F (x, a))2, x ∈ Hp
c ,0, x /∈ Hp
c .(5.2)
Then we rewrite vp(c) and mp(c) by the following formulas:
vp(c) =
∫
Hpc
ψ2(x, c)dν =
∫
Rn
ψ2(x, c)
g(x)dG(x) = Eg
[ψ2(X, c)
g(X)
], (5.3)
mp(c) =
∫
Hpc
ψ(x, c)dν =
∫
Rn
ψ(x, c)
g(x)dG(x) = Eg
[ψ(X, c)
g(X)
], (5.4)
where X is a stochastic vector in Rn iid from distribution with density functiong(x), dG(x) = g(x)dν.
We use
vp(c) =1
N
N∑i=1
ψ2(Xi, c)
g(Xi), (5.5)
and
mp(c) =1
N
N∑i=1
ψ(Xi, c)
g(Xi), (5.6)
as the unbiased estimators of vp(c) and mp(c), respectively.
We choose the density function family g(x) = N(x, µ, σ2), which is the n-dimensional coordinate-normal density defined by (4.8). And we update its pa-rameter vectors µ and σ (from µk, σk to µk+1, σk+1) at the k − th iteration byusing the updating mechanism which is introduced in the previous section, suchthat the density function g(x) is as similar as possible to the integrand functionψ2(x, c) (and ψ(x, c)) in its shape.
We propose the implementable algorithm of LVEM in follows:
Algorithm 5.1. ILVEM algorithm
Step 1. Let ε > 0 be a small number. Let ε0 > 0 and δ > 0 be also small numbers.Given a point x0 ∈ Rn and calculate c0 = F (x0, a), choose the initiative parametervectors µ0 = (µ01, ..., µ0n) and σ2
0 = (σ201, ..., σ
20n), and let 0 < ρ < 0.1. Let k := 0.
13
Step 2. Generate N =
[(− ln δ
2ε2k
)n]+ 1 (where εk > 0, we shall discuss this
parameter in Remark 5.3 in the later) samples Xk = Xk1 , Xk
2 , ..., XkN in Rn iid
from distribution with density g(x) = N(x; µk, σ2k). Calculate the estimators of
integral by using (5.5) and (5.6), and let
λk =v(ck)
2m(ck). (5.7)
Step 3. If λk < ε, goto step 7; else goto next.
Step 4. Compute ψ(Xki ), i = 1, 2, ..., N , select the N elite = [ρN ] best samples
(corresponding the N elite least value ψ(Xki )), construct the elite samples set,
denote it byHp
ck= Xk
1 , Xk2 , ..., Xk
Nelite (5.8)
Step 5. Update parameter vectors (µk, σ2k) by using the following formulas:
µk+1,j :=1
N elite
Nelite∑i=1
Xkij, j = 1, 2, ..., n, (5.9)
and
σ2k+1,j :=
1
N elite
Nelite∑i=1
(Xkij − µk+1,j)
2, j = 1, 2, ..., n. (5.10)
where Xki ∈ Hp
ck.
Smooth these parameters by
µk+1 = αµk+1 + (1− α)µk, σ2k+1 = βkσ
2k+1 + (1− βk)σ
2k. (5.11)
where
βk = β − β(1− 1
k)q k = 1, 2, ... (5.12)
0.5 < α < 0.9, 0.8 < β < 0.99, 5 ≤ q ≤ 10.
Step 6. Letck+1 = ck − λk. (5.13)
Let k := k + 1, goto step 2.
Step 7. Let H∗ε = Hp
ck= Xk
1 , Xk2 , ..., Xk
Nelite be the approximation of globaloptimal solutions set, and c∗ε = ck+1 be the approximation of global optimalvalue.
Remark 5.1. According to Remark 4.1, given a small number δ > 0 (for exampleδ = 0.05%), and given the precision εk, we can get the sample size N , which is
used in step 2 of the kth iteration, such that the estimators of integral satisfies theεk−precision condition with high probability (for example, not less than 99.95%).
14
The parameter εk in the formula of the sample size N will be discussed in thelater, see Remark 5.3.
Remark 5.2. According to the analysis of Theorem 2.4, we set the penalty
parameter a = ak >c0
ϑ>
ck
ϑ(∀k = 1, 2, ...) in computing F (x, a) function value
in step 2 of algorithm ILVEM, where ϑ > 0 is refer as to jump. This settingensures F (x, a) matching the characters of discontinuous exact penalty functionwhich are discussed in [3]. The initiative estimator of a = a0 is a large positivereal number, for example, a0 = 1.0e + 6. Indeed, numerical experiments showthat it is not sensitive in the parameter a in ILVEM algorithm.
We are now to prove convergence of ILVEM algorithm. We will do thisby showing ILVEM algorithm satisfies the convergent conditions of the one-dimensional inexact Newton method.
Without loss of general, we assume that f(x) and gi(x) (i = 1, 2, ..., r) inProblem (1.1) satisfy some suitable conditions, such that when functions ψ(x)defined by (5.1) and ψ2(x) defined by (5.2), both functions ψ2(x)/g(x) andψ4(x)/g(x) have finite expectation with respect to density function g(x).
Lemma 5.1 For a given ε > 0 and a given c > c∗, if function ψ2(x)/g(x) hasfinite integral in Rn, then we have
Pr limN→+∞
|mp(c)−mp(c)| < ε = 1. (5.14)
Proof: It is direct consequence of Assertion 4.1 in the previous section.
Lemma 5.2∂ψ2(x, c)
∂c= 2ψ(x, c). (5.15)
Proof: Set ∆c > 0, then we know Hpc+∆c ⊃ Hp
c , and Hpc+∆c = Hp
c ∪ (Hpc+∆c \
Hpc ), Hp
c ∩ (Hpc+∆c \Hp
c ) = ∅. Thus we have
ψ2(x, c) =
(c− F (x, a))2, x ∈ Hpc
0, x ∈ Hpc+∆c \Hp
c
0, x /∈ Hpc+∆c
and
ψ2(x, c + ∆c) =
(c + ∆c− F (x, a))2, x ∈ Hpc
(c + ∆c− F (x, a))2, x ∈ Hpc+∆c \Hp
c
0, x /∈ Hpc+∆c
15
Therefore,
ψ2(x, c + ∆c)− ψ2(x, c)
=
(c + ∆c− F (x, a))2 − (c− F (x, a))2, x ∈ Hpc
(c + ∆c− F (x, a))2, x ∈ Hpc+∆c \Hp
c
0, x /∈ Hpc+∆c
=
∆c(2c + ∆c− 2F (x, a)), x ∈ Hpc
(c + ∆c− F (x, a))2, x ∈ Hpc+∆c \Hp
c
0, x /∈ Hpc+∆c
It is obvious that
lim∆c→0+
Hpc+∆c = Hp
c and lim∆c→0+
Hpc+∆c \Hp
c = ∅.
Thus we have
lim∆c→0+
ψ2(x, c + ∆c)− ψ2(x, c)
∆c
=
lim∆c→0+
∆c(2c + ∆c− 2F (x, a))
∆c, x ∈ Hp
c
0, x /∈ Hpc
=
2(c− F (x, a)), x ∈ Hp
c
0, x /∈ Hpc
= 2ψ(x, c).
When ∆c < 0, by Hpc+∆c ⊂ Hp
c , we can prove in the same way that
lim∆c→0−
ψ2(x, c + ∆c)− ψ2(x, c)
∆c= 2ψ(x, c).
By summarizing, we have
∂ψ2(x, c)
∂c= lim
∆c→0
ψ2(x, c + ∆c)− ψ2(x, c)
∆c= 2ψ(x, c).
Lemma 5.3 For all c > c∗ and a given ε > 0, if |mp(c) −mp(c)| < ε, then wehave
|vp(c)− vp(c)| < 2ε(c− c∗). (5.16)
Proof: Firstly we assume that mp(c) ≥ mp(c) which leads to vp(c) ≥ vp(c)obviously. (The case mp(c) ≤ mp(c) can be proved in the same way.) Notice that
16
mp(c∗) = mp(c
∗) = 0 and |mp(c)−mp(c)| < ε, by Lemma 5.2 we have
|vp(c)− vp(c)| =1
N
N∑i=1
ψ2(Xi, c)
g(Xi)−
∫
Hpc
ψ2(x, c)dν
=1
N
N∑i=1
∫ c
c∗ 2ψ(Xi, c)dc
g(Xi)−
∫
Hpc
[∫ c
c∗2ψ(x, c)dc
]dν
=
∫ c
c∗
[1
N
N∑i=1
2ψ(Xi, c)
g(Xi)
]dc−
∫ c
c∗
[∫
Hpc
2ψ(x, c)dν
]dc
=
∫ c
c∗
[1
N
N∑i=1
2ψ(Xi, c)
g(Xi)−
∫
Hpc
2ψ(x, c)dν
]dc
=
∫ c
c∗2 [mp(c)−mp(c)] dc
< 2ε(c− c∗)
This completes the proof.
It is obvious that functions c 7→ Ψ(x, c) and c 7→ Ψ2(x, c) are increasing con-vex functions. When c > F (x, a), the function c 7→ Ψ2(x, c) is strictly increasingand strictly convex. Since vp(c) and mp(c) are finite positive combinations ofΨ(x, c) and Ψ2(x, c), they have the same monotonicity and convexity propertiesfor all c ∈ (−∞, +∞).
Lemma 5.4 For a given c > c∗ and 0 < η < 1, there exists 0 < η2+η
< θ < η2−η
<1, if
ε ≤ min
τ1mp(c), τ2
vp(c)
2(c− c∗)|0 < τ1, τ2 < θ
, (5.17)
and |mp(c)−mp(c)| < ε, then we have
∣∣∣∣1−mp(c)vp(c)
mp(c)vp(c)
∣∣∣∣ < η < 1. (5.18)
Proof: If |mp(c)−mp(c)| < ε ≤ τ1mp(c), then we have
∣∣∣∣mp(c)−mp(c)
mp(c)
∣∣∣∣ < τ1,
which implies
1− τ1 <mp(c)
mp(c)< 1 + τ1.
The last inequality is equal to
1
1 + τ1
<mp(c)
mp(c)<
1
1− τ1
. (5.19)
17
On the other hand, when |mp(c)−mp(c)| < ε ≤ τ2vp(c)
2(c−c∗) , by Lemma 5.3 we have
∣∣∣∣vp(c)− vp(c)
vp(c)
∣∣∣∣ <2c(c− c∗)
vp(c)< τ2,
which implies
1− τ2 <vp(c)
vp(c)< 1 + τ2. (5.20)
From (5.19) and (5.20), we get
1− θ
1 + θ<
1− τ2
1 + τ1
<mp(c)vp(c)
mp(c)vp(c)<
1 + τ2
1− τ1
<1 + θ
1− θ. (5.21)
Let1− θ
1 + θ> 1− η,
1 + θ
1− θ< 1 + η.
By 0 < η < 1 we have
0 <η
2 + η< θ <
η
2− η< 1. (5.22)
Since θ satisfies (5.22) and ε is defined by (5.17), when |mp(c) −mp(c)| < ε wehave
1− η <mp(c)vp(c)
mp(c)vp(c)< 1 + η,
which implies (5.18) and this completes the proof.
On convergence of the one-dimensional inexact Newton method for solvingsingle variable equation h(x) = 0, we have the following lemma:
Lemma 5.5 Let h : R → R be a continuously differentiable function, xk is asequence generated by the one-dimensional inexact Newton method for solvingthe equation h(x) = 0. This is:
xk+1 = xk + sk, where sk is satisfied that h′(xk)sk = −h(xk) + rk.
Suppose the following conditions are satisfied:
a). There exists x∗ such that h(x∗) = 0 and h′(x∗) = 0;
b). The function h(x) is convex, and it has continuous derivative in a neighbor(x∗ − δ, x∗ + δ), ∀δ > 0;
c). When xt is in a neighbor of x∗ and xt 6= x∗, it has h′(xt) 6= 0. And
limxt→x∗
h(xt)
h′(xt)= 0. (5.23)
d). There exists 0 < η < 1 and a sequence ηk : 0 < ηk < η, such that|rk| < ηkh(xk), k = 0, 1, 2, ....
18
Then the sequence xk converges to x∗.Proof: We get from condition d),
|sk| =
∣∣∣∣−h(xk) + rk
h′(xk)
∣∣∣∣
≤∣∣∣∣h(xk)
h′(xk)
∣∣∣∣ +
∣∣∣∣rk
h′(xk)
∣∣∣∣
≤∣∣∣∣h(xk)
h′(xk)
∣∣∣∣ +
∣∣∣∣ηkh(xk)
h′(xk)
∣∣∣∣
≤ (1 + η)
∣∣∣∣h(xk)
h′(xk)
∣∣∣∣ . (5.24)
By using Taylor’s theorem, we get
h(xk+1) = h(xk) + h′(xk)sk + O(|sk|2)
= h(xk)− (h(xk)− rk) + O(
∣∣∣∣h(xk)
h′(xk)
∣∣∣∣2
)
= rk + O(
∣∣∣∣h(xk)
h′(xk)
∣∣∣∣2
).
By taking norm on both sides of the above equation and recalling condition d),we have
|h(xk+1)| ≤ |rk|+ O(
∣∣∣∣h(xk)
h′(xk)
∣∣∣∣2
)
≤ ηk|h(xk)|+ O(
∣∣∣∣h(xk)
h′(xk)
∣∣∣∣2
)
≤ η|h(xk)|+ O(
∣∣∣∣h(xk)
h′(xk)
∣∣∣∣2
).
Take limit with respect to xk → x∗ on both sides of the previous inequality, usingcondition a) c) and d), we get
limxk→x∗
|h(xk+1)| = 0.
Thus by convexity of h(x) (condition b)), we know |xk+1−x∗| < γ|xk−x∗|, where0 < γ < 1. That is to say, sequence xk converge to x∗.Lemma 5.6
limc→c∗+
vp(c)
mp(c)= 0.
Proof: By using the mean value theorem, we get
vp(c)− vp(c∗) = v′p(ξ)(c− c∗), ξ ∈ (c∗, c). (5.25)
19
Note vp(c∗) = 0 and v′p(ξ) = 2mp(ξ), we have
limc→c∗+
vp(c)
mp(ξ)= lim
c→c∗+2(c− c∗) = 0. (5.26)
Note ξ ∈ (c∗, c), when c → c∗, ξ → c∗. Thus the assertion of Lemma 5.6 followsfrom (5.26) directly.
Theorem 5.7 Set the sequences ck and Hpck are generated by ILVEM algo-
rithm. Then the sequence ck converges to c∗p and the sequence Hpck converges
to H∗p = x ∈ Rn : F (x, a) = c∗p, respectively.
Proof: We first prove ck converges to the root of the variance equation vp(c) =0. We do this by verifying ILVEM algorithm satisfies the convergent conditionsof the one-dimensional inexact Newton method (Lemma 5.5).
At the kth iteration, the step size of LVEM algorithm is λk =vp(ck)
2mp(ck)(which
is Newton method for solving the equation vp(c) = 0 essentially), and the step size
of ILVEM algorithm is λk =vp(ck)
2mp(ck)(which is one-dimensional inexact Newton
method for solving the equation vp(c) = 0 essentially). By comparing these twostep sizes, we know the residual of ILVEM algorithm is that:
|rk| = 2mp(ck)|λk − λk|. (5.27)
Thus∣∣∣∣
rk
vp(ck)
∣∣∣∣ =∣∣∣λk − λk
∣∣∣ · 2mp(ck)
vp(ck)
=
∣∣∣∣vp(ck)
2mp(ck)− vp(ck)
2mp(ck)
∣∣∣∣ ·2mp(ck)
vp(ck)
=
∣∣∣∣1−mp(ck)
mp(ck)
vp(ck)
vp(ck)
∣∣∣∣
On the other hand, the density function of sample distribution g(x) = N(µk, σ2k)
and the level value ck are fixed at the kth iteration (which are given by the previousiteration), thus Assertion 4.1 and Remark 4.1 are applicable. Therefore, for given0 < η < 1 and η
2+η< θ < η
2−η, if
εk ≤ min
τ1mp(ck), τ2
vp(ck)
2(ck − c∗)|0 < τ1, τ2 < θ
, (5.28)
and the sample size is N =
[(− ln δ
2ε2k
)n]+ 1, by Remark 4.1 we have
Pr |mp(ck)−mp(ck)| < εk = 1− δ.
20
Take limit with respect to δ → 0, we have
Pr
lim
N→∞|mp(ck)−mp(ck)| < εk
= 1. (5.29)
Thus by Lemma 5.4, the following inequality holds with probability one,∣∣∣∣
rk
vp(ck)
∣∣∣∣ =
∣∣∣∣1−mp(ck)
mp(ck)
vp(ck)
vp(ck)
∣∣∣∣ < η < 1. (5.30)
The inequality (5.30) shows that ILVEM algorithm satisfies the convergent con-dition d) of the one-dimensional inexact Newton method (Lemma 5.5). Thus, byLemma 5.6 and using Lemma 5.5 again, we have that the sequence ck generatedby ILVEM algorithm converges to the root of the variance equation vp(c) = 0.
In addition, we know that c∗p = minx∈Rn
F (x, a) is the largest root of the variance
equation vp(c) = 0. Therefore we only need to prove ck ≥ c∗p (k = 0, 1, 2, ...) inthe next. We do this by using mathematical induction.
It is obvious that c0 = F (x0, a) ≥ c∗p. We assume ck ≥ c∗p. If ck = c∗p then
λk = 0, and the process stops at step 3. If ck > c∗p, by the convexity of vp(c) (seethe proof of Lemma 5.3) and vp(c
∗p) = 0, we have
−vp(ck) = vp(c∗p)− vp(ck) ≥ 2mp(ck)(c
∗p − ck),
which follows that
ck+1 = ck − λk = ck − vp(ck)
2mp(ck)≥ c∗p.
By using mathematical induction, we have
ck ≥ c∗p, k = 0, 1, 2, ....
By summarizing, we have that ck converges to c∗p. When ck converges to c∗p,by using the Property 2 in [4] we immediately have that Hp
ck converges to H∗
p .
Theorem 3.2 shows that, if a > 0 is big enough, then c∗p = c∗ and H∗p =
H∗. Thus by Theorem 5.7, we indeed have that the sequences ck and Hpck
generated by ILVEM algorithm converge to c∗ and H∗ respectively.
Remark 5.3. By Lemma 5.4, if and only if the estimator m(ck) satisfies theprecision condition (5.17), the control convergent condition d) of Lemma 5.5holds when it is used in the proof of Theorem 5.7. See (5.28), (5.29) and (5.30).On the other hand, in the step 2 of ILVEM algorithm, the sample size N dependson εk. And the parameter εk depends on ck, see (5.28). But in (5.28), the valuesof functions m(ck), v(ck) and c∗ are all unknown. At the k−th iteration, ck isunknown. Thus in practice, a suitable alternative of (5.28) is as the following:
εk ≤ min
τ1mp(ck−1), τ2
vp(ck−1)
2(ck−1 − c)|0 < τ1, τ2 < θ
, (5.31)
21
wherec = min
ψ(Xk−1
i ), i = 1, 2, ..., N, Xk−1i ∈ Xk−1
. (5.32)
By substituting (5.31–5.32) into N =
[(− ln δ
2ε2k
)n]+ 1 in step 2 of ILVEM
algorithm, we can get the sample size N at the k−th iteration.
6 Numerical results
We choose some classical global optimization problems to test the performanceof ILVEM algorithm. They are commonly multiple-extreme, multidimensionaland difficult to optimize by the other methods (especially local search methods).These benchmarks are chosen from [25, 27, 28] respectively. We classify theminto three classes:
Class A: Two-dimensional, multimodal or difficult to optimize functions.
Class B: Multidimensional, multimodal with huge number of local extremes.
Class C: Constrained problems.
We use ILVEM algorithm to solve these problems on a notebook personalcomputer with conditions: CPU 1.8 GHZ, 256 MB EMS memory; Matlab 6.5.The parameters in practice are set in the following: α = 0.8, β = 0.80 , q = 6.We compare the numerical results to the ones of computing by integral level-set method (abbreviating ILSM) and by the cross-entropy method (abbreviatingCE). We list the problems in the following, and state the numerical results intable 6.1 to table 6.3, respectively.
Remark 6.1. In our ILVEM algorithm, the accelerated method in Newton-stepintroduced in [20] is used.
Exam 1. [A, 29] Three-humps camel back function: In the search domain x1, x2 ∈[−5, 5] this function is defined as follows and has fmin(0, 0) = 0.
f(x) = 2x21 − 1.05x4
1 + x61/6 + x1x2 + x2
2.
Exam 2. [A, 29]Six-hump camel back function is a global optimization test func-tion. Within the bounded region it has six local minima, two of them are globalones. Test area is usually restricted to the rectangle x1 ∈ [−3, 3], x2 ∈ [−2, 2].Two global minima equal f(x∗) = −1.0316, x∗ = (0.0898,−0.7126), (−0.0898, 0.7126)
f(x) =
(4− 2.1x2
1 +x4
1
3
)x2
1 + x1x2 +(−4 + 4x2
2
)x2
2.
22
−5
0
5
−5
0
50
500
1000
1500
2000
2500
Fig 1. Three humps camel back function
−2−1
01
2
−2
−1
0
1
2−10
0
10
20
30
40
50
60
Fig 2. Six humps camel function
Exam 3. [A, 30]Modified Schaffer function 4: In the search domain xi ∈[−100, 100], i = 1, 2, fmin(0, 1.253132) = 0.292579.
f(x) = 0.5 +cos2 [sin[|x2
1 − x22|]]− 0.5
[1 + 0.001(x21 + x2
2)]2
.
Exam 4. [A, 30] Goldstein-Price function is a global optimization test function.Test area is usually restricted to the square xi ∈ [−2, 2], i = 1, 2. Its globalminimum equal f(x∗) = 3, x∗ = (0,−1).
f(x) =[1 + (x1 + x2 + 1)2 ∗ (19− 14x1 + 3x2
1 − 14x2 + 6x1x2 + 3x22)
]
× [30 + (2x1 − 3x2)
2 ∗ (18− 32x1 + 12x21 + 48x2 − 36x1x2 + 27x2
2)]
−10−5
05
10
−10
−5
0
5
100.2
0.4
0.6
0.8
1
Fig 3. Modified schaffer (4) function
−2−1
01
2
−2
−1
0
1
2−0.5
0
0.5
1
1.5
2
2.5
x 109
Fig 4. Goldstein−Price function
Table 6.1. Numerical Results on Class A
prob. ILVEM ILSM CEid iter. eval. error iter. eval. error iter. eval. error1 10 2020 1.08e-6 12 1278 1.03e-5 14 2800 4.02e-62 12 2424 1.02e-6 13 944 2.87e-5 37 7400 1.23e-53 11 2222 6.22e-5 7 10353 1.24e-5 498 99600 3.68e-74 18 3636 2.16e-5 13 980 3.80e-5 39 7800 3.43e-5
Remark 6.2. 1. In all tables, the notation iter. denotes the number ofiterations, eval. denotes the number of function evaluations, and error denotesthe absolute error which is measured by |c∗ε − c∗|, where c∗ is the known globalminimal value.
23
2. In all of the test processes, the stop conditions are same which are statedin the following: In IlVEM, we choose ε = 1.0e − 6, when λ < ε, the processstops. In ILSM, they choose ε = 1.0e− 10, when V < ε, the process stops. Andin CE, they choose ε = 1.0e− 3, when max
iσi < ε, the process stops.
3. In most cases, algorithm ILSM preserves advantage in the number of iter-ations and of evaluations, but it spends a lot of cpu-time in finding approximatinglevel set, so it has disadvantage in cpu-time.
Exam 5. [B, 27] The Trigonometric function: η = 7, µ = 1, x∗i = 0.9.
f(x) =n∑
i=1
8 sin2(η(xi − x∗i ))2 + 6 sin2(2η(xi − x∗i ))
2 + ν(xi − x∗i )2
Exam 6. [B, 29] Ackleys function is a widely used multimodal test function. Itis recommended to set a = 20, b = 0.2 c = 2π, test area is usually restricted tohypercubes xi ∈ (−32.78, 32.78), i = 1, 2, ..., n. f ∗(x∗ = 0) = 0.
f(x) = −a exp
−b
√√√√ 1
n
n∑i=1
x2i
− exp
(1
n
n∑i=1
cos(cxi)
)+ a + exp(1).
010
2030
40
010
2030
40−40
−20
0
20
40
60
80
Fig 5. Trigonometric function with n= 2
020
4060
80100
0
50
1000
5
10
15
20
Fig 6. Ackleys function with n=2
Exam 7. [B, 30] The Shekel-5 function: Its search domain: −32.195 ≤ xi ≤32.195, i = 1, 2, ..., n.
f(x) = 20 + e− 20e(−0.2√
1n
Pni=1 x2
i ) − e( 1n
Pni=1 cos(πxi))
Exam 8. [B, 30] Rastrigins function is based on the function of De Jong with theaddition of cosine modulation in order to produce frequent local minima. Thus,the test function is highly multimodal. However, the location of the minimaare regularly distributed. Test area is usually restricted to hypercubes xi ∈[−5.12, 5.12], i = 1, 2, ..., n. f ∗(x∗ = 0) = 0.
f(x) = 10n +n∑
i=1
[x2
i − 10 cos(2πxi)].
24
−10−5
05
10
−10
−5
0
5
10−20
0
20
40
60
80
100
Fig 8. Rastrigins function with n=2
Table 6.2. Numerical Results on Class B
prob. n ILVEM ILSM CEid iter. eval. error iter. eval. error iter. eval. error5 5 21 21084 9.00e-3 26 119253 6.02e-5 178 89400 9.23e-46 10 31 7874 3.16e-3 70 3877 9.92e-5 283 28300 1.87e-37 10 37 12360 1.05e-4 6 4802 1.87e-4 260 26000 1.70e-38 10 43 4429 2.0e-10 7 974 3.59e-7 77 7700 1.8e-10
Remark 6.3. In this class, when the stop rule is λk < 1.0e − 6, almost all ofsamples in ILVEM algorithm are in the current level set until the process stops.
Exam 9. [C, 27] Constrained problem (1): The bounds: U = (10, 10), L =(0, 0).
maxx
f(x) =sin3(2πx1) sin(2πx2)
x31(x1 + x2)
s.t. g1(x) = x21 − x2 + 1 ≤ 0,
g2(x) = 1− x1 + (x2 − 4)2 ≤ 0.
Exam 10. [C, 27] Constrained problem (2): The theoretical solution is x∗ =(1, 1), f(x∗) = 1.
min f(x) = (x1 − 2)2 + (x2 − 1)2
s.t. x1 + x2 − 2 ≤ 0x2
1 − x2 ≤ 0.
Exam 11. [C, 27] Constrained problem (3): The bounds: (10, 10, ..., 10), L =(−10,−10, ...,−10).
minx
f(x) = (x1 − 10)2 + 5(x2 − 12)2 + x43 + 3(x4 − 11)2
+10x65 + 7x2
6 + x47 − 4x6x7 − 10x6 − 8x7
s.t. g1(x) = 2x21 + 3x4
2 + x3 + 4x24 + 5x5 − 127 ≤ 0,
g2(x) = 7x1 + 3x2 + 10x23 + x4 − x5 − 282 ≤ 0,
g3(x) = 23x1 + x22 + 6x2
6 − 8x7 − 196 ≤ 0,
g4(x) = 4x21 + x2
2 − 3x1x2 + 2x23 + 5x6 − 11x7 ≤ 0.
25
Exam 12. [C, 27] Constrained problem (4): Its known solution is x∗ = (4.75, 5.0,−5.0),f(x∗) = −14.75
min f(x) = −x1 − x2 + x3;s.t. sin(4πx1)− 2 sin2(2πx2)− 2 sin2(2πx3) ≥ 0;
−5 ≤ xj ≤ 5, j = 1, 2, 3.
Table 6.3. Numerical Results on Class C
prob. n ILVEM ILSM CEid iter. eval. error iter. eval. error iter. eval. error9 2 51 2193 4.21e-7 11 2640 1.27e-5 33 6600 7.22e-810 2 36 1548 4.39e-3 15 1646 3.95e-1 39 3900 6.17e-411 7 40 28120 9.25e-2 55 121735 8.83e-1 200 5.6e5 1.51e-212 3 26 7878 1.32e-4 30 7299 6.56e-4 fail – –
Remark 6.4: 1. CE for solving exam 11 can not stop regularly, it is break offforcibly after 200 iterations.
2. In class C, ILSM algorithm almost has no more advantage with compari-son to ILVEM algorithm .
7 Further discussions and conclusions
In this paper, we propose a new method for solving continuous global optimizationproblem, namely the level-value estimation method (LVEM). Since we use theupdating mechanism of sample distribution based on the main idea of the cross-entropy method in the implementable algorithm, ILVEM algorithm automaticallyupdate the “importance region ” which the“ better” solutions are in. This methodovercomes the drawback that the level-set is difficult to be determined, which isthe main difficult in the integral level-set method [3]. Numerical experimentsshow the availability, efficiency and better accuracy of ILVEM algorithm.
However, ILVEM algorithm may be inefficient in dealing with some particularproblems. For example, the fifth function of De Jong,
min f(x) = 0.002 +2∑
i=−2
2∑j=−2
[5(i + 2) + j + 3 + (x1 − 16j)6 + (x2 − 16i)6]−1−1,
s.t. x1, x2 ∈ [−65.536, 65.536].
Its figure is as follows:
26
−5
0
5
−5
0
5−6
−4
−2
0
2
4
x 1018
This function is hard to minimize. ILVEM algorithm, ILSM algorithm and CEalgorithm all fail in dealing with this problem.
When the number of global optima is more than one, ILVEM algorithm cangenerally get one of them, see the result on exam 2 (six humps camel function).However. it fails in the fifth function of De Jong. This difficult case may beimproved by using the density function with multi-kernel in sample distribution.For example, Rubinstein [9] proposed a coordinate-normal density with two ker-nels. This conjecture needs verification by using more numerical experiments inour further work.
It is possible to use a full covariance matrix for better approximation. But inhigh dimensional case, to get a random sample by using a full covariance matrixis expensive until now.
Acknowledgement: We would like to give our thanks to Professor A. Hoff-mann, for his help to improve the quality of the final version of this paper. And weare also grateful to the anonymous reviewers, for their detailed comments. Thiswork is supported by Natural Sciences Foundation of China (No. 10671117).
References
[1] Xiaoli Wang and Guobiao Zhou, A new filled function for unconstrained global optimiza-tion, Applied Mathematics and Computation 174 (2006) 419-429.
[2] P.M. Pardalos et al, Recent developments and trends in global optimization, J. Comp.and Appl. maths, 124(2000) 209-228.
[3] Chew Soo Hong and Zheng Quan, Integral Global Optimization (Theory, Implementationand Applications), Berlin Heidelberg: Spring-Verlag 1988 88-132.
[4] Tian-WeiWen, Wu-DongHua, et al, Modified Integral-Level set Method for The Con-strained Global Optimization, Applied Mathsmatics and mechanics, 25-2(2004) 202-210.
[5] Wu-DongHua, Tian-WeiWen, et al, An Algorithm of Midified Integral-Level Set for Solv-ing Global Optimization, Acta Mathematica Applicative Science(In Chinese), 24-1(2001)100-110.
27
[6] S.M. Ross , Simulation, 3rd Edition, Beijing: POSTS and TELECOM PRESS , 2006.1167-179
[7] S. Asmussen, D.P. Kroese, and R.Y. Rubinstein, Heavy Tails, Importance Sampling andCross-Entropy, Stochastic Models, 21-1 (2005).
[8] P-T. De Boer, D.P Kroese, S. Mannor and R.Y. Rubinstein , A Tutorial on the Cross-Entropy Method. Annals of Operations Research. 134(2005) 19-67.
[9] D. P. Kroese, R. Y Rubinstein and S. Porotsky , The Cross-Entropy Method for Contin-uous Multi-extremal Optimization. On web of The Cross-Entropy Method.
[10] R. Y. Rubinstein, The cross-entropy method for combinatorial and continuous optimiza-tion. Methodology and Computing in Applied Probability, 2 (1999) 127-190.
[11] R.S.Dembo, S.C.Eisenstat and T.Steihaug, Inexact Newton methods, SIAM J.Numer.Anal., 19 (1982), 400-408.
[12] Wang Tao, Global Optimization for Constrained Nonlinear Programming, Thesis for PH.Din University of Illinois at Urbana-Champaign, 2001.
[13] Kar-ann Toh, Global Optimization by Monotonic Transformation, Computational Opti-mization and Applications, 23 (2002), 77-99
[14] R. Chelouah and P. Siarry, Tabu Search Applied to Global Optimzation, European Journalof Operational Research 123(2000) 256-270
[15] D. Bunnag, Stochastic Algorithms for Global Optimization, Thesis for PH.D in The Uni-versity of Alabama, 2004.
[16] J. Nocedal, S. J. Wright, Numerical Optimization, Springer-Verlag NewYork, Inc, 1999.
[17] H. X. Phu, A. Hoffmann, Essential suprmum and supremum of summable function. Nu-merical Functional Analysis and Optimization, 1996, 17(1/2): 167-180.
[18] D.H.Wu, W.Y.Yu, W.W.Tian, A Level-Value Estimation Method for solving Global Op-timization, Applied Mathematics and Mechanics, 2006, 27(7): 1001-1010.
[19] Z. Peng, D.H. Wu and W.W. Tian, A Level-Value Estimation Method for Solving Con-strained Global Optimization, Mathematica Numerica Sinica, 2007, 29(3): 293-304.
[20] J. Hichert, A. Hoffmann, H. X. Phu: Convergence speed of an integral method for com-puting the essential supremum. In I. M. Bomze, T. Csendes, R. Horst, and P. M. Pardalos,editors, Developments in global Optimization, pages 153 - 170. Kluver Academic Publish-ers, Dordrecht, Bosten, London, 1997
[21] J. Hichert: Methoden zur Bestimmung des wesentlichen Supremums mit anwendung inder globalen Optimierung, Thesis TU Ilmenau 1999 , N 231 (also published in ShakerVerlag, Aachen 2000, ISBN 3-8265- 8206-3)
[22] J. Hichert, A. Hoffmann, H. X. Phu, R. Reinhardt: A primal-dual integral method inglobal optimization. Discussiones Mathematicae - Differential Inclusions, Control and Op-timization 20(2000)2,257-278.
[23] A. Chinchuluun, E. Rentsen, and P.M. Pardalos, Global Minimization Algorithms forConcave Quadratic Programming Problems, Optimization, Vol 24, No. 6 (Dec. 2005), pp.627-639.
28
[24] P.M. Pardalos and E. Romeijn (Eds), Handbook of Global Optimization - Volume 2:Heuristic Approaches, Kluwer Academic Publishers, (2002).
[25] C.A. Floudas, P.M. Pardalos, C.S. Adjiman, ect., Handbook of Test Problems for Localand Global Optimization, Kluwer Academic Publishers, (1999).
[26] C. P. Robert, G. Casella, Monte Carlo Statistical Mehtods ( 2nd Edition), Spring Science+Business Media. Inc., 2004.
[27] M. Molga, C. Smutnicki, Test functions for optimization needs,www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf
[28] S. K. Mishra , Some New Test Functions for Global Optimization and Performance ofRepulsive Particle Swarm Method(August 23, 2006).Available at SSRN: http://ssrn.com/abstract=926132
[29] G.L. Gong and M.P. Qian , A course in applied stochastic process–and stochastic modelsin algorithms and intelligent computing. BeiJing: Tsinghua University Press, 2004.3, 186-188.
[30] J.Q. Hu, M. C. Fu and S. I. Marcus, Model Reference Adaptive Search Method for GlobalOptimization, Operations Research 55(3), 2007, 549-568.
29