26
1 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013 Landscape Theory Relevant Results Software Tools Conclusions & Future Work Problem Understanding through Landscape Theory Francisco Chicano, Gabriel Luque and Enrique Alba

Problem Understanding through Landscape Theory

Embed Size (px)

DESCRIPTION

Presentation in the Workshop on Problem Understanding and Real-World Optimization of the GECCO 2013 conference (Amsterdam).

Citation preview

Page 1: Problem Understanding through Landscape Theory

1 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Problem Understanding through Landscape Theory

Francisco Chicano, Gabriel Luque and Enrique Alba

Page 2: Problem Understanding through Landscape Theory

2 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  A landscape is a triple (X,N, f) where

Ø  X is the solution space

Ø  N is the neighbourhood operator

Ø  f is the objective function

Landscape Definition Landscape Definition Elementary Landscapes Landscape decomposition

The pair (X,N) is called configuration space

s0

s4 s7

s6

s2

s1

s8 s9

s5

s3 2

0

3

5

1

2

4 0

7 6

•  The neighbourhood operator is a function

N: X →P(X)

•  Solution y is neighbour of x if y ∈ N(x)

•  Regular and symmetric neighbourhoods

•  d=|N(x)| ∀ x ∈ X

•  y ∈ N(x) ⇔ x ∈ N(y)

•  Objective function

f: X →R (or N, Z, Q)

Page 3: Problem Understanding through Landscape Theory

3 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  An elementary function is an eigenvector of the graph Laplacian (plus constant)

•  Graph Laplacian:

•  Elementary function: eigenvector of Δ (plus constant)

Elementary Landscapes: Formal Definition

s0

s4 s7

s6

s2

s1

s8 s9

s5

s3

Adjacency matrix Degree matrix

Depends on the configuration space

Eigenvalue

Landscape Definition Elementary Landscapes Landscape decomposition

Page 4: Problem Understanding through Landscape Theory

4 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  An elementary landscape is a landscape for which

where

•  Grover’s wave equation

Elementary Landscapes: Characterization

Linear relationship

Eigenvalue

Depend on the problem/instance

Landscape Definition Elementary Landscapes Landscape decomposition

def

Page 5: Problem Understanding through Landscape Theory

5 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Elementary Landscapes: Examples Problem Neighbourhood d k

Symmetric TSP 2-opt n(n-3)/2 n-1 swap two cities n(n-1)/2 2(n-1)

Antisymmetric TSP inversions n(n-1)/2 n(n+1)/2 swap two cities n(n-1)/2 2n

Graph α-Coloring recolor 1 vertex (α-1)n 2α Graph Matching swap two elements n(n-1)/2 2(n-1) Graph Bipartitioning Johnson graph n2/4 2(n-1) NAES bit-flip n 4 Max Cut bit-flip n 4 Weight Partition bit-flip n 4

Landscape Definition Elementary Landscapes Landscape decomposition

Page 6: Problem Understanding through Landscape Theory

6 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  What if the landscape is not elementary?

•  Any landscape can be written as the sum of elementary landscapes

•  There exists a set of elementary functions that form a basis of the function space (Fourier basis)

Landscape Decomposition Landscape Definition Elementary Landscapes Landscape decomposition

X X X

e1

e2

Elementary functions

(from the Fourier basis)

Non-elementary function

f Elementary components of f

f < e1,f > < e2,f >

< e2,f >

< e1,f >

Page 7: Problem Understanding through Landscape Theory

7 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Landscape Decomposition: Examples Problem Neighbourhood d Components

General TSP inversions n(n-1)/2 2 swap two cities n(n-1)/2 2

Subset Sum Problem bit-flip n 2 MAX k-SAT bit-flip n k NK-landscapes bit-flip n k+1

Radio Network Design bit-flip n max. nb. of reachable antennae

Frequency Assignment change 1 frequency (α-1)n 2 QAP swap two elements n(n-1)/2 3

Landscape Definition Elementary Landscapes Landscape decomposition

Page 8: Problem Understanding through Landscape Theory

8 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  Let {x0, x1, ...} be a simple random walk on the configuration space where xi+1∈N(xi)

•  The random walk induces a time series {f(x0), f(x1), ...} on a landscape.

•  The autocorrelation function is defined as:

•  The autocorrelation length and coefficient:

•  Autocorrelation length conjecture:

Autocorrelation

The number of local optima in a search space is roughly Solutions reached from x0

after l moves

s0

s4 s7

s6

s2

s1

s8 s9

s5

s3 2

0

3

5

1

2

4 0

7 6

Autocorrelation FDC Mutation Uniform Crossover Runtime

Page 9: Problem Understanding through Landscape Theory

9 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  The higher the value of l and ξ the smaller the number of local optima

•  l and ξ is a measure of ruggedness

Autocorrelation Length “Conjecture”

Ruggedness SA (configuration 1) SA (configuration 2)

% rel. error nb. of steps % rel. error nb. of steps 10 ≤ ζ < 20 0.2 50,500 0.1 101,395

20 ≤ ζ < 30 0.3 53,300 0.2 106,890

30 ≤ ζ < 40 0.3 58,700 0.2 118,760

40 ≤ ζ < 50 0.5 62,700 0.3 126,395

50 ≤ ζ < 60 0.7 66,100 0.4 133,055

60 ≤ ζ < 70 1.0 75,300 0.6 151,870

70 ≤ ζ < 80 1.3 76,800 1.0 155,230

80 ≤ ζ < 90 1.9 79,700 1.4 159,840

90 ≤ ζ < 100 2.0 82,400 1.8 165,610

Length

Coefficient

Angel, Zissimopoulos. Theoretical Computer Science 263:159-172 (2001)

Autocorrelation FDC Mutation Uniform Crossover Runtime

Page 10: Problem Understanding through Landscape Theory

10 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Fitness-Distance Correlation: Definition

Distance to the optimum

Fitn

ess

valu

e Definition 2. Given a function f : Bn 7! R the fitness-distance correlation forf is defined as

r =Cov

fd

f

d

, (10)

where Cov

fd

is the covariance of the fitness values and the distances of thesolutions to their nearest global optimum, �

f

is the standard deviation of thefitness values in the search space and �

d

is the standard deviation of the distancesto the nearest global optimum in the search space. Formally:

Cov

fd

=1

2n

X

x2Bn

(f(x)� f)(d(x)� d),

f =1

2n

X

x2Bn

f(x), �

f

=

s1

2n

X

x2Bn

(f(x)� f)2,

d =1

2n

X

x2Bn

d(x), �

d

=

s1

2n

X

x2Bn

(d(x)� d)2, (11)

where the function d(x) is the Hamming distance between x and its nearest globaloptimum.

The FDC r is a value between �1 and 1. The lower the absolute value ofr, the more di�cult the optimization problem is supposed to be. The exactcomputation of the FDC using the previous definition requires the evaluationof the complete search space. It is required to determine the global optima todefine d(x) and compute the statistics for d and f . If the objective function f isa constant function, then the FDC is not well-defined, since �

f

= 0.

In the following we will focus on the case in which there exists one onlyglobal optimum x

⇤ and we know the elementary landscape decomposition of f .The following lemma provides an expression for d and �

d

in this case.

Lemma 1. Given an optimization problem defined over Bn, if there is only oneglobal optimum x

⇤, then the distance function d(x) defined in Definition 2 is theHamming distance between x and x

⇤ and its average and standard deviation inthe whole search space are given by

d =n

2, �

d

=

pn

2. (12)

Proof. Since there is only one global optimum, the function d(x) is defined asd(x) = H(x, x⇤). Given an integer number 0 k n, the number of solutions

at distance k from x

⇤ is

✓n

k

◆. Then we can compute the two first raw moments

Difficult when r < 0.15 (Jones & Forrest)

Autocorrelation FDC Mutation Uniform Crossover Runtime

Page 11: Problem Understanding through Landscape Theory

11 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  Using the previous facts we get for the elementary landscapes…

•  In general, for an arbitrary function…

Fitness-Distance Correlation Formulas

f(x) =

nX

p=0

f[p](x)

where

j =

�1

2

f[p] = 0 for j > 0

f[0] = f

r =

�f[p](x⇤)

�fpn

r = 0

1

If j>1

f(x) =

nX

p=0

f[p](x)

where

j =

�1

2

f[p] = 0 for j > 0

f[0] = f

r =

�f[p](x⇤)

�fpn

r = 0

f(x) = f[0](x) + f[1](x) + f[2](x) + . . .+ f[n](x)

1

… the only component contributing to r is f[1](x)

of d(x) over the search space as:

↵1 = d =1

2n

nX

k=0

✓n

k

◆k =

n2n�1

2n=

n

2,

↵2 = d

2 =1

2n

nX

k=0

✓n

k

◆k

2 =n(n+ 1)2n�2

2n=

n(n+ 1)

4.

Using these moments we can compute the standard deviation asp

↵2 � ↵

21,

which yields:

d

=

rn(n+ 1)

4� n

2

4=

rn

4=

pn

2. (13)

ut

Now we are ready to prove the main result of this work.

Theorem 1. Let f be an objective function whose elementary landscape decom-position is f =

Pn

p=0 f[p], where f[0] is the constant function f[0](x) = f andeach f[p] with p > 0 is an order-p elementary function with zero o↵set. If thereexists only one global optimum in the search space x

⇤, the FDC can be exactlycomputed as:

r =�f[1](x

⇤)

f

pn

. (14)

Proof. Let us expand the covariance as

Cov

fd

=1

2n

X

x2Bn

f(x)d(x)� f d =1

2n

nX

k=0

k

X

x2BnH(x,x

⇤)=k

f(x)� f

n

2

=1

2n

nX

k=0

k

X

x2BnH(x,x

⇤)=k

nX

p=0

f[p](x)� f[0]n

2=

1

2n

nX

k=0

k

nX

p=0

K(n)k,p

f[p](x⇤)� f[0]

n

2

=nX

p=0

1

2n

nX

k=0

kK(n)k,p

!f[p](x

⇤)� f[0]n

2=

n

2f[0] � 1

2f[1](x

⇤)� f[0]n

2

= �1

2f[1](x

⇤), (15)

where we used the result in Proposition 1. Substituting in (10) we obtain (14).ut

The previous theorem shows that the only thing we need to know on theglobal optimum is the value of the first elementary component. With this infor-mation we can exactly compute the FDC. Some problems for which we knowthe elementary landscape decomposition based on the numerial data defininga problem instance are MAX-SAT, 0-1 Unconstrained Quadratic Optimization

Rugged components are not considered by FDC

EvoCOP 2012, LNCS 7245: 111-123

If j=1

f(x) =

nX

j=0

f[j](x)

where

j =

�1

2

f[j] = 0 for j > 0

f[0] = f

r =

�f[j](x⇤)

f

pn

r = 0

f(x) = f[0](x) + f[1](x) + f[2](x) + . . .+ f[n](x)

a1 = 7061.43

a2 = a3 = . . . = a45 = 1

⇧(f(M

p

(x))) =

8>>><

>>>:

0 1 · · · 1

0 0 1

.

.

.

.

.

.

.

.

.

.

.

.

0 0 · · · 0

9>>>=

>>>;⇡(f(M

p

(x)))

f[j](x) =

X

w2Bn|w|=j

a

w

w

(x)

R(A, f) = �(f)| {z }problem

⌦ ⇤(A)| {z }algorithm

(1)

1

Autocorrelation FDC Mutation Uniform Crossover Runtime

Page 12: Problem Understanding through Landscape Theory

12 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  Fitness-Distance Correlation for order-1 (linear) elementary landscapes (assume max.)

•  We can define a linear elementary landscape with the desired FDC ρ (lower than 0)

•  “Difficult” problems can be obtained starting in n=45 (|r| < 0.15)

FDC: Implications for Linear Functions

The previous corollary states that only elementary landscapes with orderp = 1 have a nonzero FDC. Furthermore, the FDC does depend on the valueof the objective function in the global optimum f(x⇤) and the average value f ,but not on the solution x

⇤ itself. We can also observe that if we are maximizing,then f(x⇤) > f and the FDC is negative, while if we are minimizing f(x⇤) < f

and the FDC is positive.Interestingly, the order-1 elementary landscapes can always be written as

linear functions and they can be optimized in polynomial time. That is, if f isan order-1 elementary function then it can be written in the following way:

f(x) =nX

i=1

a

i

x

i

+ b. (17)

where a

i

and b are real values. The following proposition provides the averageand the standard deviation for this family of functions.

Proposition 2. Let f be an order-1 elementary function, which can be writtenas (17). Then, the average and the standard deviation of the function values inthe whole search space are:

f = b+1

2

nX

i=1

a

i

, �

f

=1

2

vuutnX

i=1

a

2i

. (18)

Proof. Using the linearity property of the average we can write: f =P

n

i=1 aixi

+b, and f in (18) follows from the fact that x

i

= 1/2. Now we can compute thevariance of f as:

V ar[f ] = (f(x)� f)2 =

nX

i=1

a

i

x

i

� 1

2

nX

i=1

a

i

!2

=

nX

i=1

a

i

✓x

i

� 1

2

◆!2

=nX

i,j=1

a

i

a

j

✓x

i

� 1

2

◆✓x

j

� 1

2

◆=

nX

i,j=1

a

i

a

j

✓x

i

x

j

� 1

2x

i

� 1

2x

j

+1

4

=nX

i,j=1

a

i

a

j

✓x

i

x

j

� 1

4

◆=

nX

i,j=1

a

i

a

j

✓�

j

i

1

4+

1

4� 1

4

◆=

1

4

nX

i=1

a

2i

, (19)

where we used again x

i

= x

j

= 1/2 and x

i

x

j

= 1/4(�ji

+ 1), being �

j

i

theKronecker delta. The expression for �

f

in (18) follows from (19). utUsing Proposition 2 we can compute the FDC for the order-1 elementary

landscapes.

Proposition 3. Let f be an order-1 elementary function written as (17) suchthat all a

i

6= 0. Then, it has one only global optimum and its FDC (assumingmaximization) is:

r =�Pn

i=1 |ai|pn

Pn

i=1 a2i

, (20)

which is always in the interval �1 r < 0.

The previous corollary states that only elementary landscapes with orderp = 1 have a nonzero FDC. Furthermore, the FDC does depend on the valueof the objective function in the global optimum f(x⇤) and the average value f ,but not on the solution x

⇤ itself. We can also observe that if we are maximizing,then f(x⇤) > f and the FDC is negative, while if we are minimizing f(x⇤) < f

and the FDC is positive.Interestingly, the order-1 elementary landscapes can always be written as

linear functions and they can be optimized in polynomial time. That is, if f isan order-1 elementary function then it can be written in the following way:

f(x) =nX

i=1

a

i

x

i

+ b. (17)

where a

i

and b are real values. The following proposition provides the averageand the standard deviation for this family of functions.

Proposition 2. Let f be an order-1 elementary function, which can be writtenas (17). Then, the average and the standard deviation of the function values inthe whole search space are:

f = b+1

2

nX

i=1

a

i

, �

f

=1

2

vuutnX

i=1

a

2i

. (18)

Proof. Using the linearity property of the average we can write: f =P

n

i=1 aixi

+b, and f in (18) follows from the fact that x

i

= 1/2. Now we can compute thevariance of f as:

V ar[f ] = (f(x)� f)2 =

nX

i=1

a

i

x

i

� 1

2

nX

i=1

a

i

!2

=

nX

i=1

a

i

✓x

i

� 1

2

◆!2

=nX

i,j=1

a

i

a

j

✓x

i

� 1

2

◆✓x

j

� 1

2

◆=

nX

i,j=1

a

i

a

j

✓x

i

x

j

� 1

2x

i

� 1

2x

j

+1

4

=nX

i,j=1

a

i

a

j

✓x

i

x

j

� 1

4

◆=

nX

i,j=1

a

i

a

j

✓�

j

i

1

4+

1

4� 1

4

◆=

1

4

nX

i=1

a

2i

, (19)

where we used again x

i

= x

j

= 1/2 and x

i

x

j

= 1/4(�ji

+ 1), being �

j

i

theKronecker delta. The expression for �

f

in (18) follows from (19). utUsing Proposition 2 we can compute the FDC for the order-1 elementary

landscapes.

Proposition 3. Let f be an order-1 elementary function written as (17) suchthat all a

i

6= 0. Then, it has one only global optimum and its FDC (assumingmaximization) is:

r =�Pn

i=1 |ai|pn

Pn

i=1 a2i

, (20)

which is always in the interval �1 r < 0.

The previous corollary states that only elementary landscapes with orderp = 1 have a nonzero FDC. Furthermore, the FDC does depend on the valueof the objective function in the global optimum f(x⇤) and the average value f ,but not on the solution x

⇤ itself. We can also observe that if we are maximizing,then f(x⇤) > f and the FDC is negative, while if we are minimizing f(x⇤) < f

and the FDC is positive.Interestingly, the order-1 elementary landscapes can always be written as

linear functions and they can be optimized in polynomial time. That is, if f isan order-1 elementary function then it can be written in the following way:

f(x) =nX

i=1

a

i

x

i

+ b. (17)

where a

i

and b are real values. The following proposition provides the averageand the standard deviation for this family of functions.

Proposition 2. Let f be an order-1 elementary function, which can be writtenas (17). Then, the average and the standard deviation of the function values inthe whole search space are:

f = b+1

2

nX

i=1

a

i

, �

f

=1

2

vuutnX

i=1

a

2i

. (18)

Proof. Using the linearity property of the average we can write: f =P

n

i=1 aixi

+b, and f in (18) follows from the fact that x

i

= 1/2. Now we can compute thevariance of f as:

V ar[f ] = (f(x)� f)2 =

nX

i=1

a

i

x

i

� 1

2

nX

i=1

a

i

!2

=

nX

i=1

a

i

✓x

i

� 1

2

◆!2

=nX

i,j=1

a

i

a

j

✓x

i

� 1

2

◆✓x

j

� 1

2

◆=

nX

i,j=1

a

i

a

j

✓x

i

x

j

� 1

2x

i

� 1

2x

j

+1

4

=nX

i,j=1

a

i

a

j

✓x

i

x

j

� 1

4

◆=

nX

i,j=1

a

i

a

j

✓�

j

i

1

4+

1

4� 1

4

◆=

1

4

nX

i=1

a

2i

, (19)

where we used again x

i

= x

j

= 1/2 and x

i

x

j

= 1/4(�ji

+ 1), being �

j

i

theKronecker delta. The expression for �

f

in (18) follows from (19). utUsing Proposition 2 we can compute the FDC for the order-1 elementary

landscapes.

Proposition 3. Let f be an order-1 elementary function written as (17) suchthat all a

i

6= 0. Then, it has one only global optimum and its FDC (assumingmaximization) is:

r =�Pn

i=1 |ai|pn

Pn

i=1 a2i

, (20)

which is always in the interval �1 r < 0.

Proof. The global optimum x

⇤ has 1 in all the positions i such that ai

> 0 andthe maximum fitness value is:

f(x⇤) = b+nX

i=1a

i

>0

a

i

. (21)

Using Proposition 2 we can write:

f � f(x⇤) =

b+

1

2

nX

i=1

a

i

!�

0

B@b+nX

i=1a

i

>0

a

i

1

CA = �1

2

nX

i=1

|ai

|. (22)

Replacing the previous expression and �

f

in (16) we prove the claimed result.ut

When all the values of ai

are the same, the FDC computed with (20) is �1.This happens in particular for the Onemax problem. But if there exist di↵erentvalues for a

i

, then we can reach any arbitrary value in [�1, 0) for r. The followingtheorem provides a way to do it.

Theorem 2. Let ⇢ be an arbitrary real value in the interval [�1, 0), then anylinear function f(x) given by (17) where n > 1/⇢2, a2 = a3 = . . . = a

n

= 1 anda1 is

a1 =(n� 1) + n|⇢|p(1� ⇢

2)(n� 1)

n⇢

2 � 1(23)

has exactly FDC r = ⇢.

Proof. The expression for a1 is well-defined since n⇢

2> 1. Replacing all the a

i

in (20) we get r = ⇢. utTheorem 2 provides a solid argument against the use of FDC as a measure

of the di�culty of a problem. In e↵ect, we can always build an optimizationproblem based on a linear function, which can be solved in polynomial time,with an FDC as near as desired to 0 (but not zero), that is, as “di�cult” asdesired according to the FDC. However, we have to highlight here that for agiven FDC value ⇢ we need at least n > 1/⇢2 variables. Thus, an FDC nearer to0 requires more variables.

4 FDC, Autocorrelation Length and Local Optima

The autocorrelation length ` [8] has also been used as a measure of the di�cultyof a problem. Chicano and Alba [4] found a negative correlation between ` andthe number of local optima in the 0-1 Unconstrained Quadratic Optimizationproblem (0-1 UQO), an NP-hard problem [9]. Kinnear [11] also studied theuse of the autocorrelation measures as problem di�culty, but the results wereinconclusive. In this section we investigate which of the two measures, ` or the

Proof. The global optimum x

⇤ has 1 in all the positions i such that ai

> 0 andthe maximum fitness value is:

f(x⇤) = b+nX

i=1a

i

>0

a

i

. (21)

Using Proposition 2 we can write:

f � f(x⇤) =

b+

1

2

nX

i=1

a

i

!�

0

B@b+nX

i=1a

i

>0

a

i

1

CA = �1

2

nX

i=1

|ai

|. (22)

Replacing the previous expression and �

f

in (16) we prove the claimed result.ut

When all the values of ai

are the same, the FDC computed with (20) is �1.This happens in particular for the Onemax problem. But if there exist di↵erentvalues for a

i

, then we can reach any arbitrary value in [�1, 0) for r. The followingtheorem provides a way to do it.

Theorem 2. Let ⇢ be an arbitrary real value in the interval [�1, 0), then anylinear function f(x) given by (17) where n > 1/⇢2, a2 = a3 = . . . = a

n

= 1 anda1 is

a1 =(n� 1) + n|⇢|p(1� ⇢

2)(n� 1)

n⇢

2 � 1(23)

has exactly FDC r = ⇢.

Proof. The expression for a1 is well-defined since n⇢

2> 1. Replacing all the a

i

in (20) we get r = ⇢. utTheorem 2 provides a solid argument against the use of FDC as a measure

of the di�culty of a problem. In e↵ect, we can always build an optimizationproblem based on a linear function, which can be solved in polynomial time,with an FDC as near as desired to 0 (but not zero), that is, as “di�cult” asdesired according to the FDC. However, we have to highlight here that for agiven FDC value ⇢ we need at least n > 1/⇢2 variables. Thus, an FDC nearer to0 requires more variables.

4 FDC, Autocorrelation Length and Local Optima

The autocorrelation length ` [8] has also been used as a measure of the di�cultyof a problem. Chicano and Alba [4] found a negative correlation between ` andthe number of local optima in the 0-1 Unconstrained Quadratic Optimizationproblem (0-1 UQO), an NP-hard problem [9]. Kinnear [11] also studied theuse of the autocorrelation measures as problem di�culty, but the results wereinconclusive. In this section we investigate which of the two measures, ` or the

Proof. The global optimum x

⇤ has 1 in all the positions i such that ai

> 0 andthe maximum fitness value is:

f(x⇤) = b+nX

i=1a

i

>0

a

i

. (21)

Using Proposition 2 we can write:

f � f(x⇤) =

b+

1

2

nX

i=1

a

i

!�

0

B@b+nX

i=1a

i

>0

a

i

1

CA = �1

2

nX

i=1

|ai

|. (22)

Replacing the previous expression and �

f

in (16) we prove the claimed result.ut

When all the values of ai

are the same, the FDC computed with (20) is �1.This happens in particular for the Onemax problem. But if there exist di↵erentvalues for a

i

, then we can reach any arbitrary value in [�1, 0) for r. The followingtheorem provides a way to do it.

Theorem 2. Let ⇢ be an arbitrary real value in the interval [�1, 0), then anylinear function f(x) given by (17) where n > 1/⇢2, a2 = a3 = . . . = a

n

= 1 anda1 is

a1 =(n� 1) + n|⇢|p(1� ⇢

2)(n� 1)

n⇢

2 � 1(23)

has exactly FDC r = ⇢.

Proof. The expression for a1 is well-defined since n⇢

2> 1. Replacing all the a

i

in (20) we get r = ⇢. utTheorem 2 provides a solid argument against the use of FDC as a measure

of the di�culty of a problem. In e↵ect, we can always build an optimizationproblem based on a linear function, which can be solved in polynomial time,with an FDC as near as desired to 0 (but not zero), that is, as “di�cult” asdesired according to the FDC. However, we have to highlight here that for agiven FDC value ⇢ we need at least n > 1/⇢2 variables. Thus, an FDC nearer to0 requires more variables.

4 FDC, Autocorrelation Length and Local Optima

The autocorrelation length ` [8] has also been used as a measure of the di�cultyof a problem. Chicano and Alba [4] found a negative correlation between ` andthe number of local optima in the 0-1 Unconstrained Quadratic Optimizationproblem (0-1 UQO), an NP-hard problem [9]. Kinnear [11] also studied theuse of the autocorrelation measures as problem di�culty, but the results wereinconclusive. In this section we investigate which of the two measures, ` or the

f(x) =

nX

p=0

f[p](x)

where

j =

�1

2

f[p] = 0 for j > 0

f[0] = f

r =

�f[p](x⇤)

�fpn

r = 0

f(x) = f[0](x) + f[1](x) + f[2](x) + . . .+ f[n](x)

a1 = 7061.43

a2 = a3 = . . . = a45 = 1

1

f(x) =

nX

p=0

f[p](x)

where

j =

�1

2

f[p] = 0 for j > 0

f[0] = f

r =

�f[p](x⇤)

�fpn

r = 0

f(x) = f[0](x) + f[1](x) + f[2](x) + . . .+ f[n](x)

a1 = 7061.43

a2 = a3 = . . . = a45 = 1

1

Autocorrelation FDC Mutation Uniform Crossover Runtime

Page 13: Problem Understanding through Landscape Theory

13 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  Example (j=0,1,2,3):

p=1/2 → start from scratch

p≈0.23 → maximum expectation

The traditional p=1/n could be around here

E[f ]x

=nX

j=0

(1� 2p)jf[j](x) (1)

R(A, f) = �(f)| {z }problema

⌦ ⇤(A)| {z }algoritmo

(2)

f = f[0] + f[1] + f[2] + f[3] (3)

` =1X

s=0

r(s) (4)

M ⇡ |X|/|X(x0, `)| (5)

1

Expectation for Bit-flip Mutation

Fitness Probability Distribution of Bit-Flip Mutation

where f[j] is the elementary component of f with order j.

Proof. We can write f as the sum of its elementary components as f =

Pn

j=0 f[j]. Then,we can compute the expected value as:

E{f(Mp

(x))} =

nX

j=0

E{f[j](Mp

(x))} =

nX

j=0

(1� 2p)jf[j](x), (36)

where we used the result of Theorem 1.

3.2 Higher Order MomentsEquation (35) can be used to compute the expected value of f(M

p

(x)). We may also useit to extend to higher order moments, as in the following theorem.Theorem 2. Let x 2 Bn be a binary string, f : Bn ! R a function and M

p

(x) the solutionreached after applying the bit-flip mutation operator with probability p to solution x. The m-thmoment of the random variable f(M

p

(x)) is

µm

{f(Mp

(x))} =

nX

j=0

(1� 2p)jfm

[j](x), (37)

where fm

[j] is the order-j elementary component of fm.1

Proof. By definition, µm

{f(Mp

(x))} can be expressed as the expectation of the randomvariable fm

(Mp

(x)). Then, using (35) we can write:

µm

{f(Mp

(x))} = E{fm

(Mp

(x))} =

nX

j=0

(1� 2p)jfm

[j](x).

We define the 0-th moment µ0{f(Mp

(x))} = 1. We can observe from (37) that allthe higher-order moments are polynomials in p, just like the expectation (first ordermoment).

Let us now introduce some new notation. Let us denote with µ{f(Mp

(x))} thevector of moments, that is, the m-th component of this vector is the m-th moment.We do not limit the number of components of this vector, we can consider it as aninfinite-dimensional vector. Later we will see that only a finite number of elements ofthis vector would be required for our purposes. We define the matrix function F (x) asF

m,j

(x) = fm

[j](x) where 0 j n and m � 0. Let us also define the vector ⇤(p) as⇤

j

(p) = (1� 2p)j for 0 j n.Using the new notation we can write (37) in vector form as:0

BBBBBB@

µ0{f(Mp

(x))}µ1{f(Mp

(x))}...

µm

{f(Mp

(x))}...

1

CCCCCCA=

0

BBBBBB@

1 0 . . . 0

f[0](x) f[1](x) . . . f[n](x)...

.... . .

...fm

[0](x) fm

[1](x) . . . fm

[n](x)...

.... . .

...

1

CCCCCCA

0

BBB@

1

1� 2p...

(1� 2p)n

1

CCCA,

1We use this notation instead of (fm)[j] to simplify the expressions, but fm[j] should not be confused with

f[j] to the power of m.

Evolutionary Computation Volume x, Number x 11

Fitness Probability Distribution of Bit-Flip Mutation

where f[j] is the elementary component of f with order j.

Proof. We can write f as the sum of its elementary components as f =

Pn

j=0 f[j]. Then,we can compute the expected value as:

E{f(Mp

(x))} =

nX

j=0

E{f[j](Mp

(x))} =

nX

j=0

(1� 2p)jf[j](x), (36)

where we used the result of Theorem 1.

3.2 Higher Order MomentsEquation (35) can be used to compute the expected value of f(M

p

(x)). We may also useit to extend to higher order moments, as in the following theorem.Theorem 2. Let x 2 Bn be a binary string, f : Bn ! R a function and M

p

(x) the solutionreached after applying the bit-flip mutation operator with probability p to solution x. The m-thmoment of the random variable f(M

p

(x)) is

µm

{f(Mp

(x))} =

nX

j=0

(1� 2p)jfm

[j](x), (37)

where fm

[j] is the order-j elementary component of fm.1

Proof. By definition, µm

{f(Mp

(x))} can be expressed as the expectation of the randomvariable fm

(Mp

(x)). Then, using (35) we can write:

µm

{f(Mp

(x))} = E{fm

(Mp

(x))} =

nX

j=0

(1� 2p)jfm

[j](x).

We define the 0-th moment µ0{f(Mp

(x))} = 1. We can observe from (37) that allthe higher-order moments are polynomials in p, just like the expectation (first ordermoment).

Let us now introduce some new notation. Let us denote with µ{f(Mp

(x))} thevector of moments, that is, the m-th component of this vector is the m-th moment.We do not limit the number of components of this vector, we can consider it as aninfinite-dimensional vector. Later we will see that only a finite number of elements ofthis vector would be required for our purposes. We define the matrix function F (x) asF

m,j

(x) = fm

[j](x) where 0 j n and m � 0. Let us also define the vector ⇤(p) as⇤

j

(p) = (1� 2p)j for 0 j n.Using the new notation we can write (37) in vector form as:0

BBBBBB@

µ0{f(Mp

(x))}µ1{f(Mp

(x))}...

µm

{f(Mp

(x))}...

1

CCCCCCA=

0

BBBBBB@

1 0 . . . 0

f[0](x) f[1](x) . . . f[n](x)...

.... . .

...fm

[0](x) fm

[1](x) . . . fm

[n](x)...

.... . .

...

1

CCCCCCA

0

BBB@

1

1� 2p...

(1� 2p)n

1

CCCA,

1We use this notation instead of (fm)[j] to simplify the expressions, but fm[j] should not be confused with

f[j] to the power of m.

Evolutionary Computation Volume x, Number x 11

Autocorrelation FDC Mutation Uniform Crossover Runtime

Page 14: Problem Understanding through Landscape Theory

14 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Expectation for the Uniform Crossover

Proof. From (4) the Walsh coe�cient bw,⇢

(x, y) is:

bw,⇢

(x, y) =12n

X

z2Bn

w

(z)Pr{U⇢

(x, y) = z}

=12n

X

z2Bn

w

(z)nY

i=1

Pr{B⇢

(xi

, yi

) = zi

}

=12n

X

z2Bn

nY

i=1

(�1)wi

z

i

!nY

i=1

Pr{B⇢

(xi

, yi

) = zi

}

=12n

X

z2Bn

nY

i=1

(�1)wi

z

iPr{B⇢

(xi

, yi

) = zi

}

=12n

nY

i=1

X

z

i

2B(�1)wi

z

iPr{B⇢

(xi

, yi

) = zi

}. (26)

For the inner sum we can writeX

z

i

2B(�1)wi

z

iPr{B⇢

(xi

, yi

) = zi

}

= Pr{B⇢

(xi

, yi

) = 0}+ (�1)wiPr{B⇢

(xi

, yi

) = 1}= 1� 2�wi

1 Pr{B⇢

(xi

, yi

) = 1}, (27)

where we exploit the fact that we must get 0 or 1 in a bitafter the crossover, that is:

Pr{B⇢

(xi

, yi

) = 0}+ Pr{B⇢

(xi

, yi

) = 1} = 1.

Including this result in (26) we have

bw,⇢

(x, y) =12n

nY

i=1

(1� 2�wi

1 Pr{B⇢

(xi

, yi

) = 1})

=12n

nY

i=1w

i

=1

(1� 2Pr{B⇢

(xi

, yi

) = 1}) . (28)

Using the definition of Pr{B⇢

(xi

, yi

) = zi

} in (23):

Pr{B⇢

(xi

, yi

) = 1} =

8>><

>>:

0 if xi

= yi

= 0,1 if x

i

= yi

= 1,⇢ if x

i

= 1 and yi

= 0,1� ⇢ if x

i

= 0 and yi

= 1.(29)

And the factor inside (28) is

(1� 2Pr{B⇢

(xi

, yi

) = 1}) = (�1)yi�1� 2⇢+ 2⇢�yi

x

i

�. (30)

We can develop (28) in the following way:

bw,⇢

(x, y) =12n

nY

i=1w

i

=1

(�1)yi�1� 2⇢+ 2⇢�yi

x

i

=12n

0

B@nY

i=1w

i

=1

(�1)yi

1

CAnY

i=1w

i

=1

�1� 2⇢+ 2⇢�yi

x

i

=12n

w

(y)nY

i=1w

i

=1

�1� 2⇢+ 2⇢�yi

x

i

�. (31)

The expression�1� 2⇢+ 2⇢�yi

x

i

�takes only two values: 1

if yi

= xi

and 1�2⇢ when xi

6= yi

. A factor 1�2⇢ is includedin the product for all the positions i in which x

i

6= yi

andw

i

= 1. Then the product in (31) becomes (1� 2⇢)|(x�y)^w|

and we obtain (25).

Now we are ready to present the main result of this work.

Theorem 1. Let f be a pseudo-Boolean function definedover Bn and a

w

with w 2 Bn its Walsh coe�cients. Thefollowing identity holds for E{f(U

(x, y))}:

E{f(U⇢

(x, y))} =nX

r=0

A(r)x,y

(1� 2⇢)r, (32)

where the coe�cients A(r)x,y

are defined in the following way:

A(r)x,y

=X

w2Bn|(x�y)^w|=r

aw

w

(y). (33)

Proof. According to (22) and (25) we can write

E{f(U⇢

(x, y))} = 2nX

w2Bn

aw

bw,⇢

(x, y)

=X

w2Bn

aw

w

(y)(1� 2⇢)|(x�y)^w|

=nX

r=0

X

w2Bn|(x�y)^w|=r

aw

w

(y)(1� 2⇢)|(x�y)^w|

=nX

r=0

(1� 2⇢)rX

w2Bn|(x�y)^w|=r

aw

w

(y), (34)

and we get (32).

Note that the expression for the expected fitness afterapplying UX is a polynomial in (1 � 2⇢). The degree ofthis polynomial depends on the Hamming distance betweenthe parent solutions, |x � y|, and the maximum order ofthe Walsh decomposition, p

max

. The degree of the poly-nomial will be the minimum between these two numbers,since |(x � y) ^ w| < |w| and |(x � y) ^ w| < |x � y|.This means that the maximum degree of the polynomialis r

max

= min(pmax

, |x� y|).

Proposition 4. Let A(r)x,y

be the polynomial coe�cients

for f and B(r)x,y

the polynomial coe�cients for g. Then, the

polynomial coe�cients for h = f+g are C(r)x,y

= A(r)x,y

+B(r)x,y

.

Proof. Let aw

with w 2 Bn be the Walsh coe�cientsof f and b

w

the Walsh coe�cients of g. Then, the Walshcoe�cients of h = f + g are c

w

= aw

+ bw

. Therefore:

C(r)x,y

=X

w2Bn|(x�y)^w|=r

cw

w

(y) =X

w2Bn|(x�y)^w|=r

(aw

+ bw

) w

(y)

=X

w2Bn|(x�y)^w|=r

aw

w

(y) +X

w2Bn|(x�y)^w|=r

bw

w

(y)

= A(r)x,y

+B(r)x,y

.

When UX is used in the literature a common value for ⇢is 1/2. In this case, the expression for the expected fitnessvalue is a simple coe�cient, as the following corollary proves.

Corollary 1. Let f be a pseudo-Boolean function de-fined over Bn and a

w

with w 2 Bn its Walsh coe�cients.

Proof. From (4) the Walsh coe�cient bw,⇢

(x, y) is:

bw,⇢

(x, y) =12n

X

z2Bn

w

(z)Pr{U⇢

(x, y) = z}

=12n

X

z2Bn

w

(z)nY

i=1

Pr{B⇢

(xi

, yi

) = zi

}

=12n

X

z2Bn

nY

i=1

(�1)wi

z

i

!nY

i=1

Pr{B⇢

(xi

, yi

) = zi

}

=12n

X

z2Bn

nY

i=1

(�1)wi

z

iPr{B⇢

(xi

, yi

) = zi

}

=12n

nY

i=1

X

z

i

2B(�1)wi

z

iPr{B⇢

(xi

, yi

) = zi

}. (26)

For the inner sum we can writeX

z

i

2B(�1)wi

z

iPr{B⇢

(xi

, yi

) = zi

}

= Pr{B⇢

(xi

, yi

) = 0}+ (�1)wiPr{B⇢

(xi

, yi

) = 1}= 1� 2�wi

1 Pr{B⇢

(xi

, yi

) = 1}, (27)

where we exploit the fact that we must get 0 or 1 in a bitafter the crossover, that is:

Pr{B⇢

(xi

, yi

) = 0}+ Pr{B⇢

(xi

, yi

) = 1} = 1.

Including this result in (26) we have

bw,⇢

(x, y) =12n

nY

i=1

(1� 2�wi

1 Pr{B⇢

(xi

, yi

) = 1})

=12n

nY

i=1w

i

=1

(1� 2Pr{B⇢

(xi

, yi

) = 1}) . (28)

Using the definition of Pr{B⇢

(xi

, yi

) = zi

} in (23):

Pr{B⇢

(xi

, yi

) = 1} =

8>><

>>:

0 if xi

= yi

= 0,1 if x

i

= yi

= 1,⇢ if x

i

= 1 and yi

= 0,1� ⇢ if x

i

= 0 and yi

= 1.(29)

And the factor inside (28) is

(1� 2Pr{B⇢

(xi

, yi

) = 1}) = (�1)yi�1� 2⇢+ 2⇢�yi

x

i

�. (30)

We can develop (28) in the following way:

bw,⇢

(x, y) =12n

nY

i=1w

i

=1

(�1)yi�1� 2⇢+ 2⇢�yi

x

i

=12n

0

B@nY

i=1w

i

=1

(�1)yi

1

CAnY

i=1w

i

=1

�1� 2⇢+ 2⇢�yi

x

i

=12n

w

(y)nY

i=1w

i

=1

�1� 2⇢+ 2⇢�yi

x

i

�. (31)

The expression�1� 2⇢+ 2⇢�yi

x

i

�takes only two values: 1

if yi

= xi

and 1�2⇢ when xi

6= yi

. A factor 1�2⇢ is includedin the product for all the positions i in which x

i

6= yi

andw

i

= 1. Then the product in (31) becomes (1� 2⇢)|(x�y)^w|

and we obtain (25).

Now we are ready to present the main result of this work.

Theorem 1. Let f be a pseudo-Boolean function definedover Bn and a

w

with w 2 Bn its Walsh coe�cients. Thefollowing identity holds for E{f(U

(x, y))}:

E{f(U⇢

(x, y))} =nX

r=0

A(r)x,y

(1� 2⇢)r, (32)

where the coe�cients A(r)x,y

are defined in the following way:

A(r)x,y

=X

w2Bn|(x�y)^w|=r

aw

w

(y). (33)

Proof. According to (22) and (25) we can write

E{f(U⇢

(x, y))} = 2nX

w2Bn

aw

bw,⇢

(x, y)

=X

w2Bn

aw

w

(y)(1� 2⇢)|(x�y)^w|

=nX

r=0

X

w2Bn|(x�y)^w|=r

aw

w

(y)(1� 2⇢)|(x�y)^w|

=nX

r=0

(1� 2⇢)rX

w2Bn|(x�y)^w|=r

aw

w

(y), (34)

and we get (32).

Note that the expression for the expected fitness afterapplying UX is a polynomial in (1 � 2⇢). The degree ofthis polynomial depends on the Hamming distance betweenthe parent solutions, |x � y|, and the maximum order ofthe Walsh decomposition, p

max

. The degree of the poly-nomial will be the minimum between these two numbers,since |(x � y) ^ w| < |w| and |(x � y) ^ w| < |x � y|.This means that the maximum degree of the polynomialis r

max

= min(pmax

, |x� y|).

Proposition 4. Let A(r)x,y

be the polynomial coe�cients

for f and B(r)x,y

the polynomial coe�cients for g. Then, the

polynomial coe�cients for h = f+g are C(r)x,y

= A(r)x,y

+B(r)x,y

.

Proof. Let aw

with w 2 Bn be the Walsh coe�cientsof f and b

w

the Walsh coe�cients of g. Then, the Walshcoe�cients of h = f + g are c

w

= aw

+ bw

. Therefore:

C(r)x,y

=X

w2Bn|(x�y)^w|=r

cw

w

(y) =X

w2Bn|(x�y)^w|=r

(aw

+ bw

) w

(y)

=X

w2Bn|(x�y)^w|=r

aw

w

(y) +X

w2Bn|(x�y)^w|=r

bw

w

(y)

= A(r)x,y

+B(r)x,y

.

When UX is used in the literature a common value for ⇢is 1/2. In this case, the expression for the expected fitnessvalue is a simple coe�cient, as the following corollary proves.

Corollary 1. Let f be a pseudo-Boolean function de-fined over Bn and a

w

with w 2 Bn its Walsh coe�cients.

E{f(C(x, y))} =X

z2Bn

f(z)Pr{C(x, y) = z}

=X

z2Bn

X

w2Bn

a

w

w

(z)

!Pr{C(x, y) = z}

=X

w2Bn

a

w

X

z2Bn

w

(z)Pr{C(x, y) = z}!

=X

w2Bn

a

w

b

w

(x, y)

b

w,⇢

(x, y) =

w

(y)(1� 2⇢)|(x�y)^w| (1)

E{f(U1/2(x, y))} = A

(0)x,y

=X

w2Bn|(x�y)^w|=0

a

w

w

(y). (2)

f

2

Autocorrelation FDC Mutation Uniform Crossover Runtime

Page 15: Problem Understanding through Landscape Theory

15 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  Can we compute the probability distribution?

Only Expected Values?

fitness

prob

abili

ty Current solution

expectation

expectation

Autocorrelation FDC Mutation Uniform Crossover Runtime

Fitness Probability Distribution of Bit-Flip Mutation

Thus, if we find a polynomial time algorithm to evaluate f[0] we can solve the decisionproblem in polynomial time. But, as SAT is NP-complete then NP = P.

As a consequence of the previous proposition we cannot ensure that an efficientevaluation of the matrix function F (x) exists in general. The complexity of computingF (x) depends on the problem.

3.4 Fitness Probability DistributionWith the help of the moments vector µ{f(M

p

(x))} we can compute the probabilitydistribution of the values of f in a mutated solution. In order to do this we proceed inthe same way as Sutton et al. (2011a).

Let us call ⇠0 < ⇠1 < · · · < ⇠q�1 to the q possible values that the function f can take

in the search space. Since we are dealing with a finite search space, q is a finite number(perhaps very large). We are interested in computing Pr{f(M

p

(x)) = ⇠i

} for 0 i < q.In order to simplify the notation in the following we define the vector of probabilities⇡(f(M

p

(x))) as ⇡i

(f(Mp

(x))) = Pr{f(Mp

(x)) = ⇠i

}.

Theorem 3. Let us consider the binary hypercube and let us denote with ⇠i

the possible valuesthat the objective function f can take in the search space, where ⇠

i

< ⇠i+1 for 0 i < q � 1.

Then, the vector of probabilities ⇡(f(Mp

(x))) can be computed as:

⇡(f(Mp

(x))) =�V T

��1F (x)

| {z }problem-dependent

⇤(p), (39)

where the matrix function F (x) is limited to the first q rows and V denotes the Vandermondematrix for the ⇠

i

values, that is, Vi,j

= ⇠ji

for 0 i, j < q.

Proof. We can compute the m-th moment µm

{f(Mp

(x))} using the following expres-sion:

µm

{f(Mp

(x))} =

q�1X

i=0

⇠mi

Pr{f(Mp

(x)) = ⇠i

}, (40)

which we can write in vector form as:

µ{f(Mp

(x))} =

0

BBB@

1 1 · · · 1

⇠10 ⇠11 · · · ⇠1q�1

......

. . ....

⇠q�10 ⇠q�1

1 · · · ⇠q�1q�1

1

CCCA

0

BBB@

Pr{f(Mp

(x)) = ⇠0}Pr{f(M

p

(x)) = ⇠1}...

Pr{f(Mp

(x)) = ⇠q�1}

1

CCCA

= V T⇡(f(Mp

(x))). (41)

Using (38) we can write:

µ{f(Mp

(x))} = V T⇡(f(Mp

(x))) = F (x)⇤(p), (42)

and solving ⇡(f(Mp

(x))) we finally get (39). The Vandermonde matrix is invertiblebecause all the ⇠

i

values are different by definition.

Again we can observe that ⇡(f(Mp

(x))) is the product of a term which is problem-dependent and a vector which depends on the parameter of the mutation p. From (39)it is clear that each particular probability ⇡

i

(f(Mp

(x))) is a polynomial in p.

Evolutionary Computation Volume x, Number x 13

Polynomials in p

Page 16: Problem Understanding through Landscape Theory

16 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

•  The expected hitting time is a fraction of polynomials in p

•  Optimal probability of mutation for n=2

Fitness Probability Distribution of Bit-Flip Mutation

since I � T is an upper triangular matrix and the system cam be solved in O(q2). Thecomponents of t will be, in general, fractions of polynomials in p. The vector t has onlyq � 1 components indexed by number from 0 to q � 2, but for the sake of completenesswe can extend it with an additional component t

q�1 = 0, which is the expected runtimeof the algorithm when the initial solution is the global optimum.

Assuming that the algorithm starts from a random solution, the expected runtimeis given by

E{⌧} =

q�1X

l=0

|Xl

||X| tl, (89)

where X is the set of solutions and Xl

is the set of solutions with fitness f(x) = ⇠l

.The expected runtime (89) is not an approximation or bound, it is the exact ex-

pression of the expected runtime as a function of p the probability of flipping a bit.However, this expression will only be practical if 1) we can define the $ matrix in theproblem we are interested and 2) the evaluations of this matrix can be efficiently donein a computer. These conditions limit the number of problems whose runtime can beanalyzed using this approach. However, we found in Section 4.1 that for any monotonefunction of Onemax we can efficiently construct and evaluate the $ matrix. In the nextsubsection we focus on Onemax.

5.2 Runtime of (1 + �) EA for OnemaxThe Onemax problem has been studied in the literature on runtime analysis manytimes. Garnier et al. (1999) derived an expression for the transition probability ma-trix of the (1 + 1) EA for the Onemax function that was later reported by He and Yao(2003). Their expression is the same as (86) for � = 1. Tight upper and lower boundshave been derived for the (1 + 1) EA using different mutation rates. Recently, Witt(2012) proved that the (1 + 1) EA optimizes all linear functions (including Onemax)in expected time en lnn + O(n), and the expected optimization time is polynomial aslong as p = O((log n)/n) and p = ⌦(1/poly(n)). Jansen et al. (2005) proved that usinga (1 + �) EA the expected number of iterations after reaching the global optimum isO(n log n/� + n). In summary, the Onemax problem is well-known and the content ofthis section adds not too much to the current knowledge on this problem. The goalof this section is, thus, to obtain the same results from a different perspective, that oflandscape analysis. The advantage of this approach is that with an exact expression ofthe expected runtime we can find a precise answer to some concrete questions for someparticular instances. The disadvantage is that the expression is quite complex to ana-lyze and we need to use numerical methods, so it is not easy to generalize the answersobtained.

Let us first start by studying the (1 + 1) EA. Taking into account the $ matrixdefined in (51) for Onemax, the expected number of iterations can be exactly computedas a function of p, the probability of flipping a bit. Just for illustration purposes, wepresent the expressions of such expectation for n 3:

E{⌧} =

1

2pfor n = 1, (90)

E{⌧} =

7� 5p

4(p� 2)(p� 1)pfor n = 2, (91)

E{⌧} =

26p4 � 115p3 + 202p2 � 163p+ 56

8(p� 1)

2p (p2 � 3p+ 3) (2p2 � 3p+ 2)

for n = 3. (92)

Evolutionary Computation Volume x, Number x 25

F. Chicano, A. M. Sutton, L. D. Whitley and E. Alba

We can observe how the expressions grow very fast as n increases. The factorp(p � 1) is always present in the denominator for n > 2, what means that when ptakes extreme values, p = 0 or p = 1, it is not possible to reach the global optimumfrom any solution, since the algorithm will keep the same solution if p = 0 or willalternate between two solutions if p = 1. However, when n = 1 the probability p = 1

is valid, furthermore, is optimal, because if the global solution is not present at thebeginning we can reach it by alternating the only bit we have. In Figure 1 we showthe expected runtime as a function of the probability of flipping a bit for n = 1 to 7.We can observe how the optimal probability (the one obtaining the minimum expectedruntime) decreases as n increases.

n=1n=2

n=3

n=4n=

5n=6n=7

p

E{t}

0.0 0.2 0.4 0.6 0.8 1.00

5

10

15

20

25

30

Figure 1: Expected runtime of the (1+1) EA for Onemax as a function of the probabilityof flipping a bit. Each line correspond to a different value of n from 1 to 7.

Having the exact expressions we can compute the optimal mutation probabilityfor each n by using classical optimization methods in one variable. In particular, forn = 1 the optimal value is p = 1 as we previously saw and for n = 2 we have to solve acubic polynomial in order to obtain the exact expression. The result is:

p⇤2 =

1

5

0

@6� 3

s2

23� 5

p21

� 3

s23� 5

p21

2

1

A ⇡ 0.561215, (93)

which is slightly higher than the recommended value p = 1/n. As we increase n,analytical responses for the optimal probability are not possible and we have to applynumerical methods. In our case we used the Newton method in order to find a rootof the equation dE{⌧}

dp

= 0. The results up to n = 100 can be found in Table 1. A fastobservation of the results reveals that the optimal probability is always a little bit higherthan the recommended p = 1/n.

From previous work we know that the optimal probability is in the form c/n for a

26 Evolutionary Computation Volume x, Number x

Runtime of (1+1) EA: Expected Hitting Time Autocorrelation FDC Mutation Uniform Crossover Runtime

Page 17: Problem Understanding through Landscape Theory

17 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Runtime of (1+1) EA: Curves

F. Chicano, A. M. Sutton, L. D. Whitley and E. Alba

We can observe how the expressions grow very fast as n increases. The factorp(p � 1) is always present in the denominator for n > 2, what means that when ptakes extreme values, p = 0 or p = 1, it is not possible to reach the global optimumfrom any solution, since the algorithm will keep the same solution if p = 0 or willalternate between two solutions if p = 1. However, when n = 1 the probability p = 1

is valid, furthermore, is optimal, because if the global solution is not present at thebeginning we can reach it by alternating the only bit we have. In Figure 1 we showthe expected runtime as a function of the probability of flipping a bit for n = 1 to 7.We can observe how the optimal probability (the one obtaining the minimum expectedruntime) decreases as n increases.

n=1n=2

n=3n=4n=

5n=6n=7

p

E{t}

0.0 0.2 0.4 0.6 0.8 1.00

5

10

15

20

25

30

Figure 1: Expected runtime of the (1+1) EA for Onemax as a function of the probabilityof flipping a bit. Each line correspond to a different value of n from 1 to 7.

Having the exact expressions we can compute the optimal mutation probabilityfor each n by using classical optimization methods in one variable. In particular, forn = 1 the optimal value is p = 1 as we previously saw and for n = 2 we have to solve acubic polynomial in order to obtain the exact expression. The result is:

p⇤2 =

1

5

0

@6� 3

s2

23� 5

p21

� 3

s23� 5

p21

2

1

A ⇡ 0.561215, (93)

which is slightly higher than the recommended value p = 1/n. As we increase n,analytical responses for the optimal probability are not possible and we have to applynumerical methods. In our case we used the Newton method in order to find a rootof the equation dE{⌧}

dp

= 0. The results up to n = 100 can be found in Table 1. A fastobservation of the results reveals that the optimal probability is always a little bit higherthan the recommended p = 1/n.

From previous work we know that the optimal probability is in the form c/n for a

26 Evolutionary Computation Volume x, Number x

Autocorrelation FDC Mutation Uniform Crossover Runtime

Page 18: Problem Understanding through Landscape Theory

18 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Runtime of (1+1) EA: Optimal Probabilities Fitness Probability Distribution of Bit-Flip Mutation

n p⇤n

E{⌧} n p⇤n

E{⌧} n p⇤n

E{⌧}1 1.00000 0.500 35 0.03453 273.018 68 0.01741 648.9722 0.56122 2.959 36 0.03354 283.448 69 0.01715 661.1893 0.38585 6.488 37 0.03261 293.953 70 0.01690 673.4454 0.29700 10.808 38 0.03172 304.531 71 0.01665 685.7405 0.24147 15.758 39 0.03088 315.181 72 0.01642 698.0736 0.20323 21.222 40 0.03009 325.900 73 0.01618 710.4447 0.17526 27.120 41 0.02933 336.688 74 0.01596 722.8528 0.15391 33.391 42 0.02861 347.541 75 0.01574 735.2989 0.13710 39.990 43 0.02793 358.459 76 0.01553 747.779

10 0.12352 46.882 44 0.02727 369.441 77 0.01532 760.29711 0.11233 54.039 45 0.02665 380.484 78 0.01512 772.84912 0.10295 61.437 46 0.02605 391.587 79 0.01492 785.43713 0.09499 69.057 47 0.02548 402.750 80 0.01473 798.05914 0.08815 76.882 48 0.02493 413.970 81 0.01454 810.71515 0.08220 84.898 49 0.02441 425.247 82 0.01436 823.40516 0.07699 93.092 50 0.02391 436.580 83 0.01418 836.12817 0.07239 101.454 51 0.02342 447.967 84 0.01400 848.88418 0.06830 109.974 52 0.02296 459.407 85 0.01384 861.67319 0.06463 118.642 53 0.02251 470.900 86 0.01367 874.49320 0.06133 127.453 54 0.02208 482.444 87 0.01351 887.34521 0.05835 136.398 55 0.02167 494.038 88 0.01335 900.22922 0.05563 145.471 56 0.02127 505.682 89 0.01320 913.14323 0.05316 154.667 57 0.02088 517.374 90 0.01304 926.08824 0.05089 163.981 58 0.02051 529.114 91 0.01290 939.06325 0.04880 173.406 59 0.02016 540.901 92 0.01275 952.06926 0.04687 182.940 60 0.01981 552.734 93 0.01261 965.10427 0.04509 192.578 61 0.01947 564.612 94 0.01247 978.16828 0.04344 202.316 62 0.01915 576.535 95 0.01234 991.26129 0.04190 212.151 63 0.01884 588.502 96 0.01221 1004.38330 0.04046 222.079 64 0.01853 600.513 97 0.01208 1017.53331 0.03912 232.097 65 0.01824 612.565 98 0.01195 1030.71232 0.03786 242.203 66 0.01796 624.660 99 0.01183 1043.91833 0.03668 252.393 67 0.01768 636.796 100 0.01170 1057.15134 0.03557 262.666

Table 1: Optimal probability values for an (1 + 1) EA solving Onemax.

constant c. We can use the results obtained by numerical analysis to find the value of cand check the dependency with n. That is, using the optimal probability p⇤

n

shown inTable 1 we can compute c

n

= p⇤n

n in order to see what is the value of cn

. In Figure 2 weplot c

n

as a function of n. We can observe that the optimal probability is not p = c/n fora fixed c. The value of the constant c

n

is higher than 1 and depends on n. However, wecan observe a clear trend c

n

! 1 as n tends to 1. The maximum value for cn

is reachedin n = 11 and the value is c11 = 1.23559.

It is also well-known that for this optimal probability the expected runtime is⇥(n log n). We can also check this using numerical analysis. We used the optimalexpected runtime for this computation and found the best fit model including n and

Evolutionary Computation Volume x, Number x 27

Autocorrelation FDC Mutation Uniform Crossover Runtime

Page 19: Problem Understanding through Landscape Theory

19 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Landscape Explorer RCP Architecture Procs. & Lands. URLs

•  Main goals:

•  Easy to extend

•  Multiplatform

•  RCP architecture

RCP (Rich Client Platform)

Page 20: Problem Understanding through Landscape Theory

20 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Rich Client Platform •  Mechanisms for plugin interaction:

•  Extensions, extension points and package exports

RCP Architecture Procs. & Lands. URLs

Page 21: Problem Understanding through Landscape Theory

21 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Architecture •  Main plugins of the application

esta basado en los conceptos de punto de extension yextension. Una extension representa una nueva fun-cionalidad que se incorpora al software. Esta exten-sion debe conectarse a uno de los puntos de exten-sion existentes en el mismo. Como requisito paraque la conexion sea correcta, la extension debe pro-porcionar la informacion requerida por el punto deextension en el formato correcto. Esta informacionpuede ser desde cadenas de texto hasta clases Javaque implementan una determinada interfaz.

Los plug-ins pueden definir tanto extensiones co-mo puntos de extension. De esta forma, si un plug-in define un punto de extension favorece la exten-sion del software. Por otro lado, los plug-ins que de-finen extensiones estan contribuyendo a ampliar lafuncionalidad del software. La descripcion de las ex-tensiones y los puntos de extension presentes en unplug-in se encuentra en un fichero dentro del propioarchivo jar del plug-in y el entorno de ejecucion seencarga de leerlo para proporcionar dicha informa-cion a los plug-ins que deseen consultarla. Sin embar-go, es responsabilidad de los plug-ins que definieronlos puntos de extension el comprobar que hay otrosplug-ins que han creado extensiones para estos y re-alizar las acciones oportunas para que dichas fun-ciones esten disponibles de cara al usuario final.

Ademas de los puntos de extension y las exten-siones, los plug-ins pueden declarar publicos ciertospaquetes definidos en ellos, de esta forma se puedenanadir con facilidad bibliotecas de clases necesariaspara determinadas funciones de la herramienta.

B. Arquitectura de la aplicacion

Dado el caracter abierto de la teorıa de landscapesno es posible predecir el tipo de analisis y de ele-mentos software que seran necesarios en el futuropara investigar en este campo. Por este motivo, seha disenado la arquitectura de Landscape Explorerde forma que permita casi cualquier extension posi-ble. Tan solo se ha especificado lo que pensamos quepuede ser un conjunto mınimo de clases y proced-imientos que cualquier analisis podrıa necesitar. Sinembargo, incluso este conjunto mınimo podrıa cam-biar con el paso del tiempo y la modularizacion dela aplicacion permitirıa realizar este cambio con rel-ativa facilidad.

Las clases y procedimientos basicos de la apli-cacion estan repartidos en dos plug-ins (vease laFigura 2):neo.landscapes.theory.kernel: proporciona las

interfaces y clases que definen lo que es un landscape(desde el punto de vista de la implementacion) yofrece varios ejemplos de ellos.neo.landscapes.theory.tool: es el plug-in que

contiene el punto de entrada a la aplicacion y es re-sponsable de mostrar la interfaz grafica de usuario(GUI) de la aplicacion ası como de definir los prin-cipales puntos de extension de Landscape Explorer.

El primer plug-in, al que llamaremos kernel paraahorrar notacion, define un punto de extension,

neo.landscapes.theory.tool.procedures

neo.landscapes.theory.tool.selectors

neo.landscapes.theory.tool neo.landscapes.theory.kernel

TSP QAP SS UQO DFA WFFAP

.landscapes

.selectors

.procedures

Fig. 2. Conjunto de plug-ins que componen la aplicacion.

neo.landscapes.theory.kernel.landscapes, quedebe ser extendido por todos aquellos plug-ins quequieran definir landscapes para ser usados en losdistintos analisis que proporcione la herramienta.Cuando un investigador desea aplicar todos los anali-sis implementados en la herramienta a un problemaconcreto en que esta interesado, debe crear un plug-in que defina una extension para este punto de ex-tension. En la fecha de escritura de este artıculo,el propio kernel define seis extensiones para estepunto que se corresponden con los seis problemas deoptimizacion que mencionaremos en la Seccion IV-C.

El segundo plug-in del nucleo de Landscape Ex-plorer, al que denominaremos tool, define los sigu-ientes puntos de extension:neo.landscapes.theory.tool.procedures: per-

mite a otros plug-ins definir metodos de analisis delandscapes. Los plug-ins que extiendan este punto deextension deben proporcionar una clase que imple-mente la interfaz IProcedure. Ademas de propor-cionar el nombre y una descripcion del metodo deanalisis, la clase debe construir un componente deGUI para que el usuario interaccione con el meto-do de analisis y pueda observar los resultados. Per-mitiendo que sea el desarrollador de la extension elque defina la GUI para el metodo de analisis con-seguimos minimizar los requisitos que dicho metododebe cumplir.neo.landscapes.theory.tool.selectors: uno

de los pasos basicos para realizar el analisis deun landscape es seleccionar dicho landscape. Estepunto de extension permite a los desarrolladoresproporcionar procedimientos de seleccion de land-scapes para que los desarrolladores de nuevos meto-dos de analisis no tengan que hacerlo de nuevo encada metodo. Los desarrolladores que deseen pro-porcionar un procedimiento de seleccion deberanproporcionar una clase que implemente la interfazISelector que, entre otras cosas, devuelve un com-ponente grafico asociado a dicho procedimiento deseleccion.

Ademas de los mencionados puntos de exten-sion, el plug-in tool exporta una clase, llamadaSelectionServices, que ofrece operaciones publi-cas para la seleccion de landscapes. Esta claseesta pensada para ser usada por parte de los meto-dos de analisis. Cuando un metodo de analisis nece-

RCP Architecture Procs. & Lands. URLs

Page 22: Problem Understanding through Landscape Theory

22 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Included Procedures and Landscapes •  Procedures

•  Elementary Landscape Check

•  Mathematica program to get the Elementary Landscape Decomposition

•  Computation of the reduced adjacency matrix

•  Theoretical Autocorrelation Measures

•  Experimental Autocorrelation

•  Estimation of the number of elementary components

•  Landscapes

•  QAP (Quadratic Assignment Problem)

•  UQO (Unconstrained Quadratic Optimization)

•  TSP (Traveling Salesman Problem)

•  Walsh Functions (linear combinations)

•  Subset Sum Problem

RCP Architecture Procs. & Lands. URLs

Page 23: Problem Understanding through Landscape Theory

23 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

Available on the Internet http://neo.lcc.uma.es/software/landexplorer

RCP Architecture Procs. & Lands. URLs

Page 24: Problem Understanding through Landscape Theory

24 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

On-line Computation of Autocorrelation for QAP http://neo.lcc.uma.es/software/qap.php

RCP Architecture Procs. & Lands. URLs

Page 25: Problem Understanding through Landscape Theory

25 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Landscape Theory Relevant Results Software Tools Conclusions

& Future Work

f(x) =

nX

p=0

f[p](x)

where

j =

�1

2

f[p] = 0 for j > 0

f[0] = f

r =

�f[p](x⇤)

f

pn

r = 0

f(x) = f[0](x) + f[1](x) + f[2](x) + . . .+ f[n](x)

a1 = 7061.43

a2 = a3 = . . . = a45 = 1

R(A, f) = �(f)| {z }problem

⌦ ⇤(A)| {z }algorithm

(1)

1

Conclusions & Future Work Conclusions •  Landscape Theory is very good for providing statitical information at a

low cost •  FDC, expected fitness value after mutation and uniform crossover •  Runtime? •  Software Tools have been developed to help non-experts to use the

knowledge

Future Work

Page 26: Problem Understanding through Landscape Theory

26 / 26 July 2013 Workshop on Problem Understanding & RWO @ GECCO 2013

Thanks for your attention !!!

Problem Understanding through Landscape Theory