View
228
Download
2
Category
Preview:
Citation preview
8/11/2019 AOR Syllabus20132014
1/115
ADVANCED OPERATIONS RESEARCH
Yves Crama
HEC Management School, University of Liege
January 2014
8/11/2019 AOR Syllabus20132014
2/115
Contents
1 Introduction 1
2 Combinatorial optimization and computational complexity 3
2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 The shortest path problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 The Chinese postman problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 The traveling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.4 The 0-1 linear programming problem . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.5 The graph equipartitioning problem . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.6 The graph coloring problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.7 Combinatorial optimization in practice . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.8 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 A glimpse at computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Computational performance criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Problems and problem instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Easy and hard problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Heuristics for combinatorial optimization problems 173.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Reformulation, rounding and decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 List-processing heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.1 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
i
8/11/2019 AOR Syllabus20132014
3/115
ii CONTENTS
3.4 Neighborhoods and neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.2 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.5 Steepest descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.2 Local minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.3 Choice of neighborhood structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.4 Selection of neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.5 Fast computation of the objective function . . . . . . . . . . . . . . . . . . . . . . 31
3.5.6 Flat ob jective functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.7 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6.1 The simulated annealing metaheuristic . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6.2 Choice of the transition probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6.3 Stopping criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6.4 Implementing the SA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6.5 Variants of the SA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Tabu search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.8 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.2 Diversification via crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8.3 A basic genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8.4 Intensification and local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.8.5 Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.8.6 Implementing a genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 Modeling languages for mathematical programming 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8/11/2019 AOR Syllabus20132014
4/115
CONTENTS iii
5 Integer programming 63
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.1 Integer programming models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.1.2 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Branch-and-bound method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.1 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.3 Heuristic solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.4 Tight formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2.5 Some final comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Neural networks 77
6.1 Feedforward neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Neural networks as computing devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Neural networks as function approximation devices . . . . . . . . . . . . . . . . . . . . . . 80
6.4 Unconstrained nonlinear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4.1 Minimization problems in one variable: introduction . . . . . . . . . . . . . . . . . 82
6.4.2 Equations in one variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4.3 Minimization problems in one variable: algorithms . . . . . . . . . . . . . . . . . . 84
6.4.4 Multivariable minimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.5 Application to NN design: the backpropagation algorithm . . . . . . . . . . . . . . . . . . 86
6.5.1 Extensions of the delta rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.5.2 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.7 Notes on PROPAGATOR software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.7.1 Input files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.7.2 Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.7.3 Main window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7 Cases 95
7.1 Container packing at Titanic Corp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.2 Stacking boxes at Gizeh Inc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8/11/2019 AOR Syllabus20132014
5/115
iv CONTENTS
7.3 A high technology routing system for Meals-on-Wheels . . . . . . . . . . . . . . . . . . . . 96
7.4 Operations scheduling in Hobbitland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.5 Setup optimization for the assembly of printed circuit boards . . . . . . . . . . . . . . . . 987.6 A new product line for Legiacom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Bibliography 110
8/11/2019 AOR Syllabus20132014
6/115
Chapter 1
Introduction
The aim of the course Advanced Operations Researchis to present several perspectives on mathematical
modeling and problem-solving strategies as they are used in operations research.
The course contains several independent parts, namely:
general-purpose heuristic strategies for the solution of combinatorial optimization problems, such
as simulated annealing, tabu search or genetic algorithms;
learning of a modeling language, i.e. a computer language specially devoted to the formulation,
the solution and the analysis of large-scale optimization models (linear or nonlinear programmingproblems);
an introduction to mixed integer programming models and algorithms;
other numerical methods, as time allows: neural networks, simulation, ...
These lecture notes propose a preliminary draft of the material usually covered in the course. They
concentrate mostly on combinatorial optimization heuristics, on mixed integer programming methods and
on neural networks. Modeling languages are handled more superficially, as this topic is mostly illustrated
through the development of numerical models in the computer lab.The course assumes that the reader has had a first introduction to operations research and has some
elementary knowledge of mathematical modeling, of mathematical programming and of graph theory.
Special thanks are due to Jean-Philippe Peters who drafted the first version of these classroom notes.
1
8/11/2019 AOR Syllabus20132014
7/115
2 CHAPTER 1. INTRODUCTION
8/11/2019 AOR Syllabus20132014
8/115
Chapter 2
Combinatorial optimization and
computational complexity: Basic
notions
The generic combinatorial optimization (CO) problem is
minimize {F(x)| x X} (2.1)
whereX is afinite(or at least, discrete1) set offeasible solutionsand Fis a real-valued objective function
defined on X. Of course, ifXis given in extension, i.e., by a complete explicit list of its elements, then
solving (CO) is quite easy: it suffices to compute the value ofF(x) for all elements x Xand to retain
the best element. But whenX is defined implicitly rather than in extension, the problem may become
much harder.
2.1 Examples
2.1.1 The shortest path problem
Nowadays, lots of commercial software products allow you to select effortlessly the shortest possible
route from your current location to a chosen destination (for example, from Liege to Torremolinos). The
1Intuitively, a set is discreteif it does not contain any continuous subset
3
8/11/2019 AOR Syllabus20132014
9/115
4 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY
optimization problem which has to be solved whenever you address a query to the system can be modelled
as follows.
There is a graph G = (V, E) whereVis a finite set of elements called verticesand Eis a collection of
pairs of vertices called edges(think ofVas a list of geographical locations and ofEas a road network;
see e.g. Figure 2.1 for a representation). Assume that every edge e Ehas a nonnegative length (e),
and lets and t be two vertices in G. Theshortest path problemis to find a path (a connected sequence of
edges) through the graph that starts ats and ends at t, and which has the shortest possible total length.
This is clearly a CO problem, where Xis the (finite) set of all paths from s to t and F(x) is the total
length of path x. Note that the cardinality ofXcan be of the same order of magnitude as 2|V|, which is
quite large as compared to the size of the graph.
4 5
2 3
1
6
Figure 2.1: A graph with 6 vertices and 8 edges
2.1.2 The Chinese postman problem
This problem is similar to the shortest path problem, except that we consider here the additional con-
straint that every edge ofG should be traversed exactly once by the path from sto t (the postman has
to visit every street in his district). It is also usual to assume that s= t in this problem (the postmanreturns to the depot at the end of the day).
Besides its postal illustration, this model encounters applications in a variety of vehicle routing situ-
ations (garbage collection, snow plowing, street cleaning, etc.) and in the design of automatic drawing
software.
8/11/2019 AOR Syllabus20132014
10/115
2.1. EXAMPLES 5
2.1.3 The traveling salesman problem
The traveling salesman problem (denoted TSP) is again similar to the shortest path problem, with the
added requirement that every vertex should be visited exactly once by the path from s to t: the salesman
must visit each and every customer (located in the cities in V) along the way. In the sequel, we shall
always assume that G is a complete graph (i.e., it contains all possible edges) and that s= t. Thus, we
speak of a traveling salesman tour rather than path. Then,Xcan simply be viewed as the set of all
permutations of the elements ofV and|X|= |V|!. For instance, if|V|= 30, then |V|! is roughly 2 1032.
This famous combinatorial optimization problem has numerous applications, either in its pure form
or as a subproblem of more complex models. It arises for instance in many production scheduling settings
(sequencing of tasks on a single machine when the setup time between two successive tasks depends on
the identity of these tasks, sequencing of drilling operations in metal sheets, sequencing of componentplacement operations for the assembly of printed circuit boards, etc.) and in various types of vehicle
routing models (truck delivery problems, mail pickup, etc.).
2.1.4 The 0-1 linear programming problem
We can express the 0-1 LP problem as
min cx
subject to Ax b and x {0, 1}n
wherec Rn,b Rm andA Rmn are the parameters (or numerical data) of the problem and x Rn
is a vector of (unknown) decision variables. Note that, if we drop the constraint x {0, 1}n, then the
problem is simply a linear programming problem which can be solved by a variety of efficient algorithms
(e.g., the simplex method or an interior-point method). However, the requirement that x {0, 1}n leads
to a (much harder) CO problem whereX={ x {0, 1}n : Ax b }. The cardinality of this set, although
finite, is potentially as large as 2n (whenn = 30, this is approximately 109).
Theknapsack problemis the special case of 0-1 LP with only one inequality constraint:
max cx
subject to ax b andx {0, 1}n
where a, c Rn+ and b R. The usual interpretation of this problem is that the indices i = 1, 2, . . . , n
denote n objects that a hiker may want to carry in her knapsack, ci is the utility of object i, ai is its
weight and b is the maximum weight that the hiker is able to carry.
8/11/2019 AOR Syllabus20132014
11/115
8/11/2019 AOR Syllabus20132014
12/115
2.2. A GLIMPSE AT COMPUTATIONAL COMPLEXITY 7
R Y
G B
R
G
Figure 2.2: A feasible coloring
Applegate, Bixby, Chvatal, and Cook (2006), Barnhart, Johnson, Nemhauser, Sigismondi, and Vance
(1993), Bartholdi, Platzman, Collins, and Warden (1983), Bollapragada, Cheng, Phillips, Garbiras, Sc-
holes, Gibbs, and Humphreville (2002), Crama, van de Klundert, and Spieksma (2002), Crama, Oer-
lemans, and Spieksma (1996), Jain, Johnson, and Safai (1996), Glover and Laguna (1997), Kohli and
Krishnamurti (1987), Moonen and Spieksma (2003), Oliveira, Ferreira, and Vidal (1993), Tyagi and
Bollapragada (2003), etc.
2.1.8 Exercises.
Exercise 1. Consider the Meals-on-Wheels case in Section 7.3. Explain the similarities that this problem
shares with the traveling salesman problem, as well as the differences between the problems.
2.2 A glimpse at computational complexity
In order to fully appreciate the field of combinatorial optimization, it is necessary to understand, at least
at an intuitive level, some of the basic concepts of computational complexity. This part of theoretical
computer science deals with fundamental, but extremely deep questions like: what tasks can be carriedout by a computer?, or how much time does a given computational task require?
In this section, we attempt to introduce some elements of computational complexity, in a very informal
and hand-waving way. We refer the interested reader to Tovey (2002) for a more formal tutorial, and to
Papadimitriou and Steiglitz (1982) for a rigorous treatment of the topic.
8/11/2019 AOR Syllabus20132014
13/115
8 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY
2.2.1 Computational performance criteria
What do we expect from a CO algorithm ? Well, an obvious answer would be that this algorithm should
always return an optimal solution of the problem. Is it the only game in town ? Certainly not. We
might also want it to be fast or efficient. Combining these two expectations is a crucial thing. Indeed
the required time to solve a problem logically increases together with the size of this problem, where the
size can be measured by the amount of data needed to describe a particular instance of the problem.
Let us take a look at an example. Suppose that we want to solve a 0-1 linear programming problem
involving n variables xj {0, 1}, j = 1, . . . , n. We can certainly find an optimal solution by listing all
possible vectors (x1, x2, . . . , xn), by checking for each of them whether it is feasible or not, by computing
the value of the objective function for each such feasible solution, and by retaining the best solution
found in the process. If we decide to go that way, then we must consider 2n vectors. For n = 50, that
means 250 1015 = 1, 000, 000, 000, 000, 000 vectors! If our algorithm is able to enumerate one million
(1,000,000) solutions per second, the whole procedure takes 109 seconds, or about 30 years. And for
n= 60, the enumeration of the 260 solutions would take about 30, 000 years !!
Note thatadding10 variables to the problem increases the computing time by a multiplicative factor
of 210 1, 000. So, with n = 80 variables (a rather modest problem size), the same algorithm would run
for 30 billion years, which is about twice the age of the universe. Not really efficient, by any practical
standards...
Let us look at this issue from another vantage point. Consider the well-known Moores law: Gordon
Moore, co-founder of the chips giant Intel, prophetized in 1965 that the number of transistors per square
inch on integrated circuits would double every 18 months per year starting from 1962, the year the
integrated circuit was invented (see the original paper of Moore (1965) for more details). In other words,
your PC processor works twice faster every year and a half, meaning that its speed is multiplied by 100
in 10 years.2 So, if you were able to enumerate 2n solutions in one hour in 1997, you could enumerate
100 2n
8/11/2019 AOR Syllabus20132014
14/115
8/11/2019 AOR Syllabus20132014
15/115
10 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY
1. Matrix addition problem:
Instance size: 2n2.
Algorithm: any naive addition algorithm.Running time: n2 (additions). We denote this byO(n2), meaning that the running time grows
at most like n2.
2. Shortest path problem:
Instance size: O(n2) wheren= |V|.
Algorithm 1: enumerate all possible paths between s and t.
Running time of Algorithm 1: there could be exponentially many paths and tA1= O(2n).
Algorithm 2: Dijkstras algorithm (see Nemhauser and Wolsey (1988)).
Running time: O(n2) operations.
3. Traveling salesman problem:
Instance size: O(n2) wheren= |V|.
Algorithm: enumerate all possible tours.
Running time: O(n!).
In view of these examples, we are led to the following concept: the complexityof an algorithm A for
a problem P is the function
cA(n) = max{tA(I)| Iis an instance ofPwith size s(I) =n}. (2.2)
This is sometimes called the worst-case complexity ofA: indeed, the definition focuses on the worst-case
running time ofA on an instance of size n, rather than on its average running time.
2.2.3 Easy and hard problems
Figure 2.3 represents different types of complexity behaviors for algorithms.
The algorithm A is polynomial ifcA(n) is a polynomial (or is bounded by a polynomial) in n, and
exponential ifcA(n) grows faster than any polynomial function in n. Intuitively, we can probably accept
the idea that a polynomial algorithm is more efficient thaSn an exponential one.
For instance, the obvious algorithms for the addition and or the multiplication of matrices are poly-
nomial. So is the Gaussian elimination algorithm for the solution of systems of linear equations. On
the other hand, the simplex method (or at least, some variants of it) for linear programming problems
8/11/2019 AOR Syllabus20132014
16/115
2.2. A GLIMPSE AT COMPUTATIONAL COMPLEXITY 11
Figure 2.3: (a) Linear: F(n) =an + b (b) Exponential: F(n) =a 2n
is known to be exponential3 while interior point methods are polynomial. This clearly illustrates the
emphasis on the worst-case running time which was already underlined above: indeed, in an average
sense, the simplex algorithm is an efficient method.
The complete enumeration approach for shortest path, Chinese postman or traveling salesman prob-
lems is exponential, since all these problems have an exponential number of feasible solutions. But
polynomial algorithms exist for the shortest path problem or the Chinese postman problem.
For the traveling salesman problem or for 0-1 integer programming problems, by contrast, only ex-
ponential algorithms are known. In fact, it is widely suspected that there does not exist any polynomial
algorithm for these problems. This is a typical feature of so-called NP-hardproblems which we define
(very informally again) as follows (see Papadimitriou and Steiglitz (1982) for details).
Definition 2.2.1. A problem P is NP-hard if it is as least as difficult as the 0-1 linear programming
problem, in the sense that any algorithm forPcan be used to solve the 0-1 LP problem with a polynomial
increase in running time.
The next claim has resisted all proof attempts (and there have been many) since the early 70s, but
the vast majority of computer scientists and operations researchers believe that it holds true.
3Klee and Minty (1972) provide instances Iof the LP problem such that tsimplex 2s(I)
8/11/2019 AOR Syllabus20132014
17/115
8/11/2019 AOR Syllabus20132014
18/115
2.2. A GLIMPSE AT COMPUTATIONAL COMPLEXITY 13
Definition 2.2.2. Aheuristicfor an optimization problemPis an algorithm which is based on intuitively
appealing principles, but which does not guarantee to provide an optimal solution ofP.
So, when running on a particular CO problem, a heuristic could for instance
return an optimal solution of the problem, or
return a suboptimal solution, or
return an infeasible solution, or
fail to return any solution at all,
etc.
This very broad definition of a heuristic may seem rather amazing at first sight. It raises again the
question of the criteria which can be applied to analyze the performance of a particular heuristic. We
mention here two criteria which will be of particular concern in this course.
Computational complexity
Generally speaking, we want heuristics to be fast, at least when compared with the highly exponential
running times mentioned above. In fact, the main reason for giving up optimality is that we want the
heuristic to compute quickly a reasonably good solution. Thus, the basic trade-off that we want to achieve
reads
SOLUTION QUALITY vs. RUNNING TIME
Quality of approximation
The solution returned by the heuristic should provide a good approximation of the optimal solution. To
understand how to measure this, let xH be the solution computed by heuristicHfor a particular instance
and letxopt be an optimal solution for this instance.
Then,
E(xH) = F(xH) F(xopt)F(xopt)
0 (2.3)
provides a relative error measure: the closer it is to 0, the better the solution xH.
In general, however, F(xopt) is unknown. So, suppose now that we know how to compute a lower
bound on F(xopt), i.e. a number F such thatF F(xopt) (this is often much easier to compute than
8/11/2019 AOR Syllabus20132014
19/115
14 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY
F(xopt)). Define
E(xH) =F(xH) F
F . (2.4)
Then we have
E(xH) = F(xH)
F(xopt) 1
F(xH)
F 1 =E(xH) (2.5)
which means that E(xH) overestimates the relative error E(xH). So, if E(xH) is small, we can
certainly be happy with the quality of the solution provided by H. (Note also that if the lower bound
F is reasonably close to F(xopt), then E(xH) actually provides a good estimate of the error.)
For example, consider the traveling salesman instance described by the (symmetric) distance matrix
L, where ij represents the distance from i to j , i, j = 1, 2, . . . , 6:
L =
0 4 7 2 6 3
4 0 3 5 5 7
7 3 0 2 6 5
2 5 2 0 9 8
6 5 6 9 0 5
3 7 5 8 5 0
.
Assume now that a heuristic returns the tourxH = (1, 2, 3, 4, 5, 6) (displayed in Figure 2.4).
6
1
5 2
4
3
3
4
3
2
9
5
Figure 2.4: A feasible tour
The total length of this tour is F(xH) = 4 + 3 + 2 + 9 + 5 + 3 = 26. On the other hand, an obvious
lower bound on the optimal tour length is given by the sum of the 6 shortest distances in L. ThusF
= 2 + 2 + 3 + 3 + 4 + 5 = 19, and, consequently, E(xH) = 261919 0.37. We can therefore conclude
8/11/2019 AOR Syllabus20132014
20/115
2.2. A GLIMPSE AT COMPUTATIONAL COMPLEXITY 15
that xH is at most 37% longer than the optimal tour.
In order to compute lower bounds for combinatorial optimization problems, a simple but powerfulprinciple can often be used: when a constraint of a minimization problem P is relaxed (i.e., when
the constraint is either removed or replaced by a weaker one), then the optimal value of the resulting
relaxed problem provides a lower bound on the optimal value ofP. This principle will be illustrated
on the examples below.
2.2.4 Exercises.
Exercise 1. Consider again the traveling salesman problem. For every vertex v V, select the shortest
edge ev incident to v. Show that
vV (ev) is a lower bound on the length of the optimal tour.Compute this lower bound for the numerical example in Section 2.2.3. Can you improve this lower bound
by taking into account thetwo shortest edges incident to every vertex v? What bound do you obtain for
the numerical example?
Exercise 2. Consider the following problem: you want to save n electronic files with respective sizes
s1, s2, . . . , sn 0 on the smallest possible number of storing devices (say, floppy disks) with capacity C.
This problem is known under the name ofbin packing problem, and it is NP-hard. Can you compute a
lower bound on its optimal value?
Exercise 3. Show that the optimal value of the linear programming problem
min cx subject to Ax b, 0 xj 1 (j = 1, 2, . . . , n)
provides a lower bound on the optimal value of the 0-1 LP problem
min cx subject to Ax b, xj {0, 1} (j = 1, 2, . . . , n).
Exercise 4. Show that the lower bounds obtained in Exercises 1-3 can all be viewed as optimal solutions
of a relaxation of the original problem.
8/11/2019 AOR Syllabus20132014
21/115
16 CHAPTER 2. COMBINATORIAL OPTIMIZATION AND COMPUTATIONAL COMPLEXITY
8/11/2019 AOR Syllabus20132014
22/115
Chapter 3
Heuristics for combinatorial
optimization problems
3.1 Introduction
Even though there does not really exist any general theory of heuristics, certain common strategies
can be identified in many successful heuristics. The aim of this chapter is to present such fundamental
principles of heuristic algorithms for combinatorial optimization problems of the form
minimize {F(x)| x X} . (CO)
In Sections 3.2 and 3.3 below, we succesively describe a few simple ideas of this nature, namely
reformulation, decomposition, rounding, and list-processing.
Then, we turn to more elaborate frameworks, or guidelines, which have been proposed to develop
specific heuristics for a broad variety of optimization problems. These frameworks go by the name of
metaheuristic schemes, ormetaheuristicsfor short. Thus, metaheuristics can be viewed as recipes for
the solution of (CO) problems.
We focus more particularly on so-called local search heuristics. Broadly speaking, local search heuris-
tics rely on a common, rather natural and intuitive approach to find a good solution of (CO): starting
from an initial solution, they move from solution to solution in the feasible region X, in an attempt (or
hope) to locate a good solution along the way (see Figure 3.1, where N(x) represents the neighborhood
17
8/11/2019 AOR Syllabus20132014
23/115
18 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
of a current solution x). Most metaheuristics (like simulated annealing or tabu search) specifically
generate local search heuristics. They constitute the main topic of this chapter.
x0x1
x2
x3
x4
N(x0)
N(x1)
N(x2)
N(x3)
X
Figure 3.1: Local search
Additional information on heuristics can be found for instance in Aarts and Lenstra (1997), Glover
and Laguna (1997), Hoos and Stutzle (2005), Papadimitriou and Steiglitz (1982), Pirlot (1992), and manyother sources.
3.2 Reformulation, rounding and decomposition
Many heuristics rely on a few simple and natural ideas. One such idea is to replace the original hard
problem (CO) by an easier, but closely related one, say (CO). This can be accomplished, for instance,
by changing the definition of the objective function, or by dropping some of the constraints of (CO). In
the latter case, solving the simplified problem (CO) usually produces an infeasible solution of (CO), and
this solution needs to be somehow repaired in order to produce a feasible (but suboptimal) solution.
A specific, but extremely useful and common application of this idea is found in rounding algorithms
for 0-1 linear programming problems of the form
min cx
8/11/2019 AOR Syllabus20132014
24/115
3.2. REFORMULATION, ROUNDING AND DECOMPOSITION 19
subject toAx b and x {0, 1}n.
We have already observed in Section 2.1.4 that, when we drop the constraint x {0, 1}n from this
problem formulation, we obtain a linear programming problem which can be easily solved. Of course, the
optimal solution of the LP model is typically fractional, and hence infeasible for the original problem.
However, it is sometimes possible to round this optimal solution in such a way as to obtain feasible 0-1
solutions of the 0-1 LP problem.
Rounding has been used in countless algorithms for 0-1 LP problems, be it in theoretical developments,
in implementations of generic solvers, or in specific industrial applications; see for instance Bollapragada
et al. (2002) for a recent illustration.
Another general idea for solving hard problems is to decompose them into a collection or a sequence
of simpler subproblems. Then each subproblem can be solved either optimally or heuristically, and the
solutions of the subproblems are patched together in order to provide a feasible solution of the original
problem. Similar decomposition approaches are sometimes called divide and conquer strategies in the
broader context of algorithmic design. Note, however, that they usually result in suboptimal solutions of
the original CO problem.
Examples of the decomposition strategy are abundant in real-world settings. In a very broad sense,
if we assume that the ultimate ob jective of management is to optimize the revenues, or the shareholders
profit, or the survivability of a firm, then the functional organization of the firm in marketing, production,
and finance departments can be viewed as a way to decompose the global optimization issue into a num-
ber of subproblems, linked together by appropriate coordination mechanisms (e.g., strategic or business
plans).
More specific examples can be found in classical production planning approaches, for instance in MRP
techniques (Material Resource Planning; see Crama (2002)). Here, for simplicity reasons, the optimal lot
size is usually determined independently for each component arising in a bill-of-materials. But in fact, the
actual cost-minimization problem faced by the firm involves many interactions among these components:
use of common production equipments, possibilities of joint orders from suppliers, etc. Therefore, the
component-wise decomposition only provides a heuristic way of handling the global issue.
Illustrations of decomposition approaches can also be found in the papers by Crama, van de Klundert,
and Spieksma (2002) or by Tyagi and Bollapragada (2003) and in numerous other publications.
8/11/2019 AOR Syllabus20132014
25/115
20 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
3.3 List-processing heuristics
List-processing heuristics (also called greedyor myopic heuristics) can be viewed as a special type of local
search heuristics, and are among the simplest among them (see Section 3.4 below). We do not want to try
to characterize them very precisely here: let us simply say that they apply in particular to CO problems
of the form
min (or max) F(S) subject to S E, S I, (IS)
whereEis a finite set of elements and Iis a collection of subsets ofE.
The elements ofEcan be viewed as thedecision variablesof the problem. For instance, the knapsack
problem (see Subsection 2.1.4) can be interpreted in this way: here,E is a set of objects, and S I if
the subset of objects Sfits in the knapsack.
Now, list-processing heuristics construct a feasible solution of (CO) in successive iterations, starting
from the initial solution S= and adding elements to this solution, one by one, in the order prescribed
by some prespecified priority list. They terminate as soon as the priority list has been exhausted. In
particular, no effort is made to improve this solution in subsequent steps (which justifies the names
myopic or greedy).
Thus, the list-processing metaheuristic can be sketched as in Figure 3.3 below.
1. Etablish a priority listL of the elements ofE.
2. Set S :=.
3. Repeat: ifL is empty then return Sand stop; else
consider the next element in L, say ei, and remove ei from L;
ifS {ei}is feasible, i.e. ifS {ei} I, then set S :=S {ei}.
Figure 3.2: The list-processing metaheuristic
Intuitively speaking, the choice of the list L should be dictated by the impact of each element ofEon
the objective function F: those variables with a smaller marginal cost (for a minimization problem) or
a heavier marginal contribution (for a maximization problem) should receive higher priority. But these
8/11/2019 AOR Syllabus20132014
26/115
3.3. LIST-PROCESSING HEURISTICS 21
general guidelines leave room for many possible implementations. Let us illustrate this discussion on a
few examples.
Example: The knapsack problem. Consider the knapsack problem
max cx
subject to ax b andx {0, 1}n
wherea, c Rn+ andb R. Various list-processing strategies can be proposed for this problem.
Strategy 1.
1. Sort the variables by nonincreasing utility value: ifci > cj , then xi precedes xj in L.
2. Setx := (0, 0, . . . 0).
3. Run throughL; increase the current variable to 1 if the resulting partial solution is feasible; otherwise
leave it equal to 0.
Let us apply this strategy to the instance:
max 3x1+ 10x2+ 3x3+ 7x4+ 6x5
subject to 2x1+ 6x2+ 5x3+ 8x4+ 3x5 16
xi {0, 1}(i= 1, 2, . . . , 5).
For this instance, we successively obtain:
L= (x2, x4, x5, x1, x3)
x := (0, 0, 0, 0, 0)
x2:= 1; x := (0, 1, 0, 0, 0);
x4:= 1; x := (0, 1, 0, 1, 0);
x5:= 0; x := (0, 1, 0, 1, 0) (since (0, 1, 0, 1, 1) is not feasible !);
x1:= 1; x := (1, 1, 0, 1, 0);
x3:= 0; x := (1, 1, 0, 1, 0);
So, the algorithm returns the heuristic solution (1, 1, 0, 1, 0), with value 20.
An obvious shortcoming of Strategy 1 is that it does not take the value of the coefficients aj into
account when fixing the priority list. So, in the previous instance, variable x4 is given higher priority
than x5 when in fact, for a comparable utility, x5 adds much less weight to the knapsack than x4. This
observation leads to the next strategy.
8/11/2019 AOR Syllabus20132014
27/115
22 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
Strategy 2.
1. Sort the variables by nonincreasing value of the ratios ciai
: if ciai
> cjaj
, then xi precedesxj in L.
2. Setx
:= (0, 0, . . . 0).3. Run throughL; increase the current variable to 1 if the resulting partial solution is feasible; otherwise
leave it equal to 0.
Going back to the numerical instance, we now obtain L = (x5, x2, x1, x4, x3). The resulting heuristic
solution is (1, 1, 1, 0, 1), with value 22.
Interestingly, it can be proved that this strategy is equivalent to the following one (which combines
rounding with list-processing): solve the LP relaxation of the knapsack problem to obtain a fractional
solutionx, then sort the variables by nonincreasing value of the componentsxi and continue as in Steps
2-3 of Strategy 2 (see e.g. Nemhauser and Wolsey (1988)).
Example: The traveling salesman problem. The TSP can be viewed as a minimization problem
of the form (IS), where E is the set of edges of the underlying graph, and S is in I if and only ifS is
a subset of edges which can be extended to a TSP tour. Assume now that the priority listL sorts the
edges by nondecreasing length. The resulting greedy heuristic is known in the literature as theshortest
edgeheuristic.
Example: The maximum forest (MFT) problem. Let G = (V, E) be a non-oriented graph with
weightw(e) 0 on each edgee E. IfSis any subset of edges ofG, the weight ofSis w(S) =
eSw(e).
AforestofG is a subset of edges ofG which does not contain any cycle (i.e., closed path). Themaximum
forest problemasks for a forest ofG of maximum weight.
The greedy (list-processing) algorithm for this problem is:
Greedy MFT
1. Sort the edges ofG by nonincreasing weight: ifw(ei)> w(ej), then ei precedesej inL.
2. SetT :=.
3. Run through L; ifT {ei}is a forest (i.e., is cycle-free), then set T :=T {ei}.
Let us look at the instance in Figure 3.3, with the following weights (we denote by w(1, 2) the weight
of edge {1, 2}, etc.): w(1, 2) = 10, w(3, 5) = 8, w(1, 3) = 7, w(2, 3) = 7, w(5, 6) = 6, w(3, 6) = 6,
w(2, 4) = 2, w(4, 5) = 2. Note that we have listed the weights by nonincreasing value. So, the Greedy
algorithm successively produces:
8/11/2019 AOR Syllabus20132014
28/115
3.3. LIST-PROCESSING HEURISTICS 23
4 5
2 3
1
6
Figure 3.3: A graph with 6 vertices and 8 edges
T :=
T :={(1, 2)}
T :={(1, 2), (3, 5)}
T :={(1, 2), (3, 5), (1, 3)}
T :={(1, 2), (3, 5), (1, 3), (5, 6)}
T :={(1, 2), (3, 5), (1, 3), (5, 6), (2, 4)}.
The resulting forest has weight 33 and it is easy to check that this is the optimal solution for this instance
(although there are several alternative optimal forests).
Actually, Proposition 3.3.1 hereunder shows that the Greedy algorithm is not only a heuristic, but
also an exactalgorithm for the Maximum forest problem. Together with some of its far-reaching gener-
alizations, this result plays a central role in combinatorial theory.
We first need a lemma. Recall that a treeis a connected forest.
Lemma 3.3.1. IfG= (V, E) is a connected graph, then every maximal forest ofG is a tree containing
|V| 1 edges. More generally, ifG hasc connected componentsGi= (Vi, Ei) (i= 1, 2, . . . , n), then every
maximal forest ofG is the union ofc trees and containsc
i=1(|Vi| 1) edges.
Proof. We leave the proof to the reader. QED
Proposition 3.3.1. The Greedy algorithm delivers an optimal solution for every instance of the Maxi-
mum forest problem.
8/11/2019 AOR Syllabus20132014
29/115
24 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
Proof. Let T ={e1, e2, . . . , et} be the solution returned by the greedy algorithm and let S= {e1, e
2, . . . , e
t}
be an optimal solution, where ei precedes ei+1andei precedese
i+1 in L. We want to show by induction
that, fork = 1, 2, . . . , t, w(ek)w(e
k), which will imply that w(T) w(S).Fork = 1, we have w(e1) w(e
1) by definition of the Greedy algorithm.
Consider now an index k > 1. Suppose that w(ei) w(ei) for 1 i < k and w(e
k) > w(ek). Note
that ek precedes ek in L.
Consider the edge-set R = {e E | w(e) w(ek)} and the forests F = {e1, e2, . . . , ek1} and
H={e1, e2, . . . , e
k}. We claim thatF is a maximal forest in R, i.e. every edge ofR \ Fcreates a cycle
in F: indeed, ife R \ F, then w(e) w(ek)> w(ek) and the greedy algorithm should have chosen e
rather than ek.
Since|F|= k 1< |H|, we conclude that the graph (V, R) contains two maximal forests of different
cardinalities, contradicting Lemma 3.3.1. QED
Beyond their application to CO problems of the form (IS), list-proceesing algorithms can be extended
to handle associatedpartitioning problems like
min m
subject to S1 S2 . . . Sm= E, (PART)
Si I (i= 1, 2, . . . , m).
Thus, problem (PART) is here to partition E into a smallest number of sets in I. This problem can
be attacked by solving a sequence of optimization subproblems over (IS), with F(S) = |S|: try first
to determine a large set S1 I, then remove from Eall elements ofS1, repeat the process in order to
determineS2, and so on. If each step is solved by a list-processing algorithm, then the resulting procedure
is also called a list-processing algorithm for (PART); see Exercises 3 and 4 hereunder.
Additional examples of list-processing algorithms can be found, for instance, in Crama (2002).
3.3.1 Exercises.
Exercise 1. Apply the shortest edge heuristic to the TSP instance given in Section 2.2.3. Compare the
length of this tour with the lower bounds computed in Section 2.2.4.
Exercise 2. Prove Lemma 3.3.1.
Exercise 3. Let G = (V, E) be a graph. A subset of verticesS V is called stable (or independent)
8/11/2019 AOR Syllabus20132014
30/115
3.4. NEIGHBORHOODS AND NEIGHBORS 25
in G if it contains no edges, that is if the following condition holds: for all u, v S, {u, v} E. The
maximum stable set problem consists in finding a stable set of maximum size in a given graph. Provide
a greedy heuristic for this problem.Exercise 4. Show that the graph coloring problem (Section 2.1.6) and the bin packing problem (Section
2.2.4) are partitioning problems of the form (PART). Develop a greedy heuristic for each of these problems.
3.4 Neighborhoods and neighbors
In this and the following sections, we concentrate on local search procedures. A common feature of all
local search procedures is that they exploit the neighborhood concept (see Figure 3.1).
Definition 3.4.1. A neighborhood structure for the setXis a collection of subsetsN(x) X, one foreachx X. We callN(x) the neighborhood of solutionx, and we say that every element inN(x) is a
neighbor ofx.
The neighborhhood concept is naturally linked to the concept of local optimality.
Definition 3.4.2. A solutionx X is alocal minimumof CO with respect to the neighborhood structure
N(x) ifN(x) does not contain any solution better thanx, i.e. ifF(x) F(x) for allxX.
Note for further reference that this definition does not only depend on the problem at hand (i.e.,
on X and F) but also on the neighborhood structure which has been adopted. However, when theneighborhhod structure is clear from the context and when no confusion can arise, it is common practice
to omit the qualifier with respect to the neighborhood structure N(x).
There are in general very many ways to define a neighborhood structure for a particular CO problem.
Although all possible definitions are not necessarily equally good from the point of view of local search
performance, it is often difficult to decide ex antewhich ones will perform best. Some experimentation
and some experience will usually be necessary in order to make the best choice of neighborhoods.
3.4.1 Some examples
1. In the 0-1 linear programming problem, we have
X={x: Ax b, xj {0, 1}, j = 1,..n}.
For a solution x X, we can for instance define
8/11/2019 AOR Syllabus20132014
31/115
26 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
N1(x) ={y X :x and y differ by at most one component}.
Note that, intuitively speaking, the number of components on which two binary x and y differ provides
a measure of the distance between x and y (sometimes called Hamming distance between x and y). In
some applications of local search, we may actually prefer to use the neighborhood structure
Nk(x) ={y X :x and y differ by at most k components}
wherek may take any of the values 1,2,3,...
2. In the traveling salesman problem, a solution can be viewed as a permutation of the vertices. E.g., the
permutation = (2, 3, 6, 5, 1, 4) represents the tour which visits vertices 2, 3, 6, 5, 1 and 4 in that order.
Note that every permutation of the vertices corresponds to a feasible tour.Then, a neighborhood structure could be, for instance:
N() ={| permutation results from permutation by transposition of two vertices}.
With this definition, permutations (3, 2, 6, 5, 1, 4), (6, 3, 2, 5, 1, 4), (2, 1, 6, 5, 3, 4), (4, 3, 6, 5, 1, 2), etc., are
neighbors of (2, 3, 6, 5, 1, 4).
An alternative, slightly more subtle neighborhood structure arises if we look at tours as lists of edges,
rather than lists of vertices (this is of course conceptually equivalent, but experience shows that different
representations of solutions may sometimes lead to very different algorithmic developments). Considernow a tour Cas represented in Figure 3.4, where i ,j,k, l are four distinct vertices and edges {l, j} and
{i, k} are not in the tour. Then, a neighborC ofCcan be obtained by removing the edges {l, k} and
{i, j}from Cand by adding the edges {l, j} and {i, k}to it. The operation that transforms C into C is
called a 2-exchange. The 2-exchange neighborhood structure is naturally defined by
N(C) ={C |C results from Cby a 2-exchange }.
3. With an instance G = (V, E) of the graph equipartitioningproblem, we associate the feasible set of all
equipartitions ofV , i.e.
X={(V1, V2) :V1 V2= V , V1 V2= , |V1|= |V2|}.
A possible neighborhood structure for this problem is defined as
N(V1, V2) ={(V1 , V
2) :V
1 =V1 {v} \ {u}, V
2 =V2 {u} \ {v}for some pair of nodes u V1, v V2}.
8/11/2019 AOR Syllabus20132014
32/115
3.4. NEIGHBORHOODS AND NEIGHBORS 27
C
i
j
k
l
C
i
j
k
l
Figure 3.4: 2-exchange neighborhood concept for the traveling salesman problem
V1 V2 V
1 V
2
u v v u
Figure 3.5: Neighborhood concept for the graph equipartitioning problem
8/11/2019 AOR Syllabus20132014
33/115
28 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
We have imposed N(x) X for all x. WhenX is small (i.e., for heavily constrained problem), it is
sometimes difficult, or overly restrictive, to define neighborhoods that obey this condition. For instance,
when partitioning a graph, it may be natural to consider the alternative neighborhood structure
N(V1, V2) ={(V1 , V
2) :V
1 =V1 {v}, V
2 =V2\ {v} for some node v V2}.
In this case, a problem occurs as the feasibility condition |V1 |= |V2 |does not hold, that is (V
1 , V
2) /X.
One way around this difficulty is to reformulate the original CO problem into an equivalent problem that
admits more feasible solutions (i.e., to extend X) and to penalize all solutions that are not in X.
For example, for any partition (V1, V2) (not necessarily into equal parts) of the vertex set V, define
e(V1, V2) to be the number of edges from V1 to V2. Then, the graph equipartitioning problem
minimizee(V1, V2)
subject to V1 V2= , V1 V2= V, |V1|= |V2|
has the same optimal solutions as the following one:
minimizeh(V1, V2) =e(V1, V2) + M(|V1| |V2|)2
subject to V1 V2= , V1 V2= V
whereM is a very large number (penalty). Such problem reformulation allows to enlarge the feasible set,
hence to move more freely within this set and to find more easily an initial feasible solution x X. (A
similar reformulation is used in the big M method of linear programming.)
3.4.2 Exercises.
Exercise 1. For each of the neighborhood structures defined in Section 3.4.1, estimate the size of the
neighborhood of a feasible solution as a function of the size of the instance (number of variables, number
of vertices, etc.).
Exercise 2. Consider problem (IS) in Section 3.3. Show that a list heuristic is obtained by applying the
local search principle to (IS) with the following neighborhood structure
N(T) ={S I |T Sand |S|= |T| + 1}
(i.e., S results fromTby adding one element to it).
8/11/2019 AOR Syllabus20132014
34/115
3.5. STEEPEST DESCENT 29
3.5 Steepest descent
The steepest descent metaheuristic is one of the most natural local search heuristics: it simply recommends
to keep moving from the current solution to the best solution in its neighborhood, until no further
improvement can be found.
A more formal description of the algorithm is given in Figure 3.5. We assume here that a particular
neighborhood structure has been selected. For k = 1, 2, . . ., we denote by xk the current solution at
iteration k. We denote by x the best available solution and by F the best available function value:
that is, F =F(x).
1. Selectx1 X, set F :=F(x1), x =x1 andk := 1.
2. Repeat:
find the best solution x in N(xk) :F(x) = min{F(x) :x N(xk)};
ifF(x)< F(xk) then xk+1 :=x, F :=F(x), x =x and k := k+ 1
else return x, F and stop.
Figure 3.6: The steepest descent metaheuristic
Note that steepest descent really is a metaheuristic, not an algorithm. In particular, it cannot be
applied directly to any particular CO problem until the initialization procedure has been described or,
more fundamentally, until the neighborhood structure has been specified for this problem.
Note also that, when dealing with maximization (rather than minimization) problems, we speak of
steepest ascent rather than steepest descent.
We now proceed with a number of further comments on this framework.
3.5.1 Initialization
How should we select x1 ? Intuitively, it seems preferable to start from a good solution, such as a
solution selected by a list-processing heuristic. Experiments show, however, that this is not necessarily
the case and that starting from a random solution may sometimes be a good idea. The influence of the
initial solution may be reduced if we execute several times the algorithm with different initial solutions.
8/11/2019 AOR Syllabus20132014
35/115
30 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
3.5.2 Local minima
By definition, steepest descent heuristics terminate with a local optimum of CO which is not necessarily
a global optimum.
For example, consider the following instance of the knapsack problem:
max2x1 3x2+ x3+ 4x4 2x5
subject to 2x1 3x2+ 2x3+ 3x4 x5 2
xi {0, 1} for i= i = 1, 2, . . . , n
and consider the neighborhood structure defined by N1(x) ={y X :5
i=1 |xi yi| 1}.
Suppose that the initial solution is x1 = (0, 0, 0, 0, 0) and F = 0. Then, steepest ascent sets x2 =
(1, 0, 0, 0, 0), F = 2 and stops.
Suppose now that we start with another initial solution, say x1 = (0, 1, 0, 0, 1) and F =5. Then,
we successively getx2 = (0, 1, 0, 1, 1),F =1, and nextx3 = (0, 0, 0, 1, 1),F = 2. The algorithm stops
with the local maximum x =x3.
So, in both cases, we have only found local maxima, whereas the global maximum isx = (1, 1, 0, 1, 0),
with F(x) = 3.
3.5.3 Choice of neighborhood structure
A further observation (closely related to the previous one) is that, when N(x) is too small, the risk of
missing the global optimum is high. But conversely, whenN(x) is large, the heuristic may spend a lot
of time exploring the neighborhood of the current solution in order to determine x. This is another
manifestation of the quality vs. time trade-off already mentioned in Section 2.2.3.
This is illustrated (although caricaturally) by considering two extreme cases:
ifN(x) ={x} for all x X(a very small neighborhood, indeed), then the algorithm stops at the first
iteration and simply returns the initial solution;
at the other extreme, ifN(x) =X for all x X, then x is the global optimum of the problem (which,
of course, may be very hard to find).
More interestingly, this brief discussion points to the fact that the subproblem min{F(x) :x N(xk)}which is to be solved at every iteration of steepest descent is fundamentally a problem of the same nature
as CO itself, but over a restricted region of the search space. In many cases, this subproblem will be
solved by exhaustive search, i.e., by complete enumeration of all solutions in N(x). This observation may
guide the choice of an appropriate neighborhood structure.
8/11/2019 AOR Syllabus20132014
36/115
3.5. STEEPEST DESCENT 31
3.5.4 Selection of neighbor
Some variants of the algorithm do not completelyexplore the neighborhood of the current solution xk in
order to find x, but rather select, for instance, the first solution x such that F(x)< F(xk) found during
the exploration phase, or the best solution among the first ten candidates, etc. (This is akin to the partial
pricing strategy used in certain implementations of the simplex method for linear programming.)
3.5.5 Fast computation of the objective function
When exploring the neighborhood N(xk) of the current solution, it is sometimes possible to improve
efficiency by avoiding to recompute F(x) from scratch for all x N(xk), and by making use of the
information that is already available about the value ofF(xk).
For example, assume (as in a knapsack problem) that F(xk) =n
j=1 cjxkj and thatx N1(x
k) differs
fromxk only on the 5th component. How should we computeF(x) in this case? Brute force computation
of the expression F(x) =n
j=1 cjxj requires n multiplications and n 1 additions. By contrast, only 2
multiplications and 2 additions are required if we notice that F(x) =F(xk) c5xk5+ c5x5. Similarly, if
X={x|n
j=1 ajxj b}, then we can check whether x Xby storing the valuen
j=1 ajxkj and simply
checking whethern
j=1 ajxkj a5xj
k + a5xj b.
Let us consider another example. For the traveling salesman problem, let Cbe a feasible tour (set of
edges) with length L(C). After the 2-exchange displayed in Figure 3.7, we obtain a tour C with length
L(C) =L(C) dij dkl+ dik+ djl ,
and the computation ofL(C) only requires 4 additions ifL(C) is available.
3.5.6 Flat objective functions
Consider the graph coloring (or chromatic number) problem. Here, X is the set of all feasible colorings
and F(x) is the number of colors used in coloring x. We can define the neighborhood of coloringx as
consisting of all the colorings which can be obtained by changing the color of at most one vertex in x.
Suppose for instance that Figure 3.8 represents an arbitrary coloring. The color of each vertex
is indicated next to it, by a number between brackets (this is the coloring provided by the smallest
available color heuristic when the vertices are explored in the order v1, v2, . . . , v10). In this case, no
neighbor improves the initial solution. Intuitively, the objective function is flat in the neighborhood
8/11/2019 AOR Syllabus20132014
37/115
32 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
C
i
j
k
l
C
i
j
k
l
Figure 3.7: 2-exchange
v4(1) v5(3)
v9(4) v10(4)
v7(1) v8(3)
v2(2) v3(2)
v6 (2)
v1(1)
Figure 3.8: A feasible coloring with 4 colors
8/11/2019 AOR Syllabus20132014
38/115
3.6. SIMULATED ANNEALING 33
of the current solution (all neighbors have the same objective function value) and it is difficult to find
descent directions (see Figure 3.9).
A possible remedy to this difficulty is to modify both F(x) and X in the definition of the problem!For instance, let us select a tentative number of colors C (for example, C= 3) and define
XC={ colorings ofVusing the colors {1, 2, . . . , C } },
where the colorings are not necessarily required to be feasible (cf. Section 2.1.6), and let
F(x) = number of monochromatic edges induced by coloring x.
So, in Figure 3.10, F(x) = 4.
Of course, the graph can be colored with Ccolors if and only if min{F(x)| x XC}= 0. In other
words, the chromatic number of a graph is the smallest value ofCfor which min{F(x)| x XC}= 0.
So, the original graph coloring problem can be transformed into a sequence of problems of the form
(XC, F), for decreasing values ofC.
Note now that the objective function F(x) is not flat, as opposed toF(x). For instance, changing the
color ofv4 from 1 to 3 yields F(x) = 2. Next, changing the color ofv8 from 1 to 3 leads to F
(x) = 0,
meaning that the graph is feasibly colored with 3 colors.
3.5.7 Exercises.
Exercise 1. Explain why the simplex algorithm for linear programming can be called a steepest desccent
method.
Exercise 2. Show that changing the color ofv9 from 1 to 3 in Figure 3.10 leads to a local optimum of
F(x).
3.6 Simulated annealing
The major weakness of steepest descent algorithms is that they tend to stop too early, i.e. they get
trapped in local optima of poor quality. How can we avoid this weakness?
A possible solution is to run the algorithm repeatedly from multiple initial solutions. This multistart
strategy may work well in some cases, but other, more complex approaches have proved to be much more
powerful for large, difficult instances of CO problems.
In this section, we want to explore the following ideas.
8/11/2019 AOR Syllabus20132014
39/115
34 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
F(x)
xxk
Figure 3.9: Flat objective function
v4(1) v5(1)
v9(1) v10(3)
v7(1) v8(1)
v2(2) v3(2)
v6 (2)
v1(1)
Figure 3.10: An infeasible coloring with 3 colors
8/11/2019 AOR Syllabus20132014
40/115
3.6. SIMULATED ANNEALING 35
Idea # 1. In order to escape local minima, it may be useful to take steps which deteriorate the objective
function, at least once in a while. One way to achieve this goal may be to replace xk by a neighborxk+1
chosen randomly in N(xk
). This idea has shown to be especially useful when combined with the nextingredient.
Idea # 2. Select a good neighbor with higher probability than a bad one.
Taken together, these two ideas result in the very popular simulated annealing algorithm. Various
aspects of the implementation of SA algorithms are discussed at length, for instance, in two papers by
Johnson, Aragon, McGeoch and Schevon (1989, 1991) or in Pirlot (1992). We only provide here some
basic elements of information and we refer to these papers for additional details.
3.6.1 The simulated annealing metaheuristic
The generic framework of the simulated annealing metaheuristic is shown in Figure 3.11. We suppose
again that a particular neighborhood structure has been selected and we use the same notations x, F, xk
as in the steepest descent heuristic. Moreover, we assume that for k = 1, 2, . . ., a number (calledtransition
probability) 0< pk
8/11/2019 AOR Syllabus20132014
41/115
36 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
1. Selectx1 X, set F :=F(x1), x =x1 andk:= 1.
2. Repeat:
Choose x randomly in N(xk) (Propose a move).
IfF(x)< F(xk) then AcceptMove(x) else Toss(xk, x).
Evaluate the stopping conditions.
If Terminate =True then return x, F and stop, else continue.
Procedure AcceptMove(x)
xk+1
:=x (Accept the move). ifF(x)< F(x) then F :=F(x), x :=x.
Procedure Toss(xk, x)
let xk+1 :=x with probability equal to pk (Accept the move)
else, let xk+1 :=xk (Reject the move).
Procedure Stopping conditions
if the stopping conditions are satisfied then Terminate := True
else k := k + 1 and Terminate :=False.
Figure 3.11: The simulated annealing metaheuristic
cooling schedulewhereby the temperature decreases by a constant factor (the cooling factor) after a
constant number L of iterations. The iterations performed at constant temperature constitute a plateau
(see Figure 3.14).
3.6.3 Stopping criteria
Note that, contrary to local search, simulated annealing may perform an infinite number of iterations ifwe do not impose some limitation on its running time. So, when should we terminate the process ?
A common criterion is to stop when a large number of iterations has been performed without any
improvement in the objective function and when the process seems to be stalling. One way to implement
this idea requires to select two positive numbers, say 2 and K2 (for example, 2 = 2 and K2 = 5). The
8/11/2019 AOR Syllabus20132014
42/115
3.6. SIMULATED ANNEALING 37
Figure 3.12: Fixing p(k)
Figure 3.13: Fixing p(k) II
Tk
k
T0
T0
2T0
3T0
L 2L 3L 4L
Plateau
........................
.................
..............
. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.14: A geometric cooling schedule
8/11/2019 AOR Syllabus20132014
43/115
8/11/2019 AOR Syllabus20132014
44/115
3.6. SIMULATED ANNEALING 39
1. Selectx1 X, set F :=F(x1), x =x1, k := 1 and T :=T0.
2. Repeat:
Choose x randomly in N(xk) (Propose a move).
IfF(x)< F(xk) then AcceptMove(x) else Toss(xk, x).
Evaluate the stopping conditions.
If Terminate =True then return x, F and stop, else continue.
Procedure AcceptMove(x)
xk+1 :=x (Accept the move).
ifF(x)< F(x) then F :=F(x), x :=x.
Procedure Toss(xk, x)
compute F :=F(x) F(xk) andpk =eFT (transition probability)
draw a number u, randomly and uniformly distributed in [0,1]
ifu pk then xk+1 :=x (Accept the move)
else xk+1 :=x
k
(Reject the move).
Procedure Stopping conditions
if the number of iterations since the last decrease of temperature is less than L
then k := k + 1 and Terminate := False(Continue with the same plateau)
else
if no improvement ofF has been recorded and if fewer than 2 % of the moves have been
accepted during the last K2 temperature plateaus
then Terminate := True
elseT :=T (decrease T), k := k+ 1 and Terminate := False. (Proceed to the next plateau)
Figure 3.15: Implementing the simulated annealing metaheuristic
8/11/2019 AOR Syllabus20132014
45/115
8/11/2019 AOR Syllabus20132014
46/115
3.6. SIMULATED ANNEALING 41
A potential problem is that the choice x1 = 1 is maybe very bad and, during each iteration, the
probability of reversing this choice is only 1n
. To solve this, here is a possible remedy: at the beginning of
steps 1, n + 1, 2n + 1,..., generate a random permutation of indices 1,...,n.; for example, (5, 3, 6, 2, 1,...).During the nextn iterations, generate the neighbors obtained by modifying each coordinates in the order
defined by the permutation. In other words:
step 2 :x15 becomes 1 x15= x
25
step 3 :x23 becomes 1 x23= x
33
step 4 :x36 becomes 1 x36= x
46
step 5 :x42 becomes 1 x42= x
52
or
(1,0,0,1,0,1,...)
(1,0,0,1,1,1,...)
(1,0,1,1,1,1,...)
(1,0,1,1,1,0,...)
(1,1,1,1,1,0,...)
Thus, aftern iterations, each coordinate had the opportunity to be modified at least once (subject to
the acceptation of the move).
Approximate exponentiation
The computation time ofeFT is quite high. A non-negligible speedup can be obtained if we replace this
expression by its approximation 1 FT
(25 times faster for comparable quality; see Oliveira, Ferreira,
and Vidal (1993) for details).
Once again, we refer the reader to Aarts and Lenstra (1997), Johnson, Aragon, McGeoch and Schevon
(1989, 1991), Pirlot (1992) and to other references in the bibliography for more information on simulated
annealing algorithms.
8/11/2019 AOR Syllabus20132014
47/115
42 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
3.7 Tabu search
(To be revised and completed...)
3.7.1 Introduction
Idea : at each iteration, choose a neighbor x ofxn that minimizes F(x) in N(xn)1
Consider the following example:
max x1+ 10x2+ 3x3+ 7x4+ 6x5
subject to 2x1+ 6x2+ 5x3+ 8x4+ 3x5 16
xj {0, 1}, j = 1, ..., 5
The neighbors are solutions within a Hamming distance of 1. Let x0 = (0, 0, 0, 0, 0) be the initial
solution. Thus we might have
x0 = (0, 0, 0, 0, 0)
x1 = (0, 1, 0, 0, 0)
x2 = (0, 1, 0, 1, 0)
x3 = (1, 1, 0, 1, 0)
x4 = (0, 1, 0, 1, 0)
Here, x4 = x2, underlying a danger of this method : the cycling problem. Now, suppose that coming
back to the last explored solution is forbidden. We could have therefore
x4 = (1, 1, 0, 0, 0) (x2 is tabu)
x5 = (1, 1, 0, 0, 1) (x3 is tabu)
x6 = (1, 1, 1, 0, 1) (optimal solution)
Note that the interested reader will find a generic description of this problem in Section 4.1 of Pirlot
(1992).
3.7.2 The algorithm
Initialization: selectx1 X; F =F(x1); x x1 and the Tabu List T L:= .
Step k (with k = 1,2,...):
Choose the best neighbor x ofxk that is not tabu
F(x) = min{F(x) :x N(xk), x /T L}.
1This is the steepest descent mildest ascent method. See Hansen and Jaumard (1990) for more about this topic.
8/11/2019 AOR Syllabus20132014
48/115
8/11/2019 AOR Syllabus20132014
49/115
44 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
i
j
k
l
i
j
k
l
Figure 3.18: Tabu list
x1 :F(x1) = 5.
The best move is to change vertex (1) into V (or vertex (3) into V or vertex (4) into V). That way,
F(x2) = 3 and T L= {(1, B)}.
Then, the best move is to change vertex (3) into V. That way, F(x3) = 1 andT L= {(1, B), (3, B)}.
All the moves increase theFfunction. Choose for instance to change vertex (4) into V F(x4) = 3
and T L= {(1, B), (3, B), (4, B)}.
The best move is to change vertex (1) into B, which is tabu, but we accept it since it satisfies the
aspiration criterion. Thus, F(x5) = 1 and T L= {(1, B), (3, B), (4, B), (1, V)}.
...
8/11/2019 AOR Syllabus20132014
50/115
3.7. TABU SEARCH 45
1
2
3
4
5
6
7
8
9
10
11
B V B
R
B V
B V
R
R R V V
B
B
Figure 3.19: Chromatic number - Tabu search
8/11/2019 AOR Syllabus20132014
51/115
46 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
3.8 Genetic algorithms
3.8.1 Introduction
Steepest descent, simulated annealing and tabu search are designed to improve an initial solution by
exploring solutions that are close to it. This approach is sometimes called an intensification strategy
since it allows to intensify the search in the vicinity of a current solution.
A major drawback of such strategies is that they cannot easily reach areas that are very distant from
the initial solution; that is, they cannotdiversifythe exploration of the feasible set X.
A possible remedy to this drawback is to apply the algorithm a large number of times from many
different initial solutions (multistart, or sampling strategy). But here again, several problems occur: first,
a large number of times (1000 times, 10000 times) can still be quite small as compared to the size ofthe space to explore. Second, it is hard to ensure that the sample of initial solutions faithfully represents
the setX. 2
Genetic algorithms (GA) offer a specific, quite powerful approach to the diversification issue (see e.g.
Goldberg (1989)). In fact, they alternate diversification and intensification phases. At each iteration,
they produce a population (i.e., a subset of solutions): at step k, the population is denoted X(k) =
{x(k)1 , x
(k)2 ,...,x
(k)N } X.
3.8.2 Diversification via crossover
Consider a pair of solutions x and y (to be called parents) in the current population. We can combine
these solutions to produce one or two new solutions (called children) u and v that share some features
of both x and y. The operator that associates a child (or two children) to a pair of parents is called
crossover.
Intuitively (and just as in real life), the children obtained by crossover should look like their parents,
but should also introduce some diversity in the current population.
Suppose for example thatx andy are binary vectors:
x= (11010011)y= (01100101)
A possible crossover operator produces the single child u, where ui = xi with probability 0.5 and
ui= yi with probability 0.5. For our example, this operator could produce the child
2This is a general problem with sampling methods.
8/11/2019 AOR Syllabus20132014
52/115
3.8. GENETIC ALGORITHMS 47
u= (11100111).
Note that the second, fifth and eighth elements (underlined) ofu are predetermined since they are
common tox and y.Another crossover method works by randomly choosing an index i, splitting x and y at coordinate i
and exchanging the initial segments ofx and y. For instance, with i = 4 in the previous example, we
produce two children:
u= (1101|0101)
v= (0110|0011).
The crossover operators defined above are uniform operators, meaning: ifz = (z1,...,zn) is a child of
x = (x1,...,xn) and y = (y1,...,yn), then either (zi =xi) or (zi =yi). Note that nonuniform crossovers
are also frequently used in the literature (see Mulhenbein (1997) for details).
Ideally, the new individuals created by crossover should inherit desirable features from their parents:
we would like to produce good children from good parents. This goal can be achieved by combining
the following elements:
When picking a pair of parents to mate, good parents should be selected with a higher probability
than bad ones. For instance, x andy could be drawn in X(k) with probability equal to
Prob(x) = Fmax F(x)Nj=1[Fmax F(xj)]
(3.1)
whereFmax= max{F(xj) :j = 1,...,N}. See Table 3.1 for an example.
Common features of the parents (those that are expected to be typical of good solutions) or,
at least, some of those features, should be preserved when producing children (see later). As an
example, consider the traveling salesman problem. If the salesman is to visit every European
capital, then, in a reasonable tour, Helsinki and Madrid will never be visited successively (neither
will London and Athens). This feature should be preserved when crossing two reasonable parents.
3.8.3 A basic genetic algorithm
We are now ready to describe a primitive genetic algorithm for the combinatorial optimization problem
min{F(x) : x X}. The algorithm depends on the choice of a crossover operator, and on the choice of
a probability distribution Prob(.) defined on every finite subset ofX. Let us assume that the following
parameters have also been selected: N(the population size) and M (the number of children produced in
each generation) with MN. Then, the basic genetic metaheuristic is presented in Figure 3.20.
8/11/2019 AOR Syllabus20132014
53/115
48 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
X(k) F(x) Prob
x1 F(x1) = 15 0
x2 F(x2) = 12 3/13
x3 F(x3) = 10 5/13
x4 F(x4) = 10 5/13
Table 3.1: Genetic algorithms: selecting good parents
1. Initialization: Select an initial population X(1) X with |X(1)| = N, set F := min{F(x) :
x X(1)}, x := argmin{F(x) :x X(1)}andk := 1.
2. Repeat:
Selection of parents: Create a new temporary populationY(k) ={y1,...,y2M}, drawn randomly
(with replacements) from X(k) according to the distribution Prob(x).
Crossover: Forj = 1,...,M, cross the pair of parents (y2j1, y2j) to produce the set of children
Z(k) ={z1,...,zM}.
Survival of the fittest: Draw randomly N M elements from X(k) (with probability
Prob(x)) and add them to Z(k) in order to create the next-generation population X(k+1) =
{x(k+1)1 ,...,x
(k+1)N }. (An alternative procedure would draw Nelements from X
(k) Z(k).)
Letx:= argmin{F(x)| x X(k+1)}. IfF(x)< F then F :=F(x) and x :=x.
If the stopping criterion is satisfied then returnx
, F
and stop, else letk := k +1 and continue.
Figure 3.20: A simple genetic metaheuristic
8/11/2019 AOR Syllabus20132014
54/115
3.8. GENETIC ALGORITHMS 49
Let us formulate some comments on this algorithm:
1. A mutationphase is sometimes added to this basic algorithm, for instance after the step Survival
of the fittest. A mutation operator replaces each element ofX(k+1), with low probability , by a
randomly selected neighbor of this element. In other words, with probability , each element is
slightly perturbed. For example, (100101) could be replaced by its mutant (101101). The objective
of this operation is to increase the amount of diversification in a population. However, many
researchers consider nowadays that mutation does not significantly improve the performance of
GAs.
2. Possible stopping criteria are, as usual: a limit on the total number of iterations, convergence ofF,
a measure of the gap between F and a lower bound on min F(x), etc. For GAs, another criterion
is also commonly used. Let us define the fitnessof population X(k) as the average value ofF(x)
over Xk, that is the value:
Qk = 1
|X(k)|
xX(k)
F(x).
Convergence ofQk toward a fixed value indicates that the population is increasingly homogeneous
and that the procedure reaches a stationary state. Thus, if the difference |Qk+1 Qk| is small for
several successive iterations, then the algorithm can stop.
In its primitive form, the genetic algorithm presented above is generally not a very efficient approach
to the solution of hard combinatorial optimization problems. Before it becomes a practical method,
some enhancements have to be added to this basic scheme. In the next subsections, we proceed with a
discussion of such possible refinements.
3.8.4 Intensification and local search
In the simple GA outlined above, the average quality (or fitness) of a population is driven up by a single
factor in the course of iterations, namely: the random bias introduced in the selection of parents and in
the selection of the fittest step. However, by itself, this bias is generally insufficient to significantly
improve a bad initial population.
Moreover, in spite of everything we said earlier, solutions (children) arising from a crossover operation
are frequently quite different from their parents and may turn out to be much worse.
These observations lead to an improvement of the GA scheme which is conceptually simple, but
very powerful in practice: it consists in introducing a local search (intensification) phase within the
8/11/2019 AOR Syllabus20132014
55/115
50 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
diversification strategy of GA. This is simply done, for instance, by adding the following step right after
the crossover step. (Some authors speak ofmemetic algorithmswhen this step is introduced in the basic
GA scheme.) Local improvement: Forj = 1, 2, . . . , M , letzj be the best solution produced by a local search algorithm
(either greedy, or steepest descent, or SA,...) starting from zj as initial solution. Replace zj by zj in
Z(k).
In picturesque terms, we could say that children must be raised before they can be incorporated in the
population. More abstractly, with the above modification, we can view GA as performing a succession of
multistart rounds, where each round is initialized from members of the current population.
Whatever the interpretation, interlacing the basic GA scheme with some form of local search seems to
be asine qua nonecondition for the efficiency of the procedure. Let us illustrate this on some examples.
Example: Knapsack problem. Consider the knapsack problem
max cx
subject to ax b andx {0, 1}n
and the particular instance:
max 2x1+ 3x2+ 5x3+ x4+ 4x5
subject to 5x1+ 4x2+ 4x3+ 3x4+ 7x5 14xi {0, 1} for i = 1, . . . , 5.
We use the following crossover operator: if the parents are x and y, then the child z has zi = 1 when
xi= yi= 1, and zi= 0 otherwise (1, 2, . . . , n). (The child inherits an object only if both his parents own
it.) So we obtain for instance:
x= 11010, value = 6
y= 10001, value = 6
z = 10000, value = 2
Note that this crossover, even though it ensures feasibility of the children, will systematically produce
children of lower quality than their parents.
Assume now that we apply a variant of the classical greedy algorithm during the improvement phase:
first, we sort the indices (1, 2, . . . , n) by nonincreasing ratios cjaj
. Then, without changing the components
8/11/2019 AOR Syllabus20132014
56/115
3.8. GENETIC ALGORITHMS 51
ofz that are already equal to 1, we run through L and fix the next variable to 1 as long as the knapsack
constraint is not violated.
In our example, this procedure yields the priority listL = (3, 2, 5, 1, 4), and successively produces thesolutions: z = 100001010011100 =z, stop (with value = 10).
An alternative interpretation of the previous approach is to consider the local optimization step as
a feature of the crossover operator itself, rather than as an addition to it. (Even though both points of
view are in a sense equivalent, it is sometimes interesting to look at them from different angles.)
To illustrate this idea, let us again consider the knapsack problem and consider a priority list L on
{1, 2, . . . , n}. Then, we can define an optimizing crossover operator as follows: to compute the child z
ofx andy , we go through L and we let
zi= 1 if either xi= 1 or yi= 1 and if this results in a feasible solution;
zi= 0 otherwise.
(Another description of the same heuristic is: restrict the attention to those objects that have b een
selected at least once in either x or y , and apply the greedy heuristic to this subset of objects.)
For the above example, the list L = (3, 2, 5, 1, 4) leads to
x= 11010
y= 10001
z= 01011
The resulting solution z has value 8 (better than both its parents).
Example: Traveling salesman problem.
Considering the local optimization step as a feature of the crossover operator can similarly be applied
to the traveling salesman problem, as explained for instance in Hoos and Stutzle (2005), Kolen and Pesch
(1994), Merz and Freisleben (2001).
Suppose that T and T are two distinct solutions of the traveling salesman problem, viewed as sets
of edges. A child ofT and T can be produced by keeping all edges that occur in both parent solutions,
and by using a greedy procedure to complete the resulting partial solution T T.
Merz and Freisleben (2001) propose more specifically to apply the DPX crossover operator shown in
Figure 3.21 (we skip some details). They show that variants of this crossover operator, when combined
with effective local improvement steps, provide excellent solutions for the TSP.
8/11/2019 AOR Syllabus20132014
57/115
52 CHAPTER 3. HEURISTICS FOR COMBINATORIAL OPTIMIZATION PROBLEMS
DPX Crossover:
1. computeC= T T; letP1, P2, . . . , P k be the subpaths that make up C, and letuj , vj
be the endpoints of subpath Pj forj = 1, 2, . . . , k;
2. while C is not a tour, repeat
ifC is a path containing all vertices, then add the missing edge that closes the
tour; else,
choose randomly one of the endpoints uj ;
choose the closest vertex to uj among all verticesw {u1, v1, u2, v2, . . . , uk, vk},
w / {uj , vj}, such that the edge (uj , w) is not included in T T;
add the edge (uj , w) to C;
3. return C;
Figure 3.21: DPX crossover for the TSP
Such adaptations of the basic genetic algorithm allow to enrich it with some heuristics that have
been specifically developed for the problem at hand. Indeed, whereas the special features of a problem
are usually included quite naturally in a steepest descent or in a simulated annealing algorithm (via theneighborhood structure), this is not immediately true in the basic GA formulation displayed in Figure
3.20.
A similar objective can sometimes be attained through a judicious encoding of the solutions. The
Recommended