Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
THEORY, VARIATIONS AND ENHANCEMENTS
JACQUES F. BENDERS
Benders’ Theory is a mathematical optimization methodology as transcendental as the Simplex
Method of George Dantzig.
Ing. Jesus Velásquez-Bermúdez, Dr. Eng.
Chief Scientist
DecisionWare & DO Analytics
Draft Version of the Chapter One of the Book:
Large Scale Optimization in Supply Chain & Smart Manufacturing: Theory & Application
To be Published in the series Springer Optimization and Its Applications
Abril 21/2019
INDEX
1. Benders Theory
1.1. Framework
1.2. Benders Partition Theory
1.3. Duality Theory & Benders Theory
1.3.1. Dual Coordinator Problem
1.3.2. Sub-Problem Degenerate Solutions
1.4. Benders Decomposition
1.4.1. Standard Bender’s Cuts (SBC)
1.4.2. Decoupled Bender’s Cuts (DBC)
1.4.3. Unified Benders Cuts (UBC)
1.5. Multilevel Benders
1.5.1. Benders' Tri-level Partition Theory
1.5.2. Benders' Multilevel Partition Theory
2. Economic Interpretation
2.1. Taxonomy of Organizations
2.2. Cobb-Douglas Production Functions
2.3. Markets
2.4. Multisectoral Planning
3. Generalizations & Extensions
3.1. Generalized Benders Decomposition
3.2. Benders Integer Linear Subproblems
3.3. Benders Dual Decomposition
3.3.1. Benders Dual Decomposition Theory
3.3.2. Benders Dual Decomposition Implementation
3.4. Logic Based Benders Decomposition
3.5. Partial Benders Decomposition
4. Dynamic and Stochastic Benders’ Theory
5. Coordinator Enhancements
5.1. MIP/MINLP Coordinators
5.1.1. Multi-Phase Coordinator
5.1.2. Modified Optimality Cuts
5.1.3. Inexact Solutions
5.1.4. Inexact Cuts
5.1.5. Combinatorial Benders Cuts
5.2. Trust Region (Regularization)
5.2.1. Neighborhood Bounding
5.2.2. Penalizations Movements
5.2.3. Binary Variables
6. Cuts Enhancements
6.1. Strong Cuts
6.1.1. Pareto Optimal
6.1.2. Other Cuts
6.1.3. Hybrid Cuts
6.2. Hybrid Strategy
7. Benders Parallel Optimization
7.1. Parallel Optimization
7.2. The Asynchronous Benders Decomposition Method
8. Conclusions
J. F. Benders: Theory, Variations and Enhancements
Jesus Velásquez-Bermúdez
Abstract. In 1962, J. F. Benders published his seminal theory in the paper "Partitioning Procedures for Solving Mixed
Variables Programming Problems" oriented to optimization of mixed integer problems (MIP) that was been the origin
of multiples methodologies oriented to solve large-scale problems related with stochastic complex combinatorial
and/or dynamic systems. Since its formulation in 1962, the researchers in Benders Theory (BT) have proven that:
▪ BT is an effective methodology to solve complex problems that cannot be solved using only “best” basic
optimization algorithms (CPLEX, GUROBI, XPRESS, … ).
▪ Algorithms based in Benders’ Theory can solve NP-hard (non-deterministic polynomial-time) problems in
reasonable time; for this type of problems BT has proven to be an effective methodology to solve complex
problems that cannot be solved using “best” mathematical solvers.
▪ BT is a mature methodology that is in the accelerated growing phase
▪ There is a gap between the research in mathematical programming and the application of the large-scale
methodologies in real world solutions.
In this book, there are four chapters oriented to teach about Benders’ Theory they are:
1. J. F. Benders: Theory, Variations and Enhancements
2. Stochastic Optimization and Risk Management: Fundamentals
3. Dynamics and Stochastic Benders Decomposition
4. The Future: Mathematical Programming 4.0
The chapters present a mathematical review of many of the aspects that must be considered to know about variations
and enhancements oriented to: i) expand problems that can be solved based on Benders concepts, and ii) to speed-up
the time of the solution of the complex problems.
1. Benders Theory
1.1. Framework
In 1962, J. F. Benders published his seminal theory oriented to optimization of mixed integer problems (MIP) that was
been the origin of multiples methodologies oriented to solve large-scale problems. The fundamental idea is the partition
of a problem into two subproblems, of reduced complexity, based on the division of the variables like coordination
variables and subordinate variables. The solution of the original problem is obtained by the coordinated solution of
two complementary subproblems: i) the coordinator linked to the coordination variables, and ii) the sub-problem linked
to subordinate variables. The sub-problem provides information to the coordinator by the dual variables associated
with its constraints. The coordinator takes the information from the primary level and incorporates it in the form of
hyperplanes (Benders cuts) limiting the area of feasibility for the optimal solution of the coordination variables.
Benders cuts represent the costs of subordinate variables as a function of the coordination variables. The algorithm
defined by Benders is convergent and solves the original problem through the solution of the coordinator problem; it
applies only to linear subproblems.
The generalization of the Benders Theory (BT) for several typical cases, where the structure of the optimization
problem enables the effective utilization of BT, requires to analyzed three basic cases:
▪ Decomposition Theory: is useful when it is possible to grouping variables subordinated into independent sets to
formulate multiple parallel subordinate subproblems.
▪ Multilevel Theory: is used when there is a multilevel hierarchical relationship within the variables of the problem,
and it is possible inside to subordinate variables to select a new set of coordination variables for establish an
additional level of partition.
▪ Multilevel Decomposition Theory: is the result of the combination of the two previous theories.
The use of these three concepts permits to decompose a mathematical problem in "atoms", in such a way to facilitate
its solution by: i) speed-up the solution time, and ii) reducing the memory requirements.
As a further result, the atomization of the problem allows to work the concepts of these concepts are the base of the
optimization in the future:
i) Asynchronous Parallel Optimization (APO, solve the problem using multiples cores), and
ii) Distributed Real-Time Optimization (DRTO, solve the problem based on the interaction of multiple smart agents
that exchange information continuously, real-time);
Three point of view must be considered to know more about BT:
▪ Mathematical: the original Benders formulation has limitations:
i) BT requires that all subproblems must be linear, which may not apply to several cases in the real-life; then
many researchers have worked on the development of methodologies that allow to apply the concepts of
Benders in cases with non- linear and/or discrete subproblems.
ii) Although BT is convergent, its speed to find the optimum can be significantly improved, it turned BT into
important complement to basic solvers in the solution of very large problems.
▪ Applications: in a very aggregate manner, Benders applications can be divided into two major groups: i)
combinatorial optimization, and ii) dynamic optimization. This fact has generated research works aimed to
accelerate the solution time of problems considering their specific features.
▪ Uncertainty: BT has been applied to stochastic optimization based on scenarios, which inherently includes the
concept of scenarios-based decomposition.
It is important to note, that, even though the variations and improvements have been made in multiple independent
research studies, it is possible to integrate “all” in a single paradigm in such a way meet those improvements in an
application that are more convenient. To compare methodologies and to present their impact two key performance
index (KPI) are considered: i) solution time, and ii) existing gap between the best-known solution (primal bound) and
the best-possible solution (dual bound), when the solution is optimal, the gap is zero.
There are several possibilities for improvement Benders methodology:
1. Formulating the mathematical problem "properly"
2. Using the appropriate Benders methodology, according to the mathematical problem
3. Modifying the master problem according to the formulation
4. Selecting the correct enhancements to speed-up the solution time; it includes selecting good cuts to add to the
master problem at each step.
5. Making a good selection of initial cuts
6. Using parallel optimization
The literature review conducted by Rahmaniani et. al.
(2017) presents the evidence of the growth of the
importance of BT in recent years, which can be attributed
to massive development and the drop-in prices of PCs
multi-CPUs and GPUs, enabling the environment to take
advantages of atomization/parallelization of optimization
algorithmic procedures.
The graphic, from Rahmaniani et. al, presents the number
of scientific papers related with BT until 2016.
The following table presents a very small summary of
papers showing the gain in speed of the proper use of the
improvements in BT.
ScientificPublications
1962 20161980 1990 2000 201019700
500
1000
1500
Figure 1. The Power of Benders Theory
This leads to conclude that the point of reference to compare the speed of mathematical programming to solve complex
problems are not the basics solvers, the proper reference is the use of large-scale methodologies that make smart use
of these solvers.
Benders Methodology
Case Format SolverPROBLEMS
Time RatioSOLVER/BENDERS Ratio
GAPPaper
SolverSolved
BendersSolved
Total Min Mean Max
Combinatorial BendersCuts
Statistical Classification
0-1 CPLEX
0 10 10 70.8
Combinatorial Benders' Cuts for Mixed-Integer Linear Programming (2006)Gianni Codato & Matteo Fischetti
Map Labelling 0 11 11 3.63
Statistical ClassificationMap Labelling
24 24 24 5 21.1 675
Benders Integer Subproblems
The Stochastic Server Location
MIP CPLEX 7 11 11 3.93 28.6 178 5.65
Decomposition Algorithms with Parametric Gomory Cuts for Two-Stage Stochastic Integer Programs (2014)Dinakar Gade, Simge Kucuyavuz, SuvrajeetSen
Generalized Benders Decomposition
Water Resources Management
NLP
MINOS 4 4 4 4.4 23 39.8 Solving Large Nonconvex Water Resources Management Models using Generalized Benders Decomposition (2001)X. Cai, D. McKinney, L. Lasdon & D. WatkinsCONOP 4 4 4 1.3 5.2 8.5
Benders Strongest Cuts
Dynamically Updated Near Maximal Cuts
Petroleum Product Supply Chain
LP CPLEX 10 10 10 0.4 2.1 4
Accelerating Benders Stochastic Decomposition for the Optimization under Uncertainty of the Petroleum Product Supply Chain (2014)F.Oliveira, I.E.Grossmann, S.Hamacher
Asynchronous Synchronous Parallel
HybridLP CPLEX
8 8 8 20 326 519The Asynchronous Benders Decomposition Method (2018)R. Rahmaniani, T. Crainic, M. Gendreau, W. Rei
Asynchronous 8 8 8 19 287 459
Benders DualDecomposition
Network Design
MIP CPLEX
26 30 35 0.50 1.12 1.16 1.96 The Benders Dual Decomposition Method (2018)R. Rahmaniani, S. Ahmed, T. Crainic, M. Gendreau, Walter Rei
Capacity Location 16 16 16 1.94 3.34 7.12 28
Network Interdiction 0 52 70 2.18 5.00 17.08 22.23
Table 1. Why Benders Large Scale Methodologies ?
There are two of ways to implement the BT enhancements:
▪ Reformulation of the BT and use computer algebraic languages
▪ Direct modification of the flow of the solvers.
This document is concentrated in the first alternative, this does not imply a value judgment with respect to the second.
1.2. Benders Partition Theory
BT considers the problem P: composed by two types of variables: y, the coordination variables, and x, the coordinates.
P: = { min z = cTx + f(y) |
F0(y) = b0 ; Ax + F(y) = b ; xR+ ; yS } (1)
BT restricts the model on x to be a linear problem, while it doesn’t impose conditions on y that may be continuous, or
discrete, and the functions f(y) and F(y) may be linear or non-linear convex functions. The P: problem is partitioned
in two coordinated problems: CYBT: over y and SPBT(y): over x which is defined as
SPBT(y): = { min Q(y) = cTx | Ax = b - F(y) ; xR+ } (2)
The dual problem of SPBT(y): (independent of x) is
DSPBT(y): = { max Q(y) = ()T(b - F(y)) | T A ≤ c ; R } (3)
The coordinator CYBT: on y can be formulated as
CYBT: = { min z = f(y) + Q(y) | F0(y) = b0 ; yS
Q(y) (k)T[b - F(y)] k IT
0 (k)T[b - F(y)] kIN } (4)
where represents the vector of dual variables of the restrictions Ax = b-F(y), IT the set of iterations, an extreme
ray of the feasibility region and IN the set of iterations on which no feasibility was obtained, it implies that DSPBT(y):
has unbounded solution.
Benders proposed the solution of P: by a hierarchical algorithm that works on two levels:
i) The coordination level solves the problem CYBT: and generates a sequence of yk values;
ii) On the second level, yk is used as a parameter of the sub-problem SPBT (y): to generate a sequence of feasible
extreme points, k, and extreme rays, k, of the dual feasible zone of SPBT (y):, these vectors are used to include
cutting planes in CYBT:.
CYBT: includes two types of cuts. The first type, that we call optimality cutting planes (OCP) because the cut eliminates
values of y that can’t be optimal ; it has the following structure
Q(y) (k)T[b - F(y)] k IT (5)
The second type, the feasibility cutting planes (FCP) restricts the feasible zone of y to maintain feasible x in SPBT(y),
it has the following structure
0 (k)T[b - F(y)] kIN (6)
1.3. Duality Theory & Benders Theory
The following aspect must be considered to define the relation between Duality Theory and Benders Theory
1.3.1. Dual Coordinator Problem
For a linear coordinator problem, it can be solved directly in its dual form, we consider the case
LP: = { min z = cTx + f(y) |
F0 y = b0 ; A x + F y = b ; xR+ ; yS } (7)
Considering the primal solution of LP: on each iteration BT incorporates constraints which implies that the dimension
of the basic solution is resized in each iteration. If the dual is solved this would grow in variables (columns generation)
keeping the size of the constant basic solution; then, may be more appropriate to work the model coordinator in the
dual version. If SPBT(y): is ever feasible ("relatively complete recourse") the feasibility cuts may be ignored; the DCY:,
the dual problem of CY: is
DCY: = { min z = b0T 2 + kITE [(k)T b ]T 3
k |
kITE 3k = 1
F0T 2 + kITE [(k)T F ]T 3
k c } (8)
The basic characteristics of this problem are:
▪ The dimension of 2 is equal to the dimension of b0, it corresponds to the dual variables of constraints F0(y) = b0
▪ The dimension of 3k is equal to the dimension of b and corresponds to the dual variables of the optimality cuts
DCY: = { min z = b0T 2 + kITE [(k)T b ]T 3
k + kITN [(k)T b ]T 4k |
kITE 3k = 1
F0T 2 + kITE [(k)T F ]T 3
k + kITN [(k)T F ]T 4k c } (9)
For the optimal solution of general problem P: the value of the objective’s functions of the primal and the dual problems
must be equal, that implies:
f(y) + Q(y) = b0T 2 + bT (10)
Considering that 2 corresponds to the dual variables of F0(y) = b0 , the dual variables () of the constraints Ax +
F(y) = b must satisfy
bT = bT kITE k 3k ]
= kITE k 3k
kITE 3k = 1 (11)
then is convex combination of dual variables of the optimality cuts. This expression implies that cuts generated by
the sub-problem in the coordinator, may be replaced by one single cut, generated based on the subrogation of all cuts,
where weights for each cut are established based on dual variable associated to the cut. This is consistent with the
results presented by different studies on the theory of Subrogate Mathematical Programming (SMP) (Greenberg &
Pierskalla, 1970) (Velasquez, 1986).
If we include feasibility cuts the structure of dual problem DCY is:
DCY: = { min z = b0T 2 + kITE [(k)T b ]T 3
k + kITN [(k)T b ]T 4k |
kITE 3k = 1
F0T 2 + kITE [(k)T F ]T 3
k + kITN [(k)T F ]T 4k c } (12)
where 4k exists for those cases in which wasn’t found feasible solution to the subproblem, its dimension is equal to
the dimension of b; the dual variables of constraints Ax + F(y) = b are
= kITE k 3k + kITN k 4
k (13)
1.3.2. Sub-Problem Degenerate Solutions
In a normal problem any change in the vector of resources (b - Fy) implies change in the value of the objective function;
then the Lagrange multipliers (dual variables) must be different from zero (0). The degeneration occurs when the
objective function is collinear with one of the active constraints; then, any infinitesimal change in some of the
components of the resources vector doesn’t imply change in the objective function value. Then, when the linear
subproblem SPBT(y): is degenerated, it implies that may has multiple dual solutions; hence a subproblem may generate
multiple optimality cuts. In this case many of the component of the vector of dual variables may be equal to zero.
Since the addition of “empty” cuts makes CYBT: harder to solve.
Magnanti and Wong (1981) proposed a seminal methodology to accelerate convergence of BT by strengthening the
generated cuts (pareto-optimal cuts). This case will be studied in a posterior numeral.
1.4. Benders Decomposition
When a problem P: has a dual-angular matrix structure that includes sub-problems diagonal matrix is possible the use
BT for its solution. P: has angular dual-diagonal structure when it can be expressed as:
P: = { Min z = i=1,N ciTxi + f(y) |
F0(y) = b0
Ai xi + Fi(y) = bi , i=1,N
xiR+ , i=1,N , yS } (14)
The matrix has the following structure (figure 2).
..
..
.
CoordinationVariables
x1 x3x2 xNy
..Coordination
Variables
x1 x3x2 xNy
DIAGONAL MATRIX
Figure 2. Dual-Angular Matrix
The index i is associated with sub-problems (areas related to industrial sectors, or to geographic areas, or to periods,
or to realizations of a stochastic process, or to a combination of them); then: i) y may be associated to
consumption/production of common resources, or to transfer of resources between areas, and ii) xi to the operation
within the area of action of the index i. The previous structure allows to break down the problem into multiple
subproblems such as shown below (figure 3).
Minx i diTxi |
Wixi = hi - Tiy
xi Si ; i1,N
Minx cTy + q |
Ay = b
q i ik (hi – Ti(y)) k 1,ITE
y R+
y1k
ik
ik
Minx diTxi
Wixi =
hi – Ti(y)
xi Si
yik
Minx diTxi
Wixi =
hi - Tiy +
xi Si
Minx diTxi
Wixi =
hi - Tiy +
xi Si
1k
i=1 i=W
Minx cTy + i qi |
Ay = b
qi ik (hi – Ti(y)) , i=1,N , k=1,ITE(i)
y R+
SBC Standard Bender’s Cuts
General
ik
Minx diTxi
Wixi =
hi - Tiy +
xi Si
Minx cTy + i qi |
Ay = b
qi ik (hi – Ti(y)) , i=1,N , k=1,ITE
y R+
DBCDecoupled Bender’s Cuts
General
UBCUnified Bender’s Cuts
Periods Random Scenarios
diTxi
Wxi = hi – Ti(y)di
Txi
Wixi = hi – Ti(y)
diTxi
Wixi = hi – Ti(y)
Figure 3. Benders Decomposition Cuts
There are, at least, three alternatives for implementation of the Decomposition Benders Theory:
▪ Standard Bender’s Cuts (SBC): It corresponds to the BT basic methodology, that resolves a single subproblem
that integrates all the subproblems and generated only one cut in each iteration.
▪ Decoupled Bender’s Cuts (DBC): It corresponds to a variation of BT that solves in each iteration a small
subproblem for each index i. DBC solves N problems and generates one decoupled cut for each index i; the cuts
are coupled in the objective function;
▪ Unified Bender’s Cuts (UBC): When the mathematical conditions are met, it corresponds to a variation of BT that
resolves in each iteration a subset of N subproblems and generates N decoupled cuts for each index i, the cuts are
coupled in the objective function. This type of cuts may be applied for i-indexes associate to periods or to random
scenarios, or a combination of both.
1.4.1. Standard Bender’s Cuts (SBC)
P: can be solved using the BT theory directly; y corresponds to the variables of coordination and xi to the coordinated
variables. Define Q(y) as the optimum value of the objective function corresponding to the problem over all xi, for a
given value of y
Q(y) = {min z = i=1,N ciTxi | Aixi = bi - Biy , i=1,N ; xiR+ , i=1,N } (15)
Using directly BT coordinator problem x is
CY: = { Min z = f(y) + Q(y) |
F0(y) = b0 , yS
Q(y) i=1,N (ik)T[bi - Fi(y)] , k=1,ITE
0 i=1,N (ik)T[b - F(y)] , k=1,ITN } (16)
where i represents the dual variables of the i-th set of restrictions and i the extreme rays for not feasible solutions.
The associated subproblem integrates all the xi.
1.4.2. Decoupled Bender’s Cuts (DBC)
Birge and Louveaux (1988) developed a multi-cut enhancement to BT, in which a separate optimality cut is constructed
for each subproblem considering that it is possible to decouple the subproblem to formulate the function Q(y) as the
sum of N functions Qi(y) each corresponding to a subproblem on xi.
Qi(y) = { Min ciTxi | Aixi = bi - Fi(y) ; xiR+ } (17)
Q(y) is equal to
Q(y) = i=1,N Qi(y) } (18)
The SPi(y) problem to calculate Qi(y) is formulated as
SPi(y): = {Min Qi(y) = ciTxi | Aixi = bi - Fi(y) ; xiR+ } (19)
and its dual problem
DSPi(y): = {Max Qi(y) = iT [bi - Fi(y)] | i
TAiT ci
T } (20)
As in the previous case, based on the theory of duality is known that
Qi(y) iT [bi - Fi(y)] (21)
fulfilling equality only for optimal i*. The model coordinator CY: can be formulated as
CY: = { Min z = f(y) + Q(y) |
F0(y) = b0 ; yS
Q(y) = i=1,N Qi(y)
Qi(y) (ik)T[bi - Fi(y)] , i=1,N , k=1,NTE(i)
0 (ik)T[bi - Fi(y)] , i=1,N , k1,ITN(i) } (22)
where ITE(i) represents the set of iterations in which the feasibility of SPi(y): has been achieved and ITN(i) of the
iteration in which the feasibility has not been achieved. This type of cuts is called Decoupled Benders Cuts (DBC).
The advantages of the decomposition approach are:
1. The original formulation does not consider the possibility of decomposition assuming a problem integrated to the
xi. Under the decomposition scheme, in the lower level a subproblem, of reduced complexity, is resolved by each
item associated with the index i.
2. In the original scheme, a single cut, that integrates all dual variables from all subproblems, is generated by each
iteration. The proposed formulation generated S decoupled cuts, one for each SPi(y):, that are coordinated by the
equation (9) that defines to Q(y). The difference is that a single cut act limiting the maximum of a summation,
and the decoupled cuts act limiting the summation of S maximums, which is a deeper condition;
3. In the decoupled the system, the information provided by each subproblem is independent of the others and there
isn’t a reason that obliges in an iteration, between coordinator and subproblems, to solve all subproblems. This
feature allows to implement solution schemes that only solve those subproblems which provide “more info”.
Although these advantages, it is convenient to consider that DBC may increase the computational effort required to
solve the master problem when it is MIP. Zang et al. (2018) present the impact of the multiples cuts in a stochastic
model in the electric sector, in two different experiments (figure 4.).
Figure 4. Speed-off of Multiple/Decoupled Cuts (Zang et al., 2018)
1.4.3. Unified Benders Cuts (UBC)
Unified Benders Cuts (UBC) theory corresponds to a case in which the sub problems SPi(y): belong to a family of
problems characterized by its matrix/vector elements; its application is possible when the index i is associated to
periods and/or random scenarios in a stochastic process. UBC are included by Chen and Powell (1997) in the CUPPS
algorithm and by Velasquez (2018) in the G-DDP algorithm. This topic is analyzed in a posterior chapter.
1.5. Multilevel Partition Benders
Velasquez (1995) study problems in which more than one level of coordination can be defined hierarchically. That is,
inside of a set of subordinated variables exist a relationship such that some variables function as coordinators of the
others (multi-dual angular matrix). For ease of presentation, first we analyze a case of three levels and subsequently
generalized the results to S levels (figure 5). The following sections only consider optimality cuts (OBC).
z xy
..
CoordinationVariables
HigherLevel
..Sub-problem
Variables
Linear Model
z xy
..
CoordinationVariables
HigherLevel
..
CoordinationVariables
IntermediateLevel
Sub-problemVariables
Lower Level
Linear Model
Linear Model
Figure 5. Multi Dual Angular Matrix
1.5.1. Benders' Tri-level Partition Theory
Consider the problem P: which can be partitioned into three hierarchical levels
P: = { Min cTx + eTw + f(y) |
F0(y) = b0
GW w + FZ(y) = bZ
A x + G w +F(y) = b
xR+ ; wR+ ; yS } (23)
where y corresponds to the general coordination variables, and w and x are coordinate variables by y; at the same time,
w may act as coordinator of x, once is defined the value of y. Appling BT, the first level coordinator model is
CY: = { Min w = Q(y) + f(y) |
F0(y) = b0 ; yS
Q(y) (xk)T(b - F(y)) + (w
k)T(bw - Fw(y)) , K=1,ITE } (24)
where x corresponds to the dual variables vector of the constraints Ax + Gw = b - F(y), w to the dual variables vector
of the constraints Gww = bw - Fw(y) and ITE to the number of cuts that have been generated from the lower level. Q(y)
is the sum of two functions Qx(y) which estimated the value of the cTx cost and Qw(y) for the cost of eTw.
Qx(y) = (x )T(b - F(y)) (25)
Qw(y) = (w)T(bw - Fw(y)) (26)
Consider the subproblem coordinated by y for {x,w} that provides feasible values for x and w
SP1(y) = { Min Q(y) = cTx + eTw |
Gww = bw Fw(y)
Ax + Gw = b - F(y)
xR+ ; wR+ } (27)
The dual of SP1(y): is
DSP1(y) = { Max xT(b - F(y)) +w
T(bw - Fw(y)) |
xTA cT
xTG + w
TGx eT } (28)
Since w coordinates to x is possible to solve SP1(y): using BT. Let us consider the coordinator problem on w
conditioned at a value of y
CW(y): = { Min eTw + W(w|y) |
Gww = bw - Fw(y) ; wR+
W(w|y) = {Min cTx | Ax = b - F(y) – Gw ; xR+} } (29)
W(w|y) function corresponds to the cost cTx(w|y) as a function of w when y is defined by the first level coordinator.
The subproblem for x is
SP2(w|y) = { Min W(w|y) = cTx | Ax = b - F(y) - Gw ; xR+ } (30)
The second level coordinator model CW(y): is formulated based on BT
CW(y): = { Min eTw + W(w|y) |
Gww = bw - Fw(y) ; wR+
W(w|y) (n)T(b - F(y) - Gw) , n=1,ITEx } (31)
where n represents the n-th vector of dual variables of restrictions Ax = b - F(y) - Gw that has been generated by
SP2(w|y):, and ITEx the total number of cuts.
The coordinator problem CW(y): and the coordinated problem SP1(y): are equivalent. For purposes of coordination
in CY:, x and w must be determined from the CW(y): solution. So, consider the dual problem of CW(y):
DCW(y): = { Max [n=1,ITEx q(n) n]T(b - F(y)) + wT(bw - Fw(y)) |
[n=1,ITEx q(n) n]TG + wTGw eT
n=1,ITEx q(n) = 1
q(n)R+ , n=1,ITEx } (32)
where q(n) is a component of the vector and corresponds to the dual variable of the n-th cut generated by the
subproblem SP2(w|y): dual:
T = { q(1), q(2), ... ,q(ITEx-1), q(ITEx) } (33)
In vector notation DCW(y): can be expressed as
DCW(y): = { Max T(ITEx)T (b - F(y)) + wT(bw - Fw(y)) |
T(IITEx)TG + wTGw eT
T 1 = 1 ; R+ } (34)
where k represents the matrix of all dual variables vectors that have been generated until the iteration k of CW(y):.
k = {1, 2, ... , k-1, k } (35)
and 1 corresponds to a vector with all its components equal to 1.
Given that DSP1(y): and DCW(y): are equivalent, it is possible prove that xk is a weighted sum of the dual variables
vector generated by SP2(w|y):, using as weighting factor the dual variables associated with the coordinator CW(y):
xk = n=1,ITEx(k) q(n) n = T ITEx(k) (36)
where ITEx(k) is the number of cuts that has been generated by SP2(w|y): in CW(y):, until the iteration k of CW(y):.
If Surrogate Mathematical Programming (SMP) is considered it is possible to take advantage of this relationship. SMP
prove that a set of constraints can be replaced by an equivalent restriction, generated from a convex combination of
restrictions, provided that the weights are collinear with the Lagrange multipliers for each constraint. Based on this
fact, the cuts generated by SP2(w|y): may be replaced by an equivalent subrogated cut (SBC) based on the subrogation
of all cuts, where the weights correspond to the dual variables associated with each cut. This occurs every time that
CW(y): gets an optimal point {x(y), w(y)} and returns a vector of dual variables to CY:.
The SBC synthesizes the information that has been processed in CW(y):. This may prevent that the number of cuts
coming from the lower level to exploit as advanced the optimization process, since whenever it begins a cycle of
optimization in CW(y): all generated cuts may be replaced by equivalent SBC that preserves the use of the memory
of the system.
The definition of xk is general to calculate in any coordinator the dual variables of the restrictions which are not
explicitly considered in it and which are managed at lower hierarchical levels. The dual variables of these restrictions
correspond to the subrogated vector of dual variables of the sub-problem. For the coordinator of higher level, they
correspond to the dual variables in the solution of the problem.
1.5.2. Benders' Multilevel Partition Theory
The extension of this theory, for cases in which exist more than two levels of coordination is direct. Each coordinator
of a lower level generates a cut to the top-level coordinator, summarizing information based on the subrogated vector
of dual variables; and on the lower level it may replace all cuts which so far have been used to generate the optimal
partial solution. In the case of S levels consider the problem P:
P: = { Min i=1,S ciTXi + f(y) |
F0(y) = b0 ;
AiXi + q=1,i-1 Ei,qXq + Fi(y) = bi i=1,S ;
xiR+ i=1,S ; yS } (37)
where y corresponds to the variables of coordination of first level, level 0, and xi to the variables of level i. Coordination
xS corresponds to the lower level, or primary level. The matrix of P: has a triangular structure in blocks (figure 6).
Linear Model Linear Model Linear ModelDiscrete
Linear Model Linear Model Linear Model
y x1 x2 x3 xs-1 xS
. . . . .
F0
F1
F2
F3
FS-1
. . . . .
FS
A1
A2
A3
AS-1
. . . . .
AS
. . . . . . . . . .
E2,1
E3,1
ES-1,1
ES,1
E3,2
ES-1,2
ES,2
. . . . .
. . . . . ES,S-1
Figure 6. Triangular Matrix
The model level 0 is
Cy: = { Min z = Q(y) + f(y) |
F0(y) = b0 ; yS
Q(y) i=1,S (i,1k)T(bi - Fi(y)) k=1,ITE } (38)
The coordinator associated with the variables xi, for i between 1 and S-1, is
Cxi(y,x1,x2, ... ,xi-1): = { Min ciTxi + Wi(xi|y,x1,x2, ... ,xi-1) |
Aixi = bi - q=1,i-1 Ei,qxq - Fi(y) ; xiR+
Wi(xi|y,x1,x2,...,xi-1) q=i+1,S (q,i+1)T(bq - Eq,ixq-1) k=1,ITEX(i+1) } (39)
where the q,ik corresponds to the subrogated vector of dual variables in level i associated with level restrictions q, and
complies with
q,ik = n=1,ITEX(i,k) qi(n) q,i+1
n = (ik)T q,i
k (40)
where qi(n) is a component of the vector i and corresponds to the dual variable of n-th cut generated by the
subproblem CXi+1(y,x1,x2, ... ,xi):
iT = { qi(1), qi(2), ... , qi(ITEX(i+1)-1), qi(ITEX(i+1)) } (41)
and the matrix ik groups the subrogated vectors of dual variables dual that have been generated in level i to the
iteration k of CXi(y,x1,x2, ... ,xi-1):, being ITEX(i,k) the total number of cuts.
q,ik = {q,i
1, q,i2, ... , q,i
ITEX(i,k)-1, q,iITEX(i,k) } (42)
The dual variables corresponding to the functional restrictions of the level i in the coordinator i are i,i
The subproblem primary SPS(y,x1,x2, ... ,xS-1): is
SPS(y,x1,x2, ... ,xS-1): = { Min cSTxS | ASxS = bS - q=1,S-1 ES,qxq - FS(y) ; xSR+ } (43)
It is equivalent to a coordinator of level i, evaluated for i equal to S, without include the cuts and the W() function.
Given the above equivalence, the formulation of the algorithm is performed in terms of problems coordinators.
In the original way to implement multilevel theory, each hierarchical level returns to the top level only when it has
obtained the optimal solution to the problem that is parameterized by the decision vector y, preset at higher levels, this
implies that at lower levels nested cycles is performed for the upper levels.
If the methodology is applied to a dynamic problem (figure 7), the result is like the so-called Nested Benders
Decomposition (NBD) theory; but multilevel theory lees restrictive because permits relation between two periods no
consecutives.
Miny f(y) + Q(y) |
F0 (y) = b0
y S
Q(y) qkk (b - F(y)) + zk (bz - Fz(y)) k=1,NP
Minz eTz + Qz(z) |
Gz z = bz - Fz(y)
z R+
h G z + Qz(z) h (b - F(y) ) h=1,NPx
Minx cTx |
A x = b - F(y) - G z
x R+
zk qkk
h
y
yz
LP
LPMIPNLP
MINLP
LP
Time
t=1
t=2
…t=S
Figure 7. Multilevel Nested (Dynamic) Benders
1.6. Multilevel Partition & Decomposition Benders Theory
There are many cases in which the combination of Partition and Benders Decomposition theories can be applied to
atomize large problems, speeding up the solution time. This topic will be studied in Chapter “The Future: Mathematical
Programming 4.0” (Velasquez 2019).
2. Techno-Economic Interpretation
There are several aspects that should consider when interpreting BT as a simple mathematical artifice instead
conceptualize it as a systemic vision of the organizations and business/industrial processes. These interpretations are
general of large-scale methodologies, they don’t depend of BT.
2.1. Taxonomy of Organizations
From the economic point of view, large scale methodologies allow to analyze systems at micro and macro level.
Holmerg (1995) analyzes the relationships between mathematical structures encountered in optimization problems and
in organizational structures; and it makes a parallel between organization charts and the flow of information and
hierarchical algorithms, which give rise to different interpretations depending on the mathematical methodologies.
A coordinator problem resembles the functions of headquarters that interact with the coordinated subproblems that
represent the subsidiaries, either by setting prices to use common resources (dual or "price-directive decomposition",
like Lagrangean Relaxation, LR) or by fixing the level of activities common to all dependencies (primal or "resource-
directive decomposition", like Benders Decomposition).
The proposals made by the "headquarters" are analyzed by subsidiaries who generate new information, determining
the level of activity in LR, and specifying the marginal costs/benefits in BT. Based on the information obtained in the
subsidiaries, the "headquarters" made a new proposal, fixing new prices or allocation new quantities.
Then, Benders coordinator represents an authority (a manager, a system operator, … ) that assign resources for many
agents in a market or in a supply chain. Then, the first level defines the vector y of resources assigned to the agents
(sectors, factories, departments, … ); the second level is a subproblem related with the agents that generates
information about the resources marginal productivity for each resource for each agent, or the prices that can pay the
agents for the resources, represented by the dual variables.
2.2. Cobb-Douglas Production Functions
In economics, the Cobb-Douglas (1928) function is a production function (Q), widely used to represent the relationship
between several final products and the use of technology inputs (T), labor (L) and capital (K). This type of aggregate
modeling permits to estimate a country's production function, as well is the expected economic growth, that is
Q = f (K, T, L, … ) (42)
This concept is also applicable to major industries. The
following graph, taken from Wikipedia, presents a typical
Cobb-Douglas function (figure 8), dependent on the (T)
labor and capital (K). In the case of an industrial system,
which produces a single product, the "optimal"
production function can be constructed from the
parametric analysis of an optimization model that
minimizes production cost for different values of the
quantity produced (Q); the solution of the problem fixes
the optimal quantities of resource (T, L, K) to be used.
Two functions are essential in the decision-making
process: i) the total cost and ii) the marginal cost; the cost
of production is the result of integrating the marginal cost
function from zero up to Q.
Source:https://www.econowiki.com
Source
Labor
Production
Figure 8. Cobb-Douglas Production Function
The marginal cost is associated to dual variable of
demand/production (Q) constraint, for linear systems
(such as the BT linear subproblems), it corresponds to a
step function whose integral is a function defined by the
intersection of the hiperplanos associated with each step
of the marginal cost function. For a real model, that has
hundreds, or thousands of final products is impossible to
define, explicitly, the production function; the
mathematical optimization models allow to determine
points of the production function corresponding to
optimal solutions for certain conditions of the production
environment.
f(x) Q(y2)
f(x) Q(y1)
f(x) Q(y3)
f(x) Q(y4)
BENDERS CUTS
f(Capital , Workforce ) Q(Production1 )
f(Capital , Workforce ) Q(Production2 )f(Capital , Workforce ) Q(Production3)f(Capital , Workforce ) Q(Production4)
f (Capital , Workforce , …)
Production
Figure 9. Economic Interpretation of Benders Theory
COBB-DOUGLAS APPROXIMATE PRODUCTION FUNCTION
Then, in BT the subproblem of the second level is associated to a linear production system; then the set of Benders
cuts represents partially the total cost function, for a full representation would be necessary to know all de Benders
cutting planes. In this case, the main activity of decision makers will be to find the part of the total cost function which
allows to make the best decision, so the decision-maker has a “smart oracle” (the subproblem) that answers its
questions about marginal costs.
2.3. Markets
An issue of special importance in the markets, for example the electricity market, is the analysis of market clearing in
which interact multiple agents, in this case the structure of the system could be conceptualized as presents the following
diagram.
The mathematical problem that must be solved can be formulated as
P: = { Min z = a=1,A caTxa + dTw |
D w + a=1,A Ba ya = bDEM
Aa xa + Fa ya = ba a=1,A
xaR+ a=1,A ; yaR+ a=1,A ; wR+ } (42)
where the vector xa represents the agent decisions, ya the purchasing decisions of the Independent System Operator
(ISO) and w the no-attended demand; D, Ba, Aa and Fa are matrix that represent the topology and the technology of
the agents, bDEM represents the demand of the market and ba the resources of each agent; ca represents the cost of the
agent a and d the deficit cost. This problem may be solved using Benders’ decomposition.
From the economic point of view, the set of hyperplanes that limit in the coordinator the variable Qa, defines the supply
function of agent. Then, BT can be interpreted as a conversation between the ISO and each agent to get information
that will allow ISO to build the supply functions of each agent; if ISO knows in advance the supply functions, it would
not require this conversation and it can determine the optimum strategy without resorting to an integral problem
involving explicit modeling of each agent, the ISO only requires to ask the agents by the marginal cost, price of offer,
for an amount ya. This corresponds to an auction process in which the ISO must obtain the lower cost to clear the
market. The following graph presents the decomposition scheme.
2.4. Multisectoral Planning
In the Chapter “The Future: Mathematical Programming 4.0” (Velásquez, 2019c) is presented some ideas about the
importance multisectoral planning using multilevel Benders Partition.
3. Generalizations & Extensions
This section introduces extensions of the Theory of Benders aimed to extend its use to non-linear subproblems.
3.1. Generalized Benders Decomposition
3.1.1. Basic Theory
Geoffrion (1972) generalized Benders approach (GBD, Generalized Benders Decomposition) to a broader class of
programs in which the subproblem need no longer be a linear program. Nonlinear convex duality theory was employed
to derive the equivalent master problem. In GBD the algorithm alternates between the solution of relaxed master
problems and convex nonlinear subproblems. GBD considered the optimization problem P: composed of two types of
variables: y corresponding to the coordination variables and x to the coordinated variables.
P: = { max f(x, y) | G(x, y) 0 ; xX ; yY } (43)
To convergence the GBD restricts the model on x to a convex problem for any value of yY, this space may be
continuous or discrete. The problem P: can split into two subproblems one over y and another on x. If we defined v(y)
as the optimal value of the objective function corresponding to the problem on x for a given value of y:
v(y) = { maxx f(x, y) | G(x, y) 0 ; xX } (44)
Then, it is possible to formulate a problem equivalent CY:
CY: = { max v(y) |
yY ; y ;
v(y) = { maxx f(x, y) | G(x, y) 0 ; xX } (45)
where corresponds to the set of y for which there is feasible solution x to P: this is
= { y | G(x, y) 0 ; xX } (46)
The subproblem SP(y): to evaluate v(y) is
SP(y): = { maxx f(x, y) | G(x, y) 0 ; xX } (47)
Consider the Lagrangean function of SP(y):
L*(x, | y) = f(x, y) - G(x, y) (48)
where corresponds to the Lagrange multipliers vector, which in accordance with conditions Karush-Kuhn-Tucker
(KKT, Karush 1939, Kuhn and Tucker 1951) should be positive and equal to
= v(y) (49)
At the optimal point of SP(y): the Lagrangean function is a maximum with respect to x. Additionally, v(y) is equal to
v(y) = L*(x, | y) = f(x, y) - G(x, y) ) (50)
All points (xk,k), optimum-feasible in SP(yk):, obtained for any value yk, must satisfied
v(y) L*(xk, k | y) = f(xk, y) - k G(xk, y) (51)
Therefore, the problem CY: can be written as
CY: = { max v(y) |
yY ;
v(y) L*(xk, k|y) = f(xk, y) - k G(xk, y) kIF } (52)
where IF represents the set of optimum-feasible points which are known as a result of resolving SP(Y):.
SP(yk): can has three possible solutions: i) unbounded, ii) feasible and optimal, and iii) unfeasible. In the event of an
unbounded solution in SP(yk): it can be concluded that P: is also unbounded. If SP(yk): has feasible and optimal
solution, then it provides information to generate an optimality cut in the feasible zone of y, this cut has the form
v(y) f(xk, y) - k G(xk, y) (53)
If SP(yk): hasn’t feasible solution, a cut should be included for reasons of the relationship between the area of feasibility
of y and the area of feasibility of x. The non-feasibility implies that it is not possible to satisfy
G(x, yk) 0 (54)
for all the vectorial functions that define the constraints. This means that for at least one restriction
gi(x, yk) 0 (55)
The condition of feasibility that should be imposed on y is expressed as:
L*(x, k|y) = supremeX ()T G(x, y) 0
= i = (56)
The GBD can be applied to an integrated subproblem or to a subproblem with several problems of the same hierarchy.
The rules for the decomposition are like the principles considered in standard Benders decomposition (figure 10).
DYNAMIC
Maxx
fi (xi,yk)
|
Gi(x,yk) 0
xiX
yki
k
Maxx
fi (xi,yk)
|
Gi(x,yk) 0
xiX
Maxx
fi (xi,yk)
|
Gi(x,yk) 0
xiX
1k N
k
i=1 i=m
(ik)T Gi(xi
k,y) 0
ik k i
k =
kINF(i) , iN
qi(y) f(xik,y) - i
k G(xik,y)
kIF(i), iN
MaxY i qi(Y) |
y Y
xNk
xikx1
k
LP
LPMIPNLP
MINLP
STATIC
ykPrimal
Variables
k
xk
Dual &Primal
Variables
MaxY q(Y) |
y Y
q(y) f(xk,y) - k G(xk,y)
kIF
(k)T G(xk,y) 0
k k ki =
kINF
Maxx f(x,yk)
|
G(x,yk) 0
xX
Figure 10. Generalized Benders Decomposition
To show the efficiency of GBD, Cai et al. (2001) applied GBD to solve Nonconvex Nonlinear Programming (NLP)
problems arise frequently in water resources management (e.g., reservoir operations, groundwater remediation, and
integrated water quantity and quality management). Such problems are usually large and sparse. Existing software in
2001, for global optimization cannot cope with problems of this size, while current local sparse GAMS NLP solvers,
MINOS or CONOPT, cannot guarantee a global solution. Cai et al. implemented the GBD using a cuts approximation
proposed by Floudas et al. (1989) and Floudas (1995); they introduce slack variables, penalizing these slacks in the
objective function. If the complicating variables are carefully selected, GBD leads to solutions with excellent objective
values in run times much less than GAMS using a non-linear solver. They concluded that GBD can be used to search
for at least approximate global solutions to models with nonlinear and nonconvex constraints. The following table
shows the comparison, the mean speed-up of GBD is 23.0 to MINOS and 5.2 to CONOP (table 2).
Table 2. Speed-up Generalized Benders Decomposition (Cai et al., 2001)
Case GBD MINOS-5 CONOP-2
Mean Ratio Mean Ratio
Case 4-1 20.5 70.6 3.4 25.9 1.3
Case 4-2 18.6 739.7 39.8 136.6 7.3
Case 4-3 23.9 536.6 22.5 202.7 8.5
Case 4-4 19.8 523.9 26.5 74.8 3.8
MEAN 20.7 467.7 23.0 110.0 5.2
3.2. Benders Integer Linear Subproblems
One of the methods used to incorporate the ability to use the BT with integer subproblems is the incorporation of
Gomory Cutting Planes (GCP).
The method of Gomory (1958, 1960) to generate cutting
planes is a procedure for obtain integer solutions using a
modify continuous linear algorithm; it works solving,
initially, a continuous linear problem; then, checking if
the solution founded is an integer solution then it is the
optimal integer solution; if it is not integer, a new
restriction (GCP) that cuts the continuous solution
obtained is added, but GCP doesn’t cut any integer point
of the original feasible region. This is repeated until an
integer solution is get it. GCP redefine the feasible
continuous zone by an "integer convex hull" containing
all the integer solutions, and whose extreme points, in the
optimal correspond to an integer solution.
Source:https://www.semanticscholar.org/
Figure 11. Gomory Cutting Planes – Integer Convex Hull
In the case of subproblems with integer/binary variables we can use Gomory concepts to build a linear continuous
sub-problem equivalent to the MIP sub-problem. In the solution of the sub-problem, in the first step is resolved the
linear problem relaxing the integer/binary character of the variables; if the solution is integer the control returns to the
coordinator problem, if it is not, GCP are introduced; when the integer solution is gotten a Benders cutting plane is
generated in the coordinator problem. The values of the sub-problem dual variables are valid because was generated
by an equivalent continuous problem, and the generated Benders cut is valid. The figure 12 presents a flowchart of the
implementation.
Miny f(y) + Q(y)
|
F0 (y) = b0
y S
Q(y) k (b - F (y)) k=1, ITERATIONS
ykPrimalVariables
xkE
Gk+1 x = dGk+1
Min cTx
|
A x = b - F (y)
x R
Gk x + Hk W = dGk
DualVariables
No
xkPrimalVariables
GENERATEGOMORY CUTS
GomoryCuts
Yes
BENDERSSUBPROBLEM
LPMIPNLP
MINLP
LP
k
Figure 12. Benders-Gomory Cuts
There are several algorithms to solve MIP subproblems, Ralphs and Hassanzadeh (2014) (table 3) presents a review
of the main approaches made until 2014 on solution of two-stage stochastic mixed linear programming (2SLP:) models
that has mixed and/or binary variables, including the case of a pure integer second-stage problem.
2SLP: = { Min z = fTy) + i=1,N qi ciTxi |
Ai xi = bi - Fi y , i=1,N
xiR+ , i=1,N ; yS } (57)
Table 3. Ralphs and Hassanzadeh – Report 2014
Paper
1st stage 2nd stage Stochasticity
Real Integer Binary Real Integer Binary Matrix
Ai Matrix
Fi RHS
bi Cost
ci
Laporte and Louveaux (1993)
Care and Tind (1997)
Caroe and Tind (1998)
Caroe and Schultz (1998)
Schultz et al. (1998)
Sherali and Fraticelli (2002)
Ahmed et al. (2004)
Sen and Higle (2005)
Sen and Sherali (2006)
Table 3. Ralphs and Hassanzadeh – Report 2014
Paper
1st stage 2nd stage Stochasticity
Real Integer Binary Real Integer Binary Matrix
Ai
Matrix
Fi
RHS
bi
Cost
ci
Sherali and Zhu (2006)
Kong et al. (2006)
Sherali and Smith (2009)
Yuan and Sen (2009)
Ntaimo (2010)
Gade et al. (2012)
Trapp et al. (2013)
Ralphs and Hassanzadeh (2014)
Gade et al. (2014) (table 4) presents the speed-up provided by the its proposed methodology compared with CPLEX
solver, the problem application was the Stochastic Server Location Problem (SSLP). It is evident that Benders-ISP is
faster than CPLEX.
Table 4. Benders-Gomory Cuts for Two-Stage Stochastic Integer Subproblems
Problem CPLEX Benders-ISP
Time/Ratio Time (secs) GAP (%) Time (secs) GAP (%)
SLP-5-25-50 2.03 0 0.18 0 11.28
SSLP-5-25-100 1.72 0 0.22 0 7.82
SSLP-5-50-50 1.06 0 0.27 0 3.93 SSLP-5-50-100 3.56 0 0.48 0 7.42
SSLP-5-50-1000 212.64 0 2.88 0 73.83
SSLP-5-50-2000 1020.54 0 5.73 0 178.10
Mean 206.93 0 1.63 0 127.21
SSLP-10-50-50 801.49 0.01 109.20 0.02 7.34 SSLP-10-50-100 3667.22 0.10 218.42 0.02 > 16.79
SSLP-10-50-500 3601.32 0.38 740.38 0.03 > 4.86
SSLP-10-50-1000 3610.06 3.56 1615.42 0.02 > 2.23 SSLP-10-50-2000 3601.55 18.59 2729.61 0.02 > 1.32
Mean 3056.33 0.04528 1082.61 0022 > 2.82
3.3. Benders Dual Decomposition
Rahmaniani et al. (2018a) formulate the called Benders Dual Decomposition (BDD), for it they apply Lagrangean
duality to subproblem reformulation to price out the coupling constraints (z = y) that link the local copies z to the
master variables y. This allows to impose the integrality requirements on the copied variables to obtain MIP
subproblems, which are comparable to those defined in Lagrangean Dual Decomposition (LDD, Ruszczynski 1997,
Rush and Collins 2012, Ahmed 2013).
3.3.1. Benders Dual Decomposition Theory
The sub-problem considered in BDD is
SPBDD(y): = { min Q(y) = cTx | Ax = b - F(z) ; xR+ ; F0
(z) = b0 ; z = y ; zR } (58)
The optimality cut can be formulated as
Q(y) ≥ cT xk + (y-zk)T k (59)
where represents the dual variables vector of the coupling constraint (z = y) and define a sub-gradient of the objective
function. If the subproblem SPBDD(y): hasn’t feasible solution, the feasible cut can be written as
0 ≥ 1T vk+ - 1T vk- + (y-zk)T k (60)
where 1 is a vector with all elements equal to one (1) and vk+ and vk- are artificial variables vectors of the following
problem
FSPBDD(y): = { Minx,z,v 1Tv+ + 1Tv- | Ax (v+ - v-) = b - F0(z) ; F0(z) = b0
z = y ; xR+ ; zR } (61)
the coordinator problem can be formulated as
CYBDD: = { min z = f(y) + Q(y) | F0(y) = b0 , yS
Q(y) ≥ cT xk + (y-zk)T k k IT
0 ≥ 1T vk+ - 1T vk- + (y-zk)T k kIN } (62)
It is important to note that the optimality and feasibility cuts including values of subproblem primal variables, x and
z, like in the case of GBD, then it is possible call this type of cuts as Generalized Benders Cuts (GBC) that include
dual and primal variables of the subproblems. This approach produced strongest cuts that standard Benders.
Because of obtaining these MIP subproblems, BDD strategy efficiently mitigates the primal and dual inefficiencies of
the BT method. Also, in contrast to the LDD method, BDD does not require an enumeration scheme (e.g., branch-and-
bound) to close the duality gap.
Furthermore, BDD strategy enables a faster convergence for the overall solution process. In summary, the main
contributions of BDD are the following:
▪ Proposing a family of strengthened optimality and feasibility cuts that dominate the classical Benders cuts at
fractional points of the MP;
▪ Showing that the proposed feasibility and optimality cuts can give the convex hull representation of the MP at the
root node, i.e., no branching effort being required;
▪ Producing high quality incumbent values while extracting the optimality cuts; and
▪ Developing numerically efficient implementation methodologies for the proposed decomposition strategy and
presenting encouraging results on a wide range of hard combinatorial optimization problems.
3.3.2. Benders Dual Decomposition Implementation
Below, the cases analyzed by Rahmaniani et al. (2018) to test the convergence and the speed-up of BDD methodology
are presented. Three type of problems were tested:
▪ FMCND (Fixed-charge Multicommodity Capacitated Network Design)
▪ CFL-S (Stochastic Capacitated Facility Location)
▪ SNI (Stochastic Network Interdiction, Pan and Morton, 2008)
In all methods, cuts (both feasibility and optimality) are generated by solving each subproblem within an optimality
gap of 0.5%. Moreover, to generate the Lagrangean cuts for the FMCND and CFL instances, Partial Relaxed
Subproblems approach is applied (some of the integers variables are relaxed).
Rahmaniani et al. (2018a) implemented four variants of the strategy:
▪ BDD1: uses the strengthened Benders cuts by imposing the integrality requirements on all the copied variables
▪ BDD2: uses the strengthened Benders cuts by imposing the integrality requirements on a subset of the copied
variables
▪ BDD3: like BDD1 but also generates Lagrangean cuts
▪ BDD4: like BDD2 but also generates Lagrangean cuts.
The table 5 shows the relative speed-up of the four BDD variations compared with standard Benders (BT).
Table 5. Benders Dual Decomposition (Rahmaniani et al., 2018a)
Problem
BT LDD BDD1 BDD2 BDD3 BDD4
Gap
%
Time
secs
Gap
%
Time
secs
Gap
%
Time
secs
Gap
%
Time
secs
Gap
%
Time
secs
Gap
%
Time
secs
FMCND 20.66 181.48 9.01 3129.79 16.36 574.21 16.62 577.1 6 2240.46 5.83 2065.15
CFL-S 18.61 60.37 10.3 3679.93 17.81 205.65 17.82 185.48 1.47 1877.97 1.23 2134.24
CFL 20.17 1.8 0.09 0.08 19.83 2.28 19.82 2.12 3.2 112.39 5.28 148.76
SNI 29.68 130.1 27.12 3832.34 29.67 176.9 29.67 156.62 20.7 1111.22 20.7 1134.86
The table 6 shows the relative speed-up of the two BDD variations compared with CPLEX; the GAP tolerance was
1%.
Table 6. Benders Dual Decomposition versus CPLEX (Rahmaniani et al., 2018a)
Case
CPLEX BBD4 BBD3 (Reference)
Time
(s)
Gap
(%) #Sol.
Ratio
Time
Ratio
GAP
Time
(s)
Gap
(%) #Sol.
Ratio
Time
Ratio
GAP
Time
(s)
Gap
(%) #Sol.
FMCND 11142.26 3.8 26/35 1.12 1.96 7992.16 1.66 30/35 0.80 0.86 9976.44 1.94 30/35
CFL-S 1261.71 0.07 16/16 3.34 28.00 210.88 0.0025 16/16 0.56 1.00 377.27 0.0025 16/16
SNI 36156.89 23.79 0/70 5.00 22.23 8356.91 1.15 57/70 1.16 1.07 7226.1 1.07 52/70
3.4. Logic Based Benders Decomposition
Logic-Based Benders Decomposition (LBBD) was introduced by Hooker and Yan (1995) in the context of logic circuit
verification. The idea was formally developed by Hooker (2000) and applied to 0-1 programming by Hooker and
Ottosson (2003). In LBBD, the Benders cuts are obtained by solving the inference dual of the subproblem, of which
the linear programming dual is a special case.
Fortunately, the idea of Benders Decomposition can be extended to a LBBD form that accommodates an arbitrary
subproblem, such as a discrete scheduling problem. Unlike classical Benders, LBBD provides no standard scheme for
generating Benders cuts. Cuts must be devised for each problem class including the dependence of the objective
functions. In the Chapter “Logic-based Benders Decomposition for Large-scale Optimization” is presented LBBD
directly by Professor Hooker (2019).
4. Dynamic and Stochastic Benders’ Theory
For this topic, the reader is invited to review the chapters:
▪ Stochastic Programming and Risk Management: Fundamentals (Velásquez, 2019a).
▪ Stochastic & Dynamic Benders Theory (Velásquez, 2019b).
5. Coordinator Enhancements
Mixed Integer Linear and Non-Linear Programs (MIP/MINLP) involving logical implications modelled through
integer or binary variables and big-M coefficients are hardest to solve. This section covers the improvements that can
be seen to accelerate the solution of the Benders coordinators that include discrete variables.
5.1. MIP/MINLP Coordinators
A MIP/MINLP coordinator CYBT: can be formulated as
CYBT: = { min z = f(y) + Q(y) | F0(y) = b0 ; yINTEGERS
Q(y) (k)T[b - F(y)] kIT
0 (k)T[b - F(y)] kIN } (63)
The solution of a MIP/MINLP problem can be divided into three stages: i) search for the feasibility, ii) search for
optimality, and iii) check that feasible solution is optimal.
NO CUTs OPTIMALITY EMPHASIS
LB (DUAL)
UB (PRIMAL)
LB (DUAL)
UB (PRIMAL)
1 SEARCHING FEASIBILITY
SEARCHING OPTIMALITY
PROBING OPTIMALITY
2
3
2
2
33
11
160 200
Figure 13. Convergence of MIP/MINLP problems - Vehicle Routing Problem
The behavior of the algorithm to solve MIP/MINLP depends on the specific problem, being "impossible" to
characterize a general behavior. The figure 13 presents two possible behaviors of solving process for the VPR (Vehicle
Routing Problem) using two different sets of parameters with CPLEX solver, easily seen that the results of the
parameters used in the solver, and not only the type of problem, affects the MIP-GAP. In several cases, the main
problem of the MIP/MINLP coordinators is related to the large times required to test the optimality of a feasible
solution, which arises from the amount of time spent in each stage. To improve the behavior of the coordinator, which
tends to be similar for families of problems, the modeler must know its behavior to apply the most appropriate
improvements, considering that it is not possible to have a dominant approach.
The improvement of the coordinator can be considered from three points of view:
▪ Temporary relaxation of the discrete character of the coordinator problem
▪ Modify the standard cuts to improve the re-optimization when inserting a cut
▪ Stop the optimization process when the solver has a feasible solution and the gap is small but greater than zero
These enhancements can be used individually or collectively, the decision is based on empirical experience of the
modeler and not in formal mathematical proofs.
5.1.1. Multi-Phase Coordinator
Considering that the Benders cuts generated for a relaxed MIP/MINLP coordinators are valid for MIP/MINLP
coordinators, this improvement is based on dividing the optimization process in, at least, two phases (figure 14):
▪ Phase 1 relaxes the integer character of the coordinator to quickly derive valid cuts, during this phase first it is
solved the LP relaxation of the MP with the classical Benders cut. The older reference that knows the author about
of this strategy is presented by McDaniel and Devine (1977); it has become one of the main methods used to
efficiently apply the Benders algorithm on numerous MIP/MINLP applications, see (Rahmaniani et al. 2017).
▪ Phase 2 solves integer problems without relaxation to strengthen optimality.
This is motivated by the fact that, at the initial iterations of the BT, the master solutions are usually of very low quality.
At this point, the derived cuts provide a poor approximation of the optimal objective function; the idea is that the
relaxed model is solved in a "short-time" and can generate "large" number of cuts which serve to give "contour" to the
subproblem objective function represented by the Benders cuts. The process is convergent, but the moment that must
be passed the 1st phase to the 2nd relies on empirical knowledge, a valid approximation may be the size gap in the
Benders cycle.
However, there is a problem that it should keep in mind that this related to the possibility of infactibilidades, or of
irrationalities, in the subproblems, since the primal solutions provided by the relaxed coordinator are not integers;
when this occurs, an alternative is to relax the integrality requirements only on a subset of the integer variables.
Miny f(y) + Q(y)
|
F0 (y) = b0
yINTEGERS
Q(y) k (b - F (y))
k=1, FIRST-ITERATIONS
Min c x
|
A x = b - F (y)
x R+
yPrimalVariables
DualVariables
RELAXED
MIPMINLP
LP
MIPMINLP
LP
Miny f(y) + Q(y)
|
F0 (y) = b0
yINTEGERS
Q(y) k (b - F (y))
k=1, ALL-ITERATIONS
Min c x
|
A x = b - F (y)
x R+
yPrimalVariables
DualVariables
FIRST ITERATIONS FINAL ITERATIONS
Figure 14. Enhancements MIP/MINLP Benders Coordinators - Relaxing Coordinator
Rahmaniani et al. (2018a) propose a three-phase strategy to generate the Benders cuts. The proposed multi-phase
implementation works as follows.
▪ Phase 3 generates Lagrangean. To do so, the Lagrangean dual problem SPBDD(y) is solved defining a trust region
(stabilization) for the Lagrangean multipliers. Details of this methodology are in Rahmaniani et al. (2018a)
5.1.2. Modified Optimality Cuts
To accelerate the process of optimization, mainly, of MIP/MINLP problems the standard Benders cuts can be
reformulate as
Q(y) + QAk ≥ k (b - F (y))
QBk (k)T[b - F(y)]
QAk ≥ 0 ; QBk ≥ 0 (64)
where QAk and QBk represent artificial variables that ensures that the solution yk is feasible for the cut included in the
iteration k+1 (QAk+1 > 0 or QBk+1 > 0), and it can be used as starting point of the search in the iteration k+1; then
penalizations must be included in the objective function of the coordinator problem, this is
CYBT: = { min z = f(y) + Q(y) + kIT ∞ QAk + kIN ∞ QBk | F0(y) = b0 ; yS
Q(y) + QAk (k)T[b - F(y)] k IT
QBk (k)T[b - F(y)] kIN } (65)
The objective of this enhancement is to reduce the time to get the feasibility in the coordinator problem.
5.1.3. Inexact Solutions
Inexact solutions of the master problem are related to the MIP-GAP in the coordinator, often GAP is rapidly reduced,
but the solver consumes much time to prove that the solution obtained is optimal or to get a “slightly better” solution
that can be optimal. In this case BT can work with a dynamic tolerance, qk, that is changing as advancing the
optimization process, this can be expressed as:
[ { f(yk) + Q(yk) } - { f(y*) + Q(y*) } ] / [ f(y*) + Q(y*) ] ≤ qk (63)
where yk is in the feasible qk zone near to optimal point yk*; the series of values qk must be positive and tend to zero
as that increase the iterations (k).
q q
q
q
q∞
The main idea is to quickly generate good tentative master solutions that can be used to obtain “good” Benders cuts
in the subproblems based in two main guidelines: i) cuts should be generated with a reasonable computational effort,
and ii) cuts should be like the solutions that would be obtained with an exact solution of the master problem.
Costa et al. (2012) developed a Benders approach based on “inexact solution” that they called Benders with extra-cuts
and applied it to Fixed-charge Network Design (FND) problem, a total of 54 instances were used in the experiments;
table 7 shows the resume of the speed-up that generates the inexact solutions (BT-IS).
Table 7. Speed-up Inexact Solutions (Costa et al., 2012)
Solution Standard Benders
Benders Inexact Solutions Ratio
BT/BT-IS BT-IS BT-IS
Time
(secs)
GAP
(%)
Time
(secs)
GAP
(%)
Yes Yes 6081.02 1617.19 3.76
(times)
No No 21600 7910.26 2.73 (times)
No No 49.45 27.59 1.79
(%)
The results are presented in three groups according to
whether the problem was solved. The figure 15 presents
the relationship between the solution times that increases
linearly as function of the complexity.
y = 4.9651x - 44.285R² = 0.9538
0
500
1000
1500
2000
2500
0 100 200 300 400 500
CPLEX(secs)
BT-IS(secs)
0
25
50
75
100
0 10 20 30 40 50 60
Figure 15. Relation CPLEX versus Inexact Solutions
5.1.4. Inexact Cuts
Early termination of the subproblems generated during BT iterations decomposition produces valid cuts (if we are
working with a dual feasible algorithm, that preserves in all iterations the dual feasibility) which are inexact in the
sense that they are not as constraining as cuts derived from an exact solution. This approach is equivalent to relax the
primal feasible zone of the sub-problem by a factor k the following formulation shows the differences.
SPBT(y): = { min Q(y) = cTx | Ax – [ b - F(y) ] = 0 ; xR+ } (64)
SPBT-IC(y): = { min Q(y) = cTx | k ≤ Ax – [ b - F(y) ] ≤ k ; xR+ } (65)
where k is subproblem feasibility tolerance that must be
positive and tend to zero when increase the iterations (k).
1 ≥ 2 ≥
3 ≥ …
k ≥
k
≥
0 (66)
Philpott et. al (1996) present an algorithm and the
convergence conditions, Zakeri et. al. (1999) uses a
primal-dual interior point algorithm (baropts) to make
experiments with model for stochastic planning of hydro-
electric power generation systems of New Zealand.
The table 8 shows the results, the gain of the BT-IC is
evident.
Table 8. Speed-up Inexact Cuts
Problem BT
(secs) BT-IC (secs)
Ratio
BT/BT-IC
(times)
Improvement
(BT - BT-IC)/BT
(%)
P1 170 68 2.50 60.00
P2 261 159 1.64 39.08 P3 124 109 1.14 12.10
P4 640 398 1.61 37.81
P5 594 546 1.09 8.08
P6 626 585 1.07 6.55
P7 324 150 2.16 53.70 P8 376 304 1.24 19.15
P9 1207 1087 1.11 9.94
P10 979 780 1.26 20.33 P11 150 134 1.12 10.67
Total 5451 4320 1.26 20.75
5.1.5. Combinatorial Benders Cuts
Codato and Fischetti (2006) proposed a generic problem reformulation, of quite general applicability, aimed at
removing the model dependency on the big-M coefficients, used in standard MIP formulations.
The master solutions are sent to a slave linear problem, which validates them and possibly returns combinatorial
inequalities to be added to the coordinator. The inequalities are associated to minimal (or irreducible) infeasible
subsystems of a certain linear system and can be separated efficiently in case that the coordinator solution is integer.
The overall solution mechanism resembles Benders Partitioning, but the cuts produced are purely combinatorial. This
produces an LP relaxation of the coordinator problem which can be considerably tighter than the one associated with
big-M formulation.
For ease of explanation, initially, we considered the following a problem P: which doesn’t include variables of the
subproblems in the objective function
P: = { min z = f(y) | F0(y) = b0 ; Ax + F(y) = b ; xR+ ; y{0,1} } (67)
in this case the subproblem is formulated as
SPBT(y): = { min Q(y) = 0Tx | Ax = b - F(y) ; xR+ } (68)
Because there is not an objective function for the subproblem SPBT(y):, any feasible solution is a solution to the
subproblem. If the subproblem hasn’t feasible solution, it implies that the combination of coordinator binary variables
yk is not feasible, then a feasible cut that eliminates the combination must be included in the coordinator yk; if the
subproblem has feasible solution, then the combination yk is the optimal solution of the problem P:. The Benders
feasibility cut may be formulated as
iCIT(k) xik ≤ | CIT(k) | - 1 (69)
where the index i represents i-th the variable in the vector x and CIT (k) the set of binary variables equal to 1 in the
cycle k of the algorithm; this cut force that at least one of the positive variables of the vector xk become zero. Then the
CBC substitute the original feasibility Benders cut. In case that the P: objective function includes the vector y, the
optimality Benders cut is equal to the standard Benders optimality cut. The formulation is:
CYBT: = { min z = f(y) + Q(y) | F0(y) = b0 ; yS
Q(y) (k)T[b - F(y)] k IT
iCIT(k) xik ≤ | CIT(k) | - 1 kIN } (70)
Codato and Fischetti compared the CBC with CPLEX v8.1 in terms of execution times and integrality gaps computed
with respect to the best integer solution found, for three cases:
1. Instances solved to proven optimality by CBC but not by CPLEX (table 9)
2. Instances solved to proven optimality by both CBC and CPLEX (table 10)
3. Instances not solved (to proven optimality) by CBC nor by CPLEX
Notice that there wasn’t instance that was solved by CPLEX but not by CBC.
Table 9. NP-hard Problems Solved only by
CBC. (Codato and Fischetti, 2006)
Problem CPLEX
GAP (%)
Statistical Classification
Chorales-134 51
Chorales-107 57
Breast-Cancer-600 99
Bridges-132 85
Mech-analysis-152 45
Monks-tr-124 70
Monks-tr-115 69
Solar-flare-323 90
BV-OS-376 65
BusVan-445 77
MEAN 70.8
Map Labelling
CMS-600-0 1.35
CMS-650-0 1.88
CMS-650-1 0.46
CMS-700-1 2.04
CMS-750-1 1.63
CMS-750-4 1.9
CMS-800-0 3.49
CMS-800-1 2.04
Railway 8.42
CMS-600-0 10.5
CMS-600-1 6.19
MEAN 3.63
Table 10. NP-hard Problems Solved by CPLEX & CBC.
(Codato and Fischetti, 2006)
Problem CPLEX CBC Time Ratio
CPLEX/CBC
Statistical Classification
Chorales-116 1:24:52 10:18 8.2
Balloons76 0:10 0:14 71.4
BCW-367 8:33 0:13 39.4
BCW-683 2:02:29 0:32 229.7
WPBC-194 57:17 3:32 16.2
Breast-Cancer-400 2:50 0:16 1062
Glass-163 56:17 0:05 675.4
Horse-colic-151 4:50 0:23 12.6
Iris-150 9:29 1:10 8.1
Credit-300 19:35 0:02 587.5
Lymphography-142 0:11 0:01 11
Mech-analysis-107 0:05 0:01 5
Mech-analysis-137 7:44 0:27 17.2
Monks-tr-122 2:05 0:05 25
Pb-gr-txt-198 4:21 0:05 52.2
Pb-pict-txt-444 2:07 0:02 63.5
Pb-hl-pict-277 4:17 0:27 9.5
Postoperative-88 15:16 0:01 916
BV-OS-282 5:13 0:24 13
Opel-Saab-80 1:03 0:13 4.8
Bus-Van-437 9:17 0:28 19.9
HouseVotes84-435 4:59 0:11 27.2
Water-treat-206 1:10 0:06 11.7
Water-treat-213 17:00 0:51 20
MEAN 18:23 00:50 21.93
Map Labelling
CMS-600-1 1:08:41 0:04:34 15
Computational results indicate that Combinatorial Benders Cuts (CBC) produces a reformulation of Benders’ Theory
which can be solved some orders of magnitude faster than the original MIP model using one of the best commercial
solvers like CPLEX. The figure 16 shows two detailed cases.
Figure 16. Speed-up Combinatorial Benders Cuts
PROBLEM: BRIDGES-132PROBLEM: CHORALES-116
CPLEX/CBC ≈ 9.16 CPLEX/CBC >> 45
COMBINATORIAL BENDERS CUTS (CBC) - CASE: NO COST FOR SUBPROBLEM VARIABLES
5.2. Trust Region (Regularization)
Considering that BT iteratively solves a problem of a non-differentiable convex optimization due to the function Q(y)
is a piece-wise linear function, can be considered mathematical conditions related to the sub-gradients, or super-
gradients, Q(y). From this point of view, should be imposed conditions of regularization of the step size of the
algorithm to avoid oscillations and get stronger convergence properties.
The first variation to consider is the proposal by Linderoth and Wright (2001), known as the "trust region" that it is a
kind of regularizing technique adapted from regularized decomposition for continuous problems which helps to
mitigate two kinds of difficulties in cutting plane methods:
▪ Growth in the number of cuts added to the master problem and
▪ The fact that there is no easy way to use a good starting solution.
The existing literature shows that solution oscillates wildly in early iterations. Thus, trust region may be used to limit
the early movements of variables (continuous, integers and binaries) around a previous point yk
5.2.1. Neighborhood Bounding
This variation adds a hypercube bounding the maximum difference between the solution of the coordinator in the stage
k and the solution in the previous stage k-1, introducing the restrictions in the form of bounds
− 1 y – yk-1 1 (71)
where is 1 a vector of unit components and a vector of appropriate multipliers, which altogether determines the size
of the "trust region".
The coordinator model is formulated as
CY(yk-1)TR: = { Min f(y) + Q(y) |
F0(y) = b0
Q(y) (k) T[b - F(y)] k=1,ITE
0 (vk) T[b - F(y)] kITN
− 1 y – yk-1 1 } (72)
The definition of is carefully analyzed by Linderoth J. and Wright (2001), who proposed a double-loop: one in BT
and another in the determination ; there are many possibilities to determine the size of the "trust region" including
the adjusted in each dimension depending on the behavior of the algorithm. In form of bounds is relatively 'easy' and
can be effectively.
5.2.2. Penalizations Movements
Another approach is called regularized decomposition (Ruszczynski, 1986) who introduce a quadratic term in the
objective function to penalize the difference between yk and yk-1; in each cycle the coordinator objective function is
½ k
(y - y k-1)T(y - y k-1) + f(y) + Q(y) (73)
where is k a positive penalty factor, whose determination is part of the algorithm. In this case the coordinator
corresponds to a quadratic problem
CY(yk
)DR: = { Min ½ k
(y - y k-1)T(y - y k-1) + f(y) + Q(y) |
F0(y) = b0
Q(y) (k)T[b - F(y)] , k=1,ITE
0 (vk)T[b - F(y)] , kITN } (74)
Zaourar and Malick (2015) report experiments that show the speed-up of the BT as consequence of implementation
of the regularization method; they used two set of standard problems related with Hub Locations Problem (table 11)
and with Network Design Problem (table 12). The advantages of regularization are evident.
Table 11. Hub Location Problems (Zaourar and Malick, 2015)
Nodes Transfer
Cost Standard Stabilized
Ratio
Standard/Stabilized
10
0.1 1.06 0.91 1.16
0.5 1.28 0.89 1.44
1 1.07 0.75 1.43
15
0.1 5.31 3.01 1.76
0.5 5.27 5.86 0.90
1 4.52 5.83 0.78
20
0.1 21.72 16.61 1.31
0.5 16.83 14.24 1.18
1 14.3 13.94 1.03
25
0.1 58.66 35.18 1.67
0.5 52.58 34.91 1.51
1 46.31 28.7 1.61
30
0.1 112.08 144.47 0.78
0.5 97.72 96.28 1.01
1 97.11 96.11 1.01
35
0.1 296.61 182.69 1.62
0.5 183.46 116.71 1.57
1 177.94 110.59 1.61
40
0.1 467.17 498.91 0.94
0.5 351.77 310.24 1.13
1 306.04 336.76 0.91
MEAN 110.42 97.79 1.13
Table 12. Network Design Problems. (Zaourar and Malick, 2015)
Nodes Commodities Standard Stabilized Ratio
Standard/Stabilized
5
5 0.27 0.31 0.87
10 0.38 0.07 5.43
15 0.58 0.12 4.83
20 0.69 0.08 8.63
8
5 1.24 0.65 1.91
10 42.13 53.43 0.79
15 72.49 60.6 1.20
10
5 7.09 3.95 1.79
10 555.79 252.69 2.20
15 20099.7 20289.8 0.99
12 5 37.58 12.8 2.94
10 34267.4 10661.6 3.21
15 5 677.5 53.54 12.65
20 5 10796.2 1481.89 7.29
MEAN 4754.22 2347.97 2.02
5.2.3. Binary Variables
For problems with binary variables in the first stage, Santoso et al. (2005) and Oliveira et al. (2014) showed that the
2-norm or infinity-norm distance is not effective. Therefore, Yang et al. (2016) uses Hamming distance, the trust
region is defined by the following equation
jIB1(k) (1- yj) + jIB0(k) yj ≤ k (75)
where IB1(k) represents the set of binary variables equal to 1 in the iteration k and IB0(k) the complementary set
(binary variables equal to 0); k limits the number of variables that can change from iteration k to iteration k+1. The
trust region cannot guarantee convergence (Keller and Bayraksan, 2009); therefore, the trust region constraint must be
dropped once the procedure has reached certain criteria.
Santoso et al. (2005) make experiments with Stochastic Supply Chain Network Design Problem and considered two
problems: domestic and global supply chain; for trust region only was used the domestic cases, whose dimensions are
presented in the table 13.
Table 13. Stochastic Supply Chain Network Design Problem. (Santoso et al., 2005)
Scenarios
Domestic
Constraints Variables
Equality Inequality Continuous Binary
1 3,498 4,324 20,912 140
20 69,960 86,480 418,240 140
40 139,920 172,960 836,480 140
60 209,880 259,440 1,254,860 140
We selected combinations of the acceleration schemes using by Santoso et al. to solve instances of 20 scenarios,
oriented to evaluate the marginal gain generated by Hamming Trust Region. The IDs of acceleration schemes are
denoted as follows: LC (Logistics constraints); TR (Hamming Trust region); KI (Knapsack inequalities) and UH
(Upper-bounding heuristic). All the results are coherent (table 14), except in the case of KI and TR that the inclusion
of TR implies less time (coherent) but more GAP (incoherent). This shows that in mathematical programming is easy
found cases in which we have little incoherencies.
Table 14. Speed Up Hamming Trust Region
Acceleration
Scheme
1st GAP
%
10th GAP
%
Time
secs Iterations
Ratio
Time
Marginal Ratio
Time
Ratio 10th
GAP
Marginal Ratio
10th GAP
Standard 100 60 > 4000 30 > 2.90 0
6000 - 2000
Standard + TR 100 40 > 4000 30 > 2.90 4000
LC 31 8 > 4000 30 > 2.90 0
800 - 730
LC + TR 31 0.70 > 4000 30 > 2.90 70
LC + KI 31 0.10 3860 26 2.80 - 0.19
10 + 10
LC + KI + TR 31 0.20 3600 23 2.61 20
LC + KI + UH 31 0.01 1500 8 1.09 - 0.09
1 0
LC + KI + UH + TR 31 0.01 1380 7 1 1
6. Cuts Enhancements
The traditional Benders decomposition might fail to achieve the computational efficiency; within the context of
generating more effective cuts, most researchers have sought to generate a set of “strong” cuts at each iteration, or by
modifying the way that Benders cuts are generated.
6.1. Strong Cuts
6.1.1. Pareto Optimal (POP)
Considering the possible degeneration of the Benders primal subproblem, Magnanti and Wong (1991) proposed a
seminal methodology to accelerate Benders convergence by strengthening the generated cuts. In linear continuous
problem, the degeneration implies that the subproblem has multiple dual solutions; hence a subproblem may generate
multiple optimality cuts, with many of its components equal to 0 (zero). Since, the addition of “empty” cuts generated
during iterations makes the MP harder to solve. Among these feasible cuts, one cut may dominate another one;
choosing the best one from these alternative cuts would be beneficial in solving the MP by reducing the number of
iterations.
Magnanti and Wong define a Pareto-OPtimal cut (POP) if it is not dominated by any other cut that may be solution of
the degenerate subproblem, this cut is also called the “deepest” cut; to calculate this cut they used a core point yC that
is a point in the interior of the convex hull of the feasible region for the original problem. It is intractable to obtain a
core point, fortunately many researchers have demonstrated that many points which are close to core points can
generate strong cuts.
A core point yC
must be a solution of the constraints that define the coordinator without Benders cuts.
F0
(yC) = b0 ; yC S (76)
A strong cut must be solution of the following problem
DPOP(yk)= { Max
(b
- F(yC)) | A
≤ c
Q(yk) = cT xk = (b - F(yk)) } (77)
where yk represents the primal values defined by the coordinator in the stage k and xk the solution of the sub-problem
depending on yk. The following diagram presents flow of the optimization process; alternatively, it is possible to solve
the dual problem of DPOP(yk)
PPOP(yk): = { Minx cTx + Q(yk) w |
A
x + (b - F
(yk)) w = (b - F(yC)) ; x R+ ; w R } (78)
Miny f(y) + Q(y)
|
F0 (y) = b0
y S
Q(y) k (b - F (y)) k=1, ITERATIONS
ykPrimalVariables
Max (b - F(yC)) |
A c
cTx* = (b - F (yk))
Min cTx |
A x = b - F (y)
x R
STRONGCUTS
kStrongDual
Variables
BENDERSSUBPROBLEM
cTx*
Value Objective Function
LPMIPNLP
MINLP
LP LP
Figure 17. Pareto-Optimal Benders Cuts
To obtain a POP, may be needed to solve twice the number of subproblems; there is a trade-off between the CPU time
saving for solving the master problem and the CPU time consumed for obtaining the POP cuts.
Yang et al. (2016) study the speed-up generates by POP cuts using cases associated with Process Flexibility Design
(PFD) problem that is related with a supply chain with m plants that can produce n types of products (table 15).
Table 15. Speed-up Pareto Optimal Cuts (Yang et al., 2016)
Case POP BT Ratio BT/POP
(times) 𝑛/𝑚 GAP
Time
secs Iterations GAP
Time
secs Iterations
4/4 5x10-4 26 16.5 1.1x10-3 15 16.4 0.58
5/5 3x10-4 123 35.6 1.2x10-3 265 54.3 2.15
5/7 5x10-3 174 38.7 3.13x10-2 1769 88.5 10.17
6.1.2. Other Cuts
Based on the research presented by Magnanti & Wong multiple works has been done in the same direction, the
following is a brief reference to several of the proposals made
▪ Papadakos Cuts: Papadakos (2008) highlights that the Magnanti–Wong's cut generation problem dependency on
the solution of SP can sometimes decrease the algorithm's performance. To circumvent this difficulty, the author
showed that one can obtain an independent formulation of Magnanti–Wong cut generation problem by dropping
the constraint that implies in the former dependency on the solution of the subproblem. The author also provided
guidelines for efficiently generating additional core points though convex combinations of previously known
cores points and feasible solutions of the MP.
▪ Maximal Non-Dominated Cuts (MND): More recently, Sherali and Lundy (2011) presented a different strategy
for generating non-dominated cuts using small perturbations on the right-hand side of the SP to generate
maximal non-dominated Benders cuts. The authors also showed a strategy based on complementary slackness
that simplifies the cut generation when compared with the traditional strategy used by Magnanti and Wong.
▪ Dynamically Updated Near Maximal Cuts (DUNM): Oliveira et. al (2014) presents the theory to formulate
Dynamically Updated Near Maximal Cuts (DUNM) as an alternative way of dealing with the difficulty related to
the correct definition of weight μ.
Oliveira et. al (2014) show a comparison between the alternatives. The figure shows the ratio-time when DUNM cut
is selected as refence. The conclusions are clear:
▪ The MIP solver (CPLEX) is faster for low number of scenarios, this is due to the overhead that implies the use of
more sophisticated methodologies
▪ CPLEX has a polynomial crescent; then, for large number of scenarios the solution time tends to infinite.
▪ MND has a better performance that POP
▪ POT and MND has a “stable” performance that implies that DUNM is approximately 2.5 times faster.
y = 0.0021x + 2.4333
y = -0.0009x + 2.4888
y = 2E-05x2 + 0.0165x - 0.0383
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0 20 40 60 80 100 120 140 160 180 200
RATIO SOLUTION TIME (times) - REFERENCE DUNM
POP MND DUMN CPLEX
Linear (POP) Linear (MND) Poly. (CPLEX) Scenarios
Figure 18. Speed-up Benders Cuts: POP, MND, DUMN, CPLEX
6.1.3. Hybrid Cuts
As its name implies, hybrid cuts are related to the mixing of two or more mathematical methodologies in a process
defined for each BT researcher; This means it is not easy to generalize and to standardize the concept of hybridization,
since each case depends on the purpose sought by the authors of the hybrid cuts to be included in a
partition/decomposition Benders process.
Jain & Grossmann (2001) present results of their work whose goal was to develop models and methods that use
complementary strengths of Mixed Integer Programming (MIP) and Constraint Programming (CP). A scheduling
model is formulated as a hybrid MIP/CP model that involves some of the MIP constraints, a reduced set of the CP
constraints, and equivalence relations between the MIP and the CP variables.
The approach relaxes the integrality constraints in the master problem and send a primal solution to the subproblem;
if there exists a feasible solution, then this solution is the optimal solution of the problem and the optimization process
ended. Otherwise the causes for infeasibility are inferred as cuts and added to the coordinator:
jIX1(k) (1- yj) + jIX0(k) yj ≤ Bk (79)
where IX1(k) represents the set of binary variables equal to 1 in the iteration k, and IX0(k) the complementary set of
binary variables equal to 0 and Bk the norm of IX1(k), this is
Bk = | IX1(k) | (80)
These general “no good" cuts may be rather weak; then whenever possible stronger cuts that exploit the special
structure of the problem they should be used. The results of the work done by Jain and Grossmann (2001) integrating
cuts obtained of CP with BT for the case of binary subproblems in a job shop scheduling problem (table 16).
For each problem was considered two data sets. For technologies was
compared: i) MIP using CPLEX 6.5 (single processor), ii) CP Solver, iii)
Hybrid model (integrating CPLEX and CP) and i) Hybrid Benders Cuts. All
experiments were run on a dual processor SUN Ultra 60 workstation.
The table 17 show a resume of the results; the conclusion is evident: at least
from 2000, BT enhanced with Hybrid Cuts may solve NP-hard problems
faster than the “best” commercial solver. Further numerical results for this
problem with different data are reported in Harjunkoski et al (2000).
Table 16. Job Shop Scheduling Problem.
(Jain and Grossmann, 2001)
Problem Orders Machines
1 3 2 2 7 3
3 12 3
4 15 5 5 20 5
Table 17. Speed-up Hybrid Benders Cuts. (Jain and Grossmann, 2001)
Problem Set
Mixed Integer Programming
CPLEX Solver
Constrain Programming
CP Optimizer
Hybrid Optimization
CPLEX+CP
Benders Hybrid Cuts
Time
Time Ratio Time Ratio Time Ratio MILP CP Total
1 1 0.01 0.50 0 0 0.04 2.00 0.02 0 0.02
2 0.03 3.00 0.02 2 0.05 5.00 0.01 0 0.01
2 1 0.47 0.90 0.04 0.076923 0.10 0.19 0.47 0.05 0.52
2 0.49 24.50 0.14 7 0.27 13.50 0.01 0.01 0.02
3 1 220.0 52.63 3.84 0.91866 4.21 1.01 4.01 0.17 4.18
2 1.77 88.50 0.38 19 1.12 56.00 0.02 0 0.02
4 1 180.41 80.18 553.54 246.0178 91.59 40.71 2.01 0.24 2.25
2 61.82 1545.50 9.28 232 5.58 139.50 0.02 0.02 0.04
5 1 > 20000 1415.43 > 68853.49 4872.858 13736.06 972.12 13.69 0.44 14.13
2 106.28 259.22 2673.87 6521.634 170.95 416.95 0.29 0.12 0.41
Consolidated > 20571.29 > 952.37 > 72094.6 > 3337.713 14009.97 648.61 20.55 1.05 21.6
Performance 3 4 2 1
6.2. Hybrid Strategy
Considering that exists many alternatives to enhancement BT, speed-up and close the GAP, a hybrid strategy may
include several enhancements. Yang et al. (2016) analyzes many hybrids strategies result of the combination of the
following enhancement: Standard Benders (BT), Hybrid Cuts (HC), Trust Region (TR), Strengthening Cuts (SC),
Approximating Master Solve (AM), Warm Startup (WS) and Parallel Computation (PC). Different grouping schemes
could lead to different computational efficiencies.
Yang et al. (2016) realized experiments using the Process Flexibility Design (PFD) that is related with a supply chain
where there are m plants that can produce n types of products; in a balanced supply chain the numbers of products and
plants are equal, in a full-flexibility supply chain each plant is able to produce all products, and exists special cases
with more general settings; the problem is formulated as a two-stage stochastic program. The table 18 shows a resume
of all experiment realized, each experiment is characterized by the duple <n,m>.
Table 18. Hybrid Benders Strategies. (Yang et al., 2016)
Code Enhancements Time GAP
Ratio
Time/GAP Time GAP
Ratio
Time/GAP Time GAP
Ratio
Time/GAP BT HC PC TR AM WS SC (s) % times or % (s) % times or % (s) % times or %
SMALL SIZE BALANCED PROBLEMS Products – Plants: <4,4> <5,5> <6,6>
MIP 143 0 9.53 2801 0.02 8.84 0.23
Single-cut 1 96 0 6.40 1324 0.03 4.18 1.31
HC 1 39 0 2.60 475 0 1.50 0.18
HC-PC 1 1 15 0 1 374 0 1.18 0.17
HC-PC-TR 1 1 1 15 0 1 284 0 0.90 1402 0.07 0.40
HC-AM 1 1 41 0 2.73 416 0 1.31 3562 0.03 1.02
HC-PC-AM 1 1 1 15 0 1 317 0 1 3504 0.03 1
HC-PC-TR-AM 1 1 1 1 15 0 1 186 0 0.59 802 0.03 0.23
HC-WS 1 1 37 0 2.47 328 0 1.03 2897 0.03 0.83
HC-WS-AM 1 1 1 32 0 2.13 327 0 1.03 2518 0.03 0.72
HC-PC-SC 1 1 1 40 0 2.67 174 0 0.55 1661 0.03 0.47
MEDIUM AND LARGE SIZE UNBALANCED PROBLEMS Products – Plants: <8,8> <10,10> <20,20>
MIP 33.4 123.63 93.6 346.78 98.77 379.88
Single-cut 1 3.77 13.96 2.88 10.67 28.35 109.04
HC-WS 1 1 2.05 7.59 1.74 6.44 6.5 25.00
HC-WS-AM 1 1 1 2.02 7.48 1.74 6.44 7.16 27.54
HC-PC-TR 1 1 1 1.03 3.81 0.45 1.67 15.31 58.88
HC-PC-TR-AM 1 1 1 1 0.27 1 0.27 1 0.26 1
HC-PC-SC 1 1 1 2.44 9.04 0.83 3.07 13.28 51.08
NUMERIC STUDY RESULTS (UNBALANCED SYSTEMS) Products - Plants: <6,4> <8,5> <15,12>
MIP 583 0 6.48 0.07 95.88 138.96
Single-cut 1 2506 0.19 27.84 7.83 15.03 21.78
HC 1 579 0 6.43 5.05 13 18.84
HC-PC 1 1 430 0 4.78 5.05 13 18.84
HC-PC-TR 1 1 1 348 0 3.87 2052 0.01 1.12 0.94 1.36
HC-AM 1 1 485 0 5.39 3.87 6.13 8.88
HC-PC-AM 1 1 1 345 0 3.83 3.87 6.13 8.88
HC-PC-TR-AM 1 1 1 1 90 0 1 1831 0 1 0.69 1
HC-WS 1 1 271 0 3.01 0.22 2.62 3.80
HC-WS-AM 1 1 1 268 0 2.98 0.17 2.57 3.72
HC-PC-SC 1 1 1 513 0 5.70 4.32 9.55 13.84
The cases over blue (HC-PC-TR-AM) are the bests performance and over rose are the worst cases. For this case the
results are evident; i) the amount of enhancements included generate better performance of Benders technologies, and
ii) the MIP solver (CPLEX) can’t compete with the enhancement Benders methodologies. The case reference is HC-
PC-TR-AM. The maximum time 3600 (secs), when it is exceeded the ratio is calculated based on GAPs. The possibility
of using hybrid strategies depend on the ability to select and the mix the improvements that behaves better performance
for a group of problems or for a type of specific problem.
7. Benders Parallel Optimization
7.1. Parallel Optimization
In the prologue of the book “Parallel Optimization: Theory, Algorithms and Applications”, wrote by Censor and
Zenios (1997), the Professor Dantzig said: “the fascinating new world of parallel optimization using parallel
processors, computers capable of doing an enormous number of complex operations in a nanosecond”, additionally he
said “according to an old adage, the whole can sometimes be much more than the sum of its parts, I am thoroughly in
agreement with the authors, belief in the added value of bringing together applications, mathematical algorithms and
parallel computing techniques”. This is exactly what the mathematical modeler found true in Parallel Optimization.
Despite the time elapsed since the first applications of parallel optimization, in 1991, this methodology is only
beginning to develop since is recent the time in which multi-processing is massive and become the low-cost multi-
core computers. Therefore, it is expected that in the coming years the research on parallel optimization and speed of
solving complex problems increases significantly.
Consider that parallelization is not only limited to BT but that many of the concepts used are valid for the application
in other large-scale methodologies. The idea of parallelism is not new, since it is at the core of the decomposition via
BT, and practically born with the idea of Van Slyke and Wets in 1969; the new is the power of parallel computing to
which researchers have access. For a long time, ideas of implementing parallel algorithms on single-processor
computers it was merely academic; the real practice was only available to researchers that have access (money) to this
type of resource.
Below, table 19 shows some applications of parallelism using BT. The papers were select from a class of Professor
Linderoth in 2003 (Parallel and High-Performance Computing for Stochastic Programming, Course: Stochastic
Programming) and correspond to twelve papers that may the firsts papers in parallel stochastic optimization.
Table 19. Papers in Parallel Stochastic Optimization. (Linderoth in 2003)
Dantzig, G., J. Ho, and G. Infanger (1991, August). “Solving Stochastic Linear Programs on a Hypercube Multicomputer”.
Technical Report SOL 91-10, Department of Operations Research, Stanford University.
Ariyawansa, K. A. and D. D. Hudson (1991). “Performance of a Benchmark Parallel Implementation of the Van Slyke and Wets
Algorithm for Two-Stage Stochastic Programs on The Sequent/Balance”. Concurrency Practice and Experience. 3, 109–128.
Ruszczynski, A. (1993). “Parallel Decomposition of Multistage Stochastic Programming Problems”. Mathematical Programming
58, 201–228
Jessup, E., D. Yang, and S. Zenios (1994). “Parallel Factorization of Structured Matrices arising in Stochastic Programming”.
SIAM Journal on Optimization 4, 833–846.
Mulvey, J. M. and A. Ruszczynski (1995). “A New Scenario Decomposition Method for Large Scale Stochastic Optimization”.
Operations Research 43, 477–490.
Birge, J. R., C. J. Donohue, D. F. Holmes, and O. G. Svintsitski (1996). “A Parallel Implementation of the Nested Decomposition
Algorithm for Multistage Stochastic Linear Programs”. Mathematical Programming 75, 327–352. J.
Nielsen, S. S. and S. A. Zenios (1997). “Scalable Parallel Benders Decomposition for Stochastic Linear Programming”. Parallel
Computing 23, 1069–1089.
Gondzio, J. and R. Kouwenberg (1999, May). “High Performance Computing for Asset Liability Management”. Technical Report
MS-99-004, Department of Mathematics and Statistics, The University of Edinburgh.
Fragniere, E., J. Gondzio, and J.-P. Vial (2000). “Building and Solving Large-Scale Stochastic Programs on an Affordable
Distributed Computing System”. Annals of Operations Research 99, 167–187.
Linderoth, J. T. and S. J. Wright (2001, April). “Decomposition Algorithms for Stochastic Programming on a Computational
Grid”. Preprint ANL/MCS-P875-0401, Mathematics and Computer Science Division, Argonne National Laboratory,
Argonne, Ill.
Blomvall and P. O. Lindberg, “A Riccati-Based Primal Interior Point Solver for Multistage Stochastic Programming - Extensions,
Optimization Methods and Software”, (2002), pp. 383–407.
Linderoth, J. T., A. Shapiro, and S. J. Wright (2002, January). “The Empirical Behavior of Sampling Methods for Stochastic
Programming Optimization”. Technical Report 02-01, Computer Sciences Department, University of Wisconsin-Madison.
In the case of parallel optimization, the modeler must then consider at least the following aspects:
1. Timing: two cases must be considered:
▪ Synchronous, in this case settled points (marks) in the processes to synchronize the results of a phase/stage;
for example, an L-Shaped problem can be solved using N processors, one for each scenario/slave problem
and the coordinator problem must wait until all subordinate problems have been resolved in any iteration.
This approach implies that the processors that ended the optimization before that last processor have idle time
while waiting to receive new information from the coordinator. An important advantage of the synchronous
method is the possibility of ensuring that the results are repeatable, which in many cases is a feature that is
required, mainly in degenerate cases or non-optimal solutions.
▪ Asynchronous, in this case the coordinator does not have to expect that all the slave problems are solved and
can generate new primal information when it considers that there is enough new information (cutting planes)
that justified a new optimization. In this case is minimized the processors idle time, which should minimize
the total time of completion of the optimization process; but it isn’t sure. This approach involves designing a
dynamic strategy for processors to assign roles to the processors during the optimization process.
2. Processor Role: is related to specific problem that must be solved by a specific processor in a specific time
▪ Static: the assignment is made at the beginning of the optimization process and remains static throughout the
process. It applies to synchronous cases.
▪ Dynamic: the assignment is carried out considering the events occurring and the status of the processors; its
implementation requires an additional task responsible for making assignments of roles to processors. Its
design has no preset rules and depends of the art/knowledge of the modeler and the knowledge of the problem
behavior. Creativity is crucial in this process since the number of variations that can be deployed may be
infinite or nonenumerable. For example, for many scenarios, the modeler can think about having more than
one processor responsible to generate primal variables to the subproblems, this implies more than one
coordinator.
In the Chapter “The Future: Mathematical Programming 4.0” (Velásquez, 2019c) if presented some ideas about the
importance of parallel optimization in the short term.
7.2. The Asynchronous Benders Decomposition Method
Below is briefly, the last work published on this topic (founded by the author). Rahmaniani et al. (2018b) present the
called The Asynchronous Benders Decomposition Method (ABD) compare the asynchronous and hybrid parallel
algorithms versus the latest version of CPLEX.
Rahmaniani et al. (2018b) describe the state-of-the-art in Benders parallelization as: “the existing parallel BT method
can be summarized as follows: The MP (master problem) is assigned to a processor, the “master", which also
coordinates other processors, the “slaves", which solve the SPs (subproblems). At each iteration, the solution obtained
from solving the MP is broadcast to the SPs. They then return the objective values and the cuts obtained from solving
the SPs to MP and the same procedure repeats. Such master-slave parallelization schemes are known as low-level
parallelism as they do not modify the BT algorithmic logic or the search space (Crainic and Toulouse 1998)”.
They present several strategies to specify the Scheduling and Pool Management decisions; Pool Management implies
manage the pool of solutions (denoted by S1, S2, …, ) and the pool of cuts (denoted by C1, C2, C3, … ):
1. Solution and Cut Pool Management: considering the previously partially evaluated solutions and the new one at
the current iteration, ABD need to decide which solution to choose and evaluate its associated (unevaluated) SPs.
At each iteration, the master process broadcasts its solution to all slave processors. Each slave processor stores
this solution in a pool and follows one of the following strategies to pick the appropriate one:
▪ S1: chooses solutions based on the FIFO rule;
▪ S2: chooses solutions based on the LIFO rule;
▪ S3: chooses solutions in the pool randomly.
ABD was test with two selection rules: i) each solution in the pool has an equal chance to be selected, and ii) each
solution is assigned a weight of 1/(1+k) , where k is number of the iterations since that specific solution has been
generated, so that more recent solutions have higher chance to be selected. Moreover, ABD use local branching,
Rei et al. (2009), to identify the solutions which are no longer required to be evaluated. Finally, ABD make use
of techniques to manage the cut pool to eliminate the dominated cuts.
2. ABD has implemented static work allocation because by equally distributing the scenario SPs, every process is
almost equally loaded. Once the solution is chosen, ABD need to decide the order by which the associated SPs
will be evaluated, because ABD may not evaluate all of them and it is important to give higher priority to those
which tighten the master (MP) formulation the most.
The following strategies are considered:
▪ SP1: randomly choose the SPs;
▪ SP2: assign a weight to each SP and then randomly select one based on the roulette wheel rule. The weights
are set equal to the normalized demands for each SP.
▪ SP3: ABD observe that if a solution is infeasible, ABD may not need to solve all its SPs. This strategy first
orders the SPs based on their demand sum and then assigns to each SP a criticality counter which increases
by one each time that the SP is infeasible. Then, a SP with the highest criticality value is selected.
3. Solving the MP. This dimension specifies the waiting portion of the master processor before it re-optimizes the
MP. ABD have proposed the following strategies:
▪ MP1: the master processor waits for at least % new cuts at each iteration;
▪ MP2: the master processor waits for % of the cuts associated with the current solution;
▪ MP3: this strategy is the same as the MP2 strategy, but with a mechanism to synchronize the processors
according to the current state of the algorithm. In this regard, if the cuts added to the MP fail to affect the
lower bound and/or regenerate the same solution, the MP waits until all the cuts associated with the current
solution are delivered.
To test the quality of the results, ABD was used to solve the Multi-Commodity Capacitated Fixed-charge Network
Design Problem with Stochastic Demand (MCFNDSD) that implies a MIP coordinator. To conduct the numerical
tests, was solve R instances which are widely used in the literature, these instances have up to 64 scenarios.
Rahmaniani et al. present the analysis of the implementation of several of improvements to BT in environments of
parallelism; such interesting results are not presented in this document; here we only show some results related whit
parallelism. The results indicate that ABD reaches higher speedup rates compared to the conventional parallel methods.
ABD also show that it is several orders of magnitude faster than the state-of-the-art solvers (CPLEX). ABD algorithm
(and its variations) runs until reaching the same optimality gap which is obtained by CPLEX after 10 hours. Note that
all ABD algorithms run on 15 processors. The average speedup rates are reported in the next figure.
519505
420
486
300
241
115
20
459
412
355
437
282
230
99
19
0
100
200
300
400
500
600
r04 r05 r06 r07 r08 r09 r10 r11
Speed-up Rate vs CPLEX
Hybrid
Asychronous
1.00
1.73
2.45
2.943.05 3.10
1.46
2.40
3.60
3.954.08
4.44
1.67
2.75
2.97
3.833.91
4.05
0.00
1.00
2.00
3.00
4.00
2 3 5 10 15 20
Speed-up Rate vs Processors
Synchronous
Hybrid
Asychronous
Source: The Asynchronous Benders Decomposition MethodR. Rahmaniani, T. Crainic, M. Gendreau, W. ReiJanuary 2018, CIRRELT 2018-07
Figure 19. Speed-up Parallel Benders Decomposition
The procedure described by Rahmaniani et al. is an example of a strategy, empirical, addressing the problem of
parallelization using large-scale methodologies. The best rule is the experience that is gaining to solve problems in
parallel form and to study what other researchers contribute and have socialized at the scientific community.
8. Conclusions
Nothing is required, nothing is enough, everything is useful
The synthesis of the state-of-the-art applications using Benders Theory is presented below
1. Since its formulation in 1962, it has proven to be an effective methodology to solve complex problems that cannot
be solved using only “best” basic optimization algorithms
2. Algorithms based in Benders’ Theory can solve NP-hard problems in reasonable time, it has proven to be an
effective methodology to solve complex problems that cannot be solved using “best” mathematical solvers.
3. Benders Theory is a mature methodology that are in the accelerated growing phase; there are many possibilities
to research in Benders Theory.
4. There is a GAP between the research in mathematical programming and the application of the large-scale
methodologies in real world solutions.
5. There is a GAP between the education on mathematical programming and the application of the large-scale
methodologies in real world solutions. For many young professionals the references of mathematical
programming are the basic solvers, but the reality is that the reference must be, at least, BT.
6. Benders Theory its Variations and enhancements speed up significatively the Mathematical Programming
Algorithms
7. Benders Theory is fundamental to use the power of the actual computer technologies based in multiples cores and
large storages of RAM memory.
8. It is necessary to socialize (make easy) the use of BT for the standard professionals, like the use of the basic
solvers.
References
1. Ahmed, S. (2013). A Scenario Decomposition Algorithm for 0–1 Stochastic Programs. Operations Research
Letters 41(6):565-569.
2. Ahmed, S., Tawarmalani, M., and Sahinidis, N. (2004). A Finite Branch-and-Bound Algorithm for Two-Stage
Stochastic Integer Programs. Mathematical Programming, 100(2):355-377.
3. Benders, J. F. (1962). Partitioning procedures for Solving Mixed Variables Programming Problems. Numer.
Math 4, 238-252.
4. Birge, J. R. and Louveaux, F. V. (1988). A Multicut Algorithm for Two-Stage Stochastic Linear Programs.
European Journal of Operational Research, 34(3): 384-392, 1988.
5. Cai, X., McKinney, D. Lasdon L. and Watkins D. (2001). Solving Large Nonconvex Water Resources
Management Models using Generalized Benders Decomposition. Operations ResearchVol. 49, No. 2, March–
April 2001, pp. 235–245
6. Caroe, C. and Schultz, R. (1998). Dual Decomposition in Stochastic Integer Programming. Operations Research
Letters, 24(1):37-46.
7. Caroe, C. and Tind, J. (1997). A Cutting-Plane Approach to Mixed 0-1 Stochastic Integer Programs. European
Journal of Operational Research, 101(2):306-316.
8. Caroe, C. and Tind, J. (1998). L-Shaped Decomposition of Two-Stage Stochastic Programs with Integer
Recourse. Mathematical Programming, 83(1):451-464.
9. Censor, Y. and Zenios, S. (1997). Parallel Optimization: Theory, Algorithms and Applications. Publisher:
Oxford University Press. Series on Numerical Mathematics and Scientific Computation (1997).
10. Chen Z-L. and Powell W. (1998). A Convergent Cutting-Plane and Partial-Sampling Algorithm for Multistage
Stochastic Linear Programs with Recourse. Department of Civil Engineering and Operations Research
Princeton University Princeton, NJ 08544 Technical Report SOR-97-11.
11. Cobb, C. W., Douglas, P. H. (1928). "A Theory of Production". American Economic Review. 18
12. Codato, G. and Fischetti, M. (2006). Combinatorial Benders' Cuts for Mixed-Integer Linear Programming).
Operations ResearchVol. 54, No. 4 Volume 54, Issue 4, July-August 2006, Pages ii-812
13. Costa, A., Jean-Francois Cordeau, J. F, Gendron, B, and Laporte, G. (2012). Accelerating Benders
Decomposition with Heuristic Master Problem Solutions. Pesquisa Operacional (2012) 32(1): 3-19. © 2012
Brazilian Operations Research Society
14. Crainic, T. and Toulouse, M. (1998). Parallel Metaheuristics. Book: Fleet Management and Logistics. Editors:
Crainic T. and Laporte, G. Springer, Boston, MA, 205-251.
15. Floudas, C. (1995). Nonlinear and Mixed-Integer Optimization. Oxford University Press, New York.
16. Floudas, C., Aggarwal, A. and Ciric, R. (1989). Global Optimum Search for Nonconvex NLP and MINLP
Problems. Computers & Chemical Engineering. 13 1117–1132.
17. Gade, D., Küçükyavuz, S. and Sen, S. (2014). Decomposition Algorithms with Parametric Gomory Cuts for
Two-Stage Stochastic Integer Programs. Mathematical Programming, April 2014, Volume 144, Issue 1–2, pp
39–64
18. Gade, D., Kucukyavuz, S., and Sen, S. (2012). Decomposition Algorithms with Parametric Gomory Cuts for
Two-Stage Stochastic Integer Programs. Mathematical Programming, pages 1-26.
19. Geoffrion, A. M. (1972). Generalized Benders Decomposition. Journal of Optimization Theory and
Applications,10 237–259.
20. Gomory, R. (1958). Outline of an Algorithm for Integer Solutions to Linear Programs. Bulletin of the American
Mathematical Society 64(5), 275{278 (1958)
21. Gomory, R. (1960). An Algorithm for the Mixed Integer Problem. Tech. Rep. RM-2597, RAND Corporation
(1960)
22. Greenberg H. and Pierskalla, W.P. (1970). Subrogate Mathematical Programming. Operations Research 18
(1970), 924-939.
23. Harjunkoski, I., Jain, V. and Grossmann, I. E. (2000). Hybrid Mixed-Interger/Constraint Logic Programming
Strategies for Solving Scheduling and Combinatorial Optimization Problems. Computers & Chemical
Engineering July 2000 24(2):337-343
24. Holmberg K. (1995). Primal and Dual Decomposition as Organizational Design: Price and/or Resource
Directive Decomposition. In: Burton R.M., Obel B. (eds) Design Models for Hierarchical Organizations.
Springer, Boston, MA
25. Hooker, J. and Ottosson, G. (2003) Logic-Based Benders Decomposition. Mathematical Programming, April
2003, Volume 96, Issue 1, pp 33–60
26. Hooker, J. N. (2000) Logic-Based Methods for Optimization: Combining Optimization and Constraint
Satisfaction, Wiley (2000).
27. Hooker, J. N. and Hong Yan. (1995) Logic Circuit Verification by Benders Decomposition, in V. Saraswat and
P. Van Hentenryck, eds., Principles and Practice of Constraint Programming: The Newport Papers, MIT Press
(Cambridge, MA, 1995) 267-288.
28. Hooker, J. N. (2019). Logic-based Benders Decomposition for Large-scale Optimization. In the Large-Scale
Optimization in Supply Chain & Smart Manufacturing: Theory & ApplicationBook Springer (2019).
29. Jain, V. and Grossmann, I. E. (2001). Algorithms for Hybrid MILP/CP Models for a Class of Optimization
Problems. INFORMS Journal on Computing Vol. 13, No. 4, Fall 2001 pp. 258–276
30. Karush, W. (1939). "Minima of Functions of Several Variables with Inequalities as Side Constraints". M.Sc.
Dissertation. Dept. of Mathematics, University of Chicago, Chicago, Illinois.
31. Keller, B. and Bayraksan G. (2009). Scheduling Jobs Sharing Multiple Resources UNDER Uncertainty: A
Stochastic Programming Approach. IIE Transactions, 42:1, 16-30, DOI: 10.1080/07408170902942683
32. Kong, N., Schaefer, A., and Hunsaker, B. (2006). Two-Stage Integer Programs with Stochastic Right-Hand
Sides: A Superadditive Dual Approach. Mathematical Programming, 108(2):275-296.
33. Kuhn, H. W.; Tucker, A. W. (1951). "Nonlinear Programming". Proceedings of 2nd Berkeley Symposium.
Berkeley: University of California Press. pp. 481–492. MR 0047303.
34. Laporte, G. and Louveaux, F. (1993). The L-shaped for Stochastic Integer Programs with Complete Recourse.
Operations Research Letters, 13(3):133-142.
35. Linderoth, J. T. (2003). Parallel and High-Performance Computing for Stochastic Programming. Course:
Stochastic Programming, Lecture 13.
36. Linderoth, J. T. and S. J. Wright (2001). Decomposition Algorithms for Stochastic Programming on a
Computational Grid. Preprint ANL/MCS-P875-0401, Mathematics and Computer Science Division, Argonne
National Laboratory, Argonne, Ill.
37. Magnanti, T. and Wong, R. (1981). Accelerating Benders Decomposition: Algorithmic Enhancement and
Model Selection Criteria. Operations Research, 1981, Vol. 29, No. 3
38. McDaniel, D and Devine M. (1977). A Modified Benders’ Partitioning Algorithm for Mixed Integer
Programming. Management Science, 1977. 24: 312–319.
39. Ntaimo, L. (2010). Disjunctive Decomposition for Two-Stage Stochastic Mixed-Binary Programs with Random
Recourse. Operations Research, 58(1):229-243.
40. Oliveira, F., Grossmann, I. E. and Hamacher, S. (2014). Accelerating Benders Stochastic Decomposition for
the Optimization under Uncertainty of the Petroleum Product Supply Chain. Computers & Operations Research
49 (2014) 47–58
41. Pan, F. and Morton, D. P. (2008). Minimizing a Stochastic Maximum-Reliability Path. Networks 52(3):111–
119.
42. Papadakos, N. (2008). Practical Enhancements to the Magnanti–Wong Method. Operations Research Letters
Volume 36, Issue 4, July 2008, Pages 444-449
43. Philpott, Andrew B., Ryan, David M. and Zakeri, G. (1996). Inexact Cuts in Stochastic Benders’
Decomposition. 32nd ORSNZ Conference Proceedings, 29-30, August 1996.
44. Rahmaniani, R. Crainic, T. G., Gendreau, M. and Rei, W. (2017). The Benders Decomposition: A Literature
Review. European Journal of Operational Research, Volume 259, Issue 3, 16 June 2017, Pages 801-817
45. Rahmaniani, R. Crainic, T. G., Gendreau, M. and Rei, W. (2018b). The Asynchronous Benders Decomposition
Method. CIRRELT-2018-07 (January 2018).
46. Rahmaniani, R., Shabbir Ahmed, S., Crainic, T., Gendreau, M, and Rei, W. (2018a). The Benders Dual
Decomposition Method. CIRRELT-2018-03 (January 2018).
47. Ralphs, T. and Hassanzadeh, A. (2014). A Generalization of Benders' Algorithm for Two-Stage Stochastic
Optimization Problems with Mixed Integer Recourse. COR@L Technical Report 14T-005
48. Rush, A. M., Collins, M. (2012). A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference
in Natural Language Processing. Journal of Artificial Intelligence Research, Volume 45, pages 305-362, 2012
49. Ruszczyński, A. (1986). A Regularized Decomposition Method for Minimizing a Sum of Polyhedral Functions.
Mathematical Programming (1986) 35: 309.
50. Ruszczyński, A. (1997). Decomposition Methods in Stochastic Programming. Mathematical Programming,
October 1997, Volume 79, Issue 1–3, pp 333–353
51. Santoso, T., Ahmed S., Goetschalckx, M. and Shapiro, A. (2005). A Stochastic Programming Approach for
Supply Chain Network Design under Uncertainty. European Journal of Operational Research 167 (2005) 96–
115
52. Schultz, R., Stougie, L., and Van Der Vlerk, M. (1998). Solving Stochastic Programs with Integer Recourse by
Enumeration: A Framework using Grobner Basis. Mathematical Programming, 83(1):229-252.
53. Sen, S. and Higle, J. (2005). The C3 Theorem and a D2 Algorithm for Large Scale Stochastic Mixed Integer
Programming: Set Convexification. Mathematical Programming, 104(1):1-20.
54. Sen, S. and Sherali, H. (2006). Decomposition with Branch-and-Cut Approaches for Two-Stage Stochastic
Mixed-Integer Programming. Mathematical Programming, 106(2):203-223.
55. Sherali, H. and Fraticelli, B. (2002). A Modification of Benders' Decomposition Algorithm for Discrete
Subproblems: An Approach for Stochastic Programs with Integer Recourse. Journal of Global Optimization,
22(1):319{342.
56. Sherali, H. and Lunday, B. (2011). On Generating Maximal Nondominated Benders Cuts. Annals of Operations
Research (2011), pp. 1-16
57. Sherali, H. and Zhu, X. (2006). On Solving Discrete Two-Stage Stochastic Programs having Mixed Integer
First-and Second-Stage Variables. Mathematical Programming, 108(2):597-616.
58. Sherali, H. D. and Smith, J. C. (2009). Two-Stage Stochastic Hierarchical Multiple Risk Problems: Models and
Algorithms. Mathematical programming, 120(2):403-427.
59. Trapp, A. C., Prokopyev, O. A., and Schaefer, A. J. (2013). On a Level-Set Characterization of the Value
Function of an Integer Program and its Application to Stochastic Programming. Operations Research,
61(2):498-511.
60. Velásquez, J. M. (1986). Primal-Dual Subrogated Algorithm. White paper
http://www.doanalytics.net/Documents/Primal-Dual-Subrogated-Algorithm.pdf
61. Velásquez, J. M. (1995). OEDM: Optimización Estocástica Dinámica Multinivel. Teoría General. Revista
Energética No. 13 (http://www.doanalytics.net/Documents/OEDM.pdf).
62. Velásquez, J. M. (2018) Benders Decomposition Using Unified Cuts.
http://www.doanalytics.net/Documents/Benders-Decomposition-Using-Unified-Cuts.pdf
63. Velásquez, J. (2019a). Stochastic Programming: Fundamentals. In the book Large Scale Optimization in Supply
Chain & Smart Manufacturing: Theory & Application. Springer 2019.
64. Velásquez, J. (2019b). Stochastic & Dynamic Benders Theory. In the book Large Scale Optimization in Supply
Chain & Smart Manufacturing: Theory & Application. Springer 2019.
65. Velásquez, J. (2019c). The Future: Mathematical Programming 4.0. In the book Large Scale Optimization in
Supply Chain & Smart Manufacturing: Theory & Application. Springer 2019.
66. Yang, H., Gupta, J., Yu, L. and Zheng. (2016). An Improved L-Shaped Method for Solving Process Flexibility
Design Problems. Mathematical Problems in Engineering. Volume 2016, Article ID 4329613
67. Yuan, Y. and Sen, S. (2009). Enhanced cut Generation Methods for Decomposition-Based Branch and Cut for
Two-Stage Stochastic Mixed-Integer Programs. INFORMS Journal on Computing, 21(3):480-487.
68. Zakeri, G., Philpott, A. and Ryan, D. (1999). Inexact Cuts in Benders Decomposition. SIAM Journal on
Optimization Volume 10 Issue 3, 1999 , Pages 643-657
69. Zang, Y., Wang J., Ding, T, and Wang, X. (2018). Conditional Value-At-Risk Based Stochastic Unit
Commitment considering the Uncertainty of Wind Power Generation, IET Generation, Transmission &
Distribution (Volume: 12, Issue: 2 , 1 30 2018)
70. Zaourar, S. and Malick, J. (2015) Quadratic Stabilization of Benders Decomposition. HAL Id: hal-01181273
https://hal.archives-ouvertes.fr/hal-01181273
List of Figures
Figure 1. The Power of Benders Theory
Figure 2. Dual-Angular Matrix
Figure 3. Benders Decomposition Cuts
Figure 4. Speed-up of Multiple/Decoupled Cuts (Zang et al., 2018)
Figure 5. Multi Dual Angular Matrix
Figure 6. Triangular Matrix
Figure 7. Multilevel Nested (Dynamic) Benders
Figure 8. Cobb-Douglas Production Function
Figure 9. Economic Interpretation of Benders Theory
Figure 10. Generalized Benders Decomposition
Figure 11. Gomory Cutting Planes – Integer Convex Hull
Figure 12. Benders-Gomory Cuts
Figure 13. Convergence of MIP/MINLP problems - Vehicle Routing Problem
Figure 14. Enhancements MIP/MINLP Benders Coordinators - Relaxing Coordinator
Figure 15. Relation CPLEX versus Inexact Solutions
Figure 16. Speed-up Combinatorial Benders Cuts
Figure 17. Pareto-Optimal Benders Cuts
Figure 18. Speed-up Benders Cuts: POP, MND, DUMN, CPLEX
Figure 19. Speed-up Parallel Benders Decomposition
List of Tables
Table 1. Why Benders Large Scale Methodologies ?
Table 2. Speed-up Generalized Benders Decomposition. (Cai et al., 2001)
Table 3. Ralphs and Hassanzadeh – Report 2014
Table 4. Benders-Gomory Cuts for Two-Stage Stochastic Integer Subproblems
Table 5. Benders Dual Decomposition (Rahmaniani et al., 2018a)
Table 6. Benders Dual Decomposition versus CPLEX (Rahmaniani et al., 2018a)
Table 7. Speed-up Inexact Solutions (Costa et al., 2012)
Table 8. Speed-up Inexact Cuts
Table 9. NP-hard Problems Solved only by CBC. (Codato and Fischetti, 2006)
Table 10. NP-hard Problems Solved by CPLEX & CBC. (Codato and Fischetti, 2006)
Table 11. Hub Location Problems (Zaourar and Malick, 2015)
Table 12. Network Design Problems. (Zaourar and Malick, 2015)
Table 13. Stochastic Supply Chain Network Design Problem. (Santoso et al., 2005)
Table 14. Speed Up Hamming Trust Region
Table 15. Speed-up Pareto Optimal Cuts (Yang et al., 2016)
Table 16. Job Shop Scheduling Problem. (Jain and Grossmann, 2001)
Table 17. Speed-up Hybrid Benders Cuts. (Jain and Grossmann, 2001)
Table 18. Hybrid Benders Strategies. (Yang et al., 2016)
Table 19. Papers in Parallel Stochastic Optimization. (Linderoth, 2003)