v [dZ } Ç] u Z u ] o } ]u]Ì ]}v methodology as

THEORY, VARIATIONS AND ENHANCEMENTS

JACQUES F. BENDERS

Benders’ Theory is a mathematical optimization methodology as transcendental as the Simplex

Method of George Dantzig.

Ing. Jesus Velásquez-Bermúdez, Dr. Eng.

Chief Scientist

DecisionWare & DO Analytics

[email protected]

Draft Version of the Chapter One of the Book:

Large Scale Optimization in Supply Chain & Smart Manufacturing: Theory & Application

To be Published in the series Springer Optimization and Its Applications

Abril 21/2019

INDEX

mailto:[email protected]

1. Benders Theory

1.1. Framework

1.2. Benders Partition Theory

1.3. Duality Theory & Benders Theory

1.3.1. Dual Coordinator Problem

1.3.2. Sub-Problem Degenerate Solutions

1.4. Benders Decomposition

1.4.1. Standard Bender’s Cuts (SBC)

1.4.2. Decoupled Bender’s Cuts (DBC)

1.4.3. Unified Benders Cuts (UBC)

1.5. Multilevel Benders

1.5.1. Benders' Tri-level Partition Theory

1.5.2. Benders' Multilevel Partition Theory

2. Economic Interpretation

2.1. Taxonomy of Organizations

2.2. Cobb-Douglas Production Functions

2.3. Markets

2.4. Multisectoral Planning

3. Generalizations & Extensions

3.1. Generalized Benders Decomposition

3.2. Benders Integer Linear Subproblems

3.3. Benders Dual Decomposition

3.3.1. Benders Dual Decomposition Theory

3.3.2. Benders Dual Decomposition Implementation

3.4. Logic Based Benders Decomposition

3.5. Partial Benders Decomposition

4. Dynamic and Stochastic Benders’ Theory

5. Coordinator Enhancements

5.1. MIP/MINLP Coordinators

5.1.1. Multi-Phase Coordinator

5.1.2. Modified Optimality Cuts

5.1.3. Inexact Solutions

5.1.4. Inexact Cuts

5.1.5. Combinatorial Benders Cuts

5.2. Trust Region (Regularization)

5.2.1. Neighborhood Bounding

5.2.2. Penalizations Movements

5.2.3. Binary Variables

6. Cuts Enhancements

6.1. Strong Cuts

6.1.1. Pareto Optimal

6.1.2. Other Cuts

6.1.3. Hybrid Cuts

6.2. Hybrid Strategy

7. Benders Parallel Optimization

7.1. Parallel Optimization

7.2. The Asynchronous Benders Decomposition Method

8. Conclusions

J. F. Benders: Theory, Variations and Enhancements

Jesus Velásquez-Bermúdez

Abstract. In 1962, J. F. Benders published his seminal theory in the paper "Partitioning Procedures for Solving Mixed

Variables Programming Problems" oriented to optimization of mixed integer problems (MIP) that was been the origin

of multiples methodologies oriented to solve large-scale problems related with stochastic complex combinatorial

and/or dynamic systems. Since its formulation in 1962, the researchers in Benders Theory (BT) have proven that:

▪ BT is an effective methodology to solve complex problems that cannot be solved using only “best” basic

optimization algorithms (CPLEX, GUROBI, XPRESS, … ).

▪ Algorithms based in Benders’ Theory can solve NP-hard (non-deterministic polynomial-time) problems in

reasonable time; for this type of problems BT has proven to be an effective methodology to solve complex

problems that cannot be solved using “best” mathematical solvers.

▪ BT is a mature methodology that is in the accelerated growing phase

▪ There is a gap between the research in mathematical programming and the application of the large-scale

methodologies in real world solutions.

In this book, there are four chapters oriented to teach about Benders’ Theory they are:

1. J. F. Benders: Theory, Variations and Enhancements

2. Stochastic Optimization and Risk Management: Fundamentals

3. Dynamics and Stochastic Benders Decomposition

4. The Future: Mathematical Programming 4.0

The chapters present a mathematical review of many of the aspects that must be considered to know about variations

and enhancements oriented to: i) expand problems that can be solved based on Benders concepts, and ii) to speed-up

the time of the solution of the complex problems.

1. Benders Theory

1.1. Framework

In 1962, J. F. Benders published his seminal theory oriented to optimization of mixed integer problems (MIP) that was

been the origin of multiples methodologies oriented to solve large-scale problems. The fundamental idea is the partition

of a problem into two subproblems, of reduced complexity, based on the division of the variables like coordination

variables and subordinate variables. The solution of the original problem is obtained by the coordinated solution of

two complementary subproblems: i) the coordinator linked to the coordination variables, and ii) the sub-problem linked

to subordinate variables. The sub-problem provides information to the coordinator by the dual variables associated

with its constraints. The coordinator takes the information from the primary level and incorporates it in the form of

hyperplanes (Benders cuts) limiting the area of feasibility for the optimal solution of the coordination variables.

Benders cuts represent the costs of subordinate variables as a function of the coordination variables. The algorithm

defined by Benders is convergent and solves the original problem through the solution of the coordinator problem; it

applies only to linear subproblems.

The generalization of the Benders Theory (BT) for several typical cases, where the structure of the optimization

problem enables the effective utilization of BT, requires to analyzed three basic cases:

▪ Decomposition Theory: is useful when it is possible to grouping variables subordinated into independent sets to

formulate multiple parallel subordinate subproblems.

▪ Multilevel Theory: is used when there is a multilevel hierarchical relationship within the variables of the problem,

and it is possible inside to subordinate variables to select a new set of coordination variables for establish an

additional level of partition.

▪ Multilevel Decomposition Theory: is the result of the combination of the two previous theories.

The use of these three concepts permits to decompose a mathematical problem in "atoms", in such a way to facilitate

its solution by: i) speed-up the solution time, and ii) reducing the memory requirements.

As a further result, the atomization of the problem allows to work the concepts of these concepts are the base of the

optimization in the future:

i) Asynchronous Parallel Optimization (APO, solve the problem using multiples cores), and

ii) Distributed Real-Time Optimization (DRTO, solve the problem based on the interaction of multiple smart agents

that exchange information continuously, real-time);

Three point of view must be considered to know more about BT:

▪ Mathematical: the original Benders formulation has limitations:

i) BT requires that all subproblems must be linear, which may not apply to several cases in the real-life; then

many researchers have worked on the development of methodologies that allow to apply the concepts of

Benders in cases with nonlinear and/or discrete subproblems.

ii) Although BT is convergent, its speed to find the optimum can be significantly improved, it turned BT into

important complement to basic solvers in the solution of very large problems.

▪ Applications: in a very aggregate manner, Benders applications can be divided into two major groups: i)

combinatorial optimization, and ii) dynamic optimization. This fact has generated research works aimed to

accelerate the solution time of problems considering their specific features.

▪ Uncertainty: BT has been applied to stochastic optimization based on scenarios, which inherently includes the

concept of scenarios-based decomposition.

It is important to note, that, even though the variations and improvements have been made in multiple independent

research studies, it is possible to integrate “all” in a single paradigm in such a way meet those improvements in an

application that are more convenient. To compare methodologies and to present their impact two key performance

index (KPI) are considered: i) solution time, and ii) existing gap between the best-known solution (primal bound) and

the best-possible solution (dual bound), when the solution is optimal, the gap is zero.

There are several possibilities for improvement Benders methodology:

1. Formulating the mathematical problem "properly"

2. Using the appropriate Benders methodology, according to the mathematical problem

3. Modifying the master problem according to the formulation

4. Selecting the correct enhancements to speed-up the solution time; it includes selecting good cuts to add to the

master problem at each step.

5. Making a good selection of initial cuts

6. Using parallel optimization

The literature review conducted by Rahmaniani et. al.

(2017) presents the evidence of the growth of the

importance of BT in recent years, which can be attributed

to massive development and the drop-in prices of PCs

multi-CPUs and GPUs, enabling the environment to take

advantages of atomization/parallelization of optimization

algorithmic procedures.

The graphic, from Rahmaniani et. al, presents the number

of scientific papers related with BT until 2016.

The following table presents a very small summary of

papers showing the gain in speed of the proper use of the

improvements in BT.

ScientificPublications

1962 20161980 1990 2000 201019700

500

1000

1500

Figure 1. The Power of Benders Theory

This leads to conclude that the point of reference to compare the speed of mathematical programming to solve complex

problems are not the basics solvers, the proper reference is the use of large-scale methodologies that make smart use

of these solvers.

Benders Methodology

Case Format SolverPROBLEMS

Time RatioSOLVER/BENDERS Ratio

GAPPaper

SolverSolved

BendersSolved

Total Min Mean Max

Combinatorial BendersCuts

Statistical Classification

0-1 CPLEX

0 10 10 70.8

Combinatorial Benders' Cuts for Mixed-Integer Linear Programming (2006)Gianni Codato & Matteo Fischetti

Map Labelling 0 11 11 3.63

Statistical ClassificationMap Labelling

24 24 24 5 21.1 675

Benders Integer Subproblems

The Stochastic Server Location

MIP CPLEX 7 11 11 3.93 28.6 178 5.65

Decomposition Algorithms with Parametric Gomory Cuts for Two-Stage Stochastic Integer Programs (2014)Dinakar Gade, Simge Kucuyavuz, SuvrajeetSen

Generalized Benders Decomposition

Water Resources Management

NLP

MINOS 4 4 4 4.4 23 39.8 Solving Large Nonconvex Water Resources Management Models using Generalized Benders Decomposition (2001)X. Cai, D. McKinney, L. Lasdon & D. WatkinsCONOP 4 4 4 1.3 5.2 8.5

Benders Strongest Cuts

Dynamically Updated Near Maximal Cuts

Petroleum Product Supply Chain

LP CPLEX 10 10 10 0.4 2.1 4

Accelerating Benders Stochastic Decomposition for the Optimization under Uncertainty of the Petroleum Product Supply Chain (2014)F.Oliveira, I.E.Grossmann, S.Hamacher

Asynchronous Synchronous Parallel

HybridLP CPLEX

8 8 8 20 326 519The Asynchronous Benders Decomposition Method (2018)R. Rahmaniani, T. Crainic, M. Gendreau, W. Rei

Asynchronous 8 8 8 19 287 459

Benders DualDecomposition

Network Design

MIP CPLEX

26 30 35 0.50 1.12 1.16 1.96 The Benders Dual Decomposition Method (2018)R. Rahmaniani, S. Ahmed, T. Crainic, M. Gendreau, Walter Rei

Capacity Location 16 16 16 1.94 3.34 7.12 28

Network Interdiction 0 52 70 2.18 5.00 17.08 22.23

Table 1. Why Benders Large Scale Methodologies ?

There are two of ways to implement the BT enhancements:

▪ Reformulation of the BT and use computer algebraic languages

▪ Direct modification of the flow of the solvers.

This document is concentrated in the first alternative, this does not imply a value judgment with respect to the second.

1.2. Benders Partition Theory

BT considers the problem P: composed by two types of variables: y, the coordination variables, and x, the coordinates.

P: = { min z = cTx + f(y) |

F0(y) = b0 ; Ax + F(y) = b ; xR+ ; yS } (1)

BT restricts the model on x to be a linear problem, while it doesn’t impose conditions on y that may be continuous, or

discrete, and the functions f(y) and F(y) may be linear or non-linear convex functions. The P: problem is partitioned

in two coordinated problems: CYBT: over y and SPBT(y): over x which is defined as

SPBT(y): = { min Q(y) = cTx | Ax = b - F(y) ; xR+ } (2)

The dual problem of SPBT(y): (independent of x) is

DSPBT(y): = { max Q(y) = ()T(b - F(y)) | T A ≤ c ; R } (3)

The coordinator CYBT: on y can be formulated as

CYBT: = { min z = f(y) + Q(y) | F0(y) = b0 ; yS

Q(y) (k)T[b - F(y)] k IT

0 (k)T[b - F(y)] kIN } (4)

where represents the vector of dual variables of the restrictions Ax = b-F(y), IT the set of iterations, an extreme

ray of the feasibility region and IN the set of iterations on which no feasibility was obtained, it implies that DSPBT(y):

has unbounded solution.

Benders proposed the solution of P: by a hierarchical algorithm that works on two levels:

i) The coordination level solves the problem CYBT: and generates a sequence of yk values;

ii) On the second level, yk is used as a parameter of the sub-problem SPBT (y): to generate a sequence of feasible

extreme points, k, and extreme rays, k, of the dual feasible zone of SPBT (y):, these vectors are used to include

cutting planes in CYBT:.

CYBT: includes two types of cuts. The first type, that we call optimality cutting planes (OCP) because the cut eliminates

values of y that can’t be optimal ; it has the following structure

Q(y) (k)T[b - F(y)] k IT (5)

The second type, the feasibility cutting planes (FCP) restricts the feasible zone of y to maintain feasible x in SPBT(y),

it has the following structure

0 (k)T[b - F(y)] kIN (6)

1.3. Duality Theory & Benders Theory

The following aspect must be considered to define the relation between Duality Theory and Benders Theory

1.3.1. Dual Coordinator Problem

For a linear coordinator problem, it can be solved directly in its dual form, we consider the case

LP: = { min z = cTx + f(y) |

F0 y = b0 ; A x + F y = b ; xR+ ; yS } (7)

Considering the primal solution of LP: on each iteration BT incorporates constraints which implies that the dimension

of the basic solution is resized in each iteration. If the dual is solved this would grow in variables (columns generation)

keeping the size of the constant basic solution; then, may be more appropriate to work the model coordinator in the

dual version. If SPBT(y): is ever feasible ("relatively complete recourse") the feasibility cuts may be ignored; the DCY:,

the dual problem of CY: is

DCY: = { min z = b0T 2 + kITE [(k)T b ]T 3

k |

kITE 3k = 1

F0T 2 + kITE [(k)T F ]T 3

k c } (8)

The basic characteristics of this problem are:

▪ The dimension of 2 is equal to the dimension of b0, it corresponds to the dual variables of constraints F0(y) = b0

▪ The dimension of 3k is equal to the dimension of b and corresponds to the dual variables of the optimality cuts


k + kITN [(k)T b ]T 4k |

kITE 3k = 1


k + kITN [(k)T F ]T 4k c } (9)

For the optimal solution of general problem P: the value of the objective’s functions of the primal and the dual problems

must be equal, that implies:

f(y) + Q(y) = b0T 2 + bT (10)

Considering that 2 corresponds to the dual variables of F0(y) = b0 , the dual variables () of the constraints Ax +

F(y) = b must satisfy

bT = bT kITE k 3k ]

= kITE k 3k

kITE 3k = 1 (11)

then is convex combination of dual variables of the optimality cuts. This expression implies that cuts generated by

the sub-problem in the coordinator, may be replaced by one single cut, generated based on the subrogation of all cuts,

where weights for each cut are established based on dual variable associated to the cut. This is consistent with the

results presented by different studies on the theory of Subrogate Mathematical Programming (SMP) (Greenberg &

Pierskalla, 1970) (Velasquez, 1986).

If we include feasibility cuts the structure of dual problem DCY is:


k + kITN [(k)T b ]T 4k |

kITE 3k = 1


k + kITN [(k)T F ]T 4k c } (12)

where 4k exists for those cases in which wasn’t found feasible solution to the subproblem, its dimension is equal to

the dimension of b; the dual variables of constraints Ax + F(y) = b are

= kITE k 3k + kITN k 4

k (13)

1.3.2. Sub-Problem Degenerate Solutions

In a normal problem any change in the vector of resources (b - Fy) implies change in the value of the objective function;

then the Lagrange multipliers (dual variables) must be different from zero (0). The degeneration occurs when the

objective function is collinear with one of the active constraints; then, any infinitesimal change in some of the

components of the resources vector doesn’t imply change in the objective function value. Then, when the linear

subproblem SPBT(y): is degenerated, it implies that may has multiple dual solutions; hence a subproblem may generate

multiple optimality cuts. In this case many of the component of the vector of dual variables may be equal to zero.

Since the addition of “empty” cuts makes CYBT: harder to solve.

Magnanti and Wong (1981) proposed a seminal methodology to accelerate convergence of BT by strengthening the

generated cuts (pareto-optimal cuts). This case will be studied in a posterior numeral.

1.4. Benders Decomposition

When a problem P: has a dual-angular matrix structure that includes sub-problems diagonal matrix is possible the use

BT for its solution. P: has angular dual-diagonal structure when it can be expressed as:

P: = { Min z = i=1,N ciTxi + f(y) |

F0(y) = b0

Ai xi + Fi(y) = bi , i=1,N

xiR+ , i=1,N , yS } (14)

The matrix has the following structure (figure 2).

..

..

.

CoordinationVariables

x1 x3x2 xNy

..Coordination

Variables

x1 x3x2 xNy

DIAGONAL MATRIX

Figure 2. Dual-Angular Matrix

The index i is associated with sub-problems (areas related to industrial sectors, or to geographic areas, or to periods,

or to realizations of a stochastic process, or to a combination of them); then: i) y may be associated to

consumption/production of common resources, or to transfer of resources between areas, and ii) xi to the operation

within the area of action of the index i. The previous structure allows to break down the problem into multiple

subproblems such as shown below (figure 3).

Minx i diTxi |

Wixi = hi - Tiy

xi Si ; i1,N

Minx cTy + q |

Ay = b

q i ik (hi – Ti(y)) k 1,ITE

y R+

y1k

ik

ik

Minx diTxi

Wixi =

hi – Ti(y)

xi Si

yik

Minx diTxi

Wixi =

hi - Tiy +

xi Si

Minx diTxi

Wixi =

hi - Tiy +

xi Si

1k

i=1 i=W

Minx cTy + i qi |

Ay = b

qi ik (hi – Ti(y)) , i=1,N , k=1,ITE(i)

y R+

SBC Standard Bender’s Cuts

General

ik

Minx diTxi

Wixi =

hi - Tiy +

xi Si

Minx cTy + i qi |

Ay = b

qi ik (hi – Ti(y)) , i=1,N , k=1,ITE

y R+

DBCDecoupled Bender’s Cuts

General

UBCUnified Bender’s Cuts

Periods Random Scenarios

diTxi

Wxi = hi – Ti(y)di

Txi

Wixi = hi – Ti(y)

diTxi

Wixi = hi – Ti(y)

Figure 3. Benders Decomposition Cuts

There are, at least, three alternatives for implementation of the Decomposition Benders Theory:

▪ Standard Bender’s Cuts (SBC): It corresponds to the BT basic methodology, that resolves a single subproblem

that integrates all the subproblems and generated only one cut in each iteration.

▪ Decoupled Bender’s Cuts (DBC): It corresponds to a variation of BT that solves in each iteration a small

subproblem for each index i. DBC solves N problems and generates one decoupled cut for each index i; the cuts

are coupled in the objective function;

▪ Unified Bender’s Cuts (UBC): When the mathematical conditions are met, it corresponds to a variation of BT that

resolves in each iteration a subset of N subproblems and generates N decoupled cuts for each index i, the cuts are

coupled in the objective function. This type of cuts may be applied for i-indexes associate to periods or to random

scenarios, or a combination of both.

1.4.1. Standard Bender’s Cuts (SBC)

P: can be solved using the BT theory directly; y corresponds to the variables of coordination and xi to the coordinated

variables. Define Q(y) as the optimum value of the objective function corresponding to the problem over all xi, for a

given value of y

Q(y) = {min z = i=1,N ciTxi | Aixi = bi - Biy , i=1,N ; xiR+ , i=1,N } (15)

Using directly BT coordinator problem x is

CY: = { Min z = f(y) + Q(y) |

F0(y) = b0 , yS

Q(y) i=1,N (ik)T[bi - Fi(y)] , k=1,ITE

0 i=1,N (ik)T[b - F(y)] , k=1,ITN } (16)

where i represents the dual variables of the i-th set of restrictions and i the extreme rays for not feasible solutions.

The associated subproblem integrates all the xi.

1.4.2. Decoupled Bender’s Cuts (DBC)

Birge and Louveaux (1988) developed a multi-cut enhancement to BT, in which a separate optimality cut is constructed

for each subproblem considering that it is possible to decouple the subproblem to formulate the function Q(y) as the

sum of N functions Qi(y) each corresponding to a subproblem on xi.

Qi(y) = { Min ciTxi | Aixi = bi - Fi(y) ; xiR+ } (17)

Q(y) is equal to

Q(y) = i=1,N Qi(y) } (18)

The SPi(y) problem to calculate Qi(y) is formulated as

SPi(y): = {Min Qi(y) = ciTxi | Aixi = bi - Fi(y) ; xiR+ } (19)

and its dual problem

DSPi(y): = {Max Qi(y) = iT [bi - Fi(y)] | i

TAiT ci

T } (20)

As in the previous case, based on the theory of duality is known that

Qi(y) iT [bi - Fi(y)] (21)

fulfilling equality only for optimal i*. The model coordinator CY: can be formulated as

CY: = { Min z = f(y) + Q(y) |

F0(y) = b0 ; yS

Q(y) = i=1,N Qi(y)

Qi(y) (ik)T[bi - Fi(y)] , i=1,N , k=1,NTE(i)

0 (ik)T[bi - Fi(y)] , i=1,N , k1,ITN(i) } (22)

where ITE(i) represents the set of iterations in which the feasibility of SPi(y): has been achieved and ITN(i) of the

iteration in which the feasibility has not been achieved. This type of cuts is called Decoupled Benders Cuts (DBC).

The advantages of the decomposition approach are:

1. The original formulation does not consider the possibility of decomposition assuming a problem integrated to the

xi. Under the decomposition scheme, in the lower level a subproblem, of reduced complexity, is resolved by each

item associated with the index i.

2. In the original scheme, a single cut, that integrates all dual variables from all subproblems, is generated by each

iteration. The proposed formulation generated S decoupled cuts, one for each SPi(y):, that are coordinated by the

equation (9) that defines to Q(y). The difference is that a single cut act limiting the maximum of a summation,

and the decoupled cuts act limiting the summation of S maximums, which is a deeper condition;

3. In the decoupled the system, the information provided by each subproblem is independent of the others and there

isn’t a reason that obliges in an iteration, between coordinator and subproblems, to solve all subproblems. This

feature allows to implement solution schemes that only solve those subproblems which provide “more info”.

Although these advantages, it is convenient to consider that DBC may increase the computational effort required to

solve the master problem when it is MIP. Zang et al. (2018) present the impact of the multiples cuts in a stochastic

model in the electric sector, in two different experiments (figure 4.).

Figure 4. Speed-off of Multiple/Decoupled Cuts (Zang et al., 2018)

1.4.3. Unified Benders Cuts (UBC)

Unified Benders Cuts (UBC) theory corresponds to a case in which the sub problems SPi(y): belong to a family of

problems characterized by its matrix/vector elements; its application is possible when the index i is associated to

periods and/or random scenarios in a stochastic process. UBC are included by Chen and Powell (1997) in the CUPPS

algorithm and by Velasquez (2018) in the G-DDP algorithm. This topic is analyzed in a posterior chapter.

1.5. Multilevel Partition Benders

Velasquez (1995) study problems in which more than one level of coordination can be defined hierarchically. That is,

inside of a set of subordinated variables exist a relationship such that some variables function as coordinators of the

others (multi-dual angular matrix). For ease of presentation, first we analyze a case of three levels and subsequently

generalized the results to S levels (figure 5). The following sections only consider optimality cuts (OBC).

z xy

..


HigherLevel

..Sub-problem

Variables

Linear Model

z xy

..


HigherLevel

..


IntermediateLevel

Sub-problemVariables

Lower Level

Linear Model

Linear Model

Figure 5. Multi Dual Angular Matrix

1.5.1. Benders' Tri-level Partition Theory

Consider the problem P: which can be partitioned into three hierarchical levels

P: = { Min cTx + eTw + f(y) |

F0(y) = b0

GW w + FZ(y) = bZ

A x + G w +F(y) = b

xR+ ; wR+ ; yS } (23)

where y corresponds to the general coordination variables, and w and x are coordinate variables by y; at the same time,

w may act as coordinator of x, once is defined the value of y. Appling BT, the first level coordinator model is

CY: = { Min w = Q(y) + f(y) |

F0(y) = b0 ; yS

Q(y) (xk)T(b - F(y)) + (w

k)T(bw - Fw(y)) , K=1,ITE } (24)

where x corresponds to the dual variables vector of the constraints Ax + Gw = b - F(y), w to the dual variables vector

of the constraints Gww = bw - Fw(y) and ITE to the number of cuts that have been generated from the lower level. Q(y)

is the sum of two functions Qx(y) which estimated the value of the cTx cost and Qw(y) for the cost of eTw.

Qx(y) = (x )T(b - F(y)) (25)

Qw(y) = (w)T(bw - Fw(y)) (26)

Consider the subproblem coordinated by y for {x,w} that provides feasible values for x and w

SP1(y) = { Min Q(y) = cTx + eTw |

Gww = bw Fw(y)

Ax + Gw = b - F(y)

xR+ ; wR+ } (27)

The dual of SP1(y): is

DSP1(y) = { Max xT(b - F(y)) +w

T(bw - Fw(y)) |

xTA cT

xTG + w

TGx eT } (28)

Since w coordinates to x is possible to solve SP1(y): using BT. Let us consider the coordinator problem on w

conditioned at a value of y

CW(y): = { Min eTw + W(w|y) |

Gww = bw - Fw(y) ; wR+

W(w|y) = {Min cTx | Ax = b - F(y) – Gw ; xR+} } (29)

W(w|y) function corresponds to the cost cTx(w|y) as a function of w when y is defined by the first level coordinator.

The subproblem for x is

SP2(w|y) = { Min W(w|y) = cTx | Ax = b - F(y) - Gw ; xR+ } (30)

The second level coordinator model CW(y): is formulated based on BT

CW(y): = { Min eTw + W(w|y) |

Gww = bw - Fw(y) ; wR+

W(w|y) (n)T(b - F(y) - Gw) , n=1,ITEx } (31)

where n represents the n-th vector of dual variables of restrictions Ax = b - F(y) - Gw that has been generated by

SP2(w|y):, and ITEx the total number of cuts.

The coordinator problem CW(y): and the coordinated problem SP1(y): are equivalent. For purposes of coordination

in CY:, x and w must be determined from the CW(y): solution. So, consider the dual problem of CW(y):

DCW(y): = { Max [n=1,ITEx q(n) n]T(b - F(y)) + wT(bw - Fw(y)) |

[n=1,ITEx q(n) n]TG + wTGw eT

n=1,ITEx q(n) = 1

q(n)R+ , n=1,ITEx } (32)

where q(n) is a component of the vector and corresponds to the dual variable of the n-th cut generated by the

subproblem SP2(w|y): dual:

T = { q(1), q(2), ... ,q(ITEx-1), q(ITEx) } (33)

In vector notation DCW(y): can be expressed as

DCW(y): = { Max T(ITEx)T (b - F(y)) + wT(bw - Fw(y)) |

T(IITEx)TG + wTGw eT

T 1 = 1 ; R+ } (34)

where k represents the matrix of all dual variables vectors that have been generated until the iteration k of CW(y):.

k = {1, 2, ... , k-1, k } (35)

and 1 corresponds to a vector with all its components equal to 1.

Given that DSP1(y): and DCW(y): are equivalent, it is possible prove that xk is a weighted sum of the dual variables

vector generated by SP2(w|y):, using as weighting factor the dual variables associated with the coordinator CW(y):

xk = n=1,ITEx(k) q(n) n = T ITEx(k) (36)

where ITEx(k) is the number of cuts that has been generated by SP2(w|y): in CW(y):, until the iteration k of CW(y):.

If Surrogate Mathematical Programming (SMP) is considered it is possible to take advantage of this relationship. SMP

prove that a set of constraints can be replaced by an equivalent restriction, generated from a convex combination of

restrictions, provided that the weights are collinear with the Lagrange multipliers for each constraint. Based on this

fact, the cuts generated by SP2(w|y): may be replaced by an equivalent subrogated cut (SBC) based on the subrogation

of all cuts, where the weights correspond to the dual variables associated with each cut. This occurs every time that

CW(y): gets an optimal point {x(y), w(y)} and returns a vector of dual variables to CY:.

The SBC synthesizes the information that has been processed in CW(y):. This may prevent that the number of cuts

coming from the lower level to exploit as advanced the optimization process, since whenever it begins a cycle of

optimization in CW(y): all generated cuts may be replaced by equivalent SBC that preserves the use of the memory

of the system.

The definition of xk is general to calculate in any coordinator the dual variables of the restrictions which are not

explicitly considered in it and which are managed at lower hierarchical levels. The dual variables of these restrictions

correspond to the subrogated vector of dual variables of the sub-problem. For the coordinator of higher level, they

correspond to the dual variables in the solution of the problem.

1.5.2. Benders' Multilevel Partition Theory

The extension of this theory, for cases in which exist more than two levels of coordination is direct. Each coordinator

of a lower level generates a cut to the top-level coordinator, summarizing information based on the subrogated vector

of dual variables; and on the lower level it may replace all cuts which so far have been used to generate the optimal

partial solution. In the case of S levels consider the problem P:

P: = { Min i=1,S ciTXi + f(y) |

F0(y) = b0 ;

AiXi + q=1,i-1 Ei,qXq + Fi(y) = bi i=1,S ;

xiR+ i=1,S ; yS } (37)

where y corresponds to the variables of coordination of first level, level 0, and xi to the variables of level i. Coordination

xS corresponds to the lower level, or primary level. The matrix of P: has a triangular structure in blocks (figure 6).

Linear Model Linear Model Linear ModelDiscrete

Linear Model Linear Model Linear Model

y x1 x2 x3 xs-1 xS

. . . . .

F0

F1

F2

F3

FS-1

. . . . .

FS

A1

A2

A3

AS-1

. . . . .

AS

. . . . . . . . . .

E2,1

E3,1

ES-1,1

ES,1

E3,2

ES-1,2

ES,2

. . . . .

. . . . . ES,S-1

Figure 6. Triangular Matrix

The model level 0 is

Cy: = { Min z = Q(y) + f(y) |

F0(y) = b0 ; yS

Q(y) i=1,S (i,1k)T(bi - Fi(y)) k=1,ITE } (38)

The coordinator associated with the variables xi, for i between 1 and S-1, is

Cxi(y,x1,x2, ... ,xi-1): = { Min ciTxi + Wi(xi|y,x1,x2, ... ,xi-1) |

Aixi = bi - q=1,i-1 Ei,qxq - Fi(y) ; xiR+

Wi(xi|y,x1,x2,...,xi-1) q=i+1,S (q,i+1)T(bq - Eq,ixq-1) k=1,ITEX(i+1) } (39)

where the q,ik corresponds to the subrogated vector of dual variables in level i associated with level restrictions q, and

complies with

q,ik = n=1,ITEX(i,k) qi(n) q,i+1

n = (ik)T q,i

k (40)

where qi(n) is a component of the vector i and corresponds to the dual variable of n-th cut generated by the

subproblem CXi+1(y,x1,x2, ... ,xi):

iT = { qi(1), qi(2), ... , qi(ITEX(i+1)-1), qi(ITEX(i+1)) } (41)

and the matrix ik groups the subrogated vectors of dual variables dual that have been generated in level i to the

iteration k of CXi(y,x1,x2, ... ,xi-1):, being ITEX(i,k) the total number of cuts.

q,ik = {q,i

1, q,i2, ... , q,i

ITEX(i,k)-1, q,iITEX(i,k) } (42)

The dual variables corresponding to the functional restrictions of the level i in the coordinator i are i,i

The subproblem primary SPS(y,x1,x2, ... ,xS-1): is

SPS(y,x1,x2, ... ,xS-1): = { Min cSTxS | ASxS = bS - q=1,S-1 ES,qxq - FS(y) ; xSR+ } (43)

It is equivalent to a coordinator of level i, evaluated for i equal to S, without include the cuts and the W() function.

Given the above equivalence, the formulation of the algorithm is performed in terms of problems coordinators.

In the original way to implement multilevel theory, each hierarchical level returns to the top level only when it has

obtained the optimal solution to the problem that is parameterized by the decision vector y, preset at higher levels, this

implies that at lower levels nested cycles is performed for the upper levels.

If the methodology is applied to a dynamic problem (figure 7), the result is like the so-called Nested Benders

Decomposition (NBD) theory; but multilevel theory lees restrictive because permits relation between two periods no

consecutives.

Miny f(y) + Q(y) |

F0 (y) = b0

y S

Q(y) qkk (b - F(y)) + zk (bz - Fz(y)) k=1,NP

Minz eTz + Qz(z) |

Gz z = bz - Fz(y)

z R+

h G z + Qz(z) h (b - F(y) ) h=1,NPx

Minx cTx |

A x = b - F(y) - G z

x R+

zk qkk

h

y

yz

LP

LPMIPNLP

MINLP

LP

Time

t=1

t=2

…t=S

Figure 7. Multilevel Nested (Dynamic) Benders

1.6. Multilevel Partition & Decomposition Benders Theory

There are many cases in which the combination of Partition and Benders Decomposition theories can be applied to

atomize large problems, speeding up the solution time. This topic will be studied in Chapter “The Future: Mathematical

Programming 4.0” (Velasquez 2019).

2. Techno-Economic Interpretation

There are several aspects that should consider when interpreting BT as a simple mathematical artifice instead

conceptualize it as a systemic vision of the organizations and business/industrial processes. These interpretations are

general of large-scale methodologies, they don’t depend of BT.

2.1. Taxonomy of Organizations

From the economic point of view, large scale methodologies allow to analyze systems at micro and macro level.

Holmerg (1995) analyzes the relationships between mathematical structures encountered in optimization problems and

in organizational structures; and it makes a parallel between organization charts and the flow of information and

hierarchical algorithms, which give rise to different interpretations depending on the mathematical methodologies.

A coordinator problem resembles the functions of headquarters that interact with the coordinated subproblems that

represent the subsidiaries, either by setting prices to use common resources (dual or "price-directive decomposition",

like Lagrangean Relaxation, LR) or by fixing the level of activities common to all dependencies (primal or "resource-

directive decomposition", like Benders Decomposition).

The proposals made by the "headquarters" are analyzed by subsidiaries who generate new information, determining

the level of activity in LR, and specifying the marginal costs/benefits in BT. Based on the information obtained in the

subsidiaries, the "headquarters" made a new proposal, fixing new prices or allocation new quantities.

Then, Benders coordinator represents an authority (a manager, a system operator, … ) that assign resources for many

agents in a market or in a supply chain. Then, the first level defines the vector y of resources assigned to the agents

(sectors, factories, departments, … ); the second level is a subproblem related with the agents that generates

information about the resources marginal productivity for each resource for each agent, or the prices that can pay the

agents for the resources, represented by the dual variables.

2.2. Cobb-Douglas Production Functions

In economics, the Cobb-Douglas (1928) function is a production function (Q), widely used to represent the relationship

between several final products and the use of technology inputs (T), labor (L) and capital (K). This type of aggregate

modeling permits to estimate a country's production function, as well is the expected economic growth, that is

Q = f (K, T, L, … ) (42)

This concept is also applicable to major industries. The

following graph, taken from Wikipedia, presents a typical

Cobb-Douglas function (figure 8), dependent on the (T)

labor and capital (K). In the case of an industrial system,

which produces a single product, the "optimal"

production function can be constructed from the

parametric analysis of an optimization model that

minimizes production cost for different values of the

quantity produced (Q); the solution of the problem fixes

the optimal quantities of resource (T, L, K) to be used.

Two functions are essential in the decision-making

process: i) the total cost and ii) the marginal cost; the cost

of production is the result of integrating the marginal cost

function from zero up to Q.

Source:https://www.econowiki.com

Source

Labor

Production

Figure 8. Cobb-Douglas Production Function

The marginal cost is associated to dual variable of

demand/production (Q) constraint, for linear systems

(such as the BT linear subproblems), it corresponds to a

step function whose integral is a function defined by the

intersection of the hiperplanos associated with each step

of the marginal cost function. For a real model, that has

hundreds, or thousands of final products is impossible to

define, explicitly, the production function; the

mathematical optimization models allow to determine

points of the production function corresponding to

optimal solutions for certain conditions of the production

environment.

f(x) Q(y2)

f(x) Q(y1)

f(x) Q(y3)

f(x) Q(y4)

BENDERS CUTS

f(Capital , Workforce ) Q(Production1 )

f(Capital , Workforce ) Q(Production2 )f(Capital , Workforce ) Q(Production3)f(Capital , Workforce ) Q(Production4)

f (Capital , Workforce , …)

Production

Figure 9. Economic Interpretation of Benders Theory

COBB-DOUGLAS APPROXIMATE PRODUCTION FUNCTION

Then, in BT the subproblem of the second level is associated to a linear production system; then the set of Benders

cuts represents partially the total cost function, for a full representation would be necessary to know all de Benders

cutting planes. In this case, the main activity of decision makers will be to find the part of the total cost function which

allows to make the best decision, so the decision-maker has a “smart oracle” (the subproblem) that answers its

questions about marginal costs.

2.3. Markets

An issue of special importance in the markets, for example the electricity market, is the analysis of market clearing in

which interact multiple agents, in this case the structure of the system could be conceptualized as presents the following

diagram.

The mathematical problem that must be solved can be formulated as

P: = { Min z = a=1,A caTxa + dTw |

D w + a=1,A Ba ya = bDEM

Aa xa + Fa ya = ba a=1,A

xaR+ a=1,A ; yaR+ a=1,A ; wR+ } (42)

where the vector xa represents the agent decisions, ya the purchasing decisions of the Independent System Operator

(ISO) and w the no-attended demand; D, Ba, Aa and Fa are matrix that represent the topology and the technology of

the agents, bDEM represents the demand of the market and ba the resources of each agent; ca represents the cost of the

agent a and d the deficit cost. This problem may be solved using Benders’ decomposition.

From the economic point of view, the set of hyperplanes that limit in the coordinator the variable Qa, defines the supply

function of agent. Then, BT can be interpreted as a conversation between the ISO and each agent to get information

that will allow ISO to build the supply functions of each agent; if ISO knows in advance the supply functions, it would

not require this conversation and it can determine the optimum strategy without resorting to an integral problem

involving explicit modeling of each agent, the ISO only requires to ask the agents by the marginal cost, price of offer,

for an amount ya. This corresponds to an auction process in which the ISO must obtain the lower cost to clear the

market. The following graph presents the decomposition scheme.

2.4. Multisectoral Planning

In the Chapter “The Future: Mathematical Programming 4.0” (Velásquez, 2019c) is presented some ideas about the

importance multisectoral planning using multilevel Benders Partition.

3. Generalizations & Extensions

This section introduces extensions of the Theory of Benders aimed to extend its use to non-linear subproblems.

3.1. Generalized Benders Decomposition

3.1.1. Basic Theory

Geoffrion (1972) generalized Benders approach (GBD, Generalized Benders Decomposition) to a broader class of

programs in which the subproblem need no longer be a linear program. Nonlinear convex duality theory was employed

to derive the equivalent master problem. In GBD the algorithm alternates between the solution of relaxed master

problems and convex nonlinear subproblems. GBD considered the optimization problem P: composed of two types of

variables: y corresponding to the coordination variables and x to the coordinated variables.

P: = { max f(x, y) | G(x, y) 0 ; xX ; yY } (43)

To convergence the GBD restricts the model on x to a convex problem for any value of yY, this space may be

continuous or discrete. The problem P: can split into two subproblems one over y and another on x. If we defined v(y)

as the optimal value of the objective function corresponding to the problem on x for a given value of y:

v(y) = { maxx f(x, y) | G(x, y) 0 ; xX } (44)

Then, it is possible to formulate a problem equivalent CY:

CY: = { max v(y) |

yY ; y ;

v(y) = { maxx f(x, y) | G(x, y) 0 ; xX } (45)

where corresponds to the set of y for which there is feasible solution x to P: this is

= { y | G(x, y) 0 ; xX } (46)

The subproblem SP(y): to evaluate v(y) is

SP(y): = { maxx f(x, y) | G(x, y) 0 ; xX } (47)

Consider the Lagrangean function of SP(y):

L*(x, | y) = f(x, y) - G(x, y) (48)

where corresponds to the Lagrange multipliers vector, which in accordance with conditions Karush-Kuhn-Tucker

(KKT, Karush 1939, Kuhn and Tucker 1951) should be positive and equal to

= v(y) (49)

At the optimal point of SP(y): the Lagrangean function is a maximum with respect to x. Additionally, v(y) is equal to

v(y) = L*(x, | y) = f(x, y) - G(x, y) ) (50)

All points (xk,k), optimum-feasible in SP(yk):, obtained for any value yk, must satisfied

v(y) L*(xk, k | y) = f(xk, y) - k G(xk, y) (51)

Therefore, the problem CY: can be written as

CY: = { max v(y) |

yY ;

v(y) L*(xk, k|y) = f(xk, y) - k G(xk, y) kIF } (52)

where IF represents the set of optimum-feasible points which are known as a result of resolving SP(Y):.

SP(yk): can has three possible solutions: i) unbounded, ii) feasible and optimal, and iii) unfeasible. In the event of an

unbounded solution in SP(yk): it can be concluded that P: is also unbounded. If SP(yk): has feasible and optimal

solution, then it provides information to generate an optimality cut in the feasible zone of y, this cut has the form

v(y) f(xk, y) - k G(xk, y) (53)

If SP(yk): hasn’t feasible solution, a cut should be included for reasons of the relationship between the area of feasibility

of y and the area of feasibility of x. The non-feasibility implies that it is not possible to satisfy

G(x, yk) 0 (54)

for all the vectorial functions that define the constraints. This means that for at least one restriction

gi(x, yk) 0 (55)

The condition of feasibility that should be imposed on y is expressed as:

L*(x, k|y) = supremeX ()T G(x, y) 0

= i = (56)

The GBD can be applied to an integrated subproblem or to a subproblem with several problems of the same hierarchy.

The rules for the decomposition are like the principles considered in standard Benders decomposition (figure 10).

DYNAMIC

Maxx

fi (xi,yk)

|

Gi(x,yk) 0

xiX

yki

k

Maxx

fi (xi,yk)

|

Gi(x,yk) 0

xiX

Maxx

fi (xi,yk)

|

Gi(x,yk) 0

xiX

1k N

k

i=1 i=m

(ik)T Gi(xi

k,y) 0

ik k i

k =

kINF(i) , iN

qi(y) f(xik,y) - i

k G(xik,y)

kIF(i), iN

MaxY i qi(Y) |

y Y

xNk

xikx1

k

LP

LPMIPNLP

MINLP

STATIC

ykPrimal

Variables

k

xk

Dual &Primal

Variables

MaxY q(Y) |

y Y

q(y) f(xk,y) - k G(xk,y)

kIF

(k)T G(xk,y) 0

k k ki =

kINF

Maxx f(x,yk)

|

G(x,yk) 0

xX

Figure 10. Generalized Benders Decomposition

To show the efficiency of GBD, Cai et al. (2001) applied GBD to solve Nonconvex Nonlinear Programming (NLP)

problems arise frequently in water resources management (e.g., reservoir operations, groundwater remediation, and

integrated water quantity and quality management). Such problems are usually large and sparse. Existing software in

2001, for global optimization cannot cope with problems of this size, while current local sparse GAMS NLP solvers,

MINOS or CONOPT, cannot guarantee a global solution. Cai et al. implemented the GBD using a cuts approximation

proposed by Floudas et al. (1989) and Floudas (1995); they introduce slack variables, penalizing these slacks in the

objective function. If the complicating variables are carefully selected, GBD leads to solutions with excellent objective

values in run times much less than GAMS using a non-linear solver. They concluded that GBD can be used to search

for at least approximate global solutions to models with nonlinear and nonconvex constraints. The following table

shows the comparison, the mean speed-up of GBD is 23.0 to MINOS and 5.2 to CONOP (table 2).

Table 2. Speed-up Generalized Benders Decomposition (Cai et al., 2001)

Case GBD MINOS-5 CONOP-2

Mean Ratio Mean Ratio

Case 4-1 20.5 70.6 3.4 25.9 1.3

Case 4-2 18.6 739.7 39.8 136.6 7.3

Case 4-3 23.9 536.6 22.5 202.7 8.5

Case 4-4 19.8 523.9 26.5 74.8 3.8

MEAN 20.7 467.7 23.0 110.0 5.2

3.2. Benders Integer Linear Subproblems

One of the methods used to incorporate the ability to use the BT with integer subproblems is the incorporation of

Gomory Cutting Planes (GCP).

The method of Gomory (1958, 1960) to generate cutting

planes is a procedure for obtain integer solutions using a

modify continuous linear algorithm; it works solving,

initially, a continuous linear problem; then, checking if

the solution founded is an integer solution then it is the

optimal integer solution; if it is not integer, a new

restriction (GCP) that cuts the continuous solution

obtained is added, but GCP doesn’t cut any integer point

of the original feasible region. This is repeated until an

integer solution is get it. GCP redefine the feasible

continuous zone by an "integer convex hull" containing

all the integer solutions, and whose extreme points, in the

optimal correspond to an integer solution.

Source:https://www.semanticscholar.org/

Figure 11. Gomory Cutting Planes – Integer Convex Hull

In the case of subproblems with integer/binary variables we can use Gomory concepts to build a linear continuous

sub-problem equivalent to the MIP sub-problem. In the solution of the sub-problem, in the first step is resolved the

linear problem relaxing the integer/binary character of the variables; if the solution is integer the control returns to the

coordinator problem, if it is not, GCP are introduced; when the integer solution is gotten a Benders cutting plane is

generated in the coordinator problem. The values of the sub-problem dual variables are valid because was generated

by an equivalent continuous problem, and the generated Benders cut is valid. The figure 12 presents a flowchart of the

implementation.

Miny f(y) + Q(y)

|

F0 (y) = b0

y S

Q(y) k (b - F (y)) k=1, ITERATIONS

ykPrimalVariables

xkE

Gk+1 x = dGk+1

Min cTx

|

A x = b - F (y)

x R

Gk x + Hk W = dGk

DualVariables

No

xkPrimalVariables

GENERATEGOMORY CUTS

GomoryCuts

Yes

BENDERSSUBPROBLEM

LPMIPNLP

MINLP

LP

k

Figure 12. Benders-Gomory Cuts

There are several algorithms to solve MIP subproblems, Ralphs and Hassanzadeh (2014) (table 3) presents a review

of the main approaches made until 2014 on solution of two-stage stochastic mixed linear programming (2SLP:) models

that has mixed and/or binary variables, including the case of a pure integer second-stage problem.

2SLP: = { Min z = fTy) + i=1,N qi ciTxi |

Ai xi = bi - Fi y , i=1,N

xiR+ , i=1,N ; yS } (57)

Table 3. Ralphs and Hassanzadeh – Report 2014

Paper

1st stage 2nd stage Stochasticity

Real Integer Binary Real Integer Binary Matrix

Ai Matrix

Fi RHS

bi Cost

ci

Laporte and Louveaux (1993)

Care and Tind (1997)

Caroe and Tind (1998)

Caroe and Schultz (1998)

Schultz et al. (1998)

Sherali and Fraticelli (2002)

Ahmed et al. (2004)

Sen and Higle (2005)

Sen and Sherali (2006)


Paper

1st stage 2nd stage Stochasticity

Real Integer Binary Real Integer Binary Matrix

Ai

Matrix

Fi

RHS

bi

Cost

ci

Sherali and Zhu (2006)

Kong et al. (2006)

Sherali and Smith (2009)

Yuan and Sen (2009)

Ntaimo (2010)

Gade et al. (2012)

Trapp et al. (2013)

Ralphs and Hassanzadeh (2014)

Gade et al. (2014) (table 4) presents the speed-up provided by the its proposed methodology compared with CPLEX

solver, the problem application was the Stochastic Server Location Problem (SSLP). It is evident that Benders-ISP is

faster than CPLEX.

Table 4. Benders-Gomory Cuts for Two-Stage Stochastic Integer Subproblems

Problem CPLEX Benders-ISP

Time/Ratio Time (secs) GAP (%) Time (secs) GAP (%)

SLP-5-25-50 2.03 0 0.18 0 11.28

SSLP-5-25-100 1.72 0 0.22 0 7.82

SSLP-5-50-50 1.06 0 0.27 0 3.93 SSLP-5-50-100 3.56 0 0.48 0 7.42

SSLP-5-50-1000 212.64 0 2.88 0 73.83

SSLP-5-50-2000 1020.54 0 5.73 0 178.10

Mean 206.93 0 1.63 0 127.21

SSLP-10-50-50 801.49 0.01 109.20 0.02 7.34 SSLP-10-50-100 3667.22 0.10 218.42 0.02 > 16.79

SSLP-10-50-500 3601.32 0.38 740.38 0.03 > 4.86

SSLP-10-50-1000 3610.06 3.56 1615.42 0.02 > 2.23 SSLP-10-50-2000 3601.55 18.59 2729.61 0.02 > 1.32

Mean 3056.33 0.04528 1082.61 0022 > 2.82

3.3. Benders Dual Decomposition

Rahmaniani et al. (2018a) formulate the called Benders Dual Decomposition (BDD), for it they apply Lagrangean

duality to subproblem reformulation to price out the coupling constraints (z = y) that link the local copies z to the

master variables y. This allows to impose the integrality requirements on the copied variables to obtain MIP

subproblems, which are comparable to those defined in Lagrangean Dual Decomposition (LDD, Ruszczynski 1997,

Rush and Collins 2012, Ahmed 2013).

3.3.1. Benders Dual Decomposition Theory

The sub-problem considered in BDD is

SPBDD(y): = { min Q(y) = cTx | Ax = b - F(z) ; xR+ ; F0

(z) = b0 ; z = y ; zR } (58)

The optimality cut can be formulated as

Q(y) ≥ cT xk + (y-zk)T k (59)

where represents the dual variables vector of the coupling constraint (z = y) and define a sub-gradient of the objective

function. If the subproblem SPBDD(y): hasn’t feasible solution, the feasible cut can be written as

0 ≥ 1T vk+ - 1T vk- + (y-zk)T k (60)

where 1 is a vector with all elements equal to one (1) and vk+ and vk- are artificial variables vectors of the following

problem

FSPBDD(y): = { Minx,z,v 1Tv+ + 1Tv- | Ax (v+ - v-) = b - F0(z) ; F0(z) = b0

z = y ; xR+ ; zR } (61)

the coordinator problem can be formulated as

CYBDD: = { min z = f(y) + Q(y) | F0(y) = b0 , yS

Q(y) ≥ cT xk + (y-zk)T k k IT

0 ≥ 1T vk+ - 1T vk- + (y-zk)T k kIN } (62)

It is important to note that the optimality and feasibility cuts including values of subproblem primal variables, x and

z, like in the case of GBD, then it is possible call this type of cuts as Generalized Benders Cuts (GBC) that include

dual and primal variables of the subproblems. This approach produced strongest cuts that standard Benders.

Because of obtaining these MIP subproblems, BDD strategy efficiently mitigates the primal and dual inefficiencies of

the BT method. Also, in contrast to the LDD method, BDD does not require an enumeration scheme (e.g., branch-and-

bound) to close the duality gap.

Furthermore, BDD strategy enables a faster convergence for the overall solution process. In summary, the main

contributions of BDD are the following:

▪ Proposing a family of strengthened optimality and feasibility cuts that dominate the classical Benders cuts at

fractional points of the MP;

▪ Showing that the proposed feasibility and optimality cuts can give the convex hull representation of the MP at the

root node, i.e., no branching effort being required;

▪ Producing high quality incumbent values while extracting the optimality cuts; and

▪ Developing numerically efficient implementation methodologies for the proposed decomposition strategy and

presenting encouraging results on a wide range of hard combinatorial optimization problems.

3.3.2. Benders Dual Decomposition Implementation

Below, the cases analyzed by Rahmaniani et al. (2018) to test the convergence and the speed-up of BDD methodology

are presented. Three type of problems were tested:

▪ FMCND (Fixed-charge Multicommodity Capacitated Network Design)

▪ CFL-S (Stochastic Capacitated Facility Location)

▪ SNI (Stochastic Network Interdiction, Pan and Morton, 2008)

In all methods, cuts (both feasibility and optimality) are generated by solving each subproblem within an optimality

gap of 0.5%. Moreover, to generate the Lagrangean cuts for the FMCND and CFL instances, Partial Relaxed

Subproblems approach is applied (some of the integers variables are relaxed).

Rahmaniani et al. (2018a) implemented four variants of the strategy:

▪ BDD1: uses the strengthened Benders cuts by imposing the integrality requirements on all the copied variables

▪ BDD2: uses the strengthened Benders cuts by imposing the integrality requirements on a subset of the copied

variables

▪ BDD3: like BDD1 but also generates Lagrangean cuts

▪ BDD4: like BDD2 but also generates Lagrangean cuts.

The table 5 shows the relative speed-up of the four BDD variations compared with standard Benders (BT).

Table 5. Benders Dual Decomposition (Rahmaniani et al., 2018a)

Problem

BT LDD BDD1 BDD2 BDD3 BDD4

Gap

%

Time

secs

Gap

%

Time

secs

Gap

%

Time

secs

Gap

%

Time

secs

Gap

%

Time

secs

Gap

%

Time

secs

FMCND 20.66 181.48 9.01 3129.79 16.36 574.21 16.62 577.1 6 2240.46 5.83 2065.15

CFL-S 18.61 60.37 10.3 3679.93 17.81 205.65 17.82 185.48 1.47 1877.97 1.23 2134.24

CFL 20.17 1.8 0.09 0.08 19.83 2.28 19.82 2.12 3.2 112.39 5.28 148.76

SNI 29.68 130.1 27.12 3832.34 29.67 176.9 29.67 156.62 20.7 1111.22 20.7 1134.86

The table 6 shows the relative speed-up of the two BDD variations compared with CPLEX; the GAP tolerance was

1%.

Table 6. Benders Dual Decomposition versus CPLEX (Rahmaniani et al., 2018a)

Case

CPLEX BBD4 BBD3 (Reference)

Time

(s)

Gap

(%) #Sol.

Ratio

Time

Ratio

GAP

Time

(s)

Gap

(%) #Sol.

Ratio

Time

Ratio

GAP

Time

(s)

Gap

(%) #Sol.

FMCND 11142.26 3.8 26/35 1.12 1.96 7992.16 1.66 30/35 0.80 0.86 9976.44 1.94 30/35

CFL-S 1261.71 0.07 16/16 3.34 28.00 210.88 0.0025 16/16 0.56 1.00 377.27 0.0025 16/16

SNI 36156.89 23.79 0/70 5.00 22.23 8356.91 1.15 57/70 1.16 1.07 7226.1 1.07 52/70

3.4. Logic Based Benders Decomposition

Logic-Based Benders Decomposition (LBBD) was introduced by Hooker and Yan (1995) in the context of logic circuit

verification. The idea was formally developed by Hooker (2000) and applied to 0-1 programming by Hooker and

Ottosson (2003). In LBBD, the Benders cuts are obtained by solving the inference dual of the subproblem, of which

the linear programming dual is a special case.

Fortunately, the idea of Benders Decomposition can be extended to a LBBD form that accommodates an arbitrary

subproblem, such as a discrete scheduling problem. Unlike classical Benders, LBBD provides no standard scheme for

generating Benders cuts. Cuts must be devised for each problem class including the dependence of the objective

functions. In the Chapter “Logic-based Benders Decomposition for Large-scale Optimization” is presented LBBD

directly by Professor Hooker (2019).

4. Dynamic and Stochastic Benders’ Theory

For this topic, the reader is invited to review the chapters:

▪ Stochastic Programming and Risk Management: Fundamentals (Velásquez, 2019a).

▪ Stochastic & Dynamic Benders Theory (Velásquez, 2019b).

5. Coordinator Enhancements

Mixed Integer Linear and Non-Linear Programs (MIP/MINLP) involving logical implications modelled through

integer or binary variables and big-M coefficients are hardest to solve. This section covers the improvements that can

be seen to accelerate the solution of the Benders coordinators that include discrete variables.

5.1. MIP/MINLP Coordinators

A MIP/MINLP coordinator CYBT: can be formulated as

CYBT: = { min z = f(y) + Q(y) | F0(y) = b0 ; yINTEGERS

Q(y) (k)T[b - F(y)] kIT

0 (k)T[b - F(y)] kIN } (63)

The solution of a MIP/MINLP problem can be divided into three stages: i) search for the feasibility, ii) search for

optimality, and iii) check that feasible solution is optimal.

NO CUTs OPTIMALITY EMPHASIS

LB (DUAL)

UB (PRIMAL)

LB (DUAL)

UB (PRIMAL)

1 SEARCHING FEASIBILITY

SEARCHING OPTIMALITY

PROBING OPTIMALITY

2

3

2

2

33

11

160 200

Figure 13. Convergence of MIP/MINLP problems - Vehicle Routing Problem

The behavior of the algorithm to solve MIP/MINLP depends on the specific problem, being "impossible" to

characterize a general behavior. The figure 13 presents two possible behaviors of solving process for the VPR (Vehicle

Routing Problem) using two different sets of parameters with CPLEX solver, easily seen that the results of the

parameters used in the solver, and not only the type of problem, affects the MIP-GAP. In several cases, the main

problem of the MIP/MINLP coordinators is related to the large times required to test the optimality of a feasible

solution, which arises from the amount of time spent in each stage. To improve the behavior of the coordinator, which

tends to be similar for families of problems, the modeler must know its behavior to apply the most appropriate

improvements, considering that it is not possible to have a dominant approach.

The improvement of the coordinator can be considered from three points of view:

▪ Temporary relaxation of the discrete character of the coordinator problem

▪ Modify the standard cuts to improve the re-optimization when inserting a cut

▪ Stop the optimization process when the solver has a feasible solution and the gap is small but greater than zero

These enhancements can be used individually or collectively, the decision is based on empirical experience of the

modeler and not in formal mathematical proofs.

5.1.1. Multi-Phase Coordinator

Considering that the Benders cuts generated for a relaxed MIP/MINLP coordinators are valid for MIP/MINLP

coordinators, this improvement is based on dividing the optimization process in, at least, two phases (figure 14):

▪ Phase 1 relaxes the integer character of the coordinator to quickly derive valid cuts, during this phase first it is

solved the LP relaxation of the MP with the classical Benders cut. The older reference that knows the author about

of this strategy is presented by McDaniel and Devine (1977); it has become one of the main methods used to

efficiently apply the Benders algorithm on numerous MIP/MINLP applications, see (Rahmaniani et al. 2017).

▪ Phase 2 solves integer problems without relaxation to strengthen optimality.

This is motivated by the fact that, at the initial iterations of the BT, the master solutions are usually of very low quality.

At this point, the derived cuts provide a poor approximation of the optimal objective function; the idea is that the

relaxed model is solved in a "short-time" and can generate "large" number of cuts which serve to give "contour" to the

subproblem objective function represented by the Benders cuts. The process is convergent, but the moment that must

be passed the 1st phase to the 2nd relies on empirical knowledge, a valid approximation may be the size gap in the

Benders cycle.

However, there is a problem that it should keep in mind that this related to the possibility of infactibilidades, or of

irrationalities, in the subproblems, since the primal solutions provided by the relaxed coordinator are not integers;

when this occurs, an alternative is to relax the integrality requirements only on a subset of the integer variables.

Miny f(y) + Q(y)

|

F0 (y) = b0

yINTEGERS

Q(y) k (b - F (y))

k=1, FIRST-ITERATIONS

Min c x

|

A x = b - F (y)

x R+

yPrimalVariables

DualVariables

RELAXED

MIPMINLP

LP

MIPMINLP

LP

Miny f(y) + Q(y)

|

F0 (y) = b0

yINTEGERS

Q(y) k (b - F (y))

k=1, ALL-ITERATIONS

Min c x

|

A x = b - F (y)

x R+

yPrimalVariables

DualVariables

FIRST ITERATIONS FINAL ITERATIONS

Figure 14. Enhancements MIP/MINLP Benders Coordinators - Relaxing Coordinator

Rahmaniani et al. (2018a) propose a three-phase strategy to generate the Benders cuts. The proposed multi-phase

implementation works as follows.

▪ Phase 3 generates Lagrangean. To do so, the Lagrangean dual problem SPBDD(y) is solved defining a trust region

(stabilization) for the Lagrangean multipliers. Details of this methodology are in Rahmaniani et al. (2018a)

5.1.2. Modified Optimality Cuts

To accelerate the process of optimization, mainly, of MIP/MINLP problems the standard Benders cuts can be

reformulate as

Q(y) + QAk ≥ k (b - F (y))

QBk (k)T[b - F(y)]

QAk ≥ 0 ; QBk ≥ 0 (64)

where QAk and QBk represent artificial variables that ensures that the solution yk is feasible for the cut included in the

iteration k+1 (QAk+1 > 0 or QBk+1 > 0), and it can be used as starting point of the search in the iteration k+1; then

penalizations must be included in the objective function of the coordinator problem, this is

CYBT: = { min z = f(y) + Q(y) + kIT ∞ QAk + kIN ∞ QBk | F0(y) = b0 ; yS

Q(y) + QAk (k)T[b - F(y)] k IT

QBk (k)T[b - F(y)] kIN } (65)

The objective of this enhancement is to reduce the time to get the feasibility in the coordinator problem.

5.1.3. Inexact Solutions

Inexact solutions of the master problem are related to the MIP-GAP in the coordinator, often GAP is rapidly reduced,

but the solver consumes much time to prove that the solution obtained is optimal or to get a “slightly better” solution

that can be optimal. In this case BT can work with a dynamic tolerance, qk, that is changing as advancing the

optimization process, this can be expressed as:

[ { f(yk) + Q(yk) } - { f(y*) + Q(y*) } ] / [ f(y*) + Q(y*) ] ≤ qk (63)

where yk is in the feasible qk zone near to optimal point yk*; the series of values qk must be positive and tend to zero

as that increase the iterations (k).

q q

q

q

q∞

The main idea is to quickly generate good tentative master solutions that can be used to obtain “good” Benders cuts

in the subproblems based in two main guidelines: i) cuts should be generated with a reasonable computational effort,

and ii) cuts should be like the solutions that would be obtained with an exact solution of the master problem.

Costa et al. (2012) developed a Benders approach based on “inexact solution” that they called Benders with extra-cuts

and applied it to Fixed-charge Network Design (FND) problem, a total of 54 instances were used in the experiments;

table 7 shows the resume of the speed-up that generates the inexact solutions (BT-IS).

Table 7. Speed-up Inexact Solutions (Costa et al., 2012)

Solution Standard Benders

Benders Inexact Solutions Ratio

BT/BT-IS BT-IS BT-IS

Time

(secs)

GAP

(%)

Time

(secs)

GAP

(%)

Yes Yes 6081.02 1617.19 3.76

(times)

No No 21600 7910.26 2.73 (times)

No No 49.45 27.59 1.79

(%)

The results are presented in three groups according to

whether the problem was solved. The figure 15 presents

the relationship between the solution times that increases

linearly as function of the complexity.

y = 4.9651x - 44.285R² = 0.9538

0

500

1000

1500

2000

2500

0 100 200 300 400 500

CPLEX(secs)

BT-IS(secs)

0

25

50

75

100

0 10 20 30 40 50 60

Figure 15. Relation CPLEX versus Inexact Solutions

5.1.4. Inexact Cuts

Early termination of the subproblems generated during BT iterations decomposition produces valid cuts (if we are

working with a dual feasible algorithm, that preserves in all iterations the dual feasibility) which are inexact in the

sense that they are not as constraining as cuts derived from an exact solution. This approach is equivalent to relax the

primal feasible zone of the sub-problem by a factor k the following formulation shows the differences.

SPBT(y): = { min Q(y) = cTx | Ax – [ b - F(y) ] = 0 ; xR+ } (64)

SPBT-IC(y): = { min Q(y) = cTx | k ≤ Ax – [ b - F(y) ] ≤ k ; xR+ } (65)

where k is subproblem feasibility tolerance that must be

positive and tend to zero when increase the iterations (k).

1 ≥ 2 ≥

3 ≥ …

k ≥

k

≥

0 (66)

Philpott et. al (1996) present an algorithm and the

convergence conditions, Zakeri et. al. (1999) uses a

primal-dual interior point algorithm (baropts) to make

experiments with model for stochastic planning of hydro-

electric power generation systems of New Zealand.

The table 8 shows the results, the gain of the BT-IC is

evident.

Table 8. Speed-up Inexact Cuts

Problem BT

(secs) BT-IC (secs)

Ratio

BT/BT-IC

(times)

Improvement

(BT - BT-IC)/BT

(%)

P1 170 68 2.50 60.00

P2 261 159 1.64 39.08 P3 124 109 1.14 12.10

P4 640 398 1.61 37.81

P5 594 546 1.09 8.08

P6 626 585 1.07 6.55

P7 324 150 2.16 53.70 P8 376 304 1.24 19.15

P9 1207 1087 1.11 9.94

P10 979 780 1.26 20.33 P11 150 134 1.12 10.67

Total 5451 4320 1.26 20.75

5.1.5. Combinatorial Benders Cuts

Codato and Fischetti (2006) proposed a generic problem reformulation, of quite general applicability, aimed at

removing the model dependency on the big-M coefficients, used in standard MIP formulations.

The master solutions are sent to a slave linear problem, which validates them and possibly returns combinatorial

inequalities to be added to the coordinator. The inequalities are associated to minimal (or irreducible) infeasible

subsystems of a certain linear system and can be separated efficiently in case that the coordinator solution is integer.

The overall solution mechanism resembles Benders Partitioning, but the cuts produced are purely combinatorial. This

produces an LP relaxation of the coordinator problem which can be considerably tighter than the one associated with

big-M formulation.

For ease of explanation, initially, we considered the following a problem P: which doesn’t include variables of the

subproblems in the objective function

P: = { min z = f(y) | F0(y) = b0 ; Ax + F(y) = b ; xR+ ; y{0,1} } (67)

in this case the subproblem is formulated as

SPBT(y): = { min Q(y) = 0Tx | Ax = b - F(y) ; xR+ } (68)

Because there is not an objective function for the subproblem SPBT(y):, any feasible solution is a solution to the

subproblem. If the subproblem hasn’t feasible solution, it implies that the combination of coordinator binary variables

yk is not feasible, then a feasible cut that eliminates the combination must be included in the coordinator yk; if the

subproblem has feasible solution, then the combination yk is the optimal solution of the problem P:. The Benders

feasibility cut may be formulated as

iCIT(k) xik ≤ | CIT(k) | - 1 (69)

where the index i represents i-th the variable in the vector x and CIT (k) the set of binary variables equal to 1 in the

cycle k of the algorithm; this cut force that at least one of the positive variables of the vector xk become zero. Then the

CBC substitute the original feasibility Benders cut. In case that the P: objective function includes the vector y, the

optimality Benders cut is equal to the standard Benders optimality cut. The formulation is:

CYBT: = { min z = f(y) + Q(y) | F0(y) = b0 ; yS

Q(y) (k)T[b - F(y)] k IT

iCIT(k) xik ≤ | CIT(k) | - 1 kIN } (70)

Codato and Fischetti compared the CBC with CPLEX v8.1 in terms of execution times and integrality gaps computed

with respect to the best integer solution found, for three cases:

1. Instances solved to proven optimality by CBC but not by CPLEX (table 9)

2. Instances solved to proven optimality by both CBC and CPLEX (table 10)

3. Instances not solved (to proven optimality) by CBC nor by CPLEX

Notice that there wasn’t instance that was solved by CPLEX but not by CBC.

Table 9. NP-hard Problems Solved only by

CBC. (Codato and Fischetti, 2006)

Problem CPLEX

GAP (%)


Chorales-134 51

Chorales-107 57

Breast-Cancer-600 99

Bridges-132 85

Mech-analysis-152 45

Monks-tr-124 70

Monks-tr-115 69

Solar-flare-323 90

BV-OS-376 65

BusVan-445 77

MEAN 70.8

Map Labelling

CMS-600-0 1.35

CMS-650-0 1.88

CMS-650-1 0.46

CMS-700-1 2.04

CMS-750-1 1.63

CMS-750-4 1.9

CMS-800-0 3.49

CMS-800-1 2.04

Railway 8.42

CMS-600-0 10.5

CMS-600-1 6.19

MEAN 3.63

Table 10. NP-hard Problems Solved by CPLEX & CBC.

(Codato and Fischetti, 2006)

Problem CPLEX CBC Time Ratio

CPLEX/CBC


Chorales-116 1:24:52 10:18 8.2

Balloons76 0:10 0:14 71.4

BCW-367 8:33 0:13 39.4

BCW-683 2:02:29 0:32 229.7

WPBC-194 57:17 3:32 16.2

Breast-Cancer-400 2:50 0:16 1062

Glass-163 56:17 0:05 675.4

Horse-colic-151 4:50 0:23 12.6

Iris-150 9:29 1:10 8.1

Credit-300 19:35 0:02 587.5

Lymphography-142 0:11 0:01 11

Mech-analysis-107 0:05 0:01 5

Mech-analysis-137 7:44 0:27 17.2

Monks-tr-122 2:05 0:05 25

Pb-gr-txt-198 4:21 0:05 52.2

Pb-pict-txt-444 2:07 0:02 63.5

Pb-hl-pict-277 4:17 0:27 9.5

Postoperative-88 15:16 0:01 916

BV-OS-282 5:13 0:24 13

Opel-Saab-80 1:03 0:13 4.8

Bus-Van-437 9:17 0:28 19.9

HouseVotes84-435 4:59 0:11 27.2

Water-treat-206 1:10 0:06 11.7

Water-treat-213 17:00 0:51 20

MEAN 18:23 00:50 21.93

Map Labelling

CMS-600-1 1:08:41 0:04:34 15

Computational results indicate that Combinatorial Benders Cuts (CBC) produces a reformulation of Benders’ Theory

which can be solved some orders of magnitude faster than the original MIP model using one of the best commercial

solvers like CPLEX. The figure 16 shows two detailed cases.

Figure 16. Speed-up Combinatorial Benders Cuts

PROBLEM: BRIDGES-132PROBLEM: CHORALES-116

CPLEX/CBC ≈ 9.16 CPLEX/CBC >> 45

COMBINATORIAL BENDERS CUTS (CBC) - CASE: NO COST FOR SUBPROBLEM VARIABLES

5.2. Trust Region (Regularization)

Considering that BT iteratively solves a problem of a non-differentiable convex optimization due to the function Q(y)

is a piece-wise linear function, can be considered mathematical conditions related to the sub-gradients, or super-

gradients, Q(y). From this point of view, should be imposed conditions of regularization of the step size of the

algorithm to avoid oscillations and get stronger convergence properties.

The first variation to consider is the proposal by Linderoth and Wright (2001), known as the "trust region" that it is a

kind of regularizing technique adapted from regularized decomposition for continuous problems which helps to

mitigate two kinds of difficulties in cutting plane methods:

▪ Growth in the number of cuts added to the master problem and

▪ The fact that there is no easy way to use a good starting solution.

The existing literature shows that solution oscillates wildly in early iterations. Thus, trust region may be used to limit

the early movements of variables (continuous, integers and binaries) around a previous point yk

5.2.1. Neighborhood Bounding

This variation adds a hypercube bounding the maximum difference between the solution of the coordinator in the stage

k and the solution in the previous stage k-1, introducing the restrictions in the form of bounds

− 1 y – yk-1 1 (71)

where is 1 a vector of unit components and a vector of appropriate multipliers, which altogether determines the size

of the "trust region".

The coordinator model is formulated as

CY(yk-1)TR: = { Min f(y) + Q(y) |

F0(y) = b0

Q(y) (k) T[b - F(y)] k=1,ITE

0 (vk) T[b - F(y)] kITN

− 1 y – yk-1 1 } (72)

The definition of is carefully analyzed by Linderoth J. and Wright (2001), who proposed a double-loop: one in BT

and another in the determination ; there are many possibilities to determine the size of the "trust region" including

the adjusted in each dimension depending on the behavior of the algorithm. In form of bounds is relatively 'easy' and

can be effectively.

5.2.2. Penalizations Movements

Another approach is called regularized decomposition (Ruszczynski, 1986) who introduce a quadratic term in the

objective function to penalize the difference between yk and yk-1; in each cycle the coordinator objective function is

½ k

(y - y k-1)T(y - y k-1) + f(y) + Q(y) (73)

where is k a positive penalty factor, whose determination is part of the algorithm. In this case the coordinator

corresponds to a quadratic problem

CY(yk

)DR: = { Min ½ k

(y - y k-1)T(y - y k-1) + f(y) + Q(y) |

F0(y) = b0

Q(y) (k)T[b - F(y)] , k=1,ITE

0 (vk)T[b - F(y)] , kITN } (74)

Zaourar and Malick (2015) report experiments that show the speed-up of the BT as consequence of implementation

of the regularization method; they used two set of standard problems related with Hub Locations Problem (table 11)

and with Network Design Problem (table 12). The advantages of regularization are evident.

Table 11. Hub Location Problems (Zaourar and Malick, 2015)

Nodes Transfer

Cost Standard Stabilized

Ratio

Standard/Stabilized

10

0.1 1.06 0.91 1.16

0.5 1.28 0.89 1.44

1 1.07 0.75 1.43

15

0.1 5.31 3.01 1.76

0.5 5.27 5.86 0.90

1 4.52 5.83 0.78

20

0.1 21.72 16.61 1.31

0.5 16.83 14.24 1.18

1 14.3 13.94 1.03

25

0.1 58.66 35.18 1.67

0.5 52.58 34.91 1.51

1 46.31 28.7 1.61

30

0.1 112.08 144.47 0.78

0.5 97.72 96.28 1.01

1 97.11 96.11 1.01

35

0.1 296.61 182.69 1.62

0.5 183.46 116.71 1.57

1 177.94 110.59 1.61

40

0.1 467.17 498.91 0.94

0.5 351.77 310.24 1.13

1 306.04 336.76 0.91

MEAN 110.42 97.79 1.13

Table 12. Network Design Problems. (Zaourar and Malick, 2015)

Nodes Commodities Standard Stabilized Ratio

Standard/Stabilized

5

5 0.27 0.31 0.87

10 0.38 0.07 5.43

15 0.58 0.12 4.83

20 0.69 0.08 8.63

8

5 1.24 0.65 1.91

10 42.13 53.43 0.79

15 72.49 60.6 1.20

10

5 7.09 3.95 1.79

10 555.79 252.69 2.20

15 20099.7 20289.8 0.99

12 5 37.58 12.8 2.94

10 34267.4 10661.6 3.21

15 5 677.5 53.54 12.65

20 5 10796.2 1481.89 7.29

MEAN 4754.22 2347.97 2.02

5.2.3. Binary Variables

For problems with binary variables in the first stage, Santoso et al. (2005) and Oliveira et al. (2014) showed that the

2-norm or infinity-norm distance is not effective. Therefore, Yang et al. (2016) uses Hamming distance, the trust

region is defined by the following equation

jIB1(k) (1- yj) + jIB0(k) yj ≤ k (75)

where IB1(k) represents the set of binary variables equal to 1 in the iteration k and IB0(k) the complementary set

(binary variables equal to 0); k limits the number of variables that can change from iteration k to iteration k+1. The

trust region cannot guarantee convergence (Keller and Bayraksan, 2009); therefore, the trust region constraint must be

dropped once the procedure has reached certain criteria.

Santoso et al. (2005) make experiments with Stochastic Supply Chain Network Design Problem and considered two

problems: domestic and global supply chain; for trust region only was used the domestic cases, whose dimensions are

presented in the table 13.

Table 13. Stochastic Supply Chain Network Design Problem. (Santoso et al., 2005)

Scenarios

Domestic

Constraints Variables

Equality Inequality Continuous Binary

1 3,498 4,324 20,912 140

20 69,960 86,480 418,240 140

40 139,920 172,960 836,480 140

60 209,880 259,440 1,254,860 140

We selected combinations of the acceleration schemes using by Santoso et al. to solve instances of 20 scenarios,

oriented to evaluate the marginal gain generated by Hamming Trust Region. The IDs of acceleration schemes are

denoted as follows: LC (Logistics constraints); TR (Hamming Trust region); KI (Knapsack inequalities) and UH

(Upper-bounding heuristic). All the results are coherent (table 14), except in the case of KI and TR that the inclusion

of TR implies less time (coherent) but more GAP (incoherent). This shows that in mathematical programming is easy

found cases in which we have little incoherencies.

Table 14. Speed Up Hamming Trust Region

Acceleration

Scheme

1st GAP

%

10th GAP

%

Time

secs Iterations

Ratio

Time

Marginal Ratio

Time

Ratio 10th

GAP

Marginal Ratio

10th GAP

Standard 100 60 > 4000 30 > 2.90 0

6000 - 2000

Standard + TR 100 40 > 4000 30 > 2.90 4000

LC 31 8 > 4000 30 > 2.90 0

800 - 730

LC + TR 31 0.70 > 4000 30 > 2.90 70

LC + KI 31 0.10 3860 26 2.80 - 0.19

10 + 10

LC + KI + TR 31 0.20 3600 23 2.61 20

LC + KI + UH 31 0.01 1500 8 1.09 - 0.09

1 0

LC + KI + UH + TR 31 0.01 1380 7 1 1

6. Cuts Enhancements

The traditional Benders decomposition might fail to achieve the computational efficiency; within the context of

generating more effective cuts, most researchers have sought to generate a set of “strong” cuts at each iteration, or by

modifying the way that Benders cuts are generated.

6.1. Strong Cuts

6.1.1. Pareto Optimal (POP)

Considering the possible degeneration of the Benders primal subproblem, Magnanti and Wong (1991) proposed a

seminal methodology to accelerate Benders convergence by strengthening the generated cuts. In linear continuous

problem, the degeneration implies that the subproblem has multiple dual solutions; hence a subproblem may generate

multiple optimality cuts, with many of its components equal to 0 (zero). Since, the addition of “empty” cuts generated

during iterations makes the MP harder to solve. Among these feasible cuts, one cut may dominate another one;

choosing the best one from these alternative cuts would be beneficial in solving the MP by reducing the number of

iterations.

Magnanti and Wong define a Pareto-OPtimal cut (POP) if it is not dominated by any other cut that may be solution of

the degenerate subproblem, this cut is also called the “deepest” cut; to calculate this cut they used a core point yC that

is a point in the interior of the convex hull of the feasible region for the original problem. It is intractable to obtain a

core point, fortunately many researchers have demonstrated that many points which are close to core points can

generate strong cuts.

A core point yC

must be a solution of the constraints that define the coordinator without Benders cuts.

F0

(yC) = b0 ; yC S (76)

A strong cut must be solution of the following problem

DPOP(yk)= { Max

(b

- F(yC)) | A

≤ c

Q(yk) = cT xk = (b - F(yk)) } (77)

where yk represents the primal values defined by the coordinator in the stage k and xk the solution of the sub-problem

depending on yk. The following diagram presents flow of the optimization process; alternatively, it is possible to solve

the dual problem of DPOP(yk)

PPOP(yk): = { Minx cTx + Q(yk) w |

A

x + (b - F

(yk)) w = (b - F(yC)) ; x R+ ; w R } (78)

Miny f(y) + Q(y)

|

F0 (y) = b0

y S

Q(y) k (b - F (y)) k=1, ITERATIONS

ykPrimalVariables

Max (b - F(yC)) |

A c

cTx* = (b - F (yk))

Min cTx |

A x = b - F (y)

x R

STRONGCUTS

kStrongDual

Variables

BENDERSSUBPROBLEM

cTx*

Value Objective Function

LPMIPNLP

MINLP

LP LP

Figure 17. Pareto-Optimal Benders Cuts

To obtain a POP, may be needed to solve twice the number of subproblems; there is a trade-off between the CPU time

saving for solving the master problem and the CPU time consumed for obtaining the POP cuts.

Yang et al. (2016) study the speed-up generates by POP cuts using cases associated with Process Flexibility Design

(PFD) problem that is related with a supply chain with m plants that can produce n types of products (table 15).

Table 15. Speed-up Pareto Optimal Cuts (Yang et al., 2016)

Case POP BT Ratio BT/POP

(times) 𝑛/𝑚 GAP

Time

secs Iterations GAP

Time

secs Iterations

4/4 5x10-4 26 16.5 1.1x10-3 15 16.4 0.58

5/5 3x10-4 123 35.6 1.2x10-3 265 54.3 2.15

5/7 5x10-3 174 38.7 3.13x10-2 1769 88.5 10.17

6.1.2. Other Cuts

Based on the research presented by Magnanti & Wong multiple works has been done in the same direction, the

following is a brief reference to several of the proposals made

▪ Papadakos Cuts: Papadakos (2008) highlights that the Magnanti–Wong's cut generation problem dependency on

the solution of SP can sometimes decrease the algorithm's performance. To circumvent this difficulty, the author

showed that one can obtain an independent formulation of Magnanti–Wong cut generation problem by dropping

the constraint that implies in the former dependency on the solution of the subproblem. The author also provided

guidelines for efficiently generating additional core points though convex combinations of previously known

cores points and feasible solutions of the MP.

▪ Maximal Non-Dominated Cuts (MND): More recently, Sherali and Lundy (2011) presented a different strategy

for generating non-dominated cuts using small perturbations on the right-hand side of the SP to generate

maximal non-dominated Benders cuts. The authors also showed a strategy based on complementary slackness

that simplifies the cut generation when compared with the traditional strategy used by Magnanti and Wong.

▪ Dynamically Updated Near Maximal Cuts (DUNM): Oliveira et. al (2014) presents the theory to formulate

Dynamically Updated Near Maximal Cuts (DUNM) as an alternative way of dealing with the difficulty related to

the correct definition of weight μ.

Oliveira et. al (2014) show a comparison between the alternatives. The figure shows the ratio-time when DUNM cut

is selected as refence. The conclusions are clear:

▪ The MIP solver (CPLEX) is faster for low number of scenarios, this is due to the overhead that implies the use of

more sophisticated methodologies

▪ CPLEX has a polynomial crescent; then, for large number of scenarios the solution time tends to infinite.

▪ MND has a better performance that POP

▪ POT and MND has a “stable” performance that implies that DUNM is approximately 2.5 times faster.

y = 0.0021x + 2.4333

y = -0.0009x + 2.4888

y = 2E-05x2 + 0.0165x - 0.0383

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0 20 40 60 80 100 120 140 160 180 200

RATIO SOLUTION TIME (times) - REFERENCE DUNM

POP MND DUMN CPLEX

Linear (POP) Linear (MND) Poly. (CPLEX) Scenarios

Figure 18. Speed-up Benders Cuts: POP, MND, DUMN, CPLEX

6.1.3. Hybrid Cuts

As its name implies, hybrid cuts are related to the mixing of two or more mathematical methodologies in a process

defined for each BT researcher; This means it is not easy to generalize and to standardize the concept of hybridization,

since each case depends on the purpose sought by the authors of the hybrid cuts to be included in a

partition/decomposition Benders process.

Jain & Grossmann (2001) present results of their work whose goal was to develop models and methods that use

complementary strengths of Mixed Integer Programming (MIP) and Constraint Programming (CP). A scheduling

model is formulated as a hybrid MIP/CP model that involves some of the MIP constraints, a reduced set of the CP

constraints, and equivalence relations between the MIP and the CP variables.

The approach relaxes the integrality constraints in the master problem and send a primal solution to the subproblem;

if there exists a feasible solution, then this solution is the optimal solution of the problem and the optimization process

ended. Otherwise the causes for infeasibility are inferred as cuts and added to the coordinator:

jIX1(k) (1- yj) + jIX0(k) yj ≤ Bk (79)

where IX1(k) represents the set of binary variables equal to 1 in the iteration k, and IX0(k) the complementary set of

binary variables equal to 0 and Bk the norm of IX1(k), this is

Bk = | IX1(k) | (80)

These general “no good" cuts may be rather weak; then whenever possible stronger cuts that exploit the special

structure of the problem they should be used. The results of the work done by Jain and Grossmann (2001) integrating

cuts obtained of CP with BT for the case of binary subproblems in a job shop scheduling problem (table 16).

For each problem was considered two data sets. For technologies was

compared: i) MIP using CPLEX 6.5 (single processor), ii) CP Solver, iii)

Hybrid model (integrating CPLEX and CP) and i) Hybrid Benders Cuts. All

experiments were run on a dual processor SUN Ultra 60 workstation.

The table 17 show a resume of the results; the conclusion is evident: at least

from 2000, BT enhanced with Hybrid Cuts may solve NP-hard problems

faster than the “best” commercial solver. Further numerical results for this

problem with different data are reported in Harjunkoski et al (2000).

Table 16. Job Shop Scheduling Problem.

(Jain and Grossmann, 2001)

Problem Orders Machines

1 3 2 2 7 3

3 12 3

4 15 5 5 20 5

Table 17. Speed-up Hybrid Benders Cuts. (Jain and Grossmann, 2001)

Problem Set

Mixed Integer Programming

CPLEX Solver

Constrain Programming

CP Optimizer

Hybrid Optimization

CPLEX+CP

Benders Hybrid Cuts

Time

Time Ratio Time Ratio Time Ratio MILP CP Total

1 1 0.01 0.50 0 0 0.04 2.00 0.02 0 0.02

2 0.03 3.00 0.02 2 0.05 5.00 0.01 0 0.01

2 1 0.47 0.90 0.04 0.076923 0.10 0.19 0.47 0.05 0.52

2 0.49 24.50 0.14 7 0.27 13.50 0.01 0.01 0.02

3 1 220.0 52.63 3.84 0.91866 4.21 1.01 4.01 0.17 4.18

2 1.77 88.50 0.38 19 1.12 56.00 0.02 0 0.02

4 1 180.41 80.18 553.54 246.0178 91.59 40.71 2.01 0.24 2.25

2 61.82 1545.50 9.28 232 5.58 139.50 0.02 0.02 0.04

5 1 > 20000 1415.43 > 68853.49 4872.858 13736.06 972.12 13.69 0.44 14.13

2 106.28 259.22 2673.87 6521.634 170.95 416.95 0.29 0.12 0.41

Consolidated > 20571.29 > 952.37 > 72094.6 > 3337.713 14009.97 648.61 20.55 1.05 21.6

Performance 3 4 2 1

6.2. Hybrid Strategy

Considering that exists many alternatives to enhancement BT, speed-up and close the GAP, a hybrid strategy may

include several enhancements. Yang et al. (2016) analyzes many hybrids strategies result of the combination of the

following enhancement: Standard Benders (BT), Hybrid Cuts (HC), Trust Region (TR), Strengthening Cuts (SC),

Approximating Master Solve (AM), Warm Startup (WS) and Parallel Computation (PC). Different grouping schemes

could lead to different computational efficiencies.

Yang et al. (2016) realized experiments using the Process Flexibility Design (PFD) that is related with a supply chain

where there are m plants that can produce n types of products; in a balanced supply chain the numbers of products and

plants are equal, in a full-flexibility supply chain each plant is able to produce all products, and exists special cases

with more general settings; the problem is formulated as a two-stage stochastic program. The table 18 shows a resume

of all experiment realized, each experiment is characterized by the duple <n,m>.

Table 18. Hybrid Benders Strategies. (Yang et al., 2016)

Code Enhancements Time GAP

Ratio

Time/GAP Time GAP

Ratio

Time/GAP Time GAP

Ratio

Time/GAP BT HC PC TR AM WS SC (s) % times or % (s) % times or % (s) % times or %

SMALL SIZE BALANCED PROBLEMS Products – Plants: <4,4> <5,5> <6,6>

MIP 143 0 9.53 2801 0.02 8.84 0.23

Single-cut 1 96 0 6.40 1324 0.03 4.18 1.31

HC 1 39 0 2.60 475 0 1.50 0.18

HC-PC 1 1 15 0 1 374 0 1.18 0.17

HC-PC-TR 1 1 1 15 0 1 284 0 0.90 1402 0.07 0.40

HC-AM 1 1 41 0 2.73 416 0 1.31 3562 0.03 1.02

HC-PC-AM 1 1 1 15 0 1 317 0 1 3504 0.03 1

HC-PC-TR-AM 1 1 1 1 15 0 1 186 0 0.59 802 0.03 0.23

HC-WS 1 1 37 0 2.47 328 0 1.03 2897 0.03 0.83

HC-WS-AM 1 1 1 32 0 2.13 327 0 1.03 2518 0.03 0.72

HC-PC-SC 1 1 1 40 0 2.67 174 0 0.55 1661 0.03 0.47

MEDIUM AND LARGE SIZE UNBALANCED PROBLEMS Products – Plants: <8,8> <10,10> <20,20>

MIP 33.4 123.63 93.6 346.78 98.77 379.88

Single-cut 1 3.77 13.96 2.88 10.67 28.35 109.04

HC-WS 1 1 2.05 7.59 1.74 6.44 6.5 25.00

HC-WS-AM 1 1 1 2.02 7.48 1.74 6.44 7.16 27.54

HC-PC-TR 1 1 1 1.03 3.81 0.45 1.67 15.31 58.88

HC-PC-TR-AM 1 1 1 1 0.27 1 0.27 1 0.26 1

HC-PC-SC 1 1 1 2.44 9.04 0.83 3.07 13.28 51.08

NUMERIC STUDY RESULTS (UNBALANCED SYSTEMS) Products - Plants: <6,4> <8,5> <15,12>

MIP 583 0 6.48 0.07 95.88 138.96

Single-cut 1 2506 0.19 27.84 7.83 15.03 21.78

HC 1 579 0 6.43 5.05 13 18.84

HC-PC 1 1 430 0 4.78 5.05 13 18.84

HC-PC-TR 1 1 1 348 0 3.87 2052 0.01 1.12 0.94 1.36

HC-AM 1 1 485 0 5.39 3.87 6.13 8.88

HC-PC-AM 1 1 1 345 0 3.83 3.87 6.13 8.88

HC-PC-TR-AM 1 1 1 1 90 0 1 1831 0 1 0.69 1

HC-WS 1 1 271 0 3.01 0.22 2.62 3.80

HC-WS-AM 1 1 1 268 0 2.98 0.17 2.57 3.72

HC-PC-SC 1 1 1 513 0 5.70 4.32 9.55 13.84

The cases over blue (HC-PC-TR-AM) are the bests performance and over rose are the worst cases. For this case the

results are evident; i) the amount of enhancements included generate better performance of Benders technologies, and

ii) the MIP solver (CPLEX) can’t compete with the enhancement Benders methodologies. The case reference is HC-

PC-TR-AM. The maximum time 3600 (secs), when it is exceeded the ratio is calculated based on GAPs. The possibility

of using hybrid strategies depend on the ability to select and the mix the improvements that behaves better performance

for a group of problems or for a type of specific problem.

7. Benders Parallel Optimization

7.1. Parallel Optimization

In the prologue of the book “Parallel Optimization: Theory, Algorithms and Applications”, wrote by Censor and

Zenios (1997), the Professor Dantzig said: “the fascinating new world of parallel optimization using parallel

processors, computers capable of doing an enormous number of complex operations in a nanosecond”, additionally he

said “according to an old adage, the whole can sometimes be much more than the sum of its parts, I am thoroughly in

agreement with the authors, belief in the added value of bringing together applications, mathematical algorithms and

parallel computing techniques”. This is exactly what the mathematical modeler found true in Parallel Optimization.

Despite the time elapsed since the first applications of parallel optimization, in 1991, this methodology is only

beginning to develop since is recent the time in which multi-processing is massive and become the low-cost multi-

core computers. Therefore, it is expected that in the coming years the research on parallel optimization and speed of

solving complex problems increases significantly.

Consider that parallelization is not only limited to BT but that many of the concepts used are valid for the application

in other large-scale methodologies. The idea of parallelism is not new, since it is at the core of the decomposition via

BT, and practically born with the idea of Van Slyke and Wets in 1969; the new is the power of parallel computing to

which researchers have access. For a long time, ideas of implementing parallel algorithms on single-processor

computers it was merely academic; the real practice was only available to researchers that have access (money) to this

type of resource.

Below, table 19 shows some applications of parallelism using BT. The papers were select from a class of Professor

Linderoth in 2003 (Parallel and High-Performance Computing for Stochastic Programming, Course: Stochastic

Programming) and correspond to twelve papers that may the firsts papers in parallel stochastic optimization.

Table 19. Papers in Parallel Stochastic Optimization. (Linderoth in 2003)

Dantzig, G., J. Ho, and G. Infanger (1991, August). “Solving Stochastic Linear Programs on a Hypercube Multicomputer”.

Technical Report SOL 91-10, Department of Operations Research, Stanford University.

Ariyawansa, K. A. and D. D. Hudson (1991). “Performance of a Benchmark Parallel Implementation of the Van Slyke and Wets

Algorithm for Two-Stage Stochastic Programs on The Sequent/Balance”. Concurrency Practice and Experience. 3, 109–128.

Ruszczynski, A. (1993). “Parallel Decomposition of Multistage Stochastic Programming Problems”. Mathematical Programming

58, 201–228

Jessup, E., D. Yang, and S. Zenios (1994). “Parallel Factorization of Structured Matrices arising in Stochastic Programming”.

SIAM Journal on Optimization 4, 833–846.

Mulvey, J. M. and A. Ruszczynski (1995). “A New Scenario Decomposition Method for Large Scale Stochastic Optimization”.

Operations Research 43, 477–490.

Birge, J. R., C. J. Donohue, D. F. Holmes, and O. G. Svintsitski (1996). “A Parallel Implementation of the Nested Decomposition

Algorithm for Multistage Stochastic Linear Programs”. Mathematical Programming 75, 327–352. J.

Nielsen, S. S. and S. A. Zenios (1997). “Scalable Parallel Benders Decomposition for Stochastic Linear Programming”. Parallel

Computing 23, 1069–1089.

Gondzio, J. and R. Kouwenberg (1999, May). “High Performance Computing for Asset Liability Management”. Technical Report

MS-99-004, Department of Mathematics and Statistics, The University of Edinburgh.

Fragniere, E., J. Gondzio, and J.-P. Vial (2000). “Building and Solving Large-Scale Stochastic Programs on an Affordable

Distributed Computing System”. Annals of Operations Research 99, 167–187.

Linderoth, J. T. and S. J. Wright (2001, April). “Decomposition Algorithms for Stochastic Programming on a Computational

Grid”. Preprint ANL/MCS-P875-0401, Mathematics and Computer Science Division, Argonne National Laboratory,

Argonne, Ill.

Blomvall and P. O. Lindberg, “A Riccati-Based Primal Interior Point Solver for Multistage Stochastic Programming - Extensions,

Optimization Methods and Software”, (2002), pp. 383–407.

Linderoth, J. T., A. Shapiro, and S. J. Wright (2002, January). “The Empirical Behavior of Sampling Methods for Stochastic

Programming Optimization”. Technical Report 02-01, Computer Sciences Department, University of Wisconsin-Madison.

In the case of parallel optimization, the modeler must then consider at least the following aspects:

1. Timing: two cases must be considered:

▪ Synchronous, in this case settled points (marks) in the processes to synchronize the results of a phase/stage;

for example, an L-Shaped problem can be solved using N processors, one for each scenario/slave problem

and the coordinator problem must wait until all subordinate problems have been resolved in any iteration.

This approach implies that the processors that ended the optimization before that last processor have idle time

while waiting to receive new information from the coordinator. An important advantage of the synchronous

method is the possibility of ensuring that the results are repeatable, which in many cases is a feature that is

required, mainly in degenerate cases or non-optimal solutions.

▪ Asynchronous, in this case the coordinator does not have to expect that all the slave problems are solved and

can generate new primal information when it considers that there is enough new information (cutting planes)

that justified a new optimization. In this case is minimized the processors idle time, which should minimize

the total time of completion of the optimization process; but it isn’t sure. This approach involves designing a

dynamic strategy for processors to assign roles to the processors during the optimization process.

2. Processor Role: is related to specific problem that must be solved by a specific processor in a specific time

▪ Static: the assignment is made at the beginning of the optimization process and remains static throughout the

process. It applies to synchronous cases.

▪ Dynamic: the assignment is carried out considering the events occurring and the status of the processors; its

implementation requires an additional task responsible for making assignments of roles to processors. Its

design has no preset rules and depends of the art/knowledge of the modeler and the knowledge of the problem

behavior. Creativity is crucial in this process since the number of variations that can be deployed may be

infinite or nonenumerable. For example, for many scenarios, the modeler can think about having more than

one processor responsible to generate primal variables to the subproblems, this implies more than one

coordinator.

In the Chapter “The Future: Mathematical Programming 4.0” (Velásquez, 2019c) if presented some ideas about the

importance of parallel optimization in the short term.

7.2. The Asynchronous Benders Decomposition Method

Below is briefly, the last work published on this topic (founded by the author). Rahmaniani et al. (2018b) present the

called The Asynchronous Benders Decomposition Method (ABD) compare the asynchronous and hybrid parallel

algorithms versus the latest version of CPLEX.

Rahmaniani et al. (2018b) describe the state-of-the-art in Benders parallelization as: “the existing parallel BT method

can be summarized as follows: The MP (master problem) is assigned to a processor, the “master", which also

coordinates other processors, the “slaves", which solve the SPs (subproblems). At each iteration, the solution obtained

from solving the MP is broadcast to the SPs. They then return the objective values and the cuts obtained from solving

the SPs to MP and the same procedure repeats. Such master-slave parallelization schemes are known as low-level

parallelism as they do not modify the BT algorithmic logic or the search space (Crainic and Toulouse 1998)”.

They present several strategies to specify the Scheduling and Pool Management decisions; Pool Management implies

manage the pool of solutions (denoted by S1, S2, …, ) and the pool of cuts (denoted by C1, C2, C3, … ):

1. Solution and Cut Pool Management: considering the previously partially evaluated solutions and the new one at

the current iteration, ABD need to decide which solution to choose and evaluate its associated (unevaluated) SPs.

At each iteration, the master process broadcasts its solution to all slave processors. Each slave processor stores

this solution in a pool and follows one of the following strategies to pick the appropriate one:

▪ S1: chooses solutions based on the FIFO rule;

▪ S2: chooses solutions based on the LIFO rule;

▪ S3: chooses solutions in the pool randomly.

ABD was test with two selection rules: i) each solution in the pool has an equal chance to be selected, and ii) each

solution is assigned a weight of 1/(1+k) , where k is number of the iterations since that specific solution has been

generated, so that more recent solutions have higher chance to be selected. Moreover, ABD use local branching,

Rei et al. (2009), to identify the solutions which are no longer required to be evaluated. Finally, ABD make use

of techniques to manage the cut pool to eliminate the dominated cuts.

2. ABD has implemented static work allocation because by equally distributing the scenario SPs, every process is

almost equally loaded. Once the solution is chosen, ABD need to decide the order by which the associated SPs

will be evaluated, because ABD may not evaluate all of them and it is important to give higher priority to those

which tighten the master (MP) formulation the most.

The following strategies are considered:

▪ SP1: randomly choose the SPs;

▪ SP2: assign a weight to each SP and then randomly select one based on the roulette wheel rule. The weights

are set equal to the normalized demands for each SP.

▪ SP3: ABD observe that if a solution is infeasible, ABD may not need to solve all its SPs. This strategy first

orders the SPs based on their demand sum and then assigns to each SP a criticality counter which increases

by one each time that the SP is infeasible. Then, a SP with the highest criticality value is selected.

3. Solving the MP. This dimension specifies the waiting portion of the master processor before it re-optimizes the

MP. ABD have proposed the following strategies:

▪ MP1: the master processor waits for at least % new cuts at each iteration;

▪ MP2: the master processor waits for % of the cuts associated with the current solution;

▪ MP3: this strategy is the same as the MP2 strategy, but with a mechanism to synchronize the processors

according to the current state of the algorithm. In this regard, if the cuts added to the MP fail to affect the

lower bound and/or regenerate the same solution, the MP waits until all the cuts associated with the current

solution are delivered.

To test the quality of the results, ABD was used to solve the Multi-Commodity Capacitated Fixed-charge Network

Design Problem with Stochastic Demand (MCFNDSD) that implies a MIP coordinator. To conduct the numerical

tests, was solve R instances which are widely used in the literature, these instances have up to 64 scenarios.

Rahmaniani et al. present the analysis of the implementation of several of improvements to BT in environments of

parallelism; such interesting results are not presented in this document; here we only show some results related whit

parallelism. The results indicate that ABD reaches higher speedup rates compared to the conventional parallel methods.

ABD also show that it is several orders of magnitude faster than the state-of-the-art solvers (CPLEX). ABD algorithm

(and its variations) runs until reaching the same optimality gap which is obtained by CPLEX after 10 hours. Note that

all ABD algorithms run on 15 processors. The average speedup rates are reported in the next figure.

519505

420

486

300

241

115

20

459

412

355

437

282

230

99

19

0

100

200

300

400

500

600

r04 r05 r06 r07 r08 r09 r10 r11

Speed-up Rate vs CPLEX

Hybrid

Asychronous

1.00

1.73

2.45

2.943.05 3.10

1.46

2.40

3.60

3.954.08

4.44

1.67

2.75

2.97

3.833.91

4.05

0.00

1.00

2.00

3.00

4.00

2 3 5 10 15 20

Speed-up Rate vs Processors

Synchronous

Hybrid

Asychronous

Source: The Asynchronous Benders Decomposition MethodR. Rahmaniani, T. Crainic, M. Gendreau, W. ReiJanuary 2018, CIRRELT 2018-07

Figure 19. Speed-up Parallel Benders Decomposition

The procedure described by Rahmaniani et al. is an example of a strategy, empirical, addressing the problem of

parallelization using large-scale methodologies. The best rule is the experience that is gaining to solve problems in

parallel form and to study what other researchers contribute and have socialized at the scientific community.

8. Conclusions

Nothing is required, nothing is enough, everything is useful

The synthesis of the state-of-the-art applications using Benders Theory is presented below

1. Since its formulation in 1962, it has proven to be an effective methodology to solve complex problems that cannot

be solved using only “best” basic optimization algorithms

2. Algorithms based in Benders’ Theory can solve NP-hard problems in reasonable time, it has proven to be an

effective methodology to solve complex problems that cannot be solved using “best” mathematical solvers.

3. Benders Theory is a mature methodology that are in the accelerated growing phase; there are many possibilities

to research in Benders Theory.

4. There is a GAP between the research in mathematical programming and the application of the large-scale

methodologies in real world solutions.

5. There is a GAP between the education on mathematical programming and the application of the large-scale

methodologies in real world solutions. For many young professionals the references of mathematical

programming are the basic solvers, but the reality is that the reference must be, at least, BT.

6. Benders Theory its Variations and enhancements speed up significatively the Mathematical Programming

Algorithms

7. Benders Theory is fundamental to use the power of the actual computer technologies based in multiples cores and

large storages of RAM memory.

8. It is necessary to socialize (make easy) the use of BT for the standard professionals, like the use of the basic

solvers.

References

1. Ahmed, S. (2013). A Scenario Decomposition Algorithm for 0–1 Stochastic Programs. Operations Research

Letters 41(6):565-569.

2. Ahmed, S., Tawarmalani, M., and Sahinidis, N. (2004). A Finite Branch-and-Bound Algorithm for Two-Stage

Stochastic Integer Programs. Mathematical Programming, 100(2):355-377.

3. Benders, J. F. (1962). Partitioning procedures for Solving Mixed Variables Programming Problems. Numer.

Math 4, 238-252.

4. Birge, J. R. and Louveaux, F. V. (1988). A Multicut Algorithm for Two-Stage Stochastic Linear Programs.

European Journal of Operational Research, 34(3): 384-392, 1988.

5. Cai, X., McKinney, D. Lasdon L. and Watkins D. (2001). Solving Large Nonconvex Water Resources

Management Models using Generalized Benders Decomposition. Operations ResearchVol. 49, No. 2, March–

April 2001, pp. 235–245

6. Caroe, C. and Schultz, R. (1998). Dual Decomposition in Stochastic Integer Programming. Operations Research

Letters, 24(1):37-46.

7. Caroe, C. and Tind, J. (1997). A Cutting-Plane Approach to Mixed 0-1 Stochastic Integer Programs. European

Journal of Operational Research, 101(2):306-316.

8. Caroe, C. and Tind, J. (1998). L-Shaped Decomposition of Two-Stage Stochastic Programs with Integer

Recourse. Mathematical Programming, 83(1):451-464.

9. Censor, Y. and Zenios, S. (1997). Parallel Optimization: Theory, Algorithms and Applications. Publisher:

Oxford University Press. Series on Numerical Mathematics and Scientific Computation (1997).

10. Chen Z-L. and Powell W. (1998). A Convergent Cutting-Plane and Partial-Sampling Algorithm for Multistage

Stochastic Linear Programs with Recourse. Department of Civil Engineering and Operations Research

Princeton University Princeton, NJ 08544 Technical Report SOR-97-11.

11. Cobb, C. W., Douglas, P. H. (1928). "A Theory of Production". American Economic Review. 18

12. Codato, G. and Fischetti, M. (2006). Combinatorial Benders' Cuts for Mixed-Integer Linear Programming).

Operations ResearchVol. 54, No. 4 Volume 54, Issue 4, July-August 2006, Pages ii-812

13. Costa, A., Jean-Francois Cordeau, J. F, Gendron, B, and Laporte, G. (2012). Accelerating Benders

Decomposition with Heuristic Master Problem Solutions. Pesquisa Operacional (2012) 32(1): 3-19. © 2012

Brazilian Operations Research Society

14. Crainic, T. and Toulouse, M. (1998). Parallel Metaheuristics. Book: Fleet Management and Logistics. Editors:

Crainic T. and Laporte, G. Springer, Boston, MA, 205-251.

15. Floudas, C. (1995). Nonlinear and Mixed-Integer Optimization. Oxford University Press, New York.

16. Floudas, C., Aggarwal, A. and Ciric, R. (1989). Global Optimum Search for Nonconvex NLP and MINLP

Problems. Computers & Chemical Engineering. 13 1117–1132.

17. Gade, D., Küçükyavuz, S. and Sen, S. (2014). Decomposition Algorithms with Parametric Gomory Cuts for

Two-Stage Stochastic Integer Programs. Mathematical Programming, April 2014, Volume 144, Issue 1–2, pp

39–64

18. Gade, D., Kucukyavuz, S., and Sen, S. (2012). Decomposition Algorithms with Parametric Gomory Cuts for

Two-Stage Stochastic Integer Programs. Mathematical Programming, pages 1-26.

19. Geoffrion, A. M. (1972). Generalized Benders Decomposition. Journal of Optimization Theory and

Applications,10 237–259.

20. Gomory, R. (1958). Outline of an Algorithm for Integer Solutions to Linear Programs. Bulletin of the American

Mathematical Society 64(5), 275{278 (1958)

21. Gomory, R. (1960). An Algorithm for the Mixed Integer Problem. Tech. Rep. RM-2597, RAND Corporation

(1960)

22. Greenberg H. and Pierskalla, W.P. (1970). Subrogate Mathematical Programming. Operations Research 18

(1970), 924-939.

23. Harjunkoski, I., Jain, V. and Grossmann, I. E. (2000). Hybrid Mixed-Interger/Constraint Logic Programming

Strategies for Solving Scheduling and Combinatorial Optimization Problems. Computers & Chemical

Engineering July 2000 24(2):337-343

24. Holmberg K. (1995). Primal and Dual Decomposition as Organizational Design: Price and/or Resource

Directive Decomposition. In: Burton R.M., Obel B. (eds) Design Models for Hierarchical Organizations.

Springer, Boston, MA

25. Hooker, J. and Ottosson, G. (2003) Logic-Based Benders Decomposition. Mathematical Programming, April

2003, Volume 96, Issue 1, pp 33–60

26. Hooker, J. N. (2000) Logic-Based Methods for Optimization: Combining Optimization and Constraint

Satisfaction, Wiley (2000).

27. Hooker, J. N. and Hong Yan. (1995) Logic Circuit Verification by Benders Decomposition, in V. Saraswat and

P. Van Hentenryck, eds., Principles and Practice of Constraint Programming: The Newport Papers, MIT Press

(Cambridge, MA, 1995) 267-288.

28. Hooker, J. N. (2019). Logic-based Benders Decomposition for Large-scale Optimization. In the Large-Scale

Optimization in Supply Chain & Smart Manufacturing: Theory & ApplicationBook Springer (2019).

29. Jain, V. and Grossmann, I. E. (2001). Algorithms for Hybrid MILP/CP Models for a Class of Optimization

Problems. INFORMS Journal on Computing Vol. 13, No. 4, Fall 2001 pp. 258–276

30. Karush, W. (1939). "Minima of Functions of Several Variables with Inequalities as Side Constraints". M.Sc.

Dissertation. Dept. of Mathematics, University of Chicago, Chicago, Illinois.

31. Keller, B. and Bayraksan G. (2009). Scheduling Jobs Sharing Multiple Resources UNDER Uncertainty: A

Stochastic Programming Approach. IIE Transactions, 42:1, 16-30, DOI: 10.1080/07408170902942683

32. Kong, N., Schaefer, A., and Hunsaker, B. (2006). Two-Stage Integer Programs with Stochastic Right-Hand

Sides: A Superadditive Dual Approach. Mathematical Programming, 108(2):275-296.

33. Kuhn, H. W.; Tucker, A. W. (1951). "Nonlinear Programming". Proceedings of 2nd Berkeley Symposium.

Berkeley: University of California Press. pp. 481–492. MR 0047303.

34. Laporte, G. and Louveaux, F. (1993). The L-shaped for Stochastic Integer Programs with Complete Recourse.

Operations Research Letters, 13(3):133-142.

35. Linderoth, J. T. (2003). Parallel and High-Performance Computing for Stochastic Programming. Course:

Stochastic Programming, Lecture 13.

36. Linderoth, J. T. and S. J. Wright (2001). Decomposition Algorithms for Stochastic Programming on a

Computational Grid. Preprint ANL/MCS-P875-0401, Mathematics and Computer Science Division, Argonne

National Laboratory, Argonne, Ill.

37. Magnanti, T. and Wong, R. (1981). Accelerating Benders Decomposition: Algorithmic Enhancement and

Model Selection Criteria. Operations Research, 1981, Vol. 29, No. 3

38. McDaniel, D and Devine M. (1977). A Modified Benders’ Partitioning Algorithm for Mixed Integer

Programming. Management Science, 1977. 24: 312–319.

39. Ntaimo, L. (2010). Disjunctive Decomposition for Two-Stage Stochastic Mixed-Binary Programs with Random

Recourse. Operations Research, 58(1):229-243.

40. Oliveira, F., Grossmann, I. E. and Hamacher, S. (2014). Accelerating Benders Stochastic Decomposition for

the Optimization under Uncertainty of the Petroleum Product Supply Chain. Computers & Operations Research

49 (2014) 47–58

41. Pan, F. and Morton, D. P. (2008). Minimizing a Stochastic Maximum-Reliability Path. Networks 52(3):111–

119.

42. Papadakos, N. (2008). Practical Enhancements to the Magnanti–Wong Method. Operations Research Letters

Volume 36, Issue 4, July 2008, Pages 444-449

43. Philpott, Andrew B., Ryan, David M. and Zakeri, G. (1996). Inexact Cuts in Stochastic Benders’

Decomposition. 32nd ORSNZ Conference Proceedings, 29-30, August 1996.

44. Rahmaniani, R. Crainic, T. G., Gendreau, M. and Rei, W. (2017). The Benders Decomposition: A Literature

Review. European Journal of Operational Research, Volume 259, Issue 3, 16 June 2017, Pages 801-817

45. Rahmaniani, R. Crainic, T. G., Gendreau, M. and Rei, W. (2018b). The Asynchronous Benders Decomposition

Method. CIRRELT-2018-07 (January 2018).

46. Rahmaniani, R., Shabbir Ahmed, S., Crainic, T., Gendreau, M, and Rei, W. (2018a). The Benders Dual

Decomposition Method. CIRRELT-2018-03 (January 2018).

47. Ralphs, T. and Hassanzadeh, A. (2014). A Generalization of Benders' Algorithm for Two-Stage Stochastic

Optimization Problems with Mixed Integer Recourse. COR@L Technical Report 14T-005

48. Rush, A. M., Collins, M. (2012). A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference

in Natural Language Processing. Journal of Artificial Intelligence Research, Volume 45, pages 305-362, 2012

49. Ruszczyński, A. (1986). A Regularized Decomposition Method for Minimizing a Sum of Polyhedral Functions.

Mathematical Programming (1986) 35: 309.

50. Ruszczyński, A. (1997). Decomposition Methods in Stochastic Programming. Mathematical Programming,

October 1997, Volume 79, Issue 1–3, pp 333–353

51. Santoso, T., Ahmed S., Goetschalckx, M. and Shapiro, A. (2005). A Stochastic Programming Approach for

Supply Chain Network Design under Uncertainty. European Journal of Operational Research 167 (2005) 96–

115

52. Schultz, R., Stougie, L., and Van Der Vlerk, M. (1998). Solving Stochastic Programs with Integer Recourse by

Enumeration: A Framework using Grobner Basis. Mathematical Programming, 83(1):229-252.

53. Sen, S. and Higle, J. (2005). The C3 Theorem and a D2 Algorithm for Large Scale Stochastic Mixed Integer

Programming: Set Convexification. Mathematical Programming, 104(1):1-20.

54. Sen, S. and Sherali, H. (2006). Decomposition with Branch-and-Cut Approaches for Two-Stage Stochastic

Mixed-Integer Programming. Mathematical Programming, 106(2):203-223.

55. Sherali, H. and Fraticelli, B. (2002). A Modification of Benders' Decomposition Algorithm for Discrete

Subproblems: An Approach for Stochastic Programs with Integer Recourse. Journal of Global Optimization,

22(1):319{342.

56. Sherali, H. and Lunday, B. (2011). On Generating Maximal Nondominated Benders Cuts. Annals of Operations

Research (2011), pp. 1-16

57. Sherali, H. and Zhu, X. (2006). On Solving Discrete Two-Stage Stochastic Programs having Mixed Integer

First-and Second-Stage Variables. Mathematical Programming, 108(2):597-616.

58. Sherali, H. D. and Smith, J. C. (2009). Two-Stage Stochastic Hierarchical Multiple Risk Problems: Models and

Algorithms. Mathematical programming, 120(2):403-427.

59. Trapp, A. C., Prokopyev, O. A., and Schaefer, A. J. (2013). On a Level-Set Characterization of the Value

Function of an Integer Program and its Application to Stochastic Programming. Operations Research,

61(2):498-511.

60. Velásquez, J. M. (1986). Primal-Dual Subrogated Algorithm. White paper

http://www.doanalytics.net/Documents/Primal-Dual-Subrogated-Algorithm.pdf

61. Velásquez, J. M. (1995). OEDM: Optimización Estocástica Dinámica Multinivel. Teoría General. Revista

Energética No. 13 (http://www.doanalytics.net/Documents/OEDM.pdf).

62. Velásquez, J. M. (2018) Benders Decomposition Using Unified Cuts.

http://www.doanalytics.net/Documents/Benders-Decomposition-Using-Unified-Cuts.pdf

63. Velásquez, J. (2019a). Stochastic Programming: Fundamentals. In the book Large Scale Optimization in Supply

Chain & Smart Manufacturing: Theory & Application. Springer 2019.

64. Velásquez, J. (2019b). Stochastic & Dynamic Benders Theory. In the book Large Scale Optimization in Supply

Chain & Smart Manufacturing: Theory & Application. Springer 2019.

65. Velásquez, J. (2019c). The Future: Mathematical Programming 4.0. In the book Large Scale Optimization in

Supply Chain & Smart Manufacturing: Theory & Application. Springer 2019.

66. Yang, H., Gupta, J., Yu, L. and Zheng. (2016). An Improved L-Shaped Method for Solving Process Flexibility

Design Problems. Mathematical Problems in Engineering. Volume 2016, Article ID 4329613

67. Yuan, Y. and Sen, S. (2009). Enhanced cut Generation Methods for Decomposition-Based Branch and Cut for

Two-Stage Stochastic Mixed-Integer Programs. INFORMS Journal on Computing, 21(3):480-487.

68. Zakeri, G., Philpott, A. and Ryan, D. (1999). Inexact Cuts in Benders Decomposition. SIAM Journal on

Optimization Volume 10 Issue 3, 1999 , Pages 643-657

69. Zang, Y., Wang J., Ding, T, and Wang, X. (2018). Conditional Value-At-Risk Based Stochastic Unit

Commitment considering the Uncertainty of Wind Power Generation, IET Generation, Transmission &

Distribution (Volume: 12, Issue: 2 , 1 30 2018)

70. Zaourar, S. and Malick, J. (2015) Quadratic Stabilization of Benders Decomposition. HAL Id: hal-01181273

https://hal.archives-ouvertes.fr/hal-01181273

http://www.doanalytics.net/Documents/Primal-Dual-Subrogated-Algorithm.pdf

http://www.doanalytics.net/Documents/OEDM.pdf

http://www.doanalytics.net/Documents/Benders-Decomposition-Using-Unified-Cuts.pdf

List of Figures

Figure 1. The Power of Benders Theory

Figure 2. Dual-Angular Matrix

Figure 3. Benders Decomposition Cuts

Figure 4. Speed-up of Multiple/Decoupled Cuts (Zang et al., 2018)

Figure 5. Multi Dual Angular Matrix

Figure 6. Triangular Matrix

Figure 7. Multilevel Nested (Dynamic) Benders

Figure 8. Cobb-Douglas Production Function

Figure 9. Economic Interpretation of Benders Theory

Figure 10. Generalized Benders Decomposition

Figure 11. Gomory Cutting Planes – Integer Convex Hull

Figure 12. Benders-Gomory Cuts

Figure 13. Convergence of MIP/MINLP problems - Vehicle Routing Problem

Figure 14. Enhancements MIP/MINLP Benders Coordinators - Relaxing Coordinator

Figure 15. Relation CPLEX versus Inexact Solutions

Figure 16. Speed-up Combinatorial Benders Cuts

Figure 17. Pareto-Optimal Benders Cuts

Figure 18. Speed-up Benders Cuts: POP, MND, DUMN, CPLEX

Figure 19. Speed-up Parallel Benders Decomposition

List of Tables

Table 1. Why Benders Large Scale Methodologies ?

Table 2. Speed-up Generalized Benders Decomposition. (Cai et al., 2001)


Table 4. Benders-Gomory Cuts for Two-Stage Stochastic Integer Subproblems

Table 5. Benders Dual Decomposition (Rahmaniani et al., 2018a)

Table 6. Benders Dual Decomposition versus CPLEX (Rahmaniani et al., 2018a)

Table 7. Speed-up Inexact Solutions (Costa et al., 2012)

Table 8. Speed-up Inexact Cuts

Table 9. NP-hard Problems Solved only by CBC. (Codato and Fischetti, 2006)

Table 10. NP-hard Problems Solved by CPLEX & CBC. (Codato and Fischetti, 2006)

Table 11. Hub Location Problems (Zaourar and Malick, 2015)

Table 12. Network Design Problems. (Zaourar and Malick, 2015)

Table 13. Stochastic Supply Chain Network Design Problem. (Santoso et al., 2005)

Table 14. Speed Up Hamming Trust Region

Table 15. Speed-up Pareto Optimal Cuts (Yang et al., 2016)

Table 16. Job Shop Scheduling Problem. (Jain and Grossmann, 2001)

Table 17. Speed-up Hybrid Benders Cuts. (Jain and Grossmann, 2001)

Table 18. Hybrid Benders Strategies. (Yang et al., 2016)

Table 19. Papers in Parallel Stochastic Optimization. (Linderoth, 2003)